CN112882830A - Video memory management method, video memory management device, model training device, electronic equipment and storage medium - Google Patents

Video memory management method, video memory management device, model training device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112882830A
CN112882830A CN202110150321.XA CN202110150321A CN112882830A CN 112882830 A CN112882830 A CN 112882830A CN 202110150321 A CN202110150321 A CN 202110150321A CN 112882830 A CN112882830 A CN 112882830A
Authority
CN
China
Prior art keywords
video memory
training
tensor
target
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110150321.XA
Other languages
Chinese (zh)
Inventor
邓哲也
章玄润
高华佐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN202110150321.XA priority Critical patent/CN112882830A/en
Publication of CN112882830A publication Critical patent/CN112882830A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The invention discloses a method and a device for video memory management and model training, an electronic device and a storage medium, wherein the method for video memory management comprises the following steps: acquiring a video memory threshold corresponding to the current round of training of the model; determining a target tensor meeting a tensor selection rule under the condition that the display memory occupation value of the electronic equipment is larger than the display memory threshold value; and releasing the video memory occupied by the target tensor. Therefore, by implementing the method, the occupation of the GPU video memory can be reduced by releasing the tensor which does not influence the normal training of the model in the video memory, the global information of the whole computational graph does not need to be obtained in advance, the complete dynamic is realized, and the video memory management under the deep learning framework of the dynamic graph mechanism is realized.

Description

Video memory management method, video memory management device, model training device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of deep learning, in particular to a method and a device for video memory management and model training, electronic equipment and a storage medium.
Background
In the field of deep learning, with the increase of training data, the Size and complexity of a model are greatly increased, and at present, in the process of model training, a difficulty that limited GPU (Graphics Processing Unit) video memory resources cannot meet the requirement of model training with large Batch Size is often encountered, so that a new challenge is brought to a deep learning framework, whether limited computing memory resources can be effectively utilized in the process of model training, and particularly, GPU video memory occupation is reduced, which is an important index for evaluating the performance of the deep learning framework.
In the related art, the deep learning framework has the following methods for reducing the video memory occupation: through proper gradient definition, the reverse gradient calculation of operators such as Relu, Sigmoid and the like does not depend on forward calculation as input, so that the part of the video memory can be released after the forward calculation is completed; or calculating the life cycle of each operator, wherein operators with non-overlapping life cycles can share the video memory; or, reducing the video memory occupation by additional data transmission, for example, exchanging temporarily unused data from the GPU to a CPU (Central Processing Unit), and exchanging data from the CPU when necessary; alternatively, the memory occupancy is reduced by additional computations, such as a sub-linear memory optimization method that recalculates intermediate results using Gradient Check pointing (Gradient Check pointing).
However, the above methods in the related art all need to obtain global information of the computation graph in advance, which requires that the computation graph of the deep learning framework must be a static graph, which is not available for the deep learning framework of the dynamic graph mechanism, and therefore, it is a technical problem to be solved by those skilled in the art to provide a memory management method for the deep learning framework of the dynamic graph mechanism.
Disclosure of Invention
The embodiment of the invention provides a video memory management method, a model training method, a video memory management device, an electronic device and a storage medium, and aims to solve the technical problem that the video memory management method in the prior art is unavailable on a deep learning framework of a dynamic graph mechanism.
According to a first aspect of the present invention, a video memory management method is disclosed, the method comprising:
acquiring a video memory threshold corresponding to the current round of training of the model;
determining a target tensor meeting a tensor selection rule under the condition that the display memory occupation value of the electronic equipment is larger than the display memory threshold value;
and releasing the video memory occupied by the target tensor.
Optionally, as some embodiments, the determining the target tensor satisfying the tensor selection rule includes:
calculating an evaluation function value corresponding to the tensor in the video memory according to the target evaluation function;
and determining the tensor with the largest evaluation function value as the target tensor.
Optionally, as some embodiments, the calculating, according to a target evaluation function, an evaluation function value corresponding to a tensor in the video memory includes:
calculating an estimation function value corresponding to the unlocked tensor in the video memory according to the target estimation function;
the determining the tensor with the largest evaluation function value as the target tensor comprises the following steps:
and determining the tensor which is not locked and has the largest evaluation function value as the target tensor.
Optionally, as some embodiments, the method further comprises:
and determining the target evaluation function according to at least two or more of the size of the video memory occupied by the tensor, the time length of the video memory occupied by the tensor, the calculation cost of the tensor and the recalculation times of the tensor.
Optionally, as some embodiments, the target estimation function is:
Figure BDA0002932121730000021
wherein t is tensor, m (t) is the size of the video memory occupied by t, l (t) is the time length of the video memory occupied by t, c (t) is the calculation cost of t, r (t) is the recalculation times of t, and alpha, beta, gamma and delta are hyper-parameters of the objective evaluation function.
Optionally, as some embodiments, when the current round of training of the model is the first round of training, the display memory threshold is display memory capacity/2, α β γ 1, δ 1/2.
Optionally, as some embodiments, the method further comprises:
acquiring the times that the video memory occupation value of the Nth round of training exceeds the video memory threshold value;
and adjusting the video memory threshold corresponding to the (N + 1) th training turn according to the frequency of the video memory occupancy value exceeding the video memory threshold, wherein N is an integer greater than or equal to 1.
Optionally, as some embodiments, the adjusting, according to the number of times that the video memory occupancy value exceeds the video memory threshold, the video memory threshold corresponding to the (N + 1) th round of training includes:
under the condition that the frequency that the video memory occupancy value exceeds the video memory threshold value is greater than the first time threshold value, increasing the video memory threshold value corresponding to the (N + 1) th round of training;
and reducing the video memory threshold corresponding to the (N + 1) th round of training under the condition that the frequency of the video memory occupancy value exceeding the video memory threshold is not greater than the first time threshold.
Optionally, as some embodiments, the method further comprises:
acquiring the times that the application space of the Nth round of training is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length to the total training time length;
and adjusting the value of the hyper-parameter in the target evaluation function corresponding to the (N + 1) th round of training according to the times that the application space is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length in the total training time length.
Optionally, as some embodiments, the adjusting, according to the number of times that the application space is greater than the maximum value of the video memory fragments and/or the percentage of the recalculated duration to the total training duration, the value of the hyper-parameter in the target estimation function corresponding to the (N + 1) th round of training includes at least one of the following steps:
under the condition that the times that the application space is larger than the maximum value of the video memory fragments are larger than a second time threshold value, increasing the value of alpha corresponding to the (N + 1) th round of training;
and under the condition that the percentage of the recalculated time length to the total training time length is increased compared with the last round of model training, reducing the value of gamma corresponding to the (N + 1) th round of training.
Optionally, as some embodiments, the method further comprises:
calculating the time consumption parameter of the Nth training, namely recalculating the time consumption/originally calculating the time consumption;
and running a simulated annealing algorithm based on the time-consuming parameters, and adjusting a video memory threshold corresponding to the (N + 1) th round of training and/or the value of the hyper-parameter in the target evaluation function.
Optionally, as some embodiments, the obtaining of the computed cost of the tensor comprises:
reading historical calculation costs of historical tensors with the same operators and/or the same input shapes as the tensors from a cache;
and determining the read historical calculation cost as the calculation cost corresponding to the tensor.
Optionally, as some embodiments, the obtaining of the duration that the tensor occupies the video memory includes:
acquiring the number of operators being executed in the current round of training and the time of the tensor entering the video memory;
and determining the time length of the tensor occupying the video memory according to the number of the operators being executed and the time of the tensor entering the video memory.
Optionally, as some embodiments, the method further comprises:
determining whether the current round of training comprises a target operator execution sequence, if so, determining a tensor corresponding to the target operator execution sequence as the target tensor;
wherein the target operator execution sequence comprises: and the tensors corresponding to the target operator execution sequence have released records in the historical training process.
Optionally, as some embodiments, after the step of releasing the video memory occupied by the target tensor, the method further includes:
and storing the newly generated tensor in the current round of training into the released video memory.
According to a second aspect of the invention, a method of model training is disclosed, the method comprising:
acquiring a training sample set, wherein the training sample set comprises training data used for model training;
and performing model training based on the training sample set and the initial model, and in the model training process, managing tensors generated by training based on the video memory management method in the first aspect until a target model is obtained by training.
According to a third aspect of the present invention, a video memory management apparatus is disclosed, the apparatus comprising:
the first acquisition module is used for acquiring a video memory threshold corresponding to the current round of training of the model;
the first determining module is used for determining a target tensor meeting a tensor selection rule under the condition that the video memory occupation value of the electronic equipment is larger than the video memory threshold value;
and the releasing module is used for releasing the video memory occupied by the target tensor.
Optionally, as some embodiments, the first determining module includes:
the calculation submodule is used for calculating an estimation function value corresponding to the tensor in the video memory according to the target estimation function;
and the determining submodule is used for determining the tensor with the largest evaluation function value as the target tensor.
Optionally, as some embodiments, the computation submodule includes:
the calculation unit is used for calculating an evaluation function value corresponding to the unlocked tensor in the video memory according to the target evaluation function;
the determination sub-module includes:
and a determining unit configured to determine an unlocked tensor having a maximum evaluation function value as the target tensor.
Optionally, as some embodiments, the apparatus further comprises:
and the second determining module is used for determining the target evaluation function according to at least two or more of the size of the video memory occupied by the tensor, the time length of the video memory occupied by the tensor, the calculation cost of the tensor and the recalculation times of the tensor.
Optionally, as some embodiments, the target estimation function is:
Figure BDA0002932121730000051
wherein t is tensor, m (t) is the size of the video memory occupied by t, l (t) is the time length of the video memory occupied by t, c (t) is the calculation cost of t, r (t) is the recalculation times of t, and alpha, beta, gamma and delta are hyper-parameters of the objective evaluation function.
Optionally, as some embodiments, when the current round of training of the model is the first round of training, the display memory threshold is display memory capacity/2, α β γ 1, δ 1/2.
Optionally, as some embodiments, the apparatus further comprises:
the second acquisition module is used for acquiring the times that the video memory occupation value of the Nth round of training exceeds the video memory threshold;
and the first adjusting module is used for adjusting the video memory threshold corresponding to the (N + 1) th round of training according to the frequency that the video memory occupancy value exceeds the video memory threshold, wherein N is an integer greater than or equal to 1.
Optionally, as some embodiments, the first adjusting module includes:
the first adjusting submodule is used for increasing the video memory threshold corresponding to the (N + 1) th round of training under the condition that the frequency that the video memory occupancy value exceeds the video memory threshold is greater than the first time threshold;
and the second adjusting submodule is used for reducing the video memory threshold corresponding to the (N + 1) th round of training under the condition that the frequency of the video memory occupancy value exceeding the video memory threshold is not greater than the first frequency threshold.
Optionally, as some embodiments, the apparatus further comprises:
the third acquisition module is used for acquiring the times that the application space of the Nth round of training is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length to the total training time length;
and the second adjusting module is used for adjusting the value of the hyper-parameter in the target evaluation function corresponding to the (N + 1) th round of training according to the times that the application space is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length in the total training time length.
Optionally, as some embodiments, the second adjusting module includes at least one of the following sub-modules:
a third adjusting submodule, configured to increase a value of α corresponding to the (N + 1) th round of training when the number of times that the application space is greater than the maximum value of the video memory fragments is greater than a second time threshold;
and the fourth adjusting submodule is used for reducing the value of gamma corresponding to the N +1 th round of training under the condition that the percentage of the recalculated time length to the total training time length is increased compared with that of the previous round of model training.
Optionally, as some embodiments, the apparatus further comprises:
the calculating module is used for calculating the consumed time parameter of the Nth round of training, namely recalculating consumed time/originally calculating consumed time;
and the third adjusting module is used for operating a simulated annealing algorithm based on the time-consuming parameters and adjusting the video memory threshold corresponding to the (N + 1) th training and/or the value of the hyper-parameter in the target evaluation function.
Optionally, as some embodiments, the apparatus further comprises:
the reading module is used for reading the historical calculation cost of the historical tensor which has the same operator and/or the same input shape with the tensor from the cache;
and the third determining module is used for determining the read historical calculation cost as the calculation cost corresponding to the tensor.
Optionally, as some embodiments, the apparatus further comprises:
the fourth acquisition module is used for acquiring the number of operators being executed in the current round of training and the time of the tensor entering the video memory;
and the fourth determining module is used for determining the time length of the tensor occupying the video memory according to the number of the operators being executed and the time of the tensor entering the video memory.
Optionally, as some embodiments, the apparatus further comprises:
a fifth determining module, configured to determine whether the current round of training includes a target operator execution sequence, and if so, determine a tensor corresponding to the target operator execution sequence as the target tensor;
wherein the target operator execution sequence comprises: and the tensors corresponding to the target operator execution sequence have released records in the historical training process.
Optionally, as some embodiments, the apparatus further comprises:
and the storage module is used for storing the newly generated tensor in the current round of training into the released video memory.
According to a fourth aspect of the present invention, there is disclosed a model training apparatus, the apparatus comprising:
the second acquisition module is used for acquiring a training sample set, wherein the training sample set comprises training data used for model training;
and the training module is used for carrying out model training based on the training sample set and the initial model, and managing tensors generated by training based on the video memory management device in the third aspect in the model training process until a target model is obtained by training.
According to a fifth aspect of the present invention, there is disclosed an electronic apparatus comprising: memory, a processor and a program stored on the memory and executable on the processor, which program, when executed by the processor, performs the steps of the video memory management method as in the first aspect or the steps of the model training method as in the second aspect.
According to a sixth aspect of the present invention, a computer readable storage medium is disclosed, having stored thereon a program which, when executed by the processor, carries out the steps of the video memory management method as in the first aspect or the steps of the model training method as in the second aspect.
In the embodiment of the invention, the video memory threshold corresponding to the current round of training of the model can be obtained, the target tensor meeting the tensor selection rule is determined under the condition that the video memory occupation value of the electronic equipment is greater than the video memory threshold, and the video memory occupied by the target tensor is released. Therefore, in the embodiment of the invention, the occupation of the GPU video memory can be reduced by releasing the tensor which does not influence the normal training of the model in the video memory, the global information of the whole computational graph does not need to be obtained in advance, the complete dynamic is realized, and the video memory management under the deep learning framework of the dynamic graph mechanism is realized. In addition, under the deep learning framework of the dynamic graph mechanism, a user can train a model which is one time larger than the original model in the time which is constant times of the original time consumption on a machine with unchanged memory capacity without modifying the original training code, and the model training efficiency is higher.
Drawings
FIG. 1 is a flow diagram of a video memory management method of some embodiments of the invention;
FIG. 2 is an exemplary diagram of the correspondence of tensors in Python to tensors in C + + according to some embodiments of the present invention;
FIG. 3 is an exemplary diagram of tensor changes in a memory when only three tensors can be stored in the memory according to some embodiments of the present invention;
FIG. 4 is a flow chart of a model training method of some embodiments of the present invention;
fig. 5 is a schematic structural diagram of a video memory management apparatus according to some embodiments of the present invention;
FIG. 6 is a schematic diagram of a model training apparatus according to some embodiments of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
In the field of deep learning, with the increase of training data, the Size and complexity of a model are greatly increased, and a dilemma frequently encountered by people is that limited GPU video memory resources cannot meet the requirement of training a model with a large Batch Size. Therefore, a new challenge is brought to the deep learning framework, whether limited computing storage resources can be effectively utilized during model training or not can be realized, and particularly, GPU video memory occupation is reduced, so that the method is an important index for evaluating the performance of the deep learning framework.
In the related art, under the condition that the computing storage resources are certain, the deep learning framework has several methods for reducing the video memory occupation, which are specifically as follows:
according to the method I, through proper gradient definition, the reverse gradient calculation of operators such as Relu and Sigmoid does not depend on forward calculation as input, so that the part of memory can be released after the forward calculation is completed;
calculating the life cycle of each operator, wherein operators with non-overlapping life cycles can share the video memory;
reducing the video memory occupation through additional data transmission, for example, exchanging the temporarily unused data from the GPU to the CPU, and exchanging the data from the CPU when necessary;
and fourthly, reducing the video memory occupation through additional calculation, for example, a sub-linear video memory optimization method for recalculating the intermediate result by using Gradient Check points (Gradient Check pointing).
The drawback of the above methods is that global information of the computation graph needs to be obtained in advance, which requires that the computation graph must be static. The benefit of static maps is that the program, when compiled, can generate the structure of the neural network, which allows the compiler to optimize to the maximum extent, e.g., calculate the life cycle of each operator to better allocate memory, calculate optimal gradient checkpoints by search algorithms, etc. However, this also means that there is a large gap between the actual execution of the compiler and the execution of the program desired by the user, and errors in the code will be more difficult to find. On the contrary, the program under the dynamic graph can be completely executed according to the order of writing commands by the user, is easy to debug and is more user-friendly. Thus, for convenience, users are increasingly choosing to use flexible frameworks with dynamic graph mechanisms, such as PyTorch, MegEngine, and so forth.
It is also because of the flexible nature of the dynamic graph that the above-described techniques do not migrate well from static graphs to dynamic graphs. For example, the second method is not feasible on the dynamic graph, because the future calculation sequence cannot be obtained on the dynamic graph, the life cycle of each operator cannot be calculated; the fourth method is not feasible on the dynamic map, and because the dynamic map cannot obtain the information of the whole calculation map from the beginning, the optimal gradient check point cannot be calculated. Therefore, the video memory management method is not applicable to the deep learning framework of the dynamic graph mechanism.
In order to solve the above technical problems, embodiments of the present invention provide a method and an apparatus for video memory management and model training, an electronic device, and a storage medium.
First, a video memory management method according to an embodiment of the present invention is described below.
It should be noted that the video memory management method provided in the embodiment of the present invention is applicable to an electronic device, and in practical applications, the electronic device may be a server, and the like.
Fig. 1 is a flowchart of a video memory management method according to some embodiments of the present invention, and as shown in fig. 1, the method may include the following steps: step 101, step 102 and step 103, wherein,
in step 101, a video memory threshold corresponding to the current round of training of the model is obtained.
For the convenience of understanding, the deep learning framework and the related contents of model training based on the deep learning framework involved in the embodiments of the present invention are described with reference to an example.
Before actual model training, a user deploys a deep learning framework supporting a dynamic graph on a machine (mainly a server, which is described later by taking the server as an example) performing the model training, and writes corresponding model codes in Python language, for example, if the user desires to finally train to obtain an object classification model, the Python code corresponding to the object classification model needs to be written. And then, submitting the written model code to a deep learning framework of the server for running, wherein the deep learning framework provides an interpreter, the interpreter splits the model code into a plurality of small tasks, and the small tasks are sent to a GPU of the server for model training, for example, kernel functions in the model code are transmitted to the GPU. In the model training process, some data related to the model training may occupy a video memory of the GPU, for example, a Tensor (Tensor) generated in the model training process may be stored in the video memory, and an object of the embodiment of the present invention is to: and optimizing the video memory occupation condition in the model training process.
In the embodiment of the invention, one function supported by the interpreter is to bind tensors in the model code to tensor types in the C + + code one by one, and when a certain amount of video memory occupied by the GPU video memory needs to be released, the mode can be realized only by operating a tensor structure body in the C + + code.
As shown in fig. 2, fig. 2 shows the correspondence between the tensor in the Python code and the tensor in C + +, and taking the tensor as t as an example, some auxiliary information may be recorded in the tensor type of C + +: calculating history information and attribute values, wherein calculating the history information comprises: calculating an operator of t and all input tensors of the operator, wherein the attribute values comprise: m (t) is the size of the video memory occupied by t (unit: MB), L (t) is the duration of the video memory occupied by t, C (t) is the calculation cost of t (unit: ms), and R (t) is the recalculation times of t.
In the embodiment of the present invention, each round of model training may be used as a processing unit, and for each round of model training, the video memory occupancy in each round of training may be optimized by using the same/different video memory optimization parameters according to the model and the video memory occupancy, etc. in consideration that a basic model (also referred to as an "initial model" or a "backbone network", etc.) needs to be trained for multiple rounds to obtain a target model, and the video memory occupancy in each round of model training is usually different.
In the embodiment of the invention, in the process of model training, the current training round number of the model is obtained every time one round of training is started, namely, the model is currently trained to the number of rounds.
In order to prevent the video memory from being exhausted, in the embodiment of the invention, a video memory threshold is set for each round of training, when the video memory occupancy value of the electronic device exceeds the video memory threshold, according to the tensor selection rule, a target tensor which does not influence the subsequent training process of the model is searched from the tensors stored in the video memory, the target tensor is released from the video memory, and the searching and releasing operations are repeated until the video memory occupancy value of the electronic device is lower than the video memory threshold.
In the embodiment of the present invention, for the released tensor, if the user needs to access the tensor in the future, or needs to use the tensor when performing reverse derivation, the original value may be restored according to the calculation path (stored in the CPU memory), the operator, and all the inputs of the tensor. That is, only the calculation path of each tensor needs to be recorded in the framework, and details such as tensor searching, memory release, tensor recalculation recovery and the like are realized without a user, so that the user is unaware of the whole optimization process, and programming experience of the user is not influenced.
In the embodiment of the invention, in the training process of the model, in order to ensure the optimization effect of the video memory, the video memory threshold value and the tensor selection rule are dynamically adjusted to a certain extent along with the training of the model, that is, for the training process of one model, different training rounds are performed, and the corresponding video memory threshold value and the tensor selection rule are usually different. Therefore, in the present round of model training, when the video memory occupation is optimized, the video memory threshold and the tensor selection rule corresponding to the current round of training need to be obtained.
In step 102, a target tensor meeting the tensor selection rule is determined under the condition that the video memory occupancy value of the electronic device is greater than the video memory threshold value.
In some embodiments provided by the present invention, the tensor selection rule may be implemented based on an evaluation function of the tensor, in this case, the step 102 may specifically include the following steps (not shown in the figure): step 1021 and step 1022, wherein,
in step 1021, under the condition that the video memory occupancy value of the electronic device is greater than the video memory threshold value, calculating an evaluation function value corresponding to a tensor in the video memory according to the target evaluation function;
in step 1022, the tensor having the largest evaluation function value is determined as the target tensor.
In the embodiment of the invention, the target evaluation function can be determined according to at least two or more of the size of the video memory occupied by the tensor, the time length of the video memory occupied by the tensor, the calculation cost of the tensor and the recalculation times of the tensor.
In the embodiment of the present invention, when determining the target evaluation function according to the size of the video memory occupied by the tensor, the duration of the video memory occupied by the tensor, the calculation cost of the tensor, and the recalculation times of the tensor, the target evaluation function may be:
Figure BDA0002932121730000131
wherein t is tensor, m (t) is the size of the video memory occupied by t, l (t) is the time length of the video memory occupied by t, c (t) is the calculation cost of t, r (t) is the recalculation times of t, and alpha, beta, gamma and delta are hyper-parameters of the target evaluation function.
In the embodiment of the present invention, when the current round of training of the model is the first round of training, the display memory threshold is display memory capacity/2, α is β is γ is 1, and δ is 1/2. That is, at the beginning of the first round of training of the model, the initial strategy is: setting a video memory threshold value as video memory capacity/2, setting a super parameter alpha as beta as gamma as 1, and setting a super parameter delta as 1/2. Of course, the implementation of the invention does not limit the initial strategy when the model starts the first round of training, and other initial strategies can be set correspondingly according to the actual application requirements.
In the embodiment of the invention, the video memory threshold value and the hyper-parameter of the target evaluation function corresponding to each round of training can be dynamically fine-tuned according to the historical training information of the model.
Based on the above dynamic fine tuning thought, the video memory management method provided by the embodiment of the present invention may further include the following steps:
acquiring the times that the video memory occupation value of the Nth round of training exceeds the video memory threshold value;
and adjusting the video memory threshold corresponding to the (N + 1) th round of training according to the times that the video memory occupancy value exceeds the video memory threshold, wherein N is an integer greater than or equal to 1.
In the embodiment of the present invention, when the video memory threshold corresponding to the (N + 1) th round of training is adjusted according to the number of times that the video memory occupancy value exceeds the video memory threshold, the following strategies may be adopted:
under the condition that the frequency that the video memory occupancy value exceeds the video memory threshold value is greater than the first time threshold value, increasing the video memory threshold value corresponding to the (N + 1) th round of training;
and under the condition that the frequency that the video memory occupancy value exceeds the video memory threshold value is not greater than the first time threshold value, reducing the video memory threshold value corresponding to the (N + 1) th round of training.
In one example, when the number of times that the video memory occupancy value exceeds the video memory threshold is greater than the first time threshold, the following settings are set: a video memory threshold corresponding to the N +1 th round of training is equal to a video memory threshold corresponding to the N-th round of training (1+ 10%); under the condition that the frequency that the video memory occupancy value exceeds the video memory threshold value is not greater than the first time threshold value, setting: the video memory threshold corresponding to the N +1 th round of training is the video memory threshold corresponding to the N th round of training (1-10%).
Based on the above dynamic fine tuning thought, the video memory management method provided by the embodiment of the present invention may further include the following steps:
acquiring the times that the application space of the Nth round of training is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length to the total training time length;
and adjusting the value of the hyper-parameter in the target evaluation function corresponding to the (N + 1) th round of training according to the times that the application space is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length in the total training time length.
In the embodiment of the present invention, when the value of the hyper-parameter in the target evaluation function corresponding to the (N + 1) th round of training is adjusted according to the number of times that the application space is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length to the total training time length, the following strategy may be adopted:
under the condition that the times that the application space is larger than the maximum value of the video memory fragments are larger than a second time threshold value, increasing the value of alpha corresponding to the (N + 1) th round of training;
and under the condition that the percentage of the recalculated time length to the total training time length is increased compared with the last round of model training, reducing the value of gamma corresponding to the N +1 round of training.
In one example, when the number of times that the application space is larger than the maximum value of the video memory fragments is larger than the second time threshold, it is described that the tensor which occupies a larger amount of the video memory should be more emphasized to release, in this case, the following settings are set: α corresponding to the N +1 th round (α x (1+ 10%) corresponding to the N-th round); in the case that the percentage of the recalculated duration to the total training duration is increased compared to the previous round of model training, it is stated that the tensor with lower recalculation cost should be released more heavily, and at this time, the following settings are set: γ for N +1 round (1-10%) for N round.
In addition to the foregoing adjustment strategy, considering that the training time consumption of each round mainly consists of two parts, namely original calculation time consumption and recalculation time consumption, in order to successfully train the model without exceeding the video memory limit, a time consumption parameter P is introduced, namely recalculation time consumption/original calculation time consumption, and P is made as small as possible. In some embodiments, the hyper-parameter may be adjusted based on a simulated annealing algorithm, which is not limited in the embodiments of the present invention. At this time, the video memory management method provided in the embodiment of the present invention may further include the following steps:
calculating the time consumption parameter of the Nth training, namely recalculating the time consumption/originally calculating the time consumption;
and (3) operating a simulated annealing algorithm based on the time-consuming parameters, and adjusting the video memory threshold corresponding to the (N + 1) th training and/or the value of the hyper-parameters in the target estimation function.
Since the simulated annealing algorithm is the existing algorithm, it is not described herein again.
In the embodiment of the present invention, when the calculation cost c (t) of the measurement tensor is considered, because synchronization between the devices is introduced, excessive measurement times should be avoided as much as possible, which causes extra excessive overhead. For the situation, Cache (Cache) optimization can be introduced, the time consumption is measured only once for the tensors of the same operator and/or the same input shape and stored in the Cache, and then the time consumption is directly read from the Cache when the tensors of the same operator and the same input shape are encountered.
Correspondingly, based on the above-mentioned idea, the step of obtaining the computed cost of the tensor may include the following steps: reading historical calculation cost of historical tensors which have the same operators and/or the same input shapes as the tensors from a cache; and determining the read historical calculation cost as the calculation cost corresponding to the tensor.
In the embodiment of the invention, when the time length L (t) of the display memory occupied by the tensor is calculated, in order to avoid calling a large number of functions for acquiring the current time, the number of currently executed operators is used as the time stamp to approximately calculate the time stamp when the current time stamp-tensor enters the display memory, so that the expense of calling the time function is reduced.
Correspondingly, based on the above-mentioned thinking, the step of obtaining the duration that the tensor occupies the video memory may include the following steps: acquiring the number of operators being executed in the current round of training and the time of the tensor entering the video memory; and determining the time length of the tensor occupying the video memory according to the number of the operators being executed and the time of the tensor entering the video memory.
In the embodiment of the invention, before the kernel function of an operator is transmitted, the input tensor dependent on the operator needs to be locked to prevent the input tensor from being determined as the target tensor, so that the tensor does not exist in the memory when the equipment calculates the operator, and the lock of the input tensor can be released after the calculation of the operator is finished.
In view of the above situation, the step 1021 may specifically include the following steps:
calculating an estimation function value corresponding to the unlocked tensor in the display memory according to the target estimation function;
correspondingly, the step 1022 may specifically include the following steps:
and determining the tensor which is not locked and has the largest evaluation function value as the target tensor.
In the embodiment of the invention, when a user deletes tensor t in a model code, an interpreter needs to judge whether tensors which depend on t as operator input are all deleted, and t cannot be deleted in the interpreter as long as one tensor u depends on t. That is, the interpreter can delete t only if the tensors dependent on t are all deleted by the user.
In step 103, the video memory occupied by the target tensor is released.
In the embodiment of the invention, after the video memory occupied by the target tensor is released, the newly generated tensor in the current round of training can be stored in the released video memory, so that the model training can be normally carried out.
In one example, as shown in fig. 3, fig. 3 shows how tensors in a video memory change when the video memory capacity is three tensors. In the model training process, A, B and C are stored in the video memory, and three tensors are stored, at this time, the video memory occupancy value of the electronic device is 100%, and if the video memory threshold value is 70%, since the video memory occupancy value is 100% greater than the video memory threshold value 70%, the target tensor in the video memory needs to be found out, for example, if C is the target tensor, the video memory occupied by C is released, so as to achieve the purpose of optimizing the video memory.
As can be seen from the above embodiment, in this embodiment, a video memory threshold corresponding to the current round of training of the model can be obtained, and when the video memory occupancy value of the electronic device is greater than the video memory threshold, the target tensor meeting the tensor selection rule is determined, and the video memory occupied by the target tensor is released. Therefore, in the embodiment of the invention, the occupation of the GPU video memory can be reduced by releasing the tensor which does not influence the normal training of the model in the video memory, the global information of the whole computational graph does not need to be obtained in advance, the complete dynamic is realized, and the video memory management under the deep learning framework of the dynamic graph mechanism is realized. In addition, under the deep learning framework of the dynamic graph mechanism, a user can train a model which is one time larger than the original model in the time which is constant times of the original time consumption on a machine with unchanged memory capacity without modifying the original trained code, for example: on a 2080Ti (video memory 11GB) card, the size of the ResNet50 training batch can be doubled to about 250, and the model training efficiency is high.
In some embodiments provided by the present invention, the video memory management method may further add the following steps on the basis of the embodiment shown in fig. 1: determining whether the current round of training comprises a target operator execution sequence, if so, determining a tensor corresponding to the target operator execution sequence as a target tensor;
wherein, the target operator execution sequence comprises: and the tensors corresponding to the target operator execution sequence have released records in the historical training process according to a plurality of operators executed in a specific sequence.
In one example, if the form is "convolution-batch regularization-Relu-pooling-convolution-batch regularization-Relu", tensors corresponding to a calculation sequence with a length of 7 are all released at a certain time later, when the 7 operators are found again in the subsequent training process, the memory occupied by the corresponding tensor can be actively released, so that the phenomenon that the memory is passively searched when the future memory exceeds a threshold value can be avoided, and the searching overhead is reduced.
FIG. 4 is a flow chart of a model training method of some embodiments of the present invention, which, as shown in FIG. 4, may include the steps of: step 401 and step 402, wherein,
in step 401, a training sample set is obtained, wherein the training sample set includes training data for model training.
In step 402, model training is performed based on the training sample set and the initial model, and in the process of model training, tensors generated by training are managed based on a preset video memory management method until a target model is obtained by training.
In an embodiment of the present invention, the preset video memory management method is a video memory management method in any one of the above embodiments.
In the embodiment of the present invention, the target model includes, but is not limited to, models for the following purposes: a model for determining a class to which an image to be processed belongs, a model for recognizing a face of a person in the image to be processed, a model for detecting a specific object in the image to be processed, a model for segmenting a specific object in the image to be processed, and a model for generating a new image having a similar feature to that of the image to be processed, and the like.
In the embodiment of the application, a proper training sample set can be selected according to the purpose of the target model.
As can be seen from the foregoing embodiment, in this embodiment, the tensor generated in the model training process can be managed based on the video memory management method, so that a user can train a model that is twice as large as the original model in a time that is a constant time that is the original time consumption on a machine with unchanged video memory capacity without modifying the original trained code, for example: on a 2080Ti (video memory 11GB) card, the size of the ResNet50 training batch can be doubled to about 250, and the model training efficiency is high.
Fig. 5 is a schematic structural diagram of a video memory management apparatus according to some embodiments of the present invention, and as shown in fig. 5, the video memory management apparatus 500 may include: a first acquisition module 501, a first determination module 502, and a release module 503, wherein,
a first obtaining module 501, configured to obtain a video memory threshold corresponding to a current round of training of a model;
a first determining module 502, configured to determine a target tensor meeting a tensor selection rule when a video memory occupancy value of the electronic device is greater than the video memory threshold;
a releasing module 503, configured to release the video memory occupied by the target tensor.
As can be seen from the above embodiment, in this embodiment, a video memory threshold corresponding to the current round of training of the model can be obtained, and when the video memory occupancy value of the electronic device is greater than the video memory threshold, the target tensor meeting the tensor selection rule is determined, and the video memory occupied by the target tensor is released. Therefore, in the embodiment of the invention, the occupation of the GPU video memory can be reduced by releasing the tensor which does not influence the normal training of the model in the video memory, the global information of the whole computational graph does not need to be obtained in advance, the complete dynamic is realized, and the video memory management under the deep learning framework of the dynamic graph mechanism is realized. In addition, under the deep learning framework of the dynamic graph mechanism, a user can train a model which is one time larger than the original model in the time which is constant times of the original time consumption on a machine with unchanged memory capacity without modifying the original training code, and the model training efficiency is higher.
Optionally, as some embodiments, the first determining module 502 may include:
the calculation submodule is used for calculating an estimation function value corresponding to the tensor in the video memory according to the target estimation function;
and the determining submodule is used for determining the tensor with the largest evaluation function value as the target tensor.
Optionally, as some embodiments, the computation submodule may include:
the calculation unit is used for calculating an evaluation function value corresponding to the unlocked tensor in the video memory according to the target evaluation function;
the determining sub-module may include:
and a determining unit configured to determine an unlocked tensor having a maximum evaluation function value as the target tensor.
Optionally, as some embodiments, the video memory management apparatus 500 may further include:
and the second determining module is used for determining the target evaluation function according to at least two or more of the size of the video memory occupied by the tensor, the time length of the video memory occupied by the tensor, the calculation cost of the tensor and the recalculation times of the tensor.
Optionally, as some embodiments, the target estimation function is:
Figure BDA0002932121730000191
wherein t is tensor, m (t) is the size of the video memory occupied by t, l (t) is the time length of the video memory occupied by t, c (t) is the calculation cost of t, r (t) is the recalculation times of t, and alpha, beta, gamma and delta are hyper-parameters of the objective evaluation function.
Optionally, as some embodiments, when the current round of training of the model is the first round of training, the display memory threshold is display memory capacity/2, α β γ 1, δ 1/2.
Optionally, as some embodiments, the video memory management apparatus 500 may further include:
the second acquisition module is used for acquiring the times that the video memory occupation value of the Nth round of training exceeds the video memory threshold;
and the first adjusting module is used for adjusting the video memory threshold corresponding to the (N + 1) th round of training according to the frequency that the video memory occupancy value exceeds the video memory threshold, wherein N is an integer greater than or equal to 1.
Optionally, as some embodiments, the first adjusting module may include:
the first adjusting submodule is used for increasing the video memory threshold corresponding to the (N + 1) th round of training under the condition that the frequency that the video memory occupancy value exceeds the video memory threshold is greater than the first time threshold;
and the second adjusting submodule is used for reducing the video memory threshold corresponding to the (N + 1) th round of training under the condition that the frequency of the video memory occupancy value exceeding the video memory threshold is not greater than the first frequency threshold.
Optionally, as some embodiments, the video memory management apparatus 500 may further include:
the third acquisition module is used for acquiring the times that the application space of the Nth round of training is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length to the total training time length;
and the second adjusting module is used for adjusting the value of the hyper-parameter in the target evaluation function corresponding to the (N + 1) th round of training according to the times that the application space is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length in the total training time length.
Optionally, as some embodiments, the second adjusting module may include at least one of the following sub-modules:
a third adjusting submodule, configured to increase a value of α corresponding to the (N + 1) th round of training when the number of times that the application space is greater than the maximum value of the video memory fragments is greater than a second time threshold;
and the fourth adjusting submodule is used for reducing the value of gamma corresponding to the N +1 th round of training under the condition that the percentage of the recalculated time length to the total training time length is increased compared with that of the previous round of model training.
Optionally, as some embodiments, the video memory management apparatus 500 may further include:
the calculating module is used for calculating the consumed time parameter of the Nth round of training, namely recalculating consumed time/originally calculating consumed time;
and the third adjusting module is used for operating a simulated annealing algorithm based on the time-consuming parameters and adjusting the video memory threshold corresponding to the (N + 1) th training and/or the value of the hyper-parameter in the target evaluation function.
Optionally, as some embodiments, the video memory management apparatus 500 may further include:
the reading module is used for reading the historical calculation cost of the historical tensor which has the same operator and/or the same input shape with the tensor from the cache;
and the third determining module is used for determining the read historical calculation cost as the calculation cost corresponding to the tensor.
Optionally, as some embodiments, the video memory management apparatus 500 may further include:
the fourth acquisition module is used for acquiring the number of operators being executed in the current round of training and the time of the tensor entering the video memory;
and the fourth determining module is used for determining the time length of the tensor occupying the video memory according to the number of the operators being executed and the time of the tensor entering the video memory.
Optionally, as some embodiments, the video memory management apparatus 500 may further include:
a fifth determining module, configured to determine whether the current round of training includes a target operator execution sequence, and if so, determine a tensor corresponding to the target operator execution sequence as the target tensor;
wherein the target operator execution sequence comprises: and the tensors corresponding to the target operator execution sequence have released records in the historical training process.
Optionally, as some embodiments, the video memory management apparatus 500 may further include:
and the storage module is used for storing the newly generated tensor in the current round of training into the released video memory.
Fig. 6 is a schematic structural diagram of a model training apparatus according to some embodiments of the present invention, and as shown in fig. 6, the model training apparatus 600 may include: a second acquisition module 601 and a training module 602, wherein,
a second obtaining module 601, configured to obtain a training sample set, where the training sample set includes training data used for model training;
and the training module 602 is configured to perform model training based on the training sample set and the initial model, and manage tensors generated by the training based on any one of the video memory management devices in a model training process until a target model is obtained by the training.
As can be seen from the foregoing embodiment, in this embodiment, the tensor generated in the model training process can be managed based on the video memory management method, so that a user can train a model that is twice as large as the original model in a time that is a constant time that is the original time consumption on a machine with unchanged video memory capacity without modifying the original trained code, for example: on a 2080Ti (video memory 11GB) card, the size of the ResNet50 training batch can be doubled to about 250, and the model training efficiency is high.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The present invention also provides, according to some embodiments thereof, an electronic device comprising: a memory, a processor and a program stored on the memory and executable on the processor, the program, when executed by the processor, implementing the steps in the video memory management method according to any of the embodiments or implementing the steps in the model training method according to some of the embodiments.
According to some embodiments of the present invention, the present invention further provides a computer readable storage medium, on which a program is stored, the program, when executed by a processor, implementing the steps in the video memory management method according to any of the embodiments or implementing the steps in the model training method according to some of the embodiments.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The video memory management and model training method, device, electronic device and storage medium provided by the invention are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (20)

1. A video memory management method is applied to electronic equipment and is characterized by comprising the following steps:
acquiring a video memory threshold corresponding to the current round of training of the model;
determining a target tensor meeting a tensor selection rule under the condition that the display memory occupation value of the electronic equipment is larger than the display memory threshold value;
and releasing the video memory occupied by the target tensor.
2. The method of claim 1, wherein determining the target tensor that satisfies the tensor selection rule comprises:
calculating an evaluation function value corresponding to the tensor in the video memory according to the target evaluation function;
and determining the tensor with the largest evaluation function value as the target tensor.
3. The method of claim 2, wherein the calculating the evaluation function value corresponding to the tensor in the video memory according to the target evaluation function comprises:
calculating an estimation function value corresponding to the unlocked tensor in the video memory according to the target estimation function;
the determining the tensor with the largest evaluation function value as the target tensor comprises the following steps:
and determining the tensor which is not locked and has the largest evaluation function value as the target tensor.
4. A method according to claim 2 or 3, characterized in that the method further comprises:
and determining the target evaluation function according to at least two or more of the size of the video memory occupied by the tensor, the time length of the video memory occupied by the tensor, the calculation cost of the tensor and the recalculation times of the tensor.
5. The method of claim 4, wherein the objective estimation function is:
Figure FDA0002932121720000011
wherein t is tensor, m (t) is the size of the video memory occupied by t, l (t) is the time length of the video memory occupied by t, c (t) is the calculation cost of t, r (t) is the recalculation times of t, and alpha, beta, gamma and delta are hyper-parameters of the objective evaluation function.
6. The method according to claim 5, wherein when the current round of training of the model is the first round of training, the video memory threshold is 2, α β γ 1, δ 1/2.
7. The method according to any one of claims 1-6, further comprising:
acquiring the times that the video memory occupation value of the Nth round of training exceeds the video memory threshold value;
and adjusting the video memory threshold corresponding to the (N + 1) th training turn according to the frequency of the video memory occupancy value exceeding the video memory threshold, wherein N is an integer greater than or equal to 1.
8. The method according to claim 7, wherein the adjusting the video memory threshold corresponding to the (N + 1) th round of training according to the number of times that the video memory occupancy value exceeds the video memory threshold comprises:
under the condition that the frequency that the video memory occupancy value exceeds the video memory threshold value is greater than the first time threshold value, increasing the video memory threshold value corresponding to the (N + 1) th round of training;
and reducing the video memory threshold corresponding to the (N + 1) th round of training under the condition that the frequency of the video memory occupancy value exceeding the video memory threshold is not greater than the first time threshold.
9. The method according to any one of claims 5-8, further comprising:
acquiring the times that the application space of the Nth round of training is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length to the total training time length;
and adjusting the value of the hyper-parameter in the target evaluation function corresponding to the (N + 1) th round of training according to the times that the application space is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated time length in the total training time length.
10. The method according to claim 9, wherein the adjusting the value of the hyper-parameter in the objective evaluation function corresponding to the (N + 1) th round of training according to the number of times that the application space is larger than the maximum value of the video memory fragments and/or the percentage of the recalculated duration to the total training duration comprises at least one of the following steps:
under the condition that the times that the application space is larger than the maximum value of the video memory fragments are larger than a second time threshold value, increasing the value of alpha corresponding to the (N + 1) th round of training;
and under the condition that the percentage of the recalculated time length to the total training time length is increased compared with the last round of model training, reducing the value of gamma corresponding to the (N + 1) th round of training.
11. The method according to any one of claims 5-10, further comprising:
calculating the time consumption parameter of the Nth training, namely recalculating the time consumption/originally calculating the time consumption;
and running a simulated annealing algorithm based on the time-consuming parameters, and adjusting a video memory threshold corresponding to the (N + 1) th round of training and/or the value of the hyper-parameter in the target evaluation function.
12. The method according to any of claims 4-11, wherein the step of obtaining the computed cost of the tensor comprises:
reading historical calculation costs of historical tensors with the same operators and/or the same input shapes as the tensors from a cache;
and determining the read historical calculation cost as the calculation cost corresponding to the tensor.
13. The method according to any one of claims 4-12, wherein the obtaining of the duration of the video memory occupied by the tensor comprises:
acquiring the number of operators being executed in the current round of training and the time of the tensor entering the video memory;
and determining the time length of the tensor occupying the video memory according to the number of the operators being executed and the time of the tensor entering the video memory.
14. The method according to any one of claims 1-13, further comprising:
determining whether the current round of training comprises a target operator execution sequence, if so, determining a tensor corresponding to the target operator execution sequence as the target tensor;
wherein the target operator execution sequence comprises: and the tensors corresponding to the target operator execution sequence have released records in the historical training process.
15. The method according to any of claims 1-14, further comprising, after the step of releasing the video memory occupied by the target tensor:
and storing the newly generated tensor in the current round of training into the released video memory.
16. A method of model training, the method comprising:
acquiring a training sample set, wherein the training sample set comprises training data used for model training;
and performing model training based on the training sample set and the initial model, and during the model training process, managing tensors generated by training based on the video memory management method of any one of claims 1 to 15 until a target model is obtained by training.
17. A video memory management apparatus, comprising:
the first acquisition module is used for acquiring a video memory threshold corresponding to the current round of training of the model;
the first determining module is used for determining a target tensor meeting a tensor selection rule under the condition that the video memory occupation value of the electronic equipment is larger than the video memory threshold value;
and the releasing module is used for releasing the video memory occupied by the target tensor.
18. A model training apparatus, the apparatus comprising:
the second acquisition module is used for acquiring a training sample set, wherein the training sample set comprises training data used for model training;
a training module, configured to perform model training based on the training sample set and the initial model, and in a model training process, manage tensors generated by training based on the video memory management apparatus of claim 17 until a target model is obtained by training.
19. An electronic device, comprising: memory, processor and program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the video memory management method according to any one of claims 1 to 15 or the steps of the model training method according to claim 16.
20. A computer readable storage medium having stored thereon a program which, when being executed by a processor, carries out the steps of the video memory management method according to any one of claims 1 to 15 or the steps of the model training method according to claim 16.
CN202110150321.XA 2021-02-03 2021-02-03 Video memory management method, video memory management device, model training device, electronic equipment and storage medium Pending CN112882830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110150321.XA CN112882830A (en) 2021-02-03 2021-02-03 Video memory management method, video memory management device, model training device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110150321.XA CN112882830A (en) 2021-02-03 2021-02-03 Video memory management method, video memory management device, model training device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112882830A true CN112882830A (en) 2021-06-01

Family

ID=76057032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110150321.XA Pending CN112882830A (en) 2021-02-03 2021-02-03 Video memory management method, video memory management device, model training device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112882830A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003306A (en) * 2021-10-27 2022-02-01 上海商汤科技开发有限公司 Video memory optimization method, device, equipment and storage medium
CN114692829A (en) * 2022-03-24 2022-07-01 西安交通大学 DNN model-based checkpoint selection method, equipment and storage medium
CN116432778A (en) * 2023-06-12 2023-07-14 摩尔线程智能科技(北京)有限责任公司 Data processing method and device, storage medium and electronic equipment
CN117032954A (en) * 2023-07-17 2023-11-10 北京泛睿科技合伙企业(有限合伙) Memory optimization method, system, equipment and medium for terminal training model
CN117130693A (en) * 2023-10-26 2023-11-28 之江实验室 Tensor unloading method, tensor unloading device, computer equipment and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003306A (en) * 2021-10-27 2022-02-01 上海商汤科技开发有限公司 Video memory optimization method, device, equipment and storage medium
CN114003306B (en) * 2021-10-27 2024-03-15 上海商汤科技开发有限公司 Video memory optimization method, device, equipment and storage medium
CN114692829A (en) * 2022-03-24 2022-07-01 西安交通大学 DNN model-based checkpoint selection method, equipment and storage medium
CN114692829B (en) * 2022-03-24 2024-04-02 西安交通大学 DNN model-based checkpoint selection method, device and storage medium
CN116432778A (en) * 2023-06-12 2023-07-14 摩尔线程智能科技(北京)有限责任公司 Data processing method and device, storage medium and electronic equipment
CN116432778B (en) * 2023-06-12 2023-09-19 摩尔线程智能科技(北京)有限责任公司 Data processing method and device, storage medium and electronic equipment
CN117032954A (en) * 2023-07-17 2023-11-10 北京泛睿科技合伙企业(有限合伙) Memory optimization method, system, equipment and medium for terminal training model
CN117032954B (en) * 2023-07-17 2024-04-26 北京泛睿科技合伙企业(有限合伙) Memory optimization method, system, equipment and medium for terminal training model
CN117130693A (en) * 2023-10-26 2023-11-28 之江实验室 Tensor unloading method, tensor unloading device, computer equipment and storage medium
CN117130693B (en) * 2023-10-26 2024-02-13 之江实验室 Tensor unloading method, tensor unloading device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112882830A (en) Video memory management method, video memory management device, model training device, electronic equipment and storage medium
CN112199190B (en) Memory allocation method and device, storage medium and electronic equipment
JP2017228086A (en) Machine learning management program, machine learning management method, and machine learning management device
CN108153587B (en) Slow task reason detection method for big data platform
US10860892B1 (en) Systems and methods of synthetic data generation for data stream
JPWO2015015574A1 (en) Processing program, processing system, and processing method
CN114692829B (en) DNN model-based checkpoint selection method, device and storage medium
CN114936085A (en) ETL scheduling method and device based on deep learning algorithm
CN116401232B (en) Database parameter configuration optimization method and device, electronic equipment and storage medium
EP4170549A1 (en) Machine learning program, method for machine learning, and information processing apparatus
US20150081263A1 (en) Production simulation apparatus and production simulation method
CN108334935B (en) Deep learning neural network method and device for simplifying input and robot system
KR20210111677A (en) Method for clipping neural networks, method for calculating convolution of neural networks and apparatus for performing the methods
JP2019185121A (en) Learning device, learning method and program
KR101145278B1 (en) Method, apparatus and computer-readable recording medium for choosing representative images among similar images
CN116185568A (en) Container expansion method and device, electronic equipment and storage medium
JP2017224038A (en) Cache miss estimation program, cache miss estimation method and information processing device
KR102441442B1 (en) Method and apparatus for learning graph convolutional network
US20220101187A1 (en) Identifying and quantifying confounding bias based on expert knowledge
CN111898080B (en) Data sequence denoising method and device, electronic equipment and computer storage medium
CN112906728B (en) Feature comparison method, device and equipment
CN113626650A (en) Service processing method and device and electronic equipment
KR102523803B1 (en) Data processing apparatus for classification of machine learning data and the operating method thereof
CN112861951B (en) Image neural network parameter determining method and electronic equipment
US20230418468A1 (en) Optimizing storage-related costs with compression in a multi-tiered storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination