CN112329834A - Video memory space distribution method and device during cyclic network model training - Google Patents

Video memory space distribution method and device during cyclic network model training Download PDF

Info

Publication number
CN112329834A
CN112329834A CN202011186142.3A CN202011186142A CN112329834A CN 112329834 A CN112329834 A CN 112329834A CN 202011186142 A CN202011186142 A CN 202011186142A CN 112329834 A CN112329834 A CN 112329834A
Authority
CN
China
Prior art keywords
video memory
memory space
network model
optimization type
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011186142.3A
Other languages
Chinese (zh)
Other versions
CN112329834B (en
Inventor
徐扬凯
王桂彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011186142.3A priority Critical patent/CN112329834B/en
Publication of CN112329834A publication Critical patent/CN112329834A/en
Application granted granted Critical
Publication of CN112329834B publication Critical patent/CN112329834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a method and a device for distributing a video memory space during the training of a circulating network model, relates to the data processing technology, and particularly relates to the technical field of deep learning and video memory space distribution. The specific implementation scheme is as follows: acquiring a circulating network model, wherein the circulating network model comprises a plurality of model layers; respectively obtaining the optimized type of each model layer; acquiring sample data and determining the sequence length of the sample data; and distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length. The method of the embodiment of the application can ensure that the video memory allocated during the training of the circulating network model can be used in each calculation, avoid the redundant waste of the video memory, effectively compress the video memory used in the network calculation, and further improve the training speed.

Description

Video memory space distribution method and device during cyclic network model training
Technical Field
The application relates to the technical field of data processing, in particular to the technical field of deep learning and video memory space distribution, and particularly relates to a video memory space distribution method and device during cyclic network model training.
Background
With the continuous maturity and popularization of artificial intelligence technology, the application of artificial intelligence gradually infiltrates into various fields and scenes of the current society. The speech assistant in the speech recognition field, the face recognition in the computer vision application, the machine translation of natural language processing and other aspects bring convenience to people, and the deep learning technology is relied on. The deep learning process continuously trains large-scale data on a multilayer neural network, and data features with higher abstract degree are continuously obtained from the multilayer training, so that the complex problem is conveniently solved. A Graphics Processing Unit (GPU) is a computing Unit on which the current deep learning platform mainly depends. Training a deep learning model on a GPU is a support of deep learning technology.
At present, in the actual model training process, the deep learning calculation framework needs to reduce the video memory overhead as much as possible so as to utilize the larger Batch Size, thereby improving the training efficiency. In a large-scale loop network, input and output calculated by all operators in a loop can be released only after loop iteration is completed, so that the problem of high instantaneous peak video memory occupation can be caused.
Disclosure of Invention
The invention provides a method and a device for allocating video memory space during cyclic network model training, which are used for solving the technical problem that in the related technology, when the cyclic network model is trained, the input and the output of each operator in a cycle are allocated in advance, and calculation data are filled into storage positions of corresponding time sequences in the cyclic process, however, when calculation is carried out at each moment, only data of fixed context is needed for model layers (such as an activation layer and a full connection layer) in the cycle, and other storage positions are invalid data or unused positions, so that the redundant waste of the video memory is generated.
According to a first aspect, a method for allocating video memory space during cyclic network model training is provided, which includes:
acquiring a circulating network model, wherein the circulating network model comprises a plurality of model layers;
respectively obtaining the optimized type of each model layer;
acquiring sample data and determining the sequence length of the sample data; and
distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length
The method for allocating the video memory space during the training of the circulating network model comprises the steps of firstly obtaining the circulating network model, wherein the circulating network model comprises a plurality of model layers, respectively obtaining the optimization type of each model layer, then obtaining sample data, determining the sequence length of the sample data, and finally allocating the video memory space corresponding to the sample data according to the optimization type and the sequence length. Therefore, the distributed video memory can be used in each calculation during the training of the circulating network model, the redundant waste of the video memory is avoided, the video memory used in the calculation of the effective compression network is effectively used, and the training speed is improved.
According to a second aspect, there is provided an apparatus for allocating video memory space during cyclic network model training, including:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a circulating network model, and the circulating network model comprises a plurality of model layers;
the second obtaining module is used for respectively obtaining the optimized type of each model layer;
the determining module is used for acquiring sample data and determining the sequence length of the sample data; and
and the distribution module is used for distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length.
The distribution device for the video memory space during the training of the circulating network model obtains the circulating network model through the first obtaining module, wherein the circulating network model comprises a plurality of model layers, the optimized type of each model layer is obtained through the second obtaining module, the sample data is obtained through the determining module, the sequence length of the sample data is determined, and the distribution module distributes the video memory space corresponding to the sample data according to the optimized type and the sequence length. Therefore, the distributed video memory can be used in each calculation during the training of the circulating network model, the redundant waste of the video memory is avoided, the video memory used in the calculation of the effective compression network is effectively used, and the training speed is improved.
According to a third aspect, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the method for allocating video memory space during training of the loop network model according to the embodiment of the above aspect.
According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing thereon a computer program for causing a computer to execute the method for allocating video memory during cyclic network model training according to the embodiment of the above aspect.
According to the technology of the application, the technical problem that in the related technology, when a loop network model is trained, the input and output of each operator in a loop are allocated in advance, calculation data are filled into storage positions of corresponding time sequences in the loop process, however, when calculation is carried out at each moment, only context data need to be fixed for a model layer in the loop, other storage positions are invalid data or unused positions, redundancy and waste of video memory are caused, meanwhile, the video memory used in network calculation can be effectively compressed, and therefore training speed is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a schematic flowchart of a method for allocating video memory space during training of a cyclic network model according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of another method for allocating video memory space during training of a cyclic network model according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a training of a cyclic network model provided by an embodiment of the present application;
fig. 4 is a schematic block diagram of an apparatus for allocating video memory space during training of a cyclic network model according to an embodiment of the present disclosure;
fig. 5 is a schematic block diagram of an apparatus for allocating video memory space during training of another cyclic network model according to an embodiment of the present disclosure; and
fig. 6 is a block diagram of an electronic device for allocating video memory space during training of a cyclic network model according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The following describes a method, an apparatus, an electronic device, and a storage medium for allocating a video memory space during training of a cyclic network model according to an embodiment of the present application with reference to the drawings.
The embodiment of the application provides a method for allocating a video memory space during cyclic network model training, aiming at the technical problems that in the related art, when the cyclic network model training is performed, the input and the output of each operator in a cycle are allocated in advance, and the calculation data are filled into the storage positions of corresponding time sequences in the cyclic process, however, when calculation is performed at each moment, only the data of fixed context is needed for the model layers (such as an active layer and a full connection layer) in the cycle, and other storage positions are invalid data or unused positions, so that the redundancy waste of the video memory is caused.
According to the method for distributing the video memory space during the training of the circulating network model, the optimization type of each model layer in the circulating network model and the sequence length of the sample data are obtained, and the video memory space corresponding to the sample data is distributed according to the optimization type and the sequence length, so that the problems in the related technology are solved, the video memory used in network calculation can be effectively compressed, and the training speed is improved.
The method for allocating the video memory space during the training of the cyclic network model provided in the embodiment of the present application may be executed by an electronic device, where the electronic device may be a Personal Computer (PC), a tablet Computer, a mobile phone, a palmtop Computer, or the like, and is not limited herein.
In the embodiment of the application, the electronic device can be provided with a processing component, a storage component and a driving component. Optionally, the driving component and the processing component may be integrated, the storage component may store an operating system, an application program, or other program modules, and the processing component implements the method for allocating a video memory space during the training of the cyclic network model provided in the embodiment of the present application by executing the application program stored in the storage component.
Fig. 1 is a schematic flowchart of a method for allocating video memory space during training of a cyclic network model according to an embodiment of the present disclosure.
The method for allocating the video memory space during the training of the cyclic network model in the embodiment of the present application can also be performed by the apparatus for allocating the video memory space during the training of the cyclic network model provided in the embodiment of the present application, and the apparatus can be configured in an electronic device to achieve the purpose of obtaining the optimized type of each model layer in the cyclic network model and the sequence length of sample data, and allocate the video memory space corresponding to the sample data according to the optimized type and the sequence length.
As a possible situation, the method for allocating the memory space during the training of the cyclic network model in the embodiment of the present application may also be executed at a server side, where the server may be a cloud server, and the method for allocating the memory space during the training of the cyclic network model may be executed at a cloud side.
It should be noted that the method for allocating memory space during training of a cyclic network model according to the embodiments of the present application may be applied to training of a cyclic network model, where the cyclic network model may be applied to multiple fields, for example, the cyclic network model may be applied to the field of deep learning of devices, such as deep learning of a speech recognition device, deep learning of a text recognition device, and deep learning of automatic driving.
As shown in fig. 1, the method for allocating video memory space during the training of the cyclic network model may include the following steps:
step 101, obtaining a cyclic network model, wherein the cyclic network model comprises a plurality of model layers. It should be noted that the multiple model layers described in this embodiment may include an input layer, a convolutional layer, an activation function layer, a fully-connected layer, and the like.
In an embodiment of the application, the electronic device may obtain a recurrent network model.
For better describing the present application, a cyclic network model applied in a speech recognition system is taken as an example, wherein when training the cyclic network model in the speech recognition system, an electronic device may first obtain the cyclic network model in the speech recognition system.
And 102, respectively acquiring the optimization type of each model layer.
In this embodiment of the present application, each model layer includes an operator, and the optimization type of each model layer can be obtained by determining a time-series dependency relationship of an input of the operator in each model layer, where the number of the operators may be multiple.
Specifically, after the electronic device acquires the cyclic network model, the cyclic network model may be analyzed to acquire operators in multiple model layers in the cyclic network model, and determine a time-series dependency relationship of the inputs of the operators in each model layer in the multiple model layers, and acquire (determine) an optimization type corresponding to each model layer through the dependency relationship.
It should be noted that the optimization types corresponding to each model layer described in this embodiment may be the same or different, where different optimization types may correspond to different video memory allocation policies.
Step 103, obtaining sample data and determining the sequence length of the sample data.
It should be noted that, in the embodiment, the sample data may be TCM (Trellis Coded Modulation) data, and for better describing the present application, it is assumed that the cyclic network model in the speech recognition system is trained, and the sample data may be obtained by converting preset speech information through a preset algorithm.
In the embodiment of the present application, there are multiple ways to acquire sample data, where the electronic device may intercept input data of the cyclic network model to acquire the sample data through a software program in the application process of the cyclic network model, or may create the sample data manually, and this is not limited here.
And 104, distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length. It should be noted that the video memory space described in this embodiment may be provided by a video card (e.g., a video card in a PC computer, a video card in a mobile device).
Specifically, after obtaining the optimized type of each model layer, the electronic device may first obtain sample data, analyze the sample data to determine the sequence length of the sample data, then generate a corresponding allocation policy of the video memory space according to the optimized type and the sequence length, and finally allocate the video memory space corresponding to the sample data according to the allocation policy of the video memory space.
In the embodiment of the application, a cyclic network model is firstly obtained, wherein the cyclic network model comprises a plurality of model layers, the optimization type of each model layer is respectively obtained, then sample data is obtained, the sequence length of the sample data is determined, and finally, the video memory space corresponding to the sample data is distributed according to the optimization type and the sequence length. Therefore, the distributed video memory can be used in each calculation during the training of the circulating network model, the redundant waste of the video memory is avoided, the video memory used in the calculation of the effective compression network is effectively used, and the training speed is improved.
To clearly illustrate the above embodiment, in an embodiment of the present application, as shown in fig. 2, the obtaining the optimized type of each model layer separately may include the following steps:
step 201, an operator of each model layer is obtained.
In the embodiment of the present application, there may be a plurality of operators in each model layer, and the inputs of the plurality of operators may be the same.
Step 202, determining a time-series dependency relationship between the operator and the input data, where the dependency relationship may include that the input of the operator depends on the input data at the current time, the input of the operator depends on the input data at the current time and L times before the current time, and the input of the operator depends on the input data at all times, where L is a positive integer.
It should be noted that, as shown in fig. 3, assuming that io0, io1, io2 and io2 are each model layer in the loop network model, op0, op1, op2 and op3 are operators in the corresponding model layers, op _ s may be data input into the loop network model, and n may be the sequence length of the sample data. When a certain model layer only exists in the current timing cycle, other operators only have fixed timing dependence on the model layer, namely when t is calculated to be x, op2 has dependence on io2[ t-1], io2[ t ], and op1 has dependence on io1[ t ]. When the model layer is a loop outside operator dependency, the full timing data must be kept complete, i.e. the reading relationship of op3 to io3, where t may be time and x may be a certain time.
That is, the input of op2 depends on the input data at the current time and the previous 1 time, the input of op1 depends on the input data at the current time, and the input of op3 depends on the input data at all times. And when calculating at each moment, only the data of the fixed context is needed for the model layer in the loop, and other storage positions are invalid data or unused positions.
And step 203, respectively obtaining the optimized type of each model layer according to the dependency relationship.
Specifically, after the electronic device obtains the cyclic network model, the cyclic network model may be analyzed to obtain operators in multiple model layers in the cyclic network model, determine a time-series dependency relationship between input and input data of the operator in each model layer in the multiple model layers, and then obtain an optimized type of each model layer according to the dependency relationship. Because the input of the operator in each model layer is edited by software in advance, the optimization type of each model layer can be accurately obtained according to the dependency relationship of the input of the operator on the time sequence.
To clearly illustrate the above embodiment, in an embodiment of the present application, the obtaining the optimized type of each model layer according to the dependency relationship may include: when the dependency relationship is that the input of an operator depends on the input data at the current moment, the model layer corresponding to the dependency relationship is a first optimization type; when the dependency relationship is that the input of the operator depends on the input data at the current moment and the previous L moments, the model layer corresponding to the dependency relationship is a second optimization type; and when the dependency relationship is that the input of the operator depends on the input data at all the moments, the model layer corresponding to the dependency relationship is a third optimization type.
Specifically, after obtaining the dependency relationship, the electronic device determines the optimization type of each model layer according to the time-series dependency relationship between the input of the operator in each model layer and the input data, wherein when the dependency relationship is that the input of the operator depends on the input data at the current time, the model layer corresponding to the dependency relationship is a first optimization type; when the dependency relationship is that the input of the operator depends on the input data at the current moment and the previous L moments, the model layer corresponding to the dependency relationship is a second optimization type; and when the dependency relationship is that the input of the operator depends on the input data at all the moments, the model layer corresponding to the dependency relationship is a third optimization type. And then the electronic equipment acquires the determined optimization type of each model layer, so that the optimization type of each model layer can be accurately acquired under the condition of not changing the training logic of the original circulating network model.
To clearly illustrate the above embodiment, in an embodiment of the present application, the allocating the video memory space according to the optimization type and the sequence length may include: when the optimization type is the first optimization type, distributing a video memory space with the sequence length of 1 for the model layer corresponding to the optimization type; when the optimization type is a second optimization type, distributing a video memory space with the sequence length of L +1 to the model layer corresponding to the optimization type; and when the optimization type is the third optimization type, distributing a video memory space with the sequence length for the model layer corresponding to the optimization type.
Specifically, after acquiring the time-series dependency relationship between the input of the operator in each model layer and the input data and the sequence length of the sample data, the electronic device may generate a video memory space allocation policy according to the optimization type and the sequence length, where the video memory space allocation policy may include: when the optimization type is the first optimization type, distributing a video memory space with the sequence length of 1 to the model layer corresponding to the optimization type; when the optimization type is a second optimization type, distributing a video memory space with the sequence length of L +1 to the model layer corresponding to the optimization type; and when the optimization type is a third optimization type, distributing a video memory space with the sequence length for the model layer corresponding to the optimization type. And finally, the electronic equipment allocates the video memory space corresponding to the sample data according to the video memory space allocation strategy, so that the access of the model layer can be optimized according to different optimization types, only necessary data is reserved, the video memory overhead is reduced, the video memory used in network calculation is effectively compressed, and the training speed is improved.
Fig. 4 is a schematic block diagram of an apparatus for allocating video memory space during training of a cyclic network model according to an embodiment of the present disclosure.
The allocation device for the video memory space during the training of the circulating network model can be configured in the electronic equipment to obtain the optimization type of each model layer in the circulating network model and the sequence length of the sample data, and allocate the video memory space corresponding to the sample data according to the optimization type and the sequence length.
The apparatus for allocating memory space during training of a cyclic network model according to the embodiments of the present application may be applied to training of a cyclic network model, where the cyclic network model may be applied to multiple fields, for example, the cyclic network model may be applied to the field of deep learning of devices, such as deep learning of a speech recognition device, deep learning of a text recognition device, and deep learning of automatic driving.
As shown in fig. 4, the apparatus 400 for allocating video memory space during the training of the cyclic network model may include: a first acquisition module 410, a second acquisition module 420, a determination module 430, and an assignment module 440.
The first obtaining module 410 is configured to obtain a cyclic network model, where the cyclic network model includes a plurality of model layers. It should be noted that the multiple model layers described in this embodiment may include an input layer, a convolutional layer, an activation function layer, a fully-connected layer, and the like.
For better describing the present application, a cyclic network model applied in a speech recognition system is taken as an example, wherein the first obtaining module 410 can obtain the cyclic network model in the speech recognition system when the cyclic network model in the speech recognition system is trained.
The second obtaining module 420 is configured to obtain an optimized type of each model layer.
In this embodiment of the present application, each of the model layers includes an operator, and the optimization type of each model layer can be obtained by determining a time-series dependency relationship of an input of the operator in each model layer, where the number of the operators may be multiple.
Specifically, after the first obtaining module 410 obtains the cyclic network model, the second obtaining module 420 may analyze the cyclic network model to obtain operators in multiple model layers in the cyclic network model, respectively determine a time-sequence dependency relationship of the inputs of the operators of each model layer in the multiple model layers, and obtain (determine) an optimization type corresponding to each model layer according to the dependency relationship.
It should be noted that the optimization types corresponding to each model layer described in this embodiment may be the same or different, where different optimization types may correspond to different video memory allocation policies.
The determining module 430 is configured to obtain sample data and determine a sequence length of the sample data.
It should be noted that, in the embodiment, the sample data may be TCM (Trellis Coded Modulation) data, and for better describing the present application, it is assumed that the cyclic network model in the speech recognition system is trained, and the sample data may be obtained by converting preset speech information through a preset algorithm.
In the embodiment of the present application, there are multiple ways to acquire sample data, where the determining module 430 may intercept input data of the cyclic network model through a software program in the application process of the cyclic network model to acquire the sample data, or the determining module 430 may further acquire sample data created artificially, which is not limited herein.
The allocating module 440 is configured to allocate a video memory space corresponding to the sample data according to the optimized type and the sequence length. It should be noted that the video memory space described in this embodiment may be provided by a video card (e.g., a video card in a PC computer, a video card in a mobile device).
Specifically, after the second obtaining module 420 obtains the optimized type of each model layer, the determining module 430 may first obtain sample data, and analyze the sample data to determine the sequence length of the sample data, and then the allocating module 440 may generate an allocation policy of a corresponding video memory space according to the optimized type and the sequence length, and finally allocate the video memory space corresponding to the sample data according to the allocation policy of the video memory space.
In the embodiment of the application, a cyclic network model is acquired through a first acquisition module, wherein the cyclic network model comprises a plurality of model layers, the optimization type of each model layer is acquired through a second acquisition module, sample data is acquired through a determination module, the sequence length of the sample data is determined, and a video memory space corresponding to the sample data is allocated through an allocation module according to the optimization type and the sequence length. Therefore, the distributed video memory can be used in each calculation during the training of the circulating network model, the redundant waste of the video memory is avoided, the video memory used in the calculation of the effective compression network is effectively used, and the training speed is improved.
In another embodiment of the present application, as shown in fig. 5, the apparatus 500 for allocating video memory space during the training of the cyclic network model may include: a first acquisition module 510, a second acquisition module 520, a determination module 530, and an assignment module 540.
Specifically, the second acquisition module 520 may include a first acquisition unit 521, a determination unit 522, and a second acquisition unit 523.
The first obtaining unit 521 is configured to obtain an operator of each model layer.
The determining unit 522 is configured to determine a time-series dependency relationship between an operator and input data, where the dependency relationship includes that the input of the operator depends on the input data at the current time, the input of the operator depends on the input data at the current time and L times before the current time, and the input of the operator depends on the input data at all times, where L is a positive integer.
The second obtaining unit 523 is configured to obtain the optimization type of each model layer according to the dependency relationship, where when the dependency relationship is that the input of the operator depends on the input data at the current time, the model layer corresponding to the dependency relationship is the first optimization type; when the dependency relationship is that the input of the operator depends on the input data at the current moment and the previous L moments, the model layer corresponding to the dependency relationship is a second optimization type; and when the dependency relationship is that the input of the operator depends on the input data at all the moments, the model layer corresponding to the dependency relationship is a third optimization type.
It should be noted that the first obtaining module 510 and the first obtaining module 410, the second obtaining module 520 and the second obtaining module 420, the determining module 530 and the determining module 430, and the allocating module 540 and the allocating module 440 described in the above embodiments may have the same functions and structures.
In an embodiment of the present application, as shown in fig. 5, the allocating module 540 is specifically configured to: when the optimization type is the first optimization type, distributing a video memory space with the sequence length of 1 for the model layer corresponding to the optimization type; when the optimization type is a second optimization type, distributing a video memory space with the sequence length of L +1 to the model layer corresponding to the optimization type; and when the optimization type is the third optimization type, distributing a video memory space with the sequence length for the model layer corresponding to the optimization type.
It should be noted that the explanation of the foregoing embodiment of the method for allocating a memory space during cyclic network model training is also applicable to the apparatus for allocating a memory space during cyclic network model training of this embodiment, and details are not repeated here.
The distribution device for the video memory space during the training of the circulating network model obtains the circulating network model through the first obtaining module, wherein the circulating network model comprises a plurality of model layers, the optimized type of each model layer is obtained through the second obtaining module, the sample data is obtained through the determining module, the sequence length of the sample data is determined, and the distribution module distributes the video memory space corresponding to the sample data according to the optimized type and the sequence length. Therefore, the distributed video memory can be used in each calculation during the training of the circulating network model, the redundant waste of the video memory is avoided, the video memory used in the calculation of the effective compression network is effectively used, and the training speed is improved.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 6 is a block diagram of an electronic device according to an allocation method of video memory space during training of a cyclic network model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor, so that the at least one processor executes the method for allocating the video memory space during the training of the loop network model provided by the application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for allocating video memory when a loop network model is trained as provided herein.
The memory 602 is a non-transitory computer readable storage medium, and can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the allocation method of the memory space during the cyclic network model training in the embodiment of the present application (for example, the allocation apparatus 1000 of the memory space during the cyclic network model training shown in fig. 4 may include the first obtaining module 100, the second obtaining module 200, the determining module 300, and the allocating module 400). The processor 601 executes various functional applications and data processing of the server by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implementing the method for allocating video memory space during the training of the loop network model in the above method embodiments.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the electronic device according to the allocation method of the video memory space at the time of the training of the cyclic network model, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 may optionally include a memory remotely located from the processor 601, and these remote memories may be connected to the electronic device of the allocation method of the memory space during the training of the cyclic network model through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method for allocating video memory space during the training of the cyclic network model may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the allocation method of video memory space at the time of the cyclic network model training, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, and the like. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the video memory distributed during the training of the circulating network model can be guaranteed to be used in each calculation, the redundant waste of the video memory is avoided, the video memory used in the network calculation is effectively compressed, and therefore the training speed is improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for distributing video memory space during the training of a circulating network model comprises the following steps:
acquiring a circulating network model, wherein the circulating network model comprises a plurality of model layers;
respectively obtaining the optimized type of each model layer;
acquiring sample data and determining the sequence length of the sample data; and
and distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length.
2. The method for allocating video memory space during the training of the cyclic network model according to claim 1, wherein the obtaining the optimized type of each model layer respectively comprises:
acquiring an operator of each model layer;
determining the dependency relationship of the operator and the input data on the time sequence, wherein the dependency relationship comprises that the input of the operator depends on the input data at the current moment, the input of the operator depends on the input data at the current moment and the previous L moments, and the input of the operator depends on the input data at all the moments, and L is a positive integer; and
and respectively obtaining the optimized type of each model layer according to the dependency relationship.
3. The method for allocating video memory space during the training of the cyclic network model according to claim 2, wherein the obtaining the optimized type of each model layer according to the dependency relationship comprises:
when the dependency relationship is that the input of the operator depends on the input data at the current moment, the model layer corresponding to the dependency relationship is a first optimization type;
when the dependency relationship is that the input of the operator depends on the input data at the current moment and the previous L moments, the model layer corresponding to the dependency relationship is a second optimization type; and
and when the dependency relationship is that the input of the operator depends on the input data at all the moments, the model layer corresponding to the dependency relationship is a third optimization type.
4. The method for allocating video memory space during training of the cyclic network model according to claim 3, wherein the allocating video memory space corresponding to the sample data according to the optimization type and the sequence length includes:
when the optimization type is the first optimization type, allocating a video memory space with the sequence length of 1 to the model layer corresponding to the optimization type;
when the optimization type is the second optimization type, allocating the video memory space with the sequence length of L +1 to the model layer corresponding to the optimization type; and
and when the optimization type is the third optimization type, allocating the video memory space of the sequence length to the model layer corresponding to the optimization type.
5. A video memory space distribution device during the training of a circulation network model comprises:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a circulating network model, and the circulating network model comprises a plurality of model layers;
the second obtaining module is used for respectively obtaining the optimized type of each model layer;
the determining module is used for acquiring sample data and determining the sequence length of the sample data; and
and the distribution module is used for distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length.
6. The apparatus for allocating video memory space during cyclic network model training according to claim 1, wherein the second obtaining module includes:
the first acquisition unit is used for acquiring an operator of each model layer;
the determining unit is used for determining the dependency relationship of the operator and the input data on the time sequence, wherein the dependency relationship comprises that the input of the operator depends on the input data at the current moment, the input of the operator depends on the input data at the current moment and the previous L moments, and the input of the operator depends on the input data at all the moments, and L is a positive integer; and
and the second obtaining unit is used for respectively obtaining the optimization type of each model layer according to the dependency relationship.
7. The apparatus for allocating video memory space during training of cyclic network model according to claim 6,
when the dependency relationship is that the input of the operator depends on the input data at the current moment, the model layer corresponding to the dependency relationship is a first optimization type;
when the dependency relationship is that the input of the operator depends on the input data at the current moment and the previous L moments, the model layer corresponding to the dependency relationship is a second optimization type; and
and when the dependency relationship is that the input of the operator depends on the input data at all the moments, the model layer corresponding to the dependency relationship is a third optimization type.
8. The apparatus for allocating video memory space during cyclic network model training according to claim 7, wherein the allocating module is specifically configured to:
when the optimization type is the first optimization type, allocating a video memory space with the sequence length of 1 to the model layer corresponding to the optimization type;
when the optimization type is the second optimization type, allocating the video memory space with the sequence length of L +1 to the model layer corresponding to the optimization type; and
and when the optimization type is the third optimization type, allocating the video memory space of the sequence length to the model layer corresponding to the optimization type.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for allocating video memory during cyclic network model training of any of claims 1-4.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method for allocating video memory when training a cyclic network model according to any one of claims 1 to 4.
CN202011186142.3A 2020-10-29 2020-10-29 Method and device for distributing video memory space during training of cyclic network model Active CN112329834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011186142.3A CN112329834B (en) 2020-10-29 2020-10-29 Method and device for distributing video memory space during training of cyclic network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011186142.3A CN112329834B (en) 2020-10-29 2020-10-29 Method and device for distributing video memory space during training of cyclic network model

Publications (2)

Publication Number Publication Date
CN112329834A true CN112329834A (en) 2021-02-05
CN112329834B CN112329834B (en) 2023-08-01

Family

ID=74297510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011186142.3A Active CN112329834B (en) 2020-10-29 2020-10-29 Method and device for distributing video memory space during training of cyclic network model

Country Status (1)

Country Link
CN (1) CN112329834B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170161604A1 (en) * 2015-12-02 2017-06-08 Google Inc. Determining Orders of Execution of a Neural Network
CN110688327A (en) * 2019-09-30 2020-01-14 百度在线网络技术(北京)有限公司 Video memory management method and device, electronic equipment and computer readable storage medium
CN110727462A (en) * 2018-07-16 2020-01-24 上海寒武纪信息科技有限公司 Data processor and data processing method
CN111209116A (en) * 2020-01-06 2020-05-29 西安芯瞳半导体技术有限公司 Method and device for distributing video memory space and computer storage medium
CN111767146A (en) * 2020-06-24 2020-10-13 杭州电子科技大学 Distributed machine learning system acceleration method based on network reconfiguration
CN111767995A (en) * 2019-04-02 2020-10-13 上海寒武纪信息科技有限公司 Operation method, device and related product

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170161604A1 (en) * 2015-12-02 2017-06-08 Google Inc. Determining Orders of Execution of a Neural Network
CN110727462A (en) * 2018-07-16 2020-01-24 上海寒武纪信息科技有限公司 Data processor and data processing method
CN111767995A (en) * 2019-04-02 2020-10-13 上海寒武纪信息科技有限公司 Operation method, device and related product
CN110688327A (en) * 2019-09-30 2020-01-14 百度在线网络技术(北京)有限公司 Video memory management method and device, electronic equipment and computer readable storage medium
CN111209116A (en) * 2020-01-06 2020-05-29 西安芯瞳半导体技术有限公司 Method and device for distributing video memory space and computer storage medium
CN111767146A (en) * 2020-06-24 2020-10-13 杭州电子科技大学 Distributed machine learning system acceleration method based on network reconfiguration

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BOJIAN ZHENG等: "Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training", 《 2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE》 *
马玮良;彭轩;熊倩;石宣化;金海;: "深度学习中的内存管理问题研究综述", 大数据, no. 04 *

Also Published As

Publication number Publication date
CN112329834B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN111708922A (en) Model generation method and device for representing heterogeneous graph nodes
CN111783948A (en) Model training method and device, electronic equipment and storage medium
CN111861955A (en) Method and device for constructing image editing model
CN111275190A (en) Neural network model compression method and device, image processing method and processor
CN111667057A (en) Method and apparatus for searching model structure
CN111753761B (en) Model generation method, device, electronic equipment and storage medium
CN111680517A (en) Method, apparatus, device and storage medium for training a model
CN110852449A (en) Model migration method and electronic device
CN110767212B (en) Voice processing method and device and electronic equipment
CN111814959A (en) Model training data processing method, device and system and storage medium
CN111695698A (en) Method, device, electronic equipment and readable storage medium for model distillation
CN110688327B (en) Video memory management method and device, electronic equipment and computer readable storage medium
CN113409430A (en) Drivable three-dimensional character generation method and device, electronic equipment and storage medium
CN112001489A (en) Optimizer learning method, device, electronic equipment and readable storage medium
CN111783949A (en) Deep neural network training method and device based on transfer learning
CN112329919B (en) Model training method and device
CN111767832B (en) Model generation method, device, electronic equipment and storage medium
CN111767149B (en) Scheduling method, device, equipment and storage equipment
CN111767059A (en) Deployment method and device of deep learning model, electronic equipment and storage medium
CN111694648A (en) Task scheduling method and device and electronic equipment
CN112232089A (en) Pre-training method, device and storage medium of semantic representation model
CN111738325A (en) Image recognition method, device, equipment and storage medium
CN112532483A (en) Test method, test device, electronic equipment and storage medium
CN112382292A (en) Voice-based control method and device
CN112329834A (en) Video memory space distribution method and device during cyclic network model training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant