CN112329834B - Method and device for distributing video memory space during training of cyclic network model - Google Patents

Method and device for distributing video memory space during training of cyclic network model Download PDF

Info

Publication number
CN112329834B
CN112329834B CN202011186142.3A CN202011186142A CN112329834B CN 112329834 B CN112329834 B CN 112329834B CN 202011186142 A CN202011186142 A CN 202011186142A CN 112329834 B CN112329834 B CN 112329834B
Authority
CN
China
Prior art keywords
optimization type
memory space
video memory
network model
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011186142.3A
Other languages
Chinese (zh)
Other versions
CN112329834A (en
Inventor
徐扬凯
王桂彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011186142.3A priority Critical patent/CN112329834B/en
Publication of CN112329834A publication Critical patent/CN112329834A/en
Application granted granted Critical
Publication of CN112329834B publication Critical patent/CN112329834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a method and a device for distributing a video memory space during training of a circulation network model, relates to the technical field of computers, and particularly relates to the technical field of artificial intelligence such as computer vision and deep learning. The specific implementation scheme is as follows: acquiring a cyclic network model, wherein the cyclic network model comprises a plurality of model layers; respectively obtaining the optimization type of each model layer; acquiring sample data and determining the sequence length of the sample data; and distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length. The method can ensure that the display memory allocated during the training of the cyclic network model can be used in each calculation, avoid the redundant waste of the display memory, and effectively compress the display memory used in the network calculation, thereby improving the training speed.

Description

Method and device for distributing video memory space during training of cyclic network model
Technical Field
The application relates to the technical field of computers, in particular to the technical field of artificial intelligence such as computer vision and deep learning, and especially relates to a method and a device for distributing a video memory space during training of a cyclic network model.
Background
With the continuous maturation and popularization of artificial intelligence technology, the application of artificial intelligence gradually permeates various field scenes of the current society. The voice assistant in the field of voice recognition, face recognition in computer vision application, machine translation of natural language processing and the like bring convenience to us, and the deep learning technology is relied on. The deep learning process is to train large-scale data on the multi-layer neural network continuously, and data features with higher abstract degree are obtained from multi-layer training continuously, so that the complex problem is solved conveniently. A special graphics processor (Graphics Processing Unit, GPU) is the computing unit on which the current deep learning platform relies mainly. Training a deep learning model on a GPU is a support for deep learning techniques.
At present, in the actual model training process, the deep learning computing framework needs to reduce the memory overhead as much as possible so as to utilize larger Batch Size and improve the training efficiency.
Disclosure of Invention
The application provides a method and a device for distributing a video memory space during training of a cyclic network model.
According to one aspect of the present application, there is provided a method for allocating video memory space during training of a cyclic network model, including:
acquiring a cyclic network model, wherein the cyclic network model comprises a plurality of model layers;
respectively obtaining the optimization type of each model layer;
acquiring sample data and determining the sequence length of the sample data; and
the video memory space corresponding to the sample data is distributed according to the optimization type and the sequence length
According to another aspect of the present application, there is provided a device for allocating a video memory space during training of a cyclic network model, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a cyclic network model, and the cyclic network model comprises a plurality of model layers;
the second acquisition module is used for respectively acquiring the optimization type of each model layer;
the determining module is used for acquiring sample data and determining the sequence length of the sample data; and
and the distribution module is used for distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length.
According to another aspect of the present application, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for allocating video memory space during cyclic network model training described in the embodiment of the above aspect.
According to another aspect of the present application, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the method for allocating video memory space during training of a cyclic network model according to the embodiment of the above aspect.
According to another aspect of the present application, there is provided a computer program product, including a computer program, where the computer program when executed by a processor implements the method for allocating a video memory space during training of a cyclic network model according to the embodiment of the foregoing aspect.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a flow chart of a method for allocating video memory space during training of a cyclic network model according to an embodiment of the present application;
FIG. 2 is a flow chart of another method for allocating video memory space during training of a cyclic network model according to an embodiment of the present application;
FIG. 3 is a training schematic diagram of a cyclic network model provided in an embodiment of the present application;
FIG. 4 is a schematic block diagram of a device for allocating video memory space during training of a cyclic network model according to an embodiment of the present application;
FIG. 5 is a schematic block diagram of a device for allocating video memory space during training of another cyclic network model according to an embodiment of the present application; and
fig. 6 is a block diagram of an electronic device for a method of allocating video memory space during training of a cyclic network model according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The following describes a method, a device, an electronic device and a storage medium for allocating a video memory space during training of a cyclic network model in the embodiment of the application with reference to the accompanying drawings.
Artificial intelligence is the discipline of studying certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person using a computer, both in the technical field of hardware and in the technical field of software. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a deep learning technology, a big data processing technology, a knowledge graph technology and the like.
The computer vision is a science for researching how to make a machine "see", and more specifically, the computer vision is to replace a human eye with a camera and a computer to identify, track and measure a target, and further perform graphic processing, so that the computer is processed into an image more suitable for the human eye to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can obtain 'information' from images or multidimensional data. The information referred to herein refers to Shannon-defined information that may be used to assist in making a "decision". Because perception can be seen as the extraction of information from sensory signals, computer vision can also be seen as science of how to "perceive" an artificial system from images or multi-dimensional data.
Deep learning is a new research direction in the field of machine learning. Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art.
In the embodiment of the application, aiming at the technical problems that input and output of each operator in a cycle are assigned in advance to apply for the training of a cycle network model in the related art, calculation data is filled into storage positions of corresponding time sequences in the cycle process, however, when calculation is performed at each time, only data of a fixed context is needed for a model layer (for example, an activation layer and a full connection layer) in the cycle, other storage positions are invalid data or unused positions, and redundant waste of a display memory is generated, and the method for assigning the display memory space in the training of the cycle network model is provided.
According to the method for distributing the video memory space during the training of the circulation network model, the optimization type of each model layer in the circulation network model and the sequence length of the sample data are obtained, the video memory space corresponding to the sample data is distributed according to the optimization type and the sequence length, the problems in the related technology are solved, and meanwhile the video memory used in the network calculation can be effectively compressed, so that the training speed is improved.
The method for allocating the video memory space during the training of the cyclic network model provided by the embodiment of the application can be executed by electronic equipment, wherein the electronic equipment can be a PC (Personal Computer ) computer, a tablet computer, a mobile phone or a palm computer, and the like, and is not limited in any way.
In an embodiment of the application, the electronic device may be provided with a processing component, a storage component and a driving component. Optionally, the driving component and the processing component may be integrally provided, the storage component may store an operating system, an application program or other program modules, and the processing component implements the method for allocating video memory space during training of the cyclic network model provided in the embodiment of the present application by executing the application program stored in the storage component.
Fig. 1 is a flow chart of a method for allocating video memory space during training of a cyclic network model according to an embodiment of the present application.
The method for distributing the video memory space during the training of the cyclic network model can be further executed by the device for distributing the video memory space during the training of the cyclic network model, which is provided by the embodiment of the application, and the device can be configured in the electronic equipment to achieve the purposes of obtaining the optimization type of each model layer in the cyclic network model and the sequence length of sample data, and distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length.
As a possible case, the method for allocating the video memory space during the training of the cyclic network model in the embodiment of the present application may also be executed at a server, where the server may be a cloud server, and the method for allocating the video memory space during the training of the cyclic network model may be executed at the cloud.
It should be noted that, the method for allocating the video memory space during the training of the cyclic network model according to the embodiments of the present application may be applied to training of the cyclic network model, where the cyclic network model may be applied in a plurality of fields, for example, the cyclic network model may be applied in a deep learning field of a device, such as deep learning of a voice recognition device, deep learning of a text recognition device, deep learning of automatic driving, and the like.
As shown in fig. 1, the method for allocating the video memory space during the training of the cyclic network model may include the following steps:
step 101, a cyclic network model is obtained, wherein the cyclic network model comprises a plurality of model layers. It should be noted that the plurality of model layers described in this embodiment may include an input layer, a convolution layer, an activation function layer, a full connection layer, and the like.
In an embodiment of the present application, the electronic device may obtain the cyclic network model.
For better describing the application, taking a cyclic network model applied in a voice recognition system as an example, when training the cyclic network model in the voice recognition system, the electronic device may first acquire the cyclic network model in the voice recognition system.
And 102, respectively acquiring the optimization type of each model layer.
In this embodiment of the present application, each of the model layers includes an operator, and the optimization type of each model layer may be obtained by determining a time-sequence dependency relationship of an input of the operator in each model layer, where the number of operators may be multiple.
Specifically, after the electronic device obtains the cyclic network model, the cyclic network model may be parsed first to obtain operators in a plurality of model layers in the cyclic network model, and determine the time-sequence dependency relationship of the inputs of the operators in each model layer, and obtain (determine) the optimization type corresponding to each model layer according to the dependency relationship.
It should be noted that, the optimization types corresponding to each model layer described in this embodiment may be the same or different, where different optimization types may correspond to different video memory space allocation policies.
Step 103, obtaining sample data and determining the sequence length of the sample data.
It should be noted that, in this embodiment, the sample data may be TCM (Trellis Coded Modulation, trellis coded) data, and for better describing the present application, it is assumed that the loop network model in the speech recognition system is trained, and the sample data may be obtained by converting preset speech information through a preset algorithm.
In the embodiment of the present application, there are multiple ways to obtain the sample data, where the electronic device may obtain the sample data by intercepting the input data of the cyclic network model in the process of applying the cyclic network model through a software program, or may also artificially and actively create the sample data, which is not limited in any way.
And 104, distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length. It should be noted that, the video memory space described in this embodiment may be provided by a video card (for example, a video card in a PC computer, a video card in a mobile device).
Specifically, after the electronic device obtains the optimization type of each model layer, the electronic device may obtain sample data first, analyze the sample data to determine the sequence length of the sample data, generate a corresponding allocation policy of the video memory space according to the optimization type and the sequence length, and allocate the video memory space corresponding to the sample data according to the allocation policy of the video memory space.
In the embodiment of the application, a cyclic network model is firstly obtained, wherein the cyclic network model comprises a plurality of model layers, an optimization type of each model layer is respectively obtained, then sample data is obtained, the sequence length of the sample data is determined, and finally the video memory space corresponding to the sample data is distributed according to the optimization type and the sequence length. Therefore, the display memory distributed during the training of the cyclic network model can be ensured to be used in each calculation, the redundant waste of the display memory is avoided, and the display memory used in the network calculation is effectively compressed, so that the training speed is improved.
To clearly illustrate the above embodiment, in one embodiment of the present application, as shown in fig. 2, the method for obtaining the optimization type of each model layer separately may include the following steps:
in step 201, operators for each model layer are obtained.
In embodiments of the present application, multiple operators may be used in each model layer, and the inputs of the multiple operators may be the same.
Step 202, determining a dependency relationship between the operator and the input data in time sequence, wherein the dependency relationship may include that the input of the operator depends on the input data at the current time, that the input of the operator depends on the input data at the current time and the previous L times, and that the input of the operator depends on the input data at all times, and L is a positive integer.
It should be noted that, as shown in fig. 3, assuming that io0, io1, io2, and io2 are each model layer in the cyclic network model, op0, op1, op2, and op3 are operators in the corresponding model layers, op_s may be data input into the cyclic network model, and n may be a sequence length of sample data. When a certain model layer exists only in the current time sequence circulation process, other operators only have fixed time sequence dependency relationship, namely when t=x is calculated, op2 has dependency relationship on io2[ t-1], io2[ t ] and op1 has dependency relationship on io1[ t ]. When the model layer is dependent on a cyclic external operator, the complete time sequence data, namely the reading relation of op3 to io3, must be maintained, wherein t can be time and x can be a certain moment.
That is, the input of op2 depends on the input data at the current time and the previous 1 time, the input of op1 depends on the input data at the current time, and the input of op3 depends on the input data at all times. And at each moment in time, only data of a fixed context is needed for the model layer in the loop, and other storage positions are invalid data or unused positions.
And 203, respectively acquiring the optimization type of each model layer according to the dependency relationship.
Specifically, after the electronic device obtains the cyclic network model, the cyclic network model may be parsed first to obtain operators in a plurality of model layers in the cyclic network model, determine dependency relationships between input and input data of the operators in each model layer in time sequence in the plurality of model layers, and then obtain an optimization type of each model layer according to the dependency relationships. Because the input of the operators in each model layer is edited in advance by software, the optimization type of each model layer can be accurately obtained according to the dependency relationship of the input of the operators on time sequence.
To clearly illustrate the above embodiment, in one embodiment of the present application, obtaining the optimization type of each model layer according to the dependency relationship may include: when the dependency relationship is that the input of the operator depends on the input data at the current moment, the model layer corresponding to the dependency relationship is of a first optimization type; when the dependency relationship is that the input of the operator depends on the input data of the current moment and the previous L moments, the model layer corresponding to the dependency relationship is of a second optimization type; and when the dependency relationship is that the input of the operator depends on the input data at all moments, the model layer corresponding to the dependency relationship is of a third optimization type.
Specifically, after the electronic device obtains the dependency relationship, determining an optimization type of each model layer according to the dependency relationship of the input of the operator and the input data in time sequence in each model layer, wherein when the dependency relationship is that the input of the operator depends on the input data at the current moment, the model layer corresponding to the dependency relationship is a first optimization type; when the dependency relationship is that the input of the operator depends on the input data of the current moment and the previous L moments, the model layer corresponding to the dependency relationship is of a second optimization type; and when the dependency relationship is that the input of the operator depends on the input data at all moments, the model layer corresponding to the dependency relationship is of a third optimization type. And the electronic equipment acquires the determined optimization type of each model layer, so that the optimization type of each model layer can be accurately obtained under the condition that the training logic of the original circulating network model is not required to be changed.
In order to clearly illustrate the above embodiment, in one embodiment of the present application, allocating the video memory space according to the optimization type and the sequence length may include: when the optimization type is the first optimization type, a video memory space with the sequence length of 1 is allocated for a model layer corresponding to the optimization type; when the optimization type is the second optimization type, a video memory space with the sequence length of L+1 is allocated for a model layer corresponding to the optimization type; and when the optimization type is the third optimization type, distributing the video memory space with the sequence length for the model layer corresponding to the optimization type.
Specifically, after the electronic device obtains the dependency relationship between the input of the operator and the input data in each model layer on the time sequence and the sequence length of the sample data, the electronic device may generate a video memory space allocation policy according to the optimization type and the sequence length, where the video memory space allocation policy may include: when the optimization type is the first optimization type, a video memory space with the sequence length of 1 is allocated for a model layer corresponding to the optimization type; when the optimization type is the second optimization type, a video memory space with the sequence length of L+1 is allocated for a model layer corresponding to the optimization type; and when the optimization type is the third optimization type, allocating the video memory space with the sequence length for the model layer corresponding to the optimization type. Finally, the electronic equipment allocates the video memory space corresponding to the sample data according to the video memory space allocation strategy, so that the access of the model layer can be optimized according to different optimization types, only necessary data is reserved, the video memory overhead is reduced, the video memory used in network calculation is effectively compressed, and the training speed is improved.
Fig. 4 is a schematic block diagram of a device for allocating video memory space during training of a cyclic network model according to an embodiment of the present application.
The device for distributing the video memory space during the training of the cyclic network model can be configured in the electronic equipment to achieve the purposes of acquiring the optimization type and the sequence length of the sample data of each model layer in the cyclic network model and distributing the video memory space corresponding to the sample data according to the optimization type and the sequence length.
It should be noted that, the device for allocating the video memory space during training the cyclic network model according to the embodiment of the present application may be applied to training the cyclic network model, where the cyclic network model may be applied in a plurality of fields, for example, the cyclic network model may be applied in a deep learning field of a device, such as deep learning of a voice recognition device, deep learning of a text recognition device, deep learning of automatic driving, and the like.
As shown in fig. 4, the device 400 for allocating the video memory space during the training of the cyclic network model may include: a first acquisition module 410, a second acquisition module 420, a determination module 430, and an allocation module 440.
Wherein the first obtaining module 410 is configured to obtain a cyclic network model, where the cyclic network model includes a plurality of model layers. It should be noted that the plurality of model layers described in this embodiment may include an input layer, a convolution layer, an activation function layer, a full connection layer, and the like.
For better describing the present application, taking a cyclic network model applied in a speech recognition system as an example, the first acquisition module 410 may acquire the cyclic network model in the speech recognition system when training the cyclic network model in the speech recognition system.
The second obtaining module 420 is configured to obtain an optimization type of each model layer.
In this embodiment of the present application, each of the model layers includes an operator, and the optimization type of each model layer may be obtained by determining a time-sequence dependency relationship of an input of the operator in each model layer, where the number of operators may be multiple.
Specifically, after the first obtaining module 410 obtains the cyclic network model, the second obtaining module 420 may parse the cyclic network model to obtain operators in a plurality of model layers in the cyclic network model, determine a time-sequence dependency relationship of the inputs of the operators in each model layer, and obtain (determine) the optimization type corresponding to each model layer according to the dependency relationship.
It should be noted that, the optimization types corresponding to each model layer described in this embodiment may be the same or different, where different optimization types may correspond to different video memory space allocation policies.
The determining module 430 is configured to obtain sample data and determine a sequence length of the sample data.
It should be noted that, in this embodiment, the sample data may be TCM (Trellis Coded Modulation, trellis coded) data, and for better describing the present application, it is assumed that the loop network model in the speech recognition system is trained, and the sample data may be obtained by converting preset speech information through a preset algorithm.
In the embodiment of the present application, there are multiple ways to obtain the sample data, where the determining module 430 may obtain the sample data by intercepting the input data of the cyclic network model in the process of applying the cyclic network model through a software program, or the determining module 430 may also obtain the sample data created by human initiative, which is not limited in any way.
The allocation module 440 is configured to allocate the video memory space corresponding to the sample data according to the optimization type and the sequence length. It should be noted that, the video memory space described in this embodiment may be provided by a video card (for example, a video card in a PC computer, a video card in a mobile device).
Specifically, after the second obtaining module 420 obtains the optimization type of each model layer, the determining module 430 may obtain the sample data first, analyze the sample data to determine the sequence length of the sample data, and then the allocating module 440 may generate the allocation policy of the corresponding video memory space according to the optimization type and the sequence length, and finally allocate the video memory space corresponding to the sample data according to the allocation policy of the video memory space.
In the embodiment of the application, a cyclic network model is acquired through a first acquisition module, wherein the cyclic network model comprises a plurality of model layers, an optimization type of each model layer is acquired through a second acquisition module, sample data is acquired through a determination module, the sequence length of the sample data is determined, and a video memory space corresponding to the sample data is distributed through a distribution module according to the optimization type and the sequence length. Therefore, the display memory distributed during the training of the cyclic network model can be ensured to be used in each calculation, the redundant waste of the display memory is avoided, and the display memory used in the network calculation is effectively compressed, so that the training speed is improved.
In another embodiment of the present application, as shown in fig. 5, a device 500 for allocating video memory space during training of a cyclic network model may include: a first acquisition module 510, a second acquisition module 520, a determination module 530, and an allocation module 540.
Specifically, the second acquisition module 520 may include a first acquisition unit 521, a determination unit 522, and a second acquisition unit 523.
Wherein the first obtaining unit 521 is configured to obtain an operator of each model layer.
The determining unit 522 is configured to determine a dependency relationship between an operator and input data in time sequence, where the dependency relationship includes input data of an operator that depends on a current time, input data of the operator that depends on the current time and the previous L times, and input data of the operator that depends on all times, and L is a positive integer.
The second obtaining unit 523 is configured to obtain an optimization type of each model layer according to a dependency relationship, where, when the dependency relationship is that the input of the operator depends on the input data at the current moment, the model layer corresponding to the dependency relationship is a first optimization type; when the dependency relationship is that the input of the operator depends on the input data of the current moment and the previous L moments, the model layer corresponding to the dependency relationship is of a second optimization type; and when the dependency relationship is that the input of the operator depends on the input data at all moments, the model layer corresponding to the dependency relationship is of a third optimization type.
It should be noted that the first acquiring module 510 and the first acquiring module 410, the second acquiring module 520 and the second acquiring module 420, the determining module 530 and the determining module 430, and the distributing module 540 and the distributing module 440 described in the above embodiments may have the same functions and structures.
In one embodiment of the present application, as shown in fig. 5, the allocation module 540 is specifically configured to: when the optimization type is the first optimization type, a video memory space with the sequence length of 1 is allocated for a model layer corresponding to the optimization type; when the optimization type is the second optimization type, a video memory space with the sequence length of L+1 is allocated for a model layer corresponding to the optimization type; and when the optimization type is the third optimization type, distributing the video memory space with the sequence length for the model layer corresponding to the optimization type.
It should be noted that, the explanation of the foregoing embodiment of the method for allocating the video memory space during the training of the cyclic network model is also applicable to the apparatus for allocating the video memory space during the training of the cyclic network model in this embodiment, which is not described herein again.
According to the distribution device of the video memory space during the training of the circulation network model, the circulation network model is obtained through the first obtaining module, wherein the circulation network model comprises a plurality of model layers, the optimization type of each model layer is obtained through the second obtaining module, the sample data is obtained through the determining module, the sequence length of the sample data is determined, and the video memory space corresponding to the sample data is distributed through the distribution module according to the optimization type and the sequence length. Therefore, the display memory distributed during the training of the cyclic network model can be ensured to be used in each calculation, the redundant waste of the display memory is avoided, and the display memory used in the network calculation is effectively compressed, so that the training speed is improved.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, for example, a method of allocating a video memory space at the time of the cyclic network model training. For example, in some embodiments, the method of allocating video memory space during cyclic network model training may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the above-described method of allocating video memory space upon training of the cyclic network model may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured in any other suitable manner (e.g., by means of firmware) to perform the method of allocating video memory space when cycling the network model training.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (8)

1. A method for distributing video memory space during training of a cyclic network model comprises the following steps:
acquiring a cyclic network model, wherein the cyclic network model comprises a plurality of model layers;
acquiring operators of each model layer;
determining the dependency of the operator and the input data on the time sequence
When the dependency relationship is that the input of the operator depends on the input data at the current moment, the model layer corresponding to the dependency relationship is of a first optimization type;
when the dependency relationship is that the input of the operator depends on the input data of the current moment and the previous L moments, the model layer corresponding to the dependency relationship is of a second optimization type; and
when the dependency relationship is that the input of the operator depends on input data at all moments, a model layer corresponding to the dependency relationship is of a third optimization type;
acquiring sample data and determining the sequence length of the sample data; and
and distributing the video memory space corresponding to the sample data according to the input data moment of the input dependence of each optimization type operator and the sequence length.
2. The method for allocating video memory space during training of a cyclic network model according to claim 1, wherein the dependency relationship comprises input data of an operator depending on a current time, input data of the operator depending on the current time and the previous L times, and input data of the operator depending on all times, and L is a positive integer.
3. The method for allocating video memory space during training of a cyclic network model according to claim 2, wherein the allocating the video memory space corresponding to the sample data according to the input data time of input dependence of each optimization type operator and the sequence length comprises:
when the optimization type is the first optimization type, a video memory space with the sequence length of 1 is allocated for a model layer corresponding to the optimization type;
when the optimization type is the second optimization type, a video memory space with the sequence length of L+1 is allocated for a model layer corresponding to the optimization type; and
and when the optimization type is the third optimization type, allocating a video memory space of the sequence length of the sample data for a model layer corresponding to the optimization type.
4. A device for distributing video memory space during training of a cyclic network model comprises:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a cyclic network model, and the cyclic network model comprises a plurality of model layers;
a second acquisition module comprising:
the first acquisition unit is used for acquiring operators of each model layer;
a determining unit for determining the dependency relationship between the operator and the input data in time sequence;
the second acquisition unit is used for determining a model layer corresponding to the dependency relationship as a first optimization type when the dependency relationship is that the input of the operator depends on the input data at the current moment;
when the dependency relationship is that the input of the operator depends on the input data of the current moment and the previous L moments, determining a model layer corresponding to the dependency relationship as a second optimization type; and
when the dependency relationship is input data of all moments of the operator, determining a model layer corresponding to the dependency relationship as a third optimization type;
the determining module is used for acquiring sample data and determining the sequence length of the sample data; and
and the distribution module is used for distributing the video memory space corresponding to the sample data according to the input data moment of the operator input dependence of each optimization type and the sequence length.
5. The device for allocating video memory space during training of a cyclic network model according to claim 4, wherein the dependency relationship comprises input data of an operator depending on a current time, input data of the operator depending on the current time and the previous L times, and input data of the operator depending on all times, and L is a positive integer.
6. The device for allocating video memory space during training of a cyclic network model according to claim 4, wherein the allocation module is specifically configured to:
when the optimization type is the first optimization type, a video memory space with the sequence length of 1 is allocated for a model layer corresponding to the optimization type;
when the optimization type is the second optimization type, a video memory space with the sequence length of L+1 is allocated for a model layer corresponding to the optimization type; and
and when the optimization type is the third optimization type, allocating a video memory space of the sequence length of the sample data for a model layer corresponding to the optimization type.
7. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of allocating video memory space during cyclic network model training of any one of claims 1-3.
8. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of allocating video memory space at the time of cyclic network model training of any one of claims 1-3.
CN202011186142.3A 2020-10-29 2020-10-29 Method and device for distributing video memory space during training of cyclic network model Active CN112329834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011186142.3A CN112329834B (en) 2020-10-29 2020-10-29 Method and device for distributing video memory space during training of cyclic network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011186142.3A CN112329834B (en) 2020-10-29 2020-10-29 Method and device for distributing video memory space during training of cyclic network model

Publications (2)

Publication Number Publication Date
CN112329834A CN112329834A (en) 2021-02-05
CN112329834B true CN112329834B (en) 2023-08-01

Family

ID=74297510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011186142.3A Active CN112329834B (en) 2020-10-29 2020-10-29 Method and device for distributing video memory space during training of cyclic network model

Country Status (1)

Country Link
CN (1) CN112329834B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688327A (en) * 2019-09-30 2020-01-14 百度在线网络技术(北京)有限公司 Video memory management method and device, electronic equipment and computer readable storage medium
CN110727462A (en) * 2018-07-16 2020-01-24 上海寒武纪信息科技有限公司 Data processor and data processing method
CN111209116A (en) * 2020-01-06 2020-05-29 西安芯瞳半导体技术有限公司 Method and device for distributing video memory space and computer storage medium
CN111767146A (en) * 2020-06-24 2020-10-13 杭州电子科技大学 Distributed machine learning system acceleration method based on network reconfiguration
CN111767995A (en) * 2019-04-02 2020-10-13 上海寒武纪信息科技有限公司 Operation method, device and related product

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10699186B2 (en) * 2015-12-02 2020-06-30 Google Llc Determining orders of execution of a neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727462A (en) * 2018-07-16 2020-01-24 上海寒武纪信息科技有限公司 Data processor and data processing method
CN111767995A (en) * 2019-04-02 2020-10-13 上海寒武纪信息科技有限公司 Operation method, device and related product
CN110688327A (en) * 2019-09-30 2020-01-14 百度在线网络技术(北京)有限公司 Video memory management method and device, electronic equipment and computer readable storage medium
CN111209116A (en) * 2020-01-06 2020-05-29 西安芯瞳半导体技术有限公司 Method and device for distributing video memory space and computer storage medium
CN111767146A (en) * 2020-06-24 2020-10-13 杭州电子科技大学 Distributed machine learning system acceleration method based on network reconfiguration

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training;Bojian Zheng等;《 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture》;全文 *
深度学习中的内存管理问题研究综述;马玮良;彭轩;熊倩;石宣化;金海;;大数据(04);全文 *

Also Published As

Publication number Publication date
CN112329834A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
US11640528B2 (en) Method, electronic device and computer readable medium for information processing for accelerating neural network training
CN112559007B (en) Parameter updating method and device of multitask model and electronic equipment
CN113627536B (en) Model training, video classification method, device, equipment and storage medium
CN113870399B (en) Expression driving method and device, electronic equipment and storage medium
CN112669852B (en) Memory allocation method and device and electronic equipment
CN113409430A (en) Drivable three-dimensional character generation method and device, electronic equipment and storage medium
CN114332590B (en) Joint perception model training method, joint perception method, device, equipment and medium
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
CN112528995A (en) Method for training target detection model, target detection method and device
CN114490116B (en) Data processing method and device, electronic equipment and storage medium
CN113641804A (en) Pre-training model obtaining method and device, electronic equipment and storage medium
CN113592932A (en) Training method and device for deep completion network, electronic equipment and storage medium
CN112329834B (en) Method and device for distributing video memory space during training of cyclic network model
US20230206080A1 (en) Model training method, system, device, and medium
CN114783597B (en) Method and device for diagnosing multi-class diseases, electronic equipment and storage medium
CN113408304B (en) Text translation method and device, electronic equipment and storage medium
CN113554550B (en) Training method and device for image processing model, electronic equipment and storage medium
CN113343997B (en) Optical character recognition method, device, electronic equipment and storage medium
CN113361575B (en) Model training method and device and electronic equipment
CN113033179B (en) Knowledge acquisition method, knowledge acquisition device, electronic equipment and readable storage medium
CN114998649A (en) Training method of image classification model, and image classification method and device
CN112558918B (en) Multiply-add operation method and device for neural network
CN114445668A (en) Image recognition method and device, electronic equipment and storage medium
CN113033219A (en) Model training method and device, electronic equipment and computer readable storage medium
CN114078097A (en) Method and device for acquiring image defogging model and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant