CN109598344B - Model generation method and device - Google Patents

Model generation method and device Download PDF

Info

Publication number
CN109598344B
CN109598344B CN201811534701.8A CN201811534701A CN109598344B CN 109598344 B CN109598344 B CN 109598344B CN 201811534701 A CN201811534701 A CN 201811534701A CN 109598344 B CN109598344 B CN 109598344B
Authority
CN
China
Prior art keywords
type
data
accuracy
precision
propagation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811534701.8A
Other languages
Chinese (zh)
Other versions
CN109598344A (en
Inventor
胡耀全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201811534701.8A priority Critical patent/CN109598344B/en
Publication of CN109598344A publication Critical patent/CN109598344A/en
Application granted granted Critical
Publication of CN109598344B publication Critical patent/CN109598344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Computing arrangements based on biological models using neural network models
    • G06N3/08Learning methods
    • G06N3/084Back-propagation

Abstract

The embodiment of the disclosure discloses a model generation method and a model generation device. The specific implementation mode of the method comprises the following steps: acquiring training sample data; calculating by using data of a first precision type in a forward propagation process based on the training sample data and the model to be trained to obtain actual output of the first precision type; in a back propagation process based on the actual output and the model to be trained, a calculation is performed using data of a second accuracy type, wherein the first accuracy type and the second accuracy type are different. This embodiment provides a new model generation approach.

Description

Model generation method and device
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a model generation method and a model generation device.
Background
With the development of artificial intelligence, neural network-based models play a role in more and more scenarios. The Neural Network may refer to an Artificial Neural Network (ANN). A neural network is generally an operational model, and is formed by connecting a large number of nodes (or neurons). Each node may represent a particular output function, called an excitation function. Every connection between two nodes represents a weighted value, called weight, for the signal passing through the connection, which is equivalent to the memory of the artificial neural network.
In the training of neural network based models, it is common in the art to use a type of precision data for the calculations.
Disclosure of Invention
The embodiment of the disclosure provides a model generation method and a model generation device.
In a first aspect, an embodiment of the present disclosure provides a model generation method, where the method includes: acquiring training sample data; calculating by using data of a first precision type in a forward propagation process based on the training sample data and the model to be trained to obtain actual output of the first precision type; in a back propagation process based on the actual output and the model to be trained, a calculation is performed using data of a second accuracy type, wherein the first accuracy type and the second accuracy type are different.
In some embodiments, the first precision type or the second precision type is a half precision type.
In some embodiments, the accuracy of the first accuracy type indication is less than the accuracy of the second accuracy type indication.
In some embodiments, the accuracy of the first accuracy type indication is greater than the accuracy of the second accuracy type indication.
In some embodiments, the calculating by using the data of the first precision type in the forward propagation process based on the training sample data and the model to be trained to obtain the actual output of the first precision type includes: in response to determining that the training sample data is not data of a first precision type, converting the training sample data into data of the first precision type, and generating first training sample data; in response to determining that the network parameter of the model to be trained is not the data of the first accuracy type, converting the network parameter into the data of the first accuracy type, and generating a first network parameter; and performing forward propagation calculation by using the first training sample data and the first network parameter to obtain actual output of a first precision type.
In some embodiments, the performing a computation using data of a second accuracy type in a back propagation process based on the actual output and the model to be trained includes: converting the actual output from the first precision type to a second precision type; in response to determining that the network parameter of the model to be trained is not the data of the second precision type, converting the network parameter into the data of the second precision type, and generating a second network parameter; and performing back propagation calculation according to the actual output of the second precision type and the second network parameter so as to update the second network parameter.
In a second aspect, an embodiment of the present disclosure provides a model generation apparatus, including: an acquisition unit configured to acquire training sample data; the forward propagation unit is configured to utilize data of a first precision type to perform calculation in a forward propagation process based on the training sample data and the model to be trained so as to obtain actual output of the first precision type; a back propagation unit configured to perform a calculation using data of a second accuracy type in a back propagation process based on the actual output and the model to be trained, wherein the first accuracy type and the second accuracy type are different.
In some embodiments, the first precision type or the second precision type is a half precision type.
In some embodiments, the accuracy of the first accuracy type indication is less than the accuracy of the second accuracy type indication.
In some embodiments, the accuracy of the first accuracy type indication is greater than the accuracy of the second accuracy type indication.
In some embodiments, the forward propagation unit is further configured to: in response to determining that the training sample data is not data of a first precision type, converting the training sample data into data of the first precision type, and generating first training sample data; in response to determining that the network parameter of the model to be trained is not the data of the first accuracy type, converting the network parameter into the data of the first accuracy type, and generating a first network parameter; and performing forward propagation calculation by using the first training sample data and the first network parameter to obtain actual output of a first precision type.
In some embodiments, the back propagation unit is further configured to: converting the actual output from the first precision type to a second precision type; in response to determining that the network parameter of the model to be trained is not the data of the second precision type, converting the network parameter into the data of the second precision type, and generating a second network parameter; and performing back propagation calculation according to the actual output of the second precision type and the second network parameter so as to update the second network parameter.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.
In a fourth aspect, the disclosed embodiments provide a computer-readable medium on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
According to the model generation method and device provided by the embodiment of the disclosure, in the model training process, a forward propagation process adopts data of a first precision type for calculation, a backward propagation process adopts data of a second precision type for calculation, and the first precision type and the second precision type are different, so that network parameters of a model to be trained can be updated to generate a new model, and the technical effects at least include: a new model generation approach is provided.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which some embodiments of the present disclosure may be applied;
FIG. 2 is a flow diagram for one embodiment of a model generation method according to the present disclosure;
FIG. 3 is a schematic diagram of one application scenario of a model generation method according to the present disclosure;
FIG. 4 is a flow diagram of yet another embodiment of a model generation method according to the present disclosure;
FIG. 5 is a schematic block diagram of one embodiment of a model generation apparatus according to the present disclosure;
FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the model generation methods or model generation apparatus of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 may be a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a model generation application, a call application, a live broadcast application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal devices 101, 102, and 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they may be various electronic devices with communication functions, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4), laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server supporting model generation class applications on the terminal devices 101, 102, 103. The terminal device can package some parameters (such as training sample data and the like) generated by the model into a model generation request, and then send the model generation request to the background server. The background server may analyze and perform other processing on the received data such as the model generation request, and feed back a processing result (e.g., various parameters of the model) to the terminal device.
It should be noted that the model generation method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the model generation device is generally disposed in the server 105. Optionally, the model generation method provided by the embodiment of the present disclosure may also be executed by the terminal devices 101, 102, and 103.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to FIG. 2, a flow 200 of one embodiment of a model generation method is shown. The embodiment is mainly exemplified by applying the method to an electronic device with certain computing capability, and the electronic device may be the server shown in fig. 1. The model generation method comprises the following steps:
step 201, training sample data is obtained.
In this embodiment, an executing subject (for example, a server shown in fig. 1) of the model generation method may acquire training sample data.
Here, the training sample data may be used to train the model to be trained to generate a new model.
In this embodiment, the model to be trained may be an untrained neural network or an untrained neural network. Herein, the neural network may refer to an artificial neural network. Common Neural networks include, for example, Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and so forth.
Optionally, the network structure of the model to be trained may be preset, for example, which layers the neural network includes, the connection order relationship between the layers, which neurons each layer includes, the weight (weight) and bias term (bias) corresponding to each neuron, the activation function of each layer, and the like need to be set. The network structure of the model to be trained may be represented by various network parameters, which may include, but are not limited to, weights, bias terms, and the like.
By way of example, when the model to be trained is a deep convolutional neural network, since the deep convolutional neural network is a multi-layer neural network, it needs to be determined which layers the deep convolutional neural network includes (e.g., convolutional layers, pooling layers, fully-connected layers, classifiers, etc.), the connection order relationship between layers, and which network parameters each layer includes (e.g., weights, bias terms, step size of convolution), etc. Among other things, convolutional layers may be used to extract image features. For each convolution layer, it can determine how many convolution kernels there are, the size of each convolution kernel, the weight of each neuron in each convolution kernel, the bias term corresponding to each convolution kernel, the step size between two adjacent convolutions, and the like.
Step 202, in the forward propagation process based on the training sample data and the model to be trained, the data with the first precision type is used for calculation, and the actual output with the first precision type is obtained.
In this embodiment, the executing agent may perform calculation by using data of a first accuracy type in a forward propagation process based on the training sample data and the model to be trained, so as to obtain an actual output of the first accuracy type.
In this embodiment, the model training process may use floating point type data for the calculations. Floating-point data can be classified into the following types according to the precision: a half precision type, but a precision type and a double precision type. In general, 16-bit floating point data may be of a half precision type, 32-bit floating point data may be of a single precision type, and 64-bit floating point data may be of a double precision type.
In this embodiment, training sample data is imported into the model to be trained, and then the output layer of the model to be trained obtains actual output, which may be referred to as forward propagation. And determining the error of the output layer by using the target output and the actual output of the model to be trained.
In this embodiment, the calculation is performed by using data of the first accuracy type, which means that the data participating in the calculation is all of the first accuracy type, that is, the training sample data participating in the calculation and the network parameters of the model to be trained are all of the first accuracy type. If the acquired training sample data and the network parameters of the model to be trained are not of the first precision type, the first precision type can be converted, and then forward propagation calculation is carried out.
Step 203, in the back propagation process based on the actual output and the model to be trained, the data with the second precision type is used for calculation.
In this embodiment, the executing entity may perform the calculation using the data of the second accuracy type in the back propagation process based on the actual output and the model to be trained. Thereby, the network parameters of the model to be trained can be updated, so that a new model can be generated based on the model to be trained.
In this embodiment, the error value of the output layer is used for error back propagation, so as to adjust the network parameters of the model to be trained, which may be referred to as back propagation. As an example, a back propagation Algorithm (BP Algorithm) and a gradient descent method (e.g., a random gradient descent Algorithm) may be used to adjust the network parameters of the model to be trained.
In this embodiment, the calculation is performed by using the data of the second accuracy type, which means that the data involved in the calculation is of the second accuracy type, that is, the actual output involved in the calculation and the network parameters of the model to be trained are of the second accuracy type. If the obtained actual output and the network parameters of the model to be trained are not of the second precision type, the actual output and the network parameters can be converted into the second precision type, and then back propagation calculation is carried out.
Here, the first accuracy type and the second accuracy type are different.
It should be noted that, in the forward propagation and backward propagation processes, the prior art uses the same precision type of data for calculation. In the present disclosure, the accuracy type of data employed for forward propagation and backward propagation is different. Thus, technical effects may include at least:
first, a new model generation approach is provided.
Secondly, the model training process is divided into two parts, one part adopts data with higher precision to calculate, and the other part adopts data with lower precision to calculate, so that the model training speed can be improved, and the model training accuracy can be ensured.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the model generation method according to the embodiment shown in fig. 2. In the application scenario of fig. 3:
first, the server 301 may acquire training sample data.
Then, the server 301 may perform calculation by using the data of the first accuracy type in the forward propagation process based on the training sample data and the model to be trained, so as to obtain an actual output of the first accuracy type.
The server 301 may then perform calculations using data of a second accuracy type in a back-propagation process based on the actual output and the model to be trained. Therefore, the network parameters of the model to be trained can be updated, and the updated network parameters are obtained to generate a new model. Here, the first accuracy type and the second accuracy type are different.
In the method provided by the foregoing embodiment of the present disclosure, in the model training process, a forward propagation process uses data of a first precision type for calculation, and a backward propagation process uses data of a second precision type for calculation, and the first precision type and the second precision type are different, so that network parameters of a model to be trained may be updated to generate a new model, and the technical effects may at least include: a new model generation approach is provided.
In some embodiments, the first precision type or the second precision type is a half precision type. Namely either of the following two ways:
a first one, the first precision type being a half precision type, the second precision type being any one of: single precision type and double precision type.
A second type of precision, the second type of precision being a half-precision type, the first type of precision being any one of: single precision type and double precision type.
In the prior art, it is generally considered that: the half-precision type data is suitable for transmission (transmission speed is fast) and is not suitable for calculation (accuracy is insufficient). In some implementations of the present disclosure, the inventor thinks that the model training can be divided into two parts, then one part of calculation (forward propagation or backward propagation) is performed by using data of half precision type, and the other part of calculation is performed by using data of higher precision, thereby overcoming the technical prejudice (the data of half precision type is not suitable for calculation), and realizing that: partial calculation can be performed by using semi-precision data, so that the calculation speed is increased; and the other part of calculation can be performed by using the data with higher precision, so that the accuracy of model training is ensured.
In some embodiments, the accuracy of the first accuracy type indication (which may be referred to as a first accuracy) is less than the accuracy of the second accuracy type indication (which may be referred to as a second accuracy).
It should be noted that the first precision is smaller than the second precision, that is, the forward propagation process adopts a smaller precision, and the backward propagation process adopts a larger precision. Therefore, the calculation speed can be improved in the forward propagation process; in the back propagation process, the accuracy of the updated network parameters is improved. Therefore, the speed of model training can be improved, and the accuracy of the model training can be ensured.
In some embodiments, the accuracy of the first accuracy type indication is greater than the accuracy of the second accuracy type indication.
It should be noted that the first precision is greater than the second precision, that is, the forward propagation process adopts a greater precision, and the backward propagation process adopts a smaller precision. Therefore, the calculation accuracy can be ensured in the forward propagation process; in the back propagation process, the speed of calculation is increased. Therefore, the accuracy of model training can be ensured, and the speed of model training can be improved.
With further reference to FIG. 4, a flow 400 of yet another embodiment of a model generation method is illustrated. The process 400 of the model generation method includes the following steps:
step 401, obtaining training sample data and network parameters of a model to be trained.
In this embodiment, an executing subject (for example, a server shown in fig. 1) of the model generation method may obtain training sample data and network parameters of a model to be trained.
Here, the implementation details of step 401 may refer to the description in step 201, and are not described herein again.
Step 402, in response to determining that the training sample data is not the data of the first accuracy type, converting the training sample data into the data of the first accuracy type, and generating first training sample data.
In this embodiment, the executing agent may first determine whether the training sample data is of the first accuracy type, and if not, convert the training sample data into data of the first accuracy type, so as to obtain the first training sample data, that is, generate the first training sample data.
In step 403, in response to determining that the network parameter of the model to be trained is not the data of the first accuracy type, the network parameter is converted into the data of the first accuracy type, and a first network parameter is generated.
In this embodiment, the executing entity may first determine whether the network parameter is of the first accuracy type, and if not, convert the network parameter into data of the first accuracy type, so as to obtain the first network parameter, that is, generate the first network parameter.
Here, the execution order of step 402 and step 403 is not limited.
Step 404, forward propagation calculation is performed by using the first training sample data and the first network parameter, so as to obtain an actual output of the first precision type.
In this embodiment, the executing agent may perform forward propagation calculation by using the first training sample data and the first network parameter, so as to obtain an actual output of a first precision type.
Step 405, converting the actual output from the first precision type to a second precision type.
In this embodiment, the execution body may convert the actual output from a first precision type to a second precision type.
Step 406, in response to determining that the network parameter of the model to be trained is not the data of the second accuracy type, converting the network parameter into the data of the second accuracy type, and generating a second network parameter.
In this embodiment, the executing entity may first determine whether the network parameter is of the second accuracy type, and if not, convert the network parameter into data of the second accuracy type to obtain the second network parameter, that is, generate the second network parameter
Step 407, performing back propagation calculation according to the actual output of the second accuracy type and the second network parameter, so as to update the second network parameter.
In this embodiment, the executing entity may perform a back propagation calculation according to the actual output of the second accuracy type and the second network parameter, so as to update the second network parameter. Thus, model training may be implemented to generate new models.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the model generation method in the present embodiment highlights the step of performing precision conversion on the data. Therefore, the technical effects of the solution described in this embodiment at least include:
first, a new model generation approach is provided.
Second, a more comprehensive model generation method is provided.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a model generation apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.
As shown in fig. 5, the model generation apparatus 500 of the present embodiment includes: an acquisition unit 501, a forward propagation unit 502 and a backward propagation unit 503. The acquisition unit is configured to acquire training sample data; the forward propagation unit is configured to utilize data of a first precision type to perform calculation in a forward propagation process based on the training sample data and the model to be trained so as to obtain actual output of the first precision type; a back propagation unit configured to perform a calculation using data of a second accuracy type in a back propagation process based on the actual output and the model to be trained, wherein the first accuracy type and the second accuracy type are different.
In this embodiment, specific processes of the obtaining unit 501, the forward propagation unit 502, and the backward propagation unit 503 of the model generating apparatus 500 and technical effects thereof can refer to related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of this embodiment, the determining unit is further configured to: determining the gradient value of the layer to be updated of the model to be trained as a first gradient value; and determining the scale factor of the layer to be updated according to the first gradient value and the current weight value of the weight in the layer to be updated.
In some optional implementations of this embodiment, the first precision type or the second precision type is a half precision type.
In some optional implementations of the embodiment, the accuracy of the first accuracy type indication is less than the accuracy of the second accuracy type indication.
In some optional implementations of the embodiment, the accuracy of the first accuracy type indication is greater than the accuracy of the second accuracy type indication.
In some optional implementations of this embodiment, the forward propagation unit is further configured to: in response to determining that the training sample data is not data of a first precision type, converting the training sample data into data of the first precision type, and generating first training sample data; in response to determining that the network parameter of the model to be trained is not the data of the first accuracy type, converting the network parameter into the data of the first accuracy type, and generating a first network parameter; and performing forward propagation calculation by using the first training sample data and the first network parameter to obtain actual output of a first precision type.
In some optional implementations of this embodiment, the back propagation unit is further configured to: converting the actual output from the first precision type to a second precision type; in response to determining that the network parameter of the model to be trained is not the data of the second precision type, converting the network parameter into the data of the second precision type, and generating a second network parameter; and performing back propagation calculation according to the actual output of the second precision type and the second network parameter so as to update the second network parameter.
It should be noted that details of implementation and technical effects of each unit in the model generation apparatus provided in the embodiment of the present disclosure may refer to descriptions of other embodiments in the present disclosure, and are not described herein again.
Referring now to fig. 6, a schematic diagram of an electronic device (e.g., a terminal or server of fig. 1) 600 suitable for implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring training sample data; calculating by using data of a first precision type in a forward propagation process based on the training sample data and the model to be trained to obtain actual output of the first precision type; in a back propagation process based on the actual output and the model to be trained, a calculation is performed using data of a second accuracy type, wherein the first accuracy type and the second accuracy type are different.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation on the unit itself, for example, an acquisition unit may also be described as a "unit to acquire training sample data".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (14)

1. A model generation method, comprising:
acquiring training sample data;
calculating by using data of a first precision type in a forward propagation process based on the training sample data and the model to be trained to obtain actual output of the first precision type;
and in the back propagation process based on the actual output and the model to be trained, calculating by using data of a second precision type, wherein the first precision type is different from the second precision type.
2. The method of claim 1, wherein the first or second precision type is a half precision type.
3. The method of claim 1, wherein the accuracy of the first accuracy type indication is less than the accuracy of the second accuracy type indication.
4. The method of claim 1, wherein the accuracy of the first accuracy type indication is greater than the accuracy of the second accuracy type indication.
5. The method according to any one of claims 1-4, wherein said performing a calculation using data of a first accuracy type in a forward propagation process based on said training sample data and a model to be trained to obtain an actual output of the first accuracy type comprises:
in response to determining that the training sample data is not data of a first precision type, converting the training sample data into data of the first precision type, and generating first training sample data;
in response to determining that the network parameters of the model to be trained are not data of the first precision type, converting the network parameters into data of the first precision type, and generating first network parameters;
and performing forward propagation calculation by using the first training sample data and the first network parameter to obtain actual output of a first precision type.
6. The method according to any one of claims 1-4, wherein said performing a calculation with data of a second type of accuracy in a back propagation process based on said actual output and said model to be trained comprises:
converting the actual output from a first precision type to a second precision type;
in response to determining that the network parameters of the model to be trained are not the data of the second precision type, converting the network parameters into the data of the second precision type, and generating second network parameters;
and performing back propagation calculation according to the actual output of the second precision type and the second network parameter so as to update the second network parameter.
7. A model generation apparatus comprising:
an acquisition unit configured to acquire training sample data;
the forward propagation unit is configured to utilize data of a first precision type to perform calculation in a forward propagation process based on the training sample data and the model to be trained so as to obtain actual output of the first precision type;
a back propagation unit configured to perform a calculation using data of a second accuracy type in a back propagation process based on the actual output and the model to be trained, wherein the first accuracy type and the second accuracy type are different.
8. The apparatus of claim 7, wherein the first or second precision type is a half precision type.
9. The apparatus of claim 7, wherein the accuracy of the first accuracy type indication is less than the accuracy of the second accuracy type indication.
10. The apparatus of claim 7, wherein the accuracy of the first accuracy type indication is greater than the accuracy of the second accuracy type indication.
11. The apparatus according to any one of claims 7-10, wherein the forward propagation unit is further configured to:
in response to determining that the training sample data is not data of a first precision type, converting the training sample data into data of the first precision type, and generating first training sample data;
in response to determining that the network parameters of the model to be trained are not data of the first precision type, converting the network parameters into data of the first precision type, and generating first network parameters;
and performing forward propagation calculation by using the first training sample data and the first network parameter to obtain actual output of a first precision type.
12. The apparatus of any one of claims 7-10, wherein the counter propagation unit is further configured to:
converting the actual output from a first precision type to a second precision type;
in response to determining that the network parameters of the model to be trained are not the data of the second precision type, converting the network parameters into the data of the second precision type, and generating second network parameters;
and performing back propagation calculation according to the actual output of the second precision type and the second network parameter so as to update the second network parameter.
13. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
14. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.
CN201811534701.8A 2018-12-14 2018-12-14 Model generation method and device Active CN109598344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811534701.8A CN109598344B (en) 2018-12-14 2018-12-14 Model generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811534701.8A CN109598344B (en) 2018-12-14 2018-12-14 Model generation method and device

Publications (2)

Publication Number Publication Date
CN109598344A CN109598344A (en) 2019-04-09
CN109598344B true CN109598344B (en) 2020-10-02

Family

ID=65961893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811534701.8A Active CN109598344B (en) 2018-12-14 2018-12-14 Model generation method and device

Country Status (1)

Country Link
CN (1) CN109598344B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650931A (en) * 2016-12-09 2017-05-10 曙光信息产业(北京)有限公司 Hybrid precision deep learning algorithm
CN107526709A (en) * 2016-06-15 2017-12-29 辉达公司 Handled using the tensor of low precision format
CN108734643A (en) * 2017-04-24 2018-11-02 英特尔公司 Use low precision and high-precision mixed inference
CN108805263A (en) * 2017-04-28 2018-11-13 英特尔公司 Multiple layers of variable precision and mixed type in network indicate

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180322382A1 (en) * 2017-05-03 2018-11-08 Intel Corporation Scaling half-precision floating point tensors for training deep neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107526709A (en) * 2016-06-15 2017-12-29 辉达公司 Handled using the tensor of low precision format
CN106650931A (en) * 2016-12-09 2017-05-10 曙光信息产业(北京)有限公司 Hybrid precision deep learning algorithm
CN108734643A (en) * 2017-04-24 2018-11-02 英特尔公司 Use low precision and high-precision mixed inference
CN108805263A (en) * 2017-04-28 2018-11-13 英特尔公司 Multiple layers of variable precision and mixed type in network indicate

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Mixed precision training;Paulius Micikevicius等;《ICLR 2018》;20180218;第1页摘要、第2页第2节第3段、第2页第3.1节第1段、第6页第2段、第6页第4.3节第1段 *
Training deep neural networks with low precision multiplications;Yoshua Bengio等;《ICLR 2015》;20150923;全文 *
深度神经网络并行化研究综述;朱虎明等;《计算机学报》;20180119;第41卷(第8期);全文 *

Also Published As

Publication number Publication date
CN109598344A (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN108520220B (en) Model generation method and device
CN110516678B (en) Image processing method and device
CN109800732B (en) Method and device for generating cartoon head portrait generation model
CN109829164B (en) Method and device for generating text
WO2020207174A1 (en) Method and apparatus for generating quantized neural network
CN110288625B (en) Method and apparatus for processing image
CN111340220A (en) Method and apparatus for training a predictive model
CN109981787B (en) Method and device for displaying information
CN110009101B (en) Method and apparatus for generating a quantized neural network
CN109598344B (en) Model generation method and device
CN111177433B (en) Method and apparatus for parallel processing of information
CN111709784A (en) Method, apparatus, device and medium for generating user retention time
CN111310896A (en) Method and apparatus for training neural networks
CN109670577B (en) Model generation method and device
CN111353585A (en) Structure searching method and device of neural network model
CN110503181B (en) Method and apparatus for generating a multi-layer neural network
CN111523640A (en) Training method and device of neural network model
CN111354345A (en) Method, apparatus, device and medium for generating speech model and speech recognition
US20200050924A1 (en) Data Processing Method and Apparatus for Neural Network
CN111949860B (en) Method and apparatus for generating a relevance determination model
CN109840072B (en) Information processing method and device
CN109977905B (en) Method and apparatus for processing fundus images
CN111275799B (en) Animation generation method and device and electronic equipment
CN109840109B (en) Method and apparatus for generating software development toolkit
CN112348162A (en) Method and apparatus for generating recognition models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant