CN109598344A - Model generating method and device - Google Patents

Model generating method and device Download PDF

Info

Publication number
CN109598344A
CN109598344A CN201811534701.8A CN201811534701A CN109598344A CN 109598344 A CN109598344 A CN 109598344A CN 201811534701 A CN201811534701 A CN 201811534701A CN 109598344 A CN109598344 A CN 109598344A
Authority
CN
China
Prior art keywords
precision
type
data
precision type
network parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811534701.8A
Other languages
Chinese (zh)
Other versions
CN109598344B (en
Inventor
胡耀全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201811534701.8A priority Critical patent/CN109598344B/en
Publication of CN109598344A publication Critical patent/CN109598344A/en
Application granted granted Critical
Publication of CN109598344B publication Critical patent/CN109598344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the present disclosure discloses model generating method and device.The specific embodiment of this method includes: to obtain training sample data;It during based on the training sample data and to the propagated forward of training pattern, is calculated using the data of the first precision type, obtains the reality output of the first precision type;It should calculated based on the reality output and using the data of the second precision type, wherein the first precision type is different with the second precision type in back-propagation process to training pattern.This embodiment offers new model generating modes.

Description

Model generating method and device
Technical field
The embodiment of the present disclosure is related to field of computer technology, and in particular to model generating method and device.
Background technique
With the development of artificial intelligence, model neural network based plays a role in more and more scenes.Nerve Network can refer to artificial neural network (Artificial Neural Network, ANN).Neural network is usually a kind of operation Model, by being interconnected to constitute between a large amount of node (or neuron).Each node can represent a kind of specific output Function, referred to as excitation function (activation function).Connection between every two node all represents one for by being somebody's turn to do The weighted value of connection signal, referred to as weight, this is equivalent to the memory of artificial neural network.
In the training process to model neural network based, a kind of number of precision type is generallyd use in the prior art According to being calculated.
Summary of the invention
The embodiment of the present disclosure proposes model generating method and device.
In a first aspect, the embodiment of the present disclosure provides a kind of model generating method, this method comprises: obtaining number of training According to;During based on the training sample data and to the propagated forward of training pattern, using the first precision type data into Row calculates, and obtains the reality output of the first precision type;Based on the reality output and should backpropagation to training pattern Cheng Zhong is calculated using the data of the second precision type, wherein the first precision type is different with the second precision type.
In some embodiments, the first precision type or the second precision type are half precision type.
In some embodiments, the precision of the first precision type instruction is less than the precision of the second precision type instruction.
In some embodiments, the precision of the first precision type instruction is greater than the precision of the second precision type instruction.
In some embodiments, should be during based on the training sample data and to the propagated forward of training pattern, benefit It is calculated with the data of the first precision type, obtains the reality output of the first precision type, comprising: in response to determining the training Sample data is not the data of the first precision type, which is converted to the data of the first precision type, is generated First training sample data;It is not the data of the first precision type in response to the determining network parameter to training pattern, by the net Network Parameter Switch is the data of the first precision type, generates first network parameter;Using the first training sample data and this One network parameter carries out propagated forward calculating, obtains the reality output of the first precision type.
In some embodiments, above-mentioned based on the reality output and should be in back-propagation process to training pattern, benefit It is calculated with the data of the second precision type, comprising: the reality output is converted into the second precision class by the first precision type Type;It is not the data of the second precision type in response to the network parameter for determining to training pattern, which is converted to the The data of two precision types generate the second network parameter;According to the reality output of the second precision type and second network parameter, Backpropagation calculating is carried out, to update second network parameter.
Second aspect, the embodiment of the present disclosure provide a kind of model generating means, which includes: acquiring unit, are matched It is set to and obtains training sample data;Propagated forward unit is configured to based on the training sample data and to training pattern It during propagated forward, is calculated using the data of the first precision type, obtains the reality output of the first precision type;Reversely Propagation unit is configured to based on the reality output and to utilize the second precision in back-propagation process to training pattern The data of type are calculated, wherein the first precision type is different with the second precision type.
In some embodiments, the first precision type or the second precision type are half precision type.
In some embodiments, the precision of the first precision type instruction is less than the precision of the second precision type instruction.
In some embodiments, the precision of the first precision type instruction is greater than the precision of the second precision type instruction.
In some embodiments, the propagated forward unit, is further configured to: not being in response to the determining training sample data The training sample data are converted to the data of the first precision type by the data of the first precision type, generate the first training sample Data;It is not the data of the first precision type in response to the determining network parameter to training pattern, which is converted to The data of first precision type generate first network parameter;Using the first training sample data and the first network parameter, into Row propagated forward calculates, and obtains the reality output of the first precision type.
In some embodiments, the backpropagation unit, is further configured to: the reality output is turned by the first precision type It is changed to the second precision type;It is not the data of the second precision type in response to the determining network parameter to training pattern, by the net Network Parameter Switch is the data of the second precision type, generates the second network parameter;According to the reality output of the second precision type and Second network parameter carries out backpropagation calculating, to update second network parameter.
The third aspect, the embodiment of the present disclosure provide a kind of electronic equipment, which includes: one or more processing Device;Storage device is stored thereon with one or more programs, when said one or multiple programs are by said one or multiple processing When device executes, so that said one or multiple processors realize the method as described in implementation any in first aspect.
Fourth aspect, the embodiment of the present disclosure provide a kind of computer-readable medium, are stored thereon with computer program, In, the method as described in implementation any in first aspect is realized when which is executed by processor.
The model generating method and device that the embodiment of the present disclosure provides, by during model training, propagated forward Process is calculated using the data of the first precision type, and back-propagation process is counted using the data of the second precision type It calculates, also, above-mentioned first precision type is different with above-mentioned second precision type, it is thus possible to update the network to training pattern Parameter, to generate new model, technical effect at least may include: to provide a kind of new model generating mode.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that some embodiments of the present disclosure can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the model generating method of the disclosure;
Fig. 3 is the schematic diagram according to an application scenarios of the model generating method of the disclosure;
Fig. 4 is the flow chart according to the further embodiment of the model generating method of the disclosure;
Fig. 5 is the structural schematic diagram according to one embodiment of the model generating means of the disclosure;
Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present disclosure.
Specific embodiment
The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the exemplary system of the embodiment of the model generating method or model generating means of the disclosure System framework 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 can be to provide the medium of communication link between terminal device 101,102,103 and server 105.Network 104 may include various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as model generates class and answers on terminal device 101,102,103 With the application of, conversational class, live streaming class application, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be the various electronic equipments with communication function, including but not limited to smart phone, tablet computer, e-book is read Read device, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert compression Standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert pressure Contracting standard audio level 4) player, pocket computer on knee and desktop computer etc..When terminal device 101,102,103 When for software, it may be mounted in above-mentioned cited electronic equipment.Its may be implemented into multiple softwares or software module (such as For providing Distributed Services), single software or software module also may be implemented into.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as to the model on terminal device 101,102,103 Generate the background server that class application is supported.Some parameters (such as training sample data that terminal device can generate model Deng) it is packaged as model generation request, model is then generated into request and is sent to background server.Background server can be to reception To model generate the data such as request and carry out the processing such as analyzing, and processing result (such as various parameters of model) is fed back into end End equipment.
It should be noted that model generating method provided by the embodiment of the present disclosure is generally executed by server 105, accordingly Ground, model generating means are generally positioned in server 105.Optionally, model generating method provided by the embodiment of the present disclosure It can also be executed by terminal device 101,102,103.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as multiple softwares or software module (such as providing Distributed Services), single software or software also may be implemented into Module.It is not specifically limited herein.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.
Referring to FIG. 2, it illustrates the processes 200 of one embodiment of model generating method.The present embodiment is mainly with this Method is applied to come in the electronic equipment for having certain operational capability for example, the electronic equipment can be service shown in fig. 1 Device.The model generating method, comprising the following steps:
Step 201, training sample data are obtained.
In the present embodiment, executing subject (such as server shown in FIG. 1) available training of model generating method Sample data.
Herein, training sample data can be used for training to training pattern, to generate new model.
In the present embodiment, the above-mentioned mind that can be unbred neural network to training pattern or training is not completed Through network.Herein, neural network can refer to artificial neural network.Common neural network is for example including deep neural network (Deep Neural Network, DNN), convolutional neural networks (Convolutional Neural Network, CNN), circulation Neural network (Recurrent Neural Network, RNN) etc..
Optionally, it can be preset to the network structure of training pattern, for example, it is desired to which it includes which that neural network, which is arranged, Which neuron layer, order of connection relationship between layers and every layer all include, the corresponding weight of each neuron (weight) and bias term (bias), every layer activation primitive etc..Network structure to training pattern can pass through various nets Network parameter indicates that network parameter can include but is not limited to weight, bias term etc..
As an example, when training pattern is depth convolutional neural networks, since depth convolutional neural networks are one The neural network of multilayer, it is therefore desirable to determine which layer depth convolutional neural networks include (for example, convolutional layer, pond layer, Quan Lian Connect layer, classifier etc.), order of connection relationship and each layer between layers include which network parameter (for example, Weight, bias term, the step-length of convolution) etc..Wherein, convolutional layer can be used for extracting characteristics of image.It can for each convolutional layer To determine how many convolution kernel, the size of each convolution kernel, the weight of each neuron in each convolution kernel, each convolution The corresponding bias term of core, the step-length etc. between adjacent convolution twice.
Step 202, during based on training sample data and to the propagated forward of training pattern, the first precision class is utilized The data of type are calculated, and the reality output of the first precision type is obtained.
In the present embodiment, above-mentioned executing subject can be based on above-mentioned training sample data and to the forward direction of training pattern It in communication process, is calculated using the data of the first precision type, obtains the reality output of the first precision type.
In the present embodiment, model training process can be calculated using real-coded GA.Real-coded GA is according to essence The difference of degree can be divided into following several types: half precision type, but precision type and type double precision.In general, 16 floating Point data may belong to half precision type, and 32 floating datas may belong to single precision type, and 64 floating datas can be with Belong to type double precision.
In the present embodiment, training sample data are imported to training pattern, is then obtained to the output layer of training pattern Reality output, this process are properly termed as propagated forward.Using the target output and reality output to training pattern, output is determined Layer error.
In the present embodiment, calculated using the data of the first precision type, it is meant that participate in calculate data be First precision type, that is, the training sample data calculated and the network parameter to training pattern are participated in, are the first precision classes Type.If the training sample data got and the network parameter to training pattern are not the first precision type, Ke Yizhuan It is changed to the first precision type, then carries out propagated forward calculating.
Step 203, in the back-propagation process based on reality output and to training pattern, the second precision type is utilized Data are calculated.
In the present embodiment, above-mentioned executing subject can be based on above-mentioned reality output and above-mentioned to the reversed of training pattern In communication process, calculated using the data of the second precision type.Thus, it is possible to the network parameter to training pattern is updated, So as to based on generating new model to training pattern.
In the present embodiment, error back propagation is carried out using output layer error amount, and then adjusts the net to training pattern Network parameter, this process are properly termed as backpropagation.As an example, back-propagation algorithm (Back can be used Propagation Algorithm, BP algorithm) and gradient descent method (such as stochastic gradient descent algorithm) to above-mentioned mould to be trained The network parameter of type is adjusted.
In the present embodiment, calculated using the data of the second precision type, it is meant that participate in calculate data be Second precision type, that is, the reality output calculated and the network parameter to training pattern are participated in, is the second precision type. If the reality output got and the network parameter to training pattern are not the second precision types, the second essence can be converted to Type is spent, then carries out backpropagation calculating.
Herein, above-mentioned first precision type is different with above-mentioned second precision type.
It should be noted that the prior art is using identical precision type in propagated forward and back-propagation process Data are calculated.In the disclosure, propagated forward is different with the precision type of data used by backpropagation.Skill as a result, Art effect at least may include:
First, provide a kind of new model generating mode.
Second, model training process is divided into two parts, a part is calculated using the data of degree of precision, another portion Divide and calculated using the data of lower accuracy, can not only improve the speed of model training, but also guarantee the accuracy of model training.
It is a signal of the application scenarios of the model generating method of embodiment according to Fig.2, with continued reference to Fig. 3, Fig. 3 Figure.In the application scenarios of Fig. 3:
Firstly, the available training sample data of server 301.
Then, during server 301 can be based on above-mentioned training sample data and to the propagated forward of training pattern, benefit It is calculated with the data of the first precision type, obtains the reality output of the first precision type.
Then, server 301 can in based on above-mentioned reality output and the above-mentioned back-propagation process to training pattern, It is calculated using the data of the second precision type.It is thus possible to update the network parameter to training pattern, obtain updated Network parameter, to generate new model.Herein, above-mentioned first precision type is different with above-mentioned second precision type.
The method provided by the above embodiment of the disclosure, by the way that during model training, propagated forward process is used The data of first precision type are calculated, and back-propagation process is calculated using the data of the second precision type, also, on It is different with above-mentioned second precision type to state the first precision type, it is thus possible to the network parameter to training pattern be updated, to generate New model, technical effect at least may include: to provide a kind of new model generating mode.
In some embodiments, the first precision type or the second precision type are half precision type.That is following two Mode is any:
The first, which is half precision type, which is any one of following: single precision class Type and type double precision.
Second, which is half precision type, which is any one of following: single precision class Type and type double precision.
It should be noted that in the prior art, it is generally recognized that: the data of half precision type are suitable for transmission (transmission speed Fastly), it is unsuitable for calculating (precision is inadequate).In some implementations of the disclosure, inventor expects can be by model training It is divided into two parts, then carries out a part of calculating (propagated forward or backpropagation) using the data of half precision type, uses The data of higher precision carry out the calculating of another part, to overcome technology prejudice, (data of half precision type are unsuitable for counting Calculate), it realizes: a part of calculating can be carried out using the data of half precision type, and improve calculating speed;It can utilize again higher The data of precision carry out another part calculating, and guarantee the accuracy of model training.
In some embodiments, the precision (being properly termed as the first precision) of above-mentioned first precision type instruction is less than above-mentioned the The precision (being properly termed as the second precision) of two precision types instruction.
It should be noted that the first precision is less than the second precision, i.e. propagated forward process uses lesser precision, reversed to pass Process is broadcast using biggish precision.Thus, it is possible in the forward propagation process, improve calculating speed;In back-propagation process, Improve the accuracy of updated network parameter.It is thus possible to not only improve the speed of model training, but also guarantee the standard of model training Exactness.
In some embodiments, the precision of above-mentioned first precision type instruction is greater than the essence of above-mentioned second precision type instruction Degree.
It should be noted that the first precision is greater than the second precision, i.e. propagated forward process uses biggish precision, reversed to pass Process is broadcast using lesser precision.Thus, it is possible in the forward propagation process, guarantee the accuracy calculated;In backpropagation Cheng Zhong improves the speed of calculating.It is thus possible to not only guarantee the accuracy of model training, but also improve the speed of model training.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of model generating method.The model generates The process 400 of method, comprising the following steps:
Step 401, training sample data and the network parameter to training pattern are obtained.
In the present embodiment, executing subject (such as server shown in FIG. 1) available training of model generating method Sample data and network parameter to training pattern.
Herein, the realization details of step 401, can be with reference to the description in step 201, and details are not described herein.
Step 402, in response to determining that training sample data are not the data of the first precision type, training sample data are turned The data of the first precision type are changed to, the first training sample data are generated.
In the present embodiment, above-mentioned executing subject can first determine whether training sample data are the first precision types, If it is not, then training sample data to be converted into the data of the first precision type, to obtain the first training sample data, i.e., Generate the first training sample data.
Step 403, it is not the data of the first precision type in response to the determining network parameter to training pattern, network is joined Number is converted to the data of the first precision type, generates first network parameter.
In the present embodiment, above-mentioned executing subject can first determine whether above-mentioned network parameter is the first precision type, If it is not, then the data that above-mentioned network parameter is converted into the first precision type are generated to obtain first network parameter First network parameter.
Herein, sequence is executed to step 402 and step 403, without limitation.
Step 404, using the first training sample data and first network parameter, propagated forward calculating is carried out, obtains first The reality output of precision type.
In the present embodiment, above-mentioned executing subject can use above-mentioned first training sample data and above-mentioned first network ginseng Number carries out propagated forward calculating, obtains the reality output of the first precision type.
Step 405, reality output is converted into the second precision type by the first precision type.
In the present embodiment, above-mentioned reality output can be converted to the second essence by the first precision type by above-mentioned executing subject Spend type.
Step 406, it is not the data of the second precision type in response to the determining network parameter to training pattern, network is joined Number is converted to the data of the second precision type, generates the second network parameter.
In the present embodiment, above-mentioned executing subject can first determine whether above-mentioned network parameter is the second precision type, If it is not, then the data that above-mentioned network parameter is converted into the second precision type are generated to obtain the second network parameter Second network parameter
Step 407, according to the reality output and the second network parameter of the second precision type, backpropagation calculating is carried out, with Update the second network parameter.
In the present embodiment, above-mentioned executing subject can join according to the reality output of the second precision type and second network Number carries out backpropagation calculating, to update second network parameter.It is thus possible to implementation model training, to generate new mould Type.
Figure 4, it is seen that compared with the corresponding embodiment of Fig. 2, the process of the model generating method in the present embodiment 400 highlight the step of carrying out precision conversion to data.The scheme of the present embodiment description, technical effect at least can wrap as a result, It includes:
First, provide a kind of new model generating mode.
Second, provide more comprehensively model generating method.
It generates and fills present disclose provides a kind of model as the realization to method shown in above-mentioned each figure with further reference to Fig. 5 The one embodiment set, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which specifically can be applied to respectively In kind electronic equipment.
As shown in figure 5, the model generating means 500 of the present embodiment include: acquiring unit 501,502 and of propagated forward unit Backpropagation unit 503.Wherein, acquiring unit is configured to obtain training sample data;Propagated forward unit, is configured to During based on the training sample data and to the propagated forward of training pattern, counted using the data of the first precision type It calculates, obtains the reality output of the first precision type;Backpropagation unit is configured to be based on the reality output and be somebody's turn to do wait train In the back-propagation process of model, calculated using the data of the second precision type, wherein the first precision type and this Two precision types are different.
In the present embodiment, the acquiring unit 501 of model generating means 500, propagated forward unit 502 and backpropagation list Member 503 it is specific processing and its brought technical effect can respectively refer to Fig. 2 corresponding embodiment in step 201, step 202 and The related description of step 203, details are not described herein.
In some optional implementations of the present embodiment, above-mentioned determination unit is further configured to: by above-mentioned wait train The gradient value of the layer to be updated of model, is determined as first gradient value;According in above-mentioned first gradient value and above-mentioned layer to be updated The present weight value of weight determines the scale factor of above-mentioned layer to be updated.
In some optional implementations of the present embodiment, the first precision type or the second precision type are half essence Spend type.
In some optional implementations of the present embodiment, the precision of the first precision type instruction is less than second essence Spend the precision of type instruction.
In some optional implementations of the present embodiment, the precision of the first precision type instruction is greater than second essence Spend the precision of type instruction.
In some optional implementations of the present embodiment, which is further configured to: in response to determination The training sample data are not the data of the first precision type, which is converted to the number of the first precision type According to the first training sample data of generation;It is not the data of the first precision type in response to the determining network parameter to training pattern, The network parameter is converted to the data of the first precision type, generates first network parameter;Utilize the first training sample data With the first network parameter, propagated forward calculating is carried out, the reality output of the first precision type is obtained.
In some optional implementations of the present embodiment, which is further configured to: the reality is defeated The second precision type is converted to by the first precision type out;In response to determining that the network parameter to training pattern is not the second precision The network parameter is converted to the data of the second precision type by the data of type, generates the second network parameter;According to the second precision The reality output of type and second network parameter carry out backpropagation calculating, to update second network parameter.
It should be noted that the realization details of each unit and technology effect in the model generating means that the embodiment of the present disclosure provides Fruit can be with reference to the explanation of other embodiments in the disclosure, and details are not described herein.
Below with reference to Fig. 6, it illustrates the electronic equipment (end of example as shown in figure 1 for being suitable for being used to realize the embodiment of the present disclosure End or server) 600 structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, should not be to the embodiment of the present disclosure Function and use scope bring any restrictions.
As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.) 601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608 Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM603 are connected with each other by bus 604. Input/output (I/O) interface 605 is also connected to bus 604.
In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 606 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 607 of dynamic device etc.;Storage device 608 including such as tape, hard disk etc.;And communication device 609.Communication device 609, which can permit electronic equipment 600, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 6 shows tool There is the electronic equipment 600 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 609, or from storage device 608 It is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, the embodiment of the present disclosure is executed Method in the above-mentioned function that limits.
It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit Any computer-readable medium other than storage media, the computer-readable signal media can send, propagate or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. are above-mentioned Any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not It is fitted into the electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by the electricity When sub- equipment executes, so that the electronic equipment: obtaining training sample data;Based on the training sample data and to training pattern Propagated forward during, calculated using the data of the first precision type, obtain the reality output of the first precision type;? Based on the reality output and it is somebody's turn to do to be calculated using the data of the second precision type in the back-propagation process of training pattern, Wherein, the first precision type is different with the second precision type.
The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions, for example, obtaining Unit is taken to be also described as " obtaining the unit of training sample data ".
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that the open scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from design disclosed above, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (14)

1. a kind of model generating method, comprising:
Obtain training sample data;
During based on the training sample data and to the propagated forward of training pattern, the data of the first precision type are utilized It is calculated, obtains the reality output of the first precision type;
In based on the reality output and the back-propagation process to training pattern, the data of the second precision type are utilized It is calculated, wherein the first precision type is different with the second precision type.
2. according to the method described in claim 1, wherein, the first precision type or the second precision type are half precision class Type.
3. according to the method described in claim 1, wherein, the precision of the first precision type instruction is less than second precision The precision of type instruction.
4. according to the method described in claim 1, wherein, the precision of the first precision type instruction is greater than second precision The precision of type instruction.
5. method according to any of claims 1-4, wherein described based on the training sample data and wait instruct During the propagated forward for practicing model, is calculated using the data of the first precision type, obtain the reality of the first precision type Output, comprising:
It is not the data of the first precision type in response to the determination training sample data, the training sample data is converted to The data of first precision type generate the first training sample data;
It is not the data of the first precision type in response to the determining network parameter to training pattern, the network parameter is converted to The data of first precision type generate first network parameter;
Using the first training sample data and the first network parameter, propagated forward calculating is carried out, the first precision is obtained The reality output of type.
6. method according to any of claims 1-4, wherein described based on the reality output and described wait instruct In the back-propagation process for practicing model, calculated using the data of the second precision type, comprising:
The reality output is converted into the second precision type by the first precision type;
It is not the data of the second precision type in response to the determining network parameter to training pattern, the network parameter is converted to The data of second precision type generate the second network parameter;
According to the reality output of the second precision type and second network parameter, backpropagation calculating is carried out, described in updating Second network parameter.
7. a kind of model generating means, comprising:
Acquiring unit is configured to obtain training sample data;
Propagated forward unit is configured to during based on the training sample data and to the propagated forward of training pattern, It is calculated using the data of the first precision type, obtains the reality output of the first precision type;
Backpropagation unit is configured in based on the reality output and the back-propagation process to training pattern, It is calculated using the data of the second precision type, wherein the first precision type is different with the second precision type.
8. device according to claim 7, wherein the first precision type or the second precision type are half precision class Type.
9. device according to claim 7, wherein the precision of the first precision type instruction is less than second precision The precision of type instruction.
10. device according to claim 7, wherein the precision of the first precision type instruction is greater than second essence Spend the precision of type instruction.
11. the device according to any one of claim 7-10, wherein the propagated forward unit is further configured to:
It is not the data of the first precision type in response to the determination training sample data, the training sample data is converted to The data of first precision type generate the first training sample data;
It is not the data of the first precision type in response to the determining network parameter to training pattern, the network parameter is converted to The data of first precision type generate first network parameter;
Using the first training sample data and the first network parameter, propagated forward calculating is carried out, the first precision is obtained The reality output of type.
12. the device according to any one of claim 7-10, wherein the backpropagation unit is further configured to:
The reality output is converted into the second precision type by the first precision type;
It is not the data of the second precision type in response to the determining network parameter to training pattern, the network parameter is converted to The data of second precision type generate the second network parameter;
According to the reality output of the second precision type and second network parameter, backpropagation calculating is carried out, described in updating Second network parameter.
13. a kind of electronic equipment, comprising:
One or more processors;
Storage device is stored thereon with one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.
14. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor Now such as method as claimed in any one of claims 1 to 6.
CN201811534701.8A 2018-12-14 2018-12-14 Model generation method and device Active CN109598344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811534701.8A CN109598344B (en) 2018-12-14 2018-12-14 Model generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811534701.8A CN109598344B (en) 2018-12-14 2018-12-14 Model generation method and device

Publications (2)

Publication Number Publication Date
CN109598344A true CN109598344A (en) 2019-04-09
CN109598344B CN109598344B (en) 2020-10-02

Family

ID=65961893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811534701.8A Active CN109598344B (en) 2018-12-14 2018-12-14 Model generation method and device

Country Status (1)

Country Link
CN (1) CN109598344B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163368A (en) * 2019-04-18 2019-08-23 腾讯科技(深圳)有限公司 Deep learning model training method, apparatus and system based on mixed-precision
CN113435520A (en) * 2021-06-30 2021-09-24 深圳市商汤科技有限公司 Neural network training method, device, equipment and computer readable storage medium
CN113469324A (en) * 2021-03-23 2021-10-01 中科创达软件股份有限公司 Model dynamic quantization method and device, electronic equipment and computer readable medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650931A (en) * 2016-12-09 2017-05-10 曙光信息产业(北京)有限公司 Hybrid precision deep learning algorithm
CN107526709A (en) * 2016-06-15 2017-12-29 辉达公司 Handled using the tensor of low precision format
CN108734643A (en) * 2017-04-24 2018-11-02 英特尔公司 Use low precision and high-precision mixed inference
US20180322382A1 (en) * 2017-05-03 2018-11-08 Intel Corporation Scaling half-precision floating point tensors for training deep neural networks
CN108805263A (en) * 2017-04-28 2018-11-13 英特尔公司 Multiple layers of variable precision and mixed type in network indicate

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107526709A (en) * 2016-06-15 2017-12-29 辉达公司 Handled using the tensor of low precision format
CN106650931A (en) * 2016-12-09 2017-05-10 曙光信息产业(北京)有限公司 Hybrid precision deep learning algorithm
CN108734643A (en) * 2017-04-24 2018-11-02 英特尔公司 Use low precision and high-precision mixed inference
CN108805263A (en) * 2017-04-28 2018-11-13 英特尔公司 Multiple layers of variable precision and mixed type in network indicate
US20180322382A1 (en) * 2017-05-03 2018-11-08 Intel Corporation Scaling half-precision floating point tensors for training deep neural networks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PAULIUS MICIKEVICIUS等: "Mixed precision training", 《ICLR 2018》 *
YOSHUA BENGIO等: "Training deep neural networks with low precision multiplications", 《ICLR 2015》 *
朱虎明等: "深度神经网络并行化研究综述", 《计算机学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163368A (en) * 2019-04-18 2019-08-23 腾讯科技(深圳)有限公司 Deep learning model training method, apparatus and system based on mixed-precision
CN110163368B (en) * 2019-04-18 2023-10-20 腾讯科技(深圳)有限公司 Deep learning model training method, device and system based on mixed precision
CN113469324A (en) * 2021-03-23 2021-10-01 中科创达软件股份有限公司 Model dynamic quantization method and device, electronic equipment and computer readable medium
CN113469324B (en) * 2021-03-23 2024-03-22 中科创达软件股份有限公司 Model dynamic quantization method, device, electronic equipment and computer readable medium
CN113435520A (en) * 2021-06-30 2021-09-24 深圳市商汤科技有限公司 Neural network training method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN109598344B (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN109902186A (en) Method and apparatus for generating neural network
CN111476871B (en) Method and device for generating video
CN110288049A (en) Method and apparatus for generating image recognition model
CN109410253B (en) For generating method, apparatus, electronic equipment and the computer-readable medium of information
CN109829432A (en) Method and apparatus for generating information
CN109961141A (en) Method and apparatus for generating quantization neural network
CN109981787A (en) Method and apparatus for showing information
CN109389072A (en) Data processing method and device
CN109598344A (en) Model generating method and device
CN109800730A (en) The method and apparatus for generating model for generating head portrait
CN108182472A (en) For generating the method and apparatus of information
CN110084317A (en) The method and apparatus of image for identification
CN109829164A (en) Method and apparatus for generating text
CN110009101A (en) Method and apparatus for generating quantization neural network
CN109377508A (en) Image processing method and device
CN110059623A (en) Method and apparatus for generating information
CN109670579A (en) Model generating method and device
CN110288683B (en) Method and device for generating information
CN109840072A (en) Information processing method and device
CN108595211A (en) Method and apparatus for output data
CN109840109A (en) Method and apparatus for generating Software Development Kit
CN114240506A (en) Modeling method of multi-task model, promotion content processing method and related device
CN109446379A (en) Method and apparatus for handling information
CN110780978B (en) Data processing method, system, device and medium
CN111949860B (en) Method and apparatus for generating a relevance determination model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.