CN109670579A

CN109670579A - Model generating method and device

Info

Publication number: CN109670579A
Application number: CN201811536493.5A
Authority: CN
Inventors: 胡耀全
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2018-12-14
Filing date: 2018-12-14
Publication date: 2019-04-23

Abstract

The embodiment of the present disclosure discloses model generating method and device.The specific embodiment of this method includes: the processor at least two processors, send the training sample subset that training sample is concentrated, wherein, processor is used for: based on to training pattern and the training sample subset received, propagated forward determines the reality output to training pattern；For the processor at least two processor, the reality output to training pattern that the processor determines is obtained；Based on acquired reality output, backpropagation is carried out, is determined to the corresponding first gradient of first network parameter preassigned in training pattern；According to the first gradient, the first network parameter is updated.This embodiment offers new model generating modes.

Description

Model generating method and device

Technical field

The embodiment of the present disclosure is related to field of computer technology, and in particular to model generating method and device.

Background technique

With the development of artificial intelligence, model neural network based plays a role in more and more scenes.Nerve Network can refer to artificial neural network (Artificial Neural Network, ANN).Neural network is usually a kind of operation Model, by being interconnected to constitute between a large amount of node (or neuron).Each node can represent a kind of specific output Function, referred to as excitation function (activation function).Connection between every two node all represents one for by being somebody's turn to do The weighted value of connection signal, referred to as weight, this is equivalent to the memory of artificial neural network.

Summary of the invention

The embodiment of the present disclosure proposes model generating method and device.

In a first aspect, the embodiment of the present disclosure provides a kind of model generating method, this method comprises: at least two processing Processor in device sends the training sample subset that training sample is concentrated, wherein processor is used for: based on to training pattern and The training sample subset received, propagated forward determine the reality output to training pattern；For at least two processor Processor, obtain the processor determine the reality output to training pattern；Based on acquired reality output, carry out reversed It propagates, determines to the corresponding first gradient of first network parameter preassigned in training pattern；According to the first gradient, update The first network parameter.

In some embodiments, which includes the network parameter in batch normalization layer.

In some embodiments, the processor at least two processor is also used to: the reality determined based on the processor Border output, carries out error back propagation, determines to corresponding second gradient of the second network parameter preassigned in training pattern.

In some embodiments, this method further include: for the processor at least two processor, obtain the processing The second gradient that device determines；According to the second acquired gradient, second network parameter is updated.

In some embodiments, processor is also used to: carrying out propagated forward calculating using the data of the first precision type；It adopts Backpropagation calculating is carried out with the data of the second precision type, wherein above-mentioned first precision type and above-mentioned second precision type It is different.

In some embodiments, the first precision type or the second precision type are half precision type.

In some embodiments, this updates the first network parameter according to the first gradient, comprising: uses the second precision The data of type carry out backpropagation, determine to the corresponding first gradient of first network parameter preassigned in training pattern.

Second aspect, the embodiment of the present disclosure provide a kind of model generating means, which includes: transmission unit, are matched The processor being set at least two processors sends the training sample subset that training sample is concentrated, wherein processor is used In: based on to training pattern and the training sample subset received, propagated forward determines the reality output to training pattern；First Acquiring unit, is configured to for the processor at least two processor, obtain that the processor determines to training pattern Reality output；Determination unit is configured to carry out backpropagation based on acquired reality output, determine to training pattern In the corresponding first gradient of preassigned first network parameter；First updating unit is configured to according to the first gradient, more The new first network parameter.

In some embodiments, device further include: second acquisition unit is configured at least two processor In processor, obtain the processor determine the second gradient；Second updating unit is configured to according to the second acquired ladder Degree, updates second network parameter.

In some embodiments, first determination unit be further configured to include: using the second precision type data into Row backpropagation is determined to the corresponding first gradient of first network parameter preassigned in training pattern.

The third aspect, the embodiment of the present disclosure provide a kind of electronic equipment, which includes: one or more processing Device；Storage device is stored thereon with one or more programs, when said one or multiple programs are by said one or multiple processing When device executes, so that said one or multiple processors realize the method as described in implementation any in first aspect.

Fourth aspect, the embodiment of the present disclosure provide a kind of computer-readable medium, are stored thereon with computer program, In, the method as described in implementation any in first aspect is realized when which is executed by processor.

The model generating method and device that the embodiment of the present disclosure provides, by above-mentioned executing subject at least two processors Training sample subset is sent, based on to training pattern and the training sample subset received, propagated forward is determined wait instruct processor Practice the reality output of model；Then, above-mentioned executing subject carries out backpropagation based on acquired reality output, determines preparatory The specified corresponding first gradient of first network parameter, finally updates the first network parameter according to the first gradient, thus, it can To update the network parameter to training pattern, to generate new model, technical effect at least may include: to provide one kind newly Model generating mode.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the disclosure is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that some embodiments of the present disclosure can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the model generating method of the disclosure；

Fig. 3 is the schematic diagram according to an application scenarios of the model generating method of the disclosure；

Fig. 4 is the flow chart according to the further embodiment of the model generating method of the disclosure；

Fig. 5 is the structural schematic diagram according to one embodiment of the model generating means of the disclosure；

Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present disclosure.

Specific embodiment

The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the exemplary system of the embodiment of the model generating method or model generating means of the disclosure System framework 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 can be to provide the medium of communication link between terminal device 101,102,103 and server 105.Network 104 may include various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as model generates class and answers on terminal device 101,102,103 With the application of, conversational class, live streaming class application, searching class application, instant messaging tools, mailbox client, social platform software etc..

Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be the various electronic equipments with communication function, including but not limited to smart phone, tablet computer, e-book is read Read device, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert compression Standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert pressure Contracting standard audio level 4) player, pocket computer on knee and desktop computer etc..When terminal device 101,102,103 When for software, it may be mounted in above-mentioned cited electronic equipment.Its may be implemented into multiple softwares or software module (such as For providing Distributed Services), single software or software module also may be implemented into.It is not specifically limited herein.

Optionally, terminal device 101,102,103 may include one or more processors.

Server 105 can be to provide the server of various services, such as to the model on terminal device 101,102,103 Generate the background server that class application is supported.Some parameters that server 105 can generate model are (such as to training pattern Network parameter and training sample data etc.) it is sent to terminal device 101,102,103.Terminal device 101,102,103 can benefit It is calculated with respective processor, obtains calculated result.Then, terminal device 101,102,103 can be sent with calculated result To server 105, server 105 can update the network parameter to training pattern according to the calculated result received.

It should be noted that model generating method provided by the embodiment of the present disclosure is generally executed by server 105, accordingly Ground, model generating means are generally positioned in server 105.Optionally, model generating method provided by the embodiment of the present disclosure It can also be executed by terminal device 101,102,103.

It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software To be implemented as multiple softwares or software module (such as providing Distributed Services), single software or software also may be implemented into Module.It is not specifically limited herein.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

Referring to FIG. 2, it illustrates the processes 200 of one embodiment of model generating method.The present embodiment is mainly with this Method is applied to come in the electronic equipment for having certain operational capability for example, the electronic equipment can be service shown in fig. 1 Device.The model generating method, comprising the following steps:

Step 201, the processor at least two processors sends the training sample subset that training sample is concentrated.

In the present embodiment, the executing subject (such as server shown in FIG. 1) of model generating method can be at least two Processor in a processor sends the training sample subset that training sample is concentrated.

Herein, above-mentioned processor is used for: based on to training pattern and the training sample subset received, propagated forward is true The fixed reality output to training pattern.

In the present embodiment, above-mentioned processor can be central processing unit (Central Processing Unit, CPU), It is also possible to graphics processor (Graphics Processing Unit, GPU).

In the present embodiment, above-mentioned at least two processor can be located on a computer, can also be located at more meters On calculation machine.

In the present embodiment, training sample set can be divided into multiple training sample subsets (mini-batch), wait train Model can replicate more parts.On each processor of multiple processors, the model copy (model to training pattern is placed replica).That is, data parallel mode can be based in the present embodiment, training pattern is treated using at least two processors and is carried out Training.On each processor of multiple processors, training sample subset is placed.

It is averaged, as an example, above-mentioned executing subject can be responsible for gradient to the network parameter update of training pattern.It does not exist together Manage model copy of the device training to training pattern.Based on training sample subset training, model copy has independence.Different processor Model copy step of the training to training pattern: processor training sample subset, propagated forward calculate reality output.

In the present embodiment, the above-mentioned mind that can be unbred neural network to training pattern or training is not completed Through network.Herein, neural network can refer to artificial neural network.Common neural network is for example including deep neural network (Deep Neural Network, DNN), convolutional neural networks (Convolutional Neural Network, CNN), circulation Neural network (Recurrent Neural Network, RNN) etc..

Optionally, it can be preset to the network structure of training pattern, for example, it is desired to which it includes which that neural network, which is arranged, Which neuron layer, order of connection relationship between layers and every layer all include, the corresponding weight of each neuron (weight) and bias term (bias), every layer activation primitive etc..Network structure to training pattern can pass through various nets Network parameter indicates that network parameter can include but is not limited to weight, bias term etc..

As an example, when training pattern is depth convolutional neural networks, since depth convolutional neural networks are one The neural network of multilayer, it is therefore desirable to determine which layer depth convolutional neural networks include (for example, convolutional layer, pond layer, Quan Lian Connect layer, classifier etc.), order of connection relationship and each layer between layers include which network parameter (for example, Weight, bias term, the step-length of convolution) etc..Wherein, convolutional layer can be used for extracting characteristics of image.It can for each convolutional layer To determine how many convolution kernel, the size of each convolution kernel, the weight of each neuron in each convolution kernel, each convolution The corresponding bias term of core, the step-length etc. between adjacent convolution twice.

Step 202, for the processor at least two processors, the reality to training pattern that the processor determines is obtained Border output.

In the present embodiment, above-mentioned executing subject can obtain at this processor at least two processor Manage the reality output to training pattern that device determines.

Step 203, based on acquired reality output, backpropagation is carried out, is determined to preassigning in training pattern The corresponding first gradient of first network parameter.

In the present embodiment, above-mentioned executing subject can based on acquired reality output, carry out backpropagation, determine to The corresponding first gradient of preassigned first network parameter in training pattern.

In the present embodiment, reality output acquired in above-mentioned executing subject is at least two.It is practical according at least two Output is handled (such as averaged), can be processed rear reality output.Above-mentioned executing subject can be according to wait instruct The target output for practicing model and reality output after above-mentioned processing, determine output layer error.

In the present embodiment, above-mentioned executing subject using output layer error amount carry out error back propagation, and then adjust to The network parameter of training pattern, this process are properly termed as backpropagation.As an example, back-propagation algorithm can be used (Back Propagation Algorithm, BP algorithm) and gradient descent method (such as stochastic gradient descent algorithm) to it is above-mentioned to The network parameter of training pattern is adjusted.

In the present embodiment, during carrying out error back propagation, can use gradient descent method determine it is each to The weight changes degree of update step and direction.Herein, the direction that gradient can be used for calculating in neural network training process With weight changes degree, thus be correctly oriented and suitably amount update network weight.

In the present embodiment, backpropagation is carried out by above-mentioned executing subject, determines and updates the first of first network parameter Gradient.Herein, first network parameter can preassign.

Optionally, the layer that can be treated in training pattern is defined, and distinguishes the first layer to be updated and second to more Mew layer.Network parameter in first layer to be updated, it is possible to specify be first network parameter；Network ginseng in second layer to be updated Number, it is possible to specify be the second network parameter.

Optionally, above-mentioned model to be processed may include batch normalization (Batch Normalization, BN) layer.On Stating first network parameter includes the network parameter in BN layers.Preassigning the network parameter in BN layers is first network parameter.

As an example, BN layers of Computing Principle is as follows: receiving input data；Seek the mean value of input data；Seek input data Variance；It according to above-mentioned mean value and above-mentioned variance, is standardized, obtains standardized value (such as in the way of normal state It realizes)；Using trained in advance network parameter α and β, standardized value is converted, output data is obtained.

It should be noted that in the model that training is completed, BN layer usually to the training sample set corresponding data of the overall situation into Row batch normalized.It is trained in the prior art if treating training pattern using multiple processors, usually in list BN layers of the corresponding gradient of network parameter is sought on processor respectively.Inventor expects, if not using uniprocessor reversely to pass The corresponding gradient of network parameter of BN layers of calculating is broadcast, but is used by the reality output of the comprehensive each processor of above-mentioned executing subject The case where progress backpropagation, obtains BN layers of the corresponding gradient of network parameter, can be with combined training sample set, carries out BN layers of net The update of network parameter.(the global corresponding data of training sample set are criticized it is thus possible to more be bonded BN layers of applicable cases Measure normalized) so that the updated value of BN layer network parameter is more accurate, to improve the accurate of trained model Degree.According to the application practice of inventor, the mode that the gradient of BN layers of network parameter is obtained by above-mentioned executing subject backpropagation, The accuracy for the model trained can be promoted 1.5%, this is acknowledged as very big promotion in artificial intelligence field.

Step 204, according to first gradient, first network parameter is updated.

In the present embodiment, above-mentioned executing subject can update the first network parameter according to the first gradient.

As an example, network parameter can be updated in the following manner: by the product of learning rate and current network parameter value, It is determined as network parameter variable；Then, by the current network parameter value of network parameter to be updated and above-mentioned network parameter variable Difference is determined as the network parameter values of network parameter to be updated.

In the present embodiment, by updating first network parameter, the update of model may be implemented, that is, generate new model.

It should be noted that in the training process to model neural network based, usually whole nets in the prior art The corresponding gradient of network parameter is determined by processor.In the disclosure, the network parameter to training pattern may include specified One network parameter, the corresponding gradient of first network parameter determine that technical effect at least can wrap as a result, by above-mentioned executing subject It includes:

First, provide a kind of new model generating mode.

Second, improve model training speed.The computing resource of above-mentioned executing subject also can use, and carry out one The backpropagation divided calculates, thus, the speed of determining integral gradient is improved, thus, improve model training speed.

Third, it is possible to specify some parameters carry out global update, such as BN layers of network parameter, to first network parameter meter Global gradient is calculated, gradient is calculated relative on each processor, the accuracy rate of updated first network parameter can be improved. It is thus possible to improve the performance of model generated.

It is a signal of the application scenarios of the model generating method of embodiment according to Fig.2, with continued reference to Fig. 3, Fig. 3 Figure.In the application scenarios of Fig. 3:

Firstly, server 301 can send the training sample subset 304 that training sample is concentrated to processor 302, to processing Device 303 sends the training sample subset 305 that training sample is concentrated,.Herein, processor 301 is used for: based on to training pattern and The training sample subset 304 received, propagated forward determine the reality output 306 to training pattern.Herein, processor 302 For: based on to training pattern and the training sample subset 305 received, propagated forward determines the reality output to training pattern 307。

Then, the available reality output 306 of server 301 and reality output 307.

Then, server 301 can carry out backpropagation based on acquired reality output 306 and reality output 307, It determines to the corresponding first gradient of first network parameter (such as network parameter in BN layers) preassigned in training pattern.

Finally, server 301 can update the first network parameter according to the first gradient.

The method provided by the above embodiment of the disclosure is sent at least two processors by above-mentioned executing subject and is trained Sample set, based on to training pattern and the training sample subset received, propagated forward is determined to training pattern processor Reality output；Then, above-mentioned executing subject carries out backpropagation based on acquired reality output, determines preassigned the The corresponding first gradient of one network parameter finally updates the first network parameter according to the first gradient, it is thus possible to update to The network parameter of training pattern, to generate new model, technical effect at least may include: to provide a kind of new model generation Mode.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of model generating method.The model generates The process 400 of method, comprising the following steps:

Step 401, the processor at least two processors sends the training sample subset that training sample is concentrated.

Herein, processor is used for: based on to training pattern and the training sample subset that receives, propagated forward determine to The reality output of training pattern；Based on the reality output that the processing determines, error back propagation is carried out, is determined in training pattern Corresponding second gradient of preassigned second network parameter.

In some embodiments, processor is also used to: carrying out propagated forward calculating using the data of the first precision type；It adopts Backpropagation calculating is carried out with the data of the second precision type.

Herein, model training process can be calculated using real-coded GA.Real-coded GA according to precision not Together, following several types: half precision type, but precision type and type double precision can be divided into.In general, 16 floating datas It may belong to half precision type, 32 floating datas may belong to single precision type, and 64 floating datas may belong to double Precision type.

Herein, it is calculated using the data of the first precision type, it is meant that the data for participating in calculating are the first essence Type is spent, that is, participates in the training sample data calculated and the network parameter to training pattern, is the first precision type.Such as The training sample data that fruit gets and the network parameter to training pattern are not the first precision types, can be converted to first Precision type, then carry out propagated forward calculating.

Herein, it is calculated using the data of the second precision type, it is meant that the data for participating in calculating are the second essence Type is spent, that is, participates in the reality output calculated and the network parameter to training pattern, is the second precision type.If obtained The reality output got and the network parameter to training pattern are not the second precision types, can be converted to the second precision type , then carry out backpropagation calculating.

It should be noted that model training process is divided into two parts, a part is counted using the data of degree of precision It calculates, another part is calculated using the data of lower accuracy, can not only improve the speed of model training, but also guarantee model training Accuracy.

In some embodiments, the first precision type or the second precision type are half precision type.That is following two Mode is any:

The first, which is half precision type, which is any one of following: single precision class Type and type double precision.

Second, which is half precision type, which is any one of following: single precision class Type and type double precision.

It should be noted that in the prior art, it is generally recognized that: the data of half precision type are suitable for transmission (transmission speed Fastly), it is unsuitable for calculating (precision is inadequate).In some implementations of the disclosure, inventor expects can be by model training It is divided into two parts, then carries out a part of calculating (propagated forward or backpropagation) using the data of half precision type, uses The data of higher precision carry out the calculating of another part, to overcome technology prejudice, (data of half precision type are unsuitable for counting Calculate), it realizes: a part of calculating can be carried out using the data of half precision type, and improve calculating speed；It can utilize again higher The data of precision carry out another part calculating, and guarantee the accuracy of model training.

Step 402, for the processor at least two processors, the reality to training pattern that the processor determines is obtained Border output.

Step 403, based on acquired reality output, backpropagation is carried out, is determined to preassigned in training pattern The corresponding first gradient of first network parameter.

Step 404, according to first gradient, first network parameter is updated.

In some embodiments, step 404 may include: to carry out backpropagation using the data of the second precision type, really Determine to the corresponding first gradient of first network parameter preassigned in training pattern.

Step 401, step 402, the related realization details and technical effect of step 403 and step 404, please refer to step 201, the description in step 202, step 203 and step 204, details are not described herein.

Step 405, it for the processor at least two processors, obtains the processor and determines the second gradient.

In the present embodiment, above-mentioned executing subject can obtain at this processor at least two processor Manage the second gradient that device determines.

Optionally, backpropagation is carried out by the processor for handling above-mentioned training sample subset, determines the second gradient, then by upper Executing subject in summary the second gradient is stated, determines the second gradient for updating the second network parameter.

As an example, above-mentioned processor can calculate output layer loss according to reality output and target output, according to output The specified corresponding gradient of the second network parameter of layer costing bio disturbance.All processors export the second gradient to above-mentioned executing subject. Above-mentioned executing subject can be averaging the second gradient of the second network parameter received, be averaged according to what average operation obtained Second gradient carries out network parameter update to training pattern.

Step 406, according to the second acquired gradient, the second network parameter is updated.

In the present embodiment, above-mentioned executing subject can update second network parameter according to the second acquired gradient.

In the present embodiment, by updating first network parameter and the second network parameter, the update of model may be implemented, i.e., Generate new model.

In some embodiments, above-mentioned steps 406 may include: reversely to be passed using the data of the second precision type It broadcasts, determines to the corresponding first gradient of first network parameter preassigned in training pattern.Optionally, above-mentioned second precision class The precision of type instruction can be higher than the precision of above-mentioned first precision type instruction.Thus, it is possible to realize that back-propagation process is opposite Higher precision is used in propagated forward, to improve the accuracy for updating network parameter based on backpropagation.

Figure 4, it is seen that compared with the corresponding embodiment of Fig. 2, the process of the model generating method in the present embodiment 400 highlight and are divided into two kinds (first network parameter and second network parameters) to the network parameter of training pattern, first network ginseng It counts corresponding gradient to be determined by above-mentioned executing subject, the step that the corresponding gradient of the second network parameter is determined by above-mentioned processor. The scheme of the present embodiment description, technical effect at least may include: as a result,

First, provide a kind of new model generating mode.

Second, provide more comprehensively model generating method.

Third improves model training speed.When processor carries out backpropagation calculating, above-mentioned executing subject Computing resource also can use, and the backpropagation for carrying out a part calculates, it is thus possible to by above-mentioned executing subject and above-mentioned Processor carries out backpropagation calculating simultaneously, improves the speed of determining integral gradient, thus, improve model training speed.

It generates and fills present disclose provides a kind of model as the realization to method shown in above-mentioned each figure with further reference to Fig. 5 The one embodiment set, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which specifically can be applied to respectively In kind electronic equipment.

As shown in figure 5, the model generating means 500 of the present embodiment include: transmission unit 501, first acquisition unit 502, Determination unit 503 and the first updating unit 504.Wherein, transmission unit, the processing being configured at least two processors Device sends the training sample subset that training sample is concentrated, wherein processor is used for: based on to training pattern and the instruction received Practice sample set, propagated forward determines the reality output to training pattern；First acquisition unit, be configured to for this at least two Processor in a processor obtains the reality output to training pattern that the processor determines；Determination unit is configured to base In acquired reality output, backpropagation is carried out, determination is corresponding to first network parameter preassigned in training pattern First gradient；First updating unit is configured to update the first network parameter according to the first gradient.

In the present embodiment, the transmission unit 501 of model generating means 500, first acquisition unit 502, determination unit 503 Specific processing and its brought technical effect with the first updating unit 504 can be respectively with reference to steps in Fig. 2 corresponding embodiment 201, the related description of step 202, step 203 and step 204, details are not described herein.

In some embodiments, the device further include: second acquisition unit (not shown), be configured to for this at least two Processor in a processor obtains the second gradient that the processor determines；Second updating unit (not shown), is configured to root According to the second acquired gradient, second network parameter is updated.

It should be noted that the realization details of each unit and technology effect in the model generating means that the embodiment of the present disclosure provides Fruit can be with reference to the explanation of other embodiments in the disclosure, and details are not described herein.

Below with reference to Fig. 6, it illustrates the electronic equipment (end of example as shown in figure 1 for being suitable for being used to realize the embodiment of the present disclosure End or server) 600 structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, should not be to the embodiment of the present disclosure Function and use scope bring any restrictions.

As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.) 601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608 Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM603 are connected with each other by bus 604. Input/output (I/O) interface 605 is also connected to bus 604.

In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 606 of head, microphone, accelerometer, gyroscope etc.；Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 607 of dynamic device etc.；Storage device 608 including such as tape, hard disk etc.；And communication device 609.Communication device 609, which can permit electronic equipment 600, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 6 shows tool There is the electronic equipment 600 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with Alternatively implement or have more or fewer devices.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 609, or from storage device 608 It is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, the embodiment of the present disclosure is executed Method in the above-mentioned function that limits.

It should be noted that the above-mentioned computer-readable medium of the disclosure can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In open, computer-readable signal media may include in a base band or as the data-signal that carrier wave a part is propagated, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable and deposit Any computer-readable medium other than storage media, the computer-readable signal media can send, propagate or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: electric wire, optical cable, RF (radio frequency) etc. are above-mentioned Any appropriate combination.

Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment；It is also possible to individualism, and not It is fitted into the electronic equipment.

Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by the electricity When sub- equipment executes, so that the electronic equipment: the processor at least two processors sends the training that training sample is concentrated Sample set, wherein processor is used for: based on to training pattern and the training sample subset that receives, propagated forward determine to The reality output of training pattern；For the processor at least two processor, the mould to be trained that the processor determines is obtained The reality output of type；Based on acquired reality output, backpropagation is carried out, is determined in training pattern preassigned first The corresponding first gradient of network parameter；According to the first gradient, the first network parameter is updated.

The calculating of the operation for executing the disclosure can be write with one or more programming languages or combinations thereof Machine program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C+ +, it further include conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package, Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part. In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN) Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service Provider is connected by internet).

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present disclosure can be realized by way of software, can also be by hard The mode of part is realized.Wherein, the title of unit does not constitute the restriction to the unit itself under certain conditions, for example, hair Unit is sent to be also described as " sending the unit for the training sample subset that training sample is concentrated ".

Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that the open scope involved in the disclosure, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from design disclosed above, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed in the disclosure Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of model generating method, comprising:

Processor at least two processors sends the training sample subset that training sample is concentrated, wherein processor is used In: based on to training pattern and the training sample subset received, propagated forward determines the reality output to training pattern；

For the processor at least two processor, the reality output to training pattern that the processor determines is obtained；

Based on acquired reality output, backpropagation is carried out, is determined to first network parameter preassigned in training pattern Corresponding first gradient；

According to the first gradient, the first network parameter is updated.

2. according to the method described in claim 1, wherein, the first network parameter includes the network ginseng in batch normalization layer Number.

3. according to the method described in claim 1, wherein, the processor at least two processor is also used to:

Based on the reality output that the processor determines, error back propagation is carried out, is determined in training pattern preassigned the Corresponding second gradient of two network parameters.

4. according to the method described in claim 3, wherein, the method also includes:

For the processor at least two processor, the second gradient that the processor determines is obtained；

According to the second acquired gradient, second network parameter is updated.

5. method according to any of claims 1-4, wherein processor is also used to:

Propagated forward calculating is carried out using the data of the first precision type；

Backpropagation calculating is carried out using the data of the second precision type, wherein above-mentioned first precision type and above-mentioned second essence It is different to spend type.

6. according to the method described in claim 5, wherein, the first precision type or the second precision type are half precision class Type.

7. it is described that the first network parameter is updated according to the first gradient according to the method described in claim 5, wherein, Include:

Backpropagation is carried out using the data of the second precision type, is determined to first network parameter preassigned in training pattern Corresponding first gradient.

8. a kind of model generating means, comprising:

Transmission unit, the processor being configured at least two processors send training sample that training sample is concentrated Collection, wherein processor is used for: based on to training pattern and the training sample subset received, propagated forward determines mould to be trained The reality output of type；

First acquisition unit is configured to obtain what the processor determined for the processor at least two processor Reality output to training pattern；

Determination unit is configured to carry out backpropagation based on acquired reality output, determine to refer in advance in training pattern The fixed corresponding first gradient of first network parameter；

First updating unit is configured to update the first network parameter according to the first gradient.

9. device according to claim 8, wherein the first network parameter includes the network ginseng in batch normalization layer Number.

10. device according to claim 8, wherein the processor at least two processor is also used to:

11. device according to claim 10, wherein described device further include:

Second acquisition unit is configured to obtain what the processor determined for the processor at least two processor Second gradient；

Second updating unit is configured to update second network parameter according to the second acquired gradient.

12. the device according to any one of claim 8-11, wherein processor is also used to:

13. device according to claim 12, wherein the first precision type or the second precision type are half precision Type.

14. device according to claim 12, wherein first determination unit be further configured to include:

15. a kind of electronic equipment, comprising:

One or more processors；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-7.

16. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor The now method as described in any in claim 1-7.