CN104978601B

CN104978601B - neural network model training system and method

Info

Publication number: CN104978601B
Application number: CN201510368328.3A
Authority: CN
Inventors: 郭志懋; 邹永强; 金涬; 李毅
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2015-06-26
Filing date: 2015-06-26
Publication date: 2017-08-25
Anticipated expiration: 2035-06-26
Also published as: CN104978601A

Abstract

The present invention relates to a kind of neural network model training system and method, the system includes coordinating the computing device of equipment and predetermined number；The coordination equipment is used to synchronize control to each computing device by the layer of neural network model；Each computing device is used in the case where coordinating Synchronization Control of the equipment by the layer of neural network model, according to the training sample for being input to neural network model, according to the learning sequence of neural network model, the node of each computing device is divided into processing neural network model in equivalent layer, by handle the data that are generated of node be sent to model storage device or next layer of node being connected with this device node where computing device, until the training sample training to input terminates.Neural network model training system and method that the present invention is provided, solve the problem of limitation of single physical equipment causes to limit neural network model scale.

Description

Neural network model training system and method

Technical field

The present invention relates to machine learning techniques field, more particularly to a kind of neural network model training system and method.

Background technology

Neural network model is a kind of machine learning model for simulating brain structure.In machine learning field, neutral net It is usually utilized to more complicated task modeling.The scale of neutral net, including depth and width can all be adjusted, depending on answering Depending on field and problem scale.Because the superpower ability to express of neutral net, speech recognition, image classification, recognition of face, The application fields such as natural language processing, advertisement putting are widely used.

The structure of neural network model includes multilayer, and first layer is input layer, and topmost one layer is output layer, and centre includes Zero or more hidden layer, each layer includes one or more nodes.Input layer scale according to the quantity of input variable determine, Output layer scale then depends on class number.Hidden layer includes multiple neurons, and adjustment neuronal quantity can just adjust nerve The complexity and ability to express of network model.In general, neutral net is more wide deeper, and its modeling ability is stronger, but training The cost of this model flower is also higher.

The training process of neural network model, is the input and output according to training sample, in an iterative manner to nerve Parameters value in network is adjusted the learning process until convergent process, also referred to as neutral net.In nerve net In the practical application of network model, data volume is increasing, in order to preferably from mass data mining mode and information, it is necessary to build Large-scale neural network model is found, such as, in advertisement field, the scale of input dimension is probably more than one hundred million grades so that nerve net The number of nodes of network model increases sharply.So cause the internal memory of a physical server to deposit and fail to lay down large-scale nerve Network model, limits the scale of neural network model, have impact on the performance of neural network model.

The content of the invention

Based on this, it is necessary to which the limitation for current physical equipment causes the technology for limiting neural network model scale There is provided a kind of neural network model training system and method for problem.

A kind of neural network model training system, the system includes coordinating the computing device of equipment and predetermined number：

The coordination equipment is used to synchronize control to each computing device by the layer of neural network model；

Each computing device is used in the case where coordinating Synchronization Control of the equipment by the layer of neural network model, according to being input to The training sample of neural network model, according to the learning sequence of neural network model, is handled in neural network model in equivalent layer Be divided into the node of each computing device, will handle the data that are generated of node be sent to model storage device or with this equipment Computing device where next layer of the node that node is connected, until the training sample training to input terminates.

A kind of neural network model training method, methods described includes：

The computing device of predetermined number is in the case where coordinating Synchronization Control of the equipment by the layer of neural network model, according to being input to The training sample of neural network model, according to the learning sequence of neural network model, is handled in neural network model in equivalent layer Be divided into the node of each computing device, will handle the data that are generated of node be sent to model storage device or with this equipment Computing device where next layer of the node that node is connected, until the training sample training to input terminates.

Above-mentioned neural network model training system and method, divide after every layer of neural network model of node is split It is scattered on multiple computing devices, so each computing device only handles a part of calculating task of neural network model, Ke Yigen According to the quantity of the scaleable configuring computing devices of neural network model, it is adaptable to the instruction of various large-scale neural network models Practice, solve the problem of limitation of single physical equipment causes to limit neural network model scale.Pressed moreover, coordinating equipment Control is synchronized to each computing device according to the layer of neural network model, this ensure that each computing device is to respective node Processing be according to layer synchronously carry out, prevent calculate malfunction possibility, improve neural network model training reliability.

Brief description of the drawings

Fig. 1 is the structural representation of neural network model in one embodiment；

Fig. 2 is the environment map of neural network model training system in one embodiment；

Fig. 3 is the environment map of neural network model training system in another embodiment；

Fig. 4 is the environment map of neural network model training system in further embodiment；

Fig. 5 is the schematic diagram that is divided to neural network model in one embodiment；

Fig. 6 is that neural network model is divided into the schematic diagram of each computing device in one embodiment；

Fig. 7 is the environment map of neural network model training system in one embodiment；

Fig. 8 is the schematic flow sheet of neural network model training method in one embodiment；

Fig. 9 is the timing diagram of neural network model training method in a concrete application scene.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Before the description specific embodiment of the invention, first the training process to neural network model is illustrated.Ginseng According to Fig. 1, Fig. 1 shows a fairly simple neural network model.The neural network model include input layer, two it is hidden Containing layer and an output layer.Input layer has three nodes, respectively node A0, node A1 and node A2；First hidden layer Including two nodes, respectively node B0 and node B1；Second hidden layer includes 2 nodes, respectively node C0 and node C1.Output layer includes a node D0.

The line segment of node connection in neural network model between different layers is referred to as side, and each edge has corresponding side right Weight, what side right was represented again is the node of close input layer in two nodes that corresponding side is connected to the node away from input layer Contribution.Specifically in Fig. 1, W_0,0The side right weight of the node B0 from node A0 to first hidden layer of input layer is represented, U_0,0Represent the side right weight of the node C0 from node B0 to second hidden layer of first hidden layer, V_0,0Represent hidden from second Side right weights of the node C0 containing layer to the node D0 of output layer.

The process being trained to the neural network model in Fig. 1 is as follows：

Step 1, concentrated from all training samples and randomly select a training sample, training sample is special comprising all inputs Levy, input feature vector can in 0 or 1 value or floating number.Each training sample also has a desired output.

Step 2, positive calculating is done to first hidden layer.Forward direction calculate need using the hidden layer it is all enter while while The activation value of weight and low-level nodes.Here enter while refer to low-level nodes point to current layer node while, correspondingly, go out side then Refer to that the node of current layer points to the side of upper level node.Here using a certain layer in neural network model as benchmark, close to defeated Enter the layer referred to as low layer of layer, the layer away from input layer is referred to as high level.

Such as, median MID is calculated_B0=ACT_A0*W_0,0+ACT_A1*W_1,0+ACT_A2*W_2,0, wherein ACT_A0Represent node A0 Activation value, ACT_A1Represent node A1 activation value, ACT_A2Represent node A2 activation value.W_1,0Represent node A1 to node B0 Side right weight, W_2,0Represent node A2 to node B0 side right weight.Then using activation primitive to median MID_B0Do non-linear Conversion, obtains node B0 activation value ACT_B0。

Step 3, similarly, positive calculating is done to second hidden layer.

Step 4, output layer only includes a node D0, calculate node D0 activation value and residual values.What residual values were represented It is the difference between observation and predicted value, according to the desired output of training sample and the node D0 calculated activation value just Egress D0 residual values ED0 can be calculated.

Step 5, backwards calculation is done to second hidden layer, side is gone out according to the residual values of output layer and second hidden layer The each node of the side right re-computation hidden layer residual values, and adjust accordingly go out side side right weight.

Specifically, during calculate node C0 residual values, output layer only one of which node D0, therefore by node D0 residual values The ED0 and node C0 side right for going out side weight V_0,0Residual computations function is substituted into after multiplication, node C0 residual values EC0 is obtained.It is similar Ground, calculate node C1 residual values EC1.

When adjusting side right weight, intermediate parameters are subtracted again with current side right as the side right weight of renewal, the intermediate parameters are pre- If step-length be multiplied by side right weight it is corresponding while upper level node residual values multiplied by with this while low-level nodes activation value.Specifically press Default step-length adjustment side right weight V_0,0When, by side right weight V_0,0Intermediate parameters are subtracted, the intermediate parameters are that default step-length is multiplied by node D0 residual values are multiplied by with node C0 activation value ACT_C0, i.e. V_0,0=V_0,0–L*ED0*ACT_C0, wherein L represents default step-length. Similarly, adjustment side right weight V_1,0。

Step 6, similarly, backwards calculation is done to first hidden layer, according to the residual error of each node of second hidden layer The residual values for going out each node of the side right re-computation on the side hidden layer of value and first hidden layer, and adjustment accordingly goes out side Side right weight.

Specifically, during calculate node B0 residual values, node C0 residual values and node B0 corresponding nodes C0 are gone out into side Side multiplied by weight U_0,0, and node C1 residual values and node B1 corresponding nodes the C1 side right for going out side are weighed into U_0,1It is multiplied, will After two product summations, i.e. EC0*U_0,0+EC1*U_0,1, substitute into the residual values EB0 that residual computations function obtains node B0.It is similar Ground, can calculate egress B1 residual values EB1.

Step 7, backwards calculation is done to input layer, the side right weight on the side for pointing to first hidden layer from input layer is adjusted.

Step 8, return to step 1 and train next training sample.

After all training samples are handled according to 1~step 8 of above-mentioned steps, the instruction to neural network model is completed Practice.The width in Fig. 1 being neural network model in a very simple neural network model, practical application can be very big, than As advertisement field is probably more than one hundred million orders of magnitude, and in order to reach more preferable classifying quality, the depth one of neural network model As also than larger, training samples number is also very more, so calculates that pressure is very big, and separate unit physical equipment can not meet meter Calculation demand.

As shown in Fig. 2 in one embodiment there is provided a kind of neural network model training system, including coordinate equipment 202 and the computing device 204 of predetermined number.Predetermined number is more than or equal to 2.Coordinate equipment 202 with each computing device 204 to connect Connect, and the computing device 204 of predetermined number is connected two-by-two.It is in nerve according to it respectively to coordinate equipment 202 and computing device 204 Effect played in network model training system is come what is named, and computing device 204 is mainly used to perform training neural network model When calculating task, coordinate equipment 202 and be then mainly used to coordinate the operation of each computing device 204.Coordinate equipment 202 and calculate Equipment 204 can be the electronic equipment of different type or same type, and electronic equipment here includes desktop computer kimonos Business device.In one embodiment, as shown in figure 3, it can be one of calculating in the computing device of predetermined number to coordinate equipment Equipment.

In one embodiment there is provided a kind of neural network model training method, the present embodiment is applied in this way Neural network model training system in above-mentioned Fig. 2 is illustrated.This method is specifically included：The computing device of predetermined number exists Coordinate under Synchronization Control of the equipment by the layer of neural network model, according to the training sample for being input to neural network model, according to The node of each computing device is divided into the learning sequence of neural network model, processing neural network model in equivalent layer, will Next layer of the node institute that the data that processing node is generated are sent to model storage device or are connected with this device node Computing device, until terminating to the training of the training sample of input.

Specifically, with reference to 1~step 8 of above-mentioned steps, the training to neural network model is according to neural network model Rotating fields are come what is carried out, and neural network model to be trained at least includes input layer and output layer, each node pair of input layer Answer the feature of each dimension of training sample, the output result of the node of output layer to the training sample of reply input.Nerve net Network model can also include hidden layer, and hidden layer can be multilayer.When neural network model does not include hidden layer, neutral net Model is really a kind of Logic Regression Models, and now input layer quantity is a lot, and exports layer number seldom, therefore output layer Node can store at the same time as in the equipment for coordinating equipment and computing device.

Coordinate equipment and control is synchronized to each computing device according to the layer of neutral net, specifically can be in all computing devices After the node layer for having handled neural network model, each computing device is controlled to handle next node layer of neural network model.God Learning sequence through network model, the processing sequence of each layer when referring to training neural network model, specifically calculates rank in forward direction Section, it, successively to the order of output layer, is then the order from output layer to input layer in the backwards calculation stage to be from input layer.

Each node layer of neural network model is divided on each computing device, is stored in the internal memory of corresponding computing device In, and node division mode is all known for coordinating equipment and each computing device.Here computing device processing node, refers to The various necessary operations carried out when training neural network model to each node layer of neural network model, such as calculate and swash Value living, calculate residual values and adjustment side right again etc..

Training sample can be input to one by one in neural network model, i.e., one training sample is input to neutral net In model and after having been processed, then input next training sample；Training sample can also be batch input to neutral net mould In type, i.e., after a collection of training sample is input in neural network model and had been processed, then input next group training sample.Area It is not that amount of calculation and exchange data quantitative change are big when handling node for every layer during batch input training sample.

The data that processing node is generated can be sent to next layer of section being connected with this device node by computing device Computing device where point.Specifically, this device node is the node on current computing device, is connected with this device node Next layer of node needs to use when being processed handles the data that this device node is generated.Correspondingly, other computing devices Also the data that the node for handling miscellaneous equipment can be generated are sent to this computing device, and so each computing device realizes data Exchange.In the present embodiment, required data can be got in time when can allow by exchanging data and handle next node layer, it is ensured that Training effectiveness.

The data that processing node is generated can be also sent to model storage device by computing device.In the present embodiment, reference Fig. 4, neural network model training system also includes model storage device 206.It is each that model storage device is specifically used to storage processing The data that node is generated, and to need the computing device for using these data to provide data.So other computing devices are being needed When using the data that the processing node is generated, it can be obtained from model storage device requests.In the present embodiment, node will be handled In the data Cun Chudao model storage devices generated, it is easy to the centralized management of data.

In one embodiment, computing device can will handle node according to calculation of next layer of the node in processing The data aggregate generated, and under the data after polymerization are sent into model storage device or are connected with this device node Computing device where one layer of node.Specifically, there may be multiple nodes on a computing device, if the plurality of section of processing The generated data of point are required for being sent to a computing device corresponding to next layer of one or more nodes, and next Layer can use the sum of these data when handling corresponding node, then can will be sent to corresponding meter after these data directly summation Equipment or model storage device are calculated, the network bandwidth can be so saved.

Above-mentioned neural network model training method, is distributed to many after every layer of neural network model of node is split On individual computing device, so each computing device only handles a part of calculating task of neural network model, can be according to nerve The quantity of the scaleable configuring computing devices of network model, it is adaptable to the training of various large-scale neural network models, solution The problem of limitation for single physical equipment of having determined causes to limit neural network model scale.Moreover, coordinating equipment according to god Layer through network model synchronizes control to each computing device, this ensure that processing of each computing device to respective node is Synchronously carried out according to layer, prevent from calculating the possibility malfunctioned, improve the reliability of neural network model training.

In one embodiment, any one in following three kinds of modes can specifically be used to each of neural network model Node layer is divided：

Mode (1), after the node of each layer of neural network model is sequentially numbered, node serial number presses predetermined number modulus, Modulus result identical node is divided into same computing device.

Specifically, if one layer of neural network model include N number of node, since 0 ing serial number be 0,1,2 ..., N- 1, it is assumed that predetermined number is 4, then by node serial number according to 4 modulus, respectively obtain 0,1,2,3,0,1,2,3 ....By modulus knot Fruit identical node division is to same computing device, such as the computing device that the node division for being 0 by modulus result is 0 to numbering On, on the computing device that the node division for being 1 by modulus result is 1 to numbering, similarly, by the node that modulus result is 2 and 3 It is respectively divided on the computing device that numbering is 2 and 3.

Reference picture 5, if predetermined number is 2, by each node layer serial number in the neural network model shown in Fig. 5, and is pressed According to 2 modulus, so that according to modulus result, by the hypographous node division of neural network model acceptance of the bid shown in Fig. 5 a to meter Calculate in equipment, and will not mark the node division of shade to another computing device.

Mode (2), the node of each layer of neural network model is divided into the group of predetermined number, and every group includes continuous adjacent Specified quantity node, each computing device is divided into respectively per group node.

Specifically, if one layer of neural network model includes N number of node, and predetermined number is 4, then can be according in the layer The order of each node in this layer, is 4 groups by this N number of node division, can be the same or different per group node quantity.

It can be one of computing device in the computing device of predetermined number to coordinate equipment.Reference picture 6, neutral net mould Type training system includes 4 computing devices, respectively computing device 0 to 3, and wherein computing device 0 is simultaneously coordination equipment.Nerve The node of each layer of network model is divided evenly on each computing device.

Mode (3), the node of each layer of neural network model is by random division to each computing device.

Specifically, can be as far as possible fifty-fifty by the node random division of each layer of neural network model to each computing device On.

Three kinds of dividing modes that above-mentioned each node layer to neural network model is divided, can select to close as needed Suitable mode, to balance the load of each computing device as far as possible.

In one embodiment, the neural network model training method also includes：Each computing device is handed over coordinating equipment Sequence number is carried when mutually, and is completed each with coordinating when interacting of equipment, sequence number is increased into predetermined amplitude certainly；Coordinate equipment root The sequence number detection carried when being interacted according to each computing device goes out the interaction of repetition or the interaction of failure, and is held according to testing result Row predetermined registration operation.

Specifically, from increase predetermined amplitude can be 1, computing device with coordinate equipment complete once interact then by sequence number certainly Increase predetermined amplitude, such sequence number can go out computing device with coordinating once interacting for equipment with unique mark.If coordinating equipment to connect The multiple interaction requests for carrying same sequence number that same computing device is sent are received, then explanation has the interaction repeated and asked Ask, the interaction request of repetition can be given up, to realize that duplicate removal is handled.Predetermined registration operation includes duplicate removal and handled and retransmission process.

If in addition, coordination equipment receives the sequence number entrained by the interaction request of a computing device, with other each meters Calculate the sequence number entrained by the interaction request of equipment to differ, then illustrate that the computing device has interaction failure, can To control the computing device to initiate interaction again.

In the present embodiment, each computing device carries sequence number with coordinating when equipment is interacted, and prevents in interaction Mistake, it is ensured that the order of neural network model training is carried out.

In one embodiment, the neural network model training method also includes：Coordinate equipment and detect predeterminable event Afterwards, neutral net mould is trained according to the training sample of input again by each computing device of layer Synchronization Control of neural network model Type.The predeterminable event includes：The coordination equipment is restarted；And/or, the coordination equipment and the communication of the computing device are broken Open more than preset duration；And/or, the coordination equipment receives and completes secondary by interaction entrained by the interaction request of computing device Number exceedes predetermined threshold value from the sequence number increased with the difference of the sequence number entrained by the interaction request of other computing devices.

In one embodiment, the neural network model training method also includes：Coordinate equipment after restart, by nerve net Each computing device of layer Synchronization Control of network model trains neural network model according to the training sample of input again.

Specifically, it is strong synchronous to coordinate equipment and each computing device, coordination equipment hinders for some reason restart after, such as because extremely After the reasons such as machine, hardware fault or accident power-off are restarted, the training to neural network model can not continue, and now need to reset and instruct Practice work, come to train nerve according to the training sample of input again according to each computing device of layer Synchronization Control of neural network model Network model.

In one embodiment, the neural network model training method also includes：Coordinate the communication of equipment and computing device Disconnect and exceed preset duration, then by each computing device of layer Synchronization Control of neural network model again according to the training sample of input Train neural network model.

Specifically, reference picture 7, coordinating equipment operation has time-out detection thread, if detecting can not lead to certain computing device Letter exceedes preset duration, then needs to reset training, weighed according to each computing device of layer Synchronization Control of neural network model New root trains neural network model according to the training sample of input.

In one embodiment, coordinate equipment and have recorded the training progress of neural network model and the shape of each computing device State, the state of computing device includes timestamp of processing progress, communications status and the last communication etc..Coordinate equipment and institute There is computing device to set up non-persistent transmission connection, for data exchange.

In one embodiment, the neural network model training method also includes：Coordinate equipment and detect computing device friendship What is carried when mutually completes number of times from the sequence number increased by interaction, and the difference of the sequence number carried when being interacted with other computing devices surpasses Predetermined threshold value is crossed, then god is trained according to the training sample of input again by each computing device of layer Synchronization Control of neural network model Through network model.

Specifically, coordinate equipment and detect the sequence number increased certainly by interaction completion number of times carried during computing device interaction, The sequence number carried when being interacted from other computing devices is different, illustrates that the computing device have failed with coordinating interacting for equipment.If The difference of sequence number illustrates that the number of times of interaction failure in recoverable numbers range, can control phase not less than predetermined threshold value The computing device answered initiates interaction again.If the difference of sequence number exceedes predetermined threshold value, illustrate the number of times of interaction failure not , it is necessary to reset training in recoverable numbers range, come according to each computing device of layer Synchronization Control of neural network model Again neural network model is trained according to the training sample of input.

As shown in figure 8, being predetermined number in one embodiment there is provided a kind of neural network model training method Computing device is in the case where coordinating Synchronization Control of the equipment by the layer of neural network model, according to the training for being input to neural network model Sample, according to the learning sequence of neural network model, each computing device is divided into processing neural network model in equivalent layer Node, next layer that the data that are generated of node are sent to model storage device or are connected with this device node will be handled Node where computing device, until training the specific steps terminated to the training sample of input.This method specifically include as Lower step：

Step 802, the computing device of predetermined number, which is each received, coordinates at the training sample according to input that equipment is sent Manage the designated layer of neural network model first notifies.

Specifically, first notify be coordinate equipment send be used for control each computing device start to process neural network model Designated layer notice, be the means that control is synchronized by the layer of neural network model.

Step 804, each computing device is notified according to first, obtains the section that current computing device is divided into designated layer Data needed for the processing of point, are handled with the node to current computing device.

Specifically, each computing device is after the first notice is received, according to first notice to current electronic device Node is handled.The node of current electronic device is handled, refers to the data according to needed for processing, Current electronic is calculated and sets Standby activation value, residual values and adjustment side right are again etc..The data that processing node is generated include activation value, residual values and side right Again etc..Data needed for the processing of current device, include the activation value of node of last layer, the residual error of neural network model designated layer Value and side right are again etc..

Step 806, each computing device determines next layer of the learning sequence according to neural network model, will handle node The data generated are sent to the computing device corresponding to the node in the next layer being connected with the node of current computing device.

Specifically, next layer here refers to next layer of the designated layer of the learning sequence according to neural network model, is Relative relation, and the high level of neural network model and low layer are then absolute relations.Such as in positive calculation stages, designated layer Next layer be high level relative to designated layer；And in the backwards calculation stage, next layer of designated layer is then relative to designated layer Low layer.

The computing device corresponding to node in the next layer being connected with the node of current computing device, handling, this is next When during the node of layer, the data produced by the processing of designated layer interior joint can be used, that is, handling the node in next layer Data needed for processing.

Step 808, each computing device sends the node that current computing device is divided into designated layer to coordination equipment Second be disposed notifies, coordination equipment is received after the second notice of each computing device, is sent out respectively to each computing device Next layer of the first notice of processing neural network model is sent, until the training sample training to input terminates.

Specifically, computing device, can be to association after being disposed and being divided into the node of current computing device in designated layer Adjust equipment to send second to notify, to inform the physical training condition for coordinating equipment oneself.Computing device can be in current computing device Node processing is finished, and the data for being generated processing node are sent to and are connected down with the node of current computing device After the computing device corresponding to node in one layer, send second and notify.

In order to be further ensured that neural network model training reliability, computing device can current computing device section Point is disposed, and the data for having been generated processing node are sent in the next layer being connected with the node of current computing device Node corresponding to computing device, and receive after the data produced by the processing node that other computing devices send, hair Send the second notice.So notify that informing that coordination device data is exchanged completes by second, it is ensured that data exchange success is laggard Enter next layer of training.Second notifies to be also the means that control is synchronized by the layer of neural network model.

Coordinate equipment after receive all computing devices second notifies, illustrate that the processing of currently assigned layer has been tied Beam is, it is necessary to start next layer of processing, and now coordination equipment is sent under processing neural network model to each computing device respectively The first of one layer notifies so that each computing device is using next layer as designated layer so as to perform 802~step 808 of above-mentioned steps.

If in the backwards calculation stage, after being disposed to first hidden layer, to each side in neural network model Weight is adjusted, and so training to the training sample of input terminates.Coordinating equipment can control each computing device to continue Next training sample is inputted to neural network model to be trained, and knot is trained in all training samples concentrated to training sample Shu Hou, power cut-off.

Above-mentioned neural network model training method, coordinates equipment and notifies to notify to enter each computing device with second by first Row Synchronization Control, each computing device is under the first notice and the second Synchronization Control notified, by training sample of the layer according to input Neural network model is trained, training is carried out in order, efficiently and accurately.

In one embodiment, in the positive calculation stages of neural network model, step 804 includes：Each computing device Notified according to first, the node of current computing device is divided into correspondence designated layer, obtain all of the last layer that is connected The summation of the respective product of corresponding activation value and corresponding side right weight of node, calculates according to the summation of each product of acquisition and works as The activation value of the node of preceding computing device.

Specifically, the node of current computing device is divided into designated layer, the section of current computing device is referred to as Point.Each computing device obtain all nodes of the last layer being connected with the node of current computing device each corresponding to it is sharp The summation of value living and the product of corresponding side right weight, the summation refers to that all nodes for the last layer being connected are each self-corresponding sharp After value living and corresponding side multiplied by weight, each product is added up.During the activation value for the node for calculating current computing device, need Integrate all nodes of the last layer being connected each corresponding activation value and corresponding side right weight.

The node of designated layer is divided on different computing devices, and corresponding side right weight is also divided into different calculating In equipment.Two nodes in a line connection different layers, this while side right weight can be stored in this while two nodes in Computing device where any one.While being that tool is directive.Search and calculate for convenience, the side right on each side can again be unified It is stored on the computing device where the unidirectional node in each side.

For example, as shown in figure 1, node A0 points to node B0 side, corresponding side right weight is W_0,0, side right weight W_0,0Can To store on the computing device where the start node A0 on the side, the meter where the end node B0 on the side can also be stored in Calculate in equipment.Each bar side of neural network model is to point to high level from low layer, therefore can uniformly deposit the side right on each side again Store up the computing device where low-level nodes or the computing device where upper level node.

In one embodiment, processing node is generated in the positive calculation stages of neural network model, step 806 Data the step of be sent to the computing device corresponding to the node in the next layer being connected with the node of current computing device, bag Include：The computing device corresponding to node in pair next layer being connected with the individual node of current computing device, sends current meter Calculate the activation value of the node of equipment and the product of corresponding side right weight；What is pair be connected with multiple nodes of current computing device is next The computing device corresponding to node in layer, sends the product of the plurality of respective activation value of node and corresponding side right weight With.

In the present embodiment, it is contemplated that in the positive calculation stages of neural network model, computing device is under treatment in one layer Node to calculate during activation value, the activation value of respective nodes in currently assigned layer and the product of side right weight can be used, and Need first to calculate the sum of these products when calculating activation value.If certain node in so next layer only with this designated layer Individual node is connected, then only can be sent to the product of the activation value of the individual node and corresponding side right weight and the single node phase Computing device where the node of connection.If certain node in next layer is connected with multiple nodes in this designated layer, both Directly the product of the activation value of each node in the plurality of node and corresponding side right weight can be sent to and the plurality of node Computing device where the node being connected, can also sum the product of the activation value of the plurality of node and corresponding side right weight The computing device where the node being connected with the plurality of node is sent to afterwards, to save the network bandwidth.

In one embodiment, in the backwards calculation stage of neural network model, step 804 includes：Each computing device Notified according to first, the node of current computing device is divided into correspondence designated layer, obtain all of the last layer that is connected The respective residual values of node and corresponding edge weight；According to the residual of the residual values of acquisition and corresponding edge weight calculation current computing device Difference, and residual values and the corresponding side right weight of activation value adjustment of the node of current computing device according to acquisition.

Specifically, the calculating of the residual values of the node of current computing device, dependent on the last layer being connected with the node The respective residual values of all nodes and corresponding edge weight, by the respective phase of the residual values of these last layer nodes and corresponding edge weight After being added again after multiplying, the residual values that residual computations function obtains the node of current computing device are substituted into.

When adjusting side right weight, corresponding edge weight is subtracted to intermediate parameters as the corresponding edge weight of renewal, the intermediate parameters The residual values of acquisition are multiplied by multiplied by with the activation value of the node of current computing device for default step-length.Specifically it is referred to above-mentioned step The step of residual values and adjustment side right weight is calculated in rapid 5.

In one embodiment, processing node is generated in the backwards calculation stage of neural network model, step 806 Data the step of be sent to the computing device corresponding to the node in the next layer being connected with the node of current computing device, bag Include：Side right after the residual values of current computing device and adjustment is resent to and is connected down with the node of current computing device The computing device corresponding to node in one layer.

In the present embodiment, the side right after the residual values of present node and adjustment is resent to next layer of respective nodes institute Computing device, in order to which other computing devices carry out backwards calculation to next layer of node.

In one embodiment, the controlling stream and data stream between equipment and each computing device are coordinated respectively using different Transmission channel.Influencing each other for controlling stream and data flow can be so avoided, the reliability of neural network model training is improved.

Reference picture 9, illustrates the principle of above-mentioned neural network model training method with a concrete application scene below, this Embodiment is applied to advertisement field to be illustrated in this way.The neural network model training used in this application scene System includes 4 computing devices, is numbered respectively with 0 to 3, and computing device 0 is simultaneously coordination equipment.In this application scene The advertisement for training the neural network model finished to be used for input is estimated.The output layer of the neural network model has one Node, the number of nodes of input layer is equal with the dimension of the feature of the advertisement inputted.The advertisement of input has the spy of multiple dimensions Value indicative, including：Clicking rate feature, price feature, purchase user property feature and conclusion of the business measure feature etc., are not enumerated.It is right The neural network model needs to prepare advertising copy collection when being trained, and advertising copy, which is concentrated, includes mass advertising sample, each Advertising copy has the characteristic value of each dimension, and with the output valve of demarcation, the advertising copy concentrated using advertising copy is to god It is trained through network model.Specifically include following steps：

Step S901 (1-3), it is that computing device 0 sends reading sample notice to other computing device 1-3 to coordinate equipment.

Step 902 (1-4), each computing device obtains the node institute that current computing device is divided into input layer respectively The characteristic value of the advertising copy of input.

Step 903 (1-3), it is that computing device 0 sends first hidden layer of processing to other computing device 1-3 to coordinate equipment First notify.

Step 904 (1-4), each computing device handles the section for belonging to first hidden layer on current computing device respectively Point.

Step 905, each computing device exchanges the data that the respective node of processing is generated.

Step 906 (1-3), each computing device is respectively that computing device 0 sends the second notice to equipment is coordinated.

Step 907 (1-3), coordinate equipment be computing device 0 receive computing device 1-3 second notify after, by first Individual hidden layer replaces with second hidden layer and continues executing with above-mentioned steps 903 (1-3) to step 907 (1-3), so repeats, directly After terminating to the advertising copy training to input, then return to above-mentioned steps S901 (1-3) and continue with next advertising copy, directly Finished to the advertising copy training that advertising copy is concentrated.

Reference picture 2 and Fig. 4, in one embodiment, a kind of neural network model training system are above-mentioned any with realizing The function of the neural network model training method of embodiment.The system includes coordinating the computing device of equipment 202 and predetermined number 204。

Coordinating equipment 202 is used to synchronize control to each computing device 204 by the layer of neural network model.Coordinate equipment 202 are used to synchronize control to each computing device 204 according to the layer of neutral net, specifically can be at all computing devices 204 After the node layer for having managed neural network model, each computing device 204 is controlled to handle next node layer of neural network model.

Each computing device 204 is used in the case where coordinating Synchronization Control of the equipment 202 by the layer of neural network model, according to input To the training sample of neural network model, according to the learning sequence of neural network model, equivalent layer in neural network model is handled In be divided into the node of each computing device 204, will handle the data that node generated be sent to model storage device 206 or Computing device 204 where next layer of node being connected with this device node, until training knot to the training sample of input Beam.

The learning sequence of neural network model, the processing sequence of each layer, specifically exists when referring to training neural network model Positive calculation stages, it, successively to the order of output layer, is then from output layer to input layer in the backwards calculation stage to be from input layer Order.

Each node layer of neural network model is divided on each computing device 204, is stored in corresponding computing device 204 Internal memory in, and node division mode is all known for coordinating equipment 202 and each computing device 204.Here computing device 204 processing nodes, refer to the various necessity carried out when training neural network model to each node layer of neural network model Operation, such as calculate activation value, calculate residual values and adjustment side right again etc..

Computing device 204 can will handle the data that are generated of node and be sent to be connected with this device node next layer Computing device 204 where node.Specifically, this device node is the node on current computing device 204, with this device node Next layer of node being connected needs when being processed, which are used, handles the data that this device node is generated.Correspondingly, it is other The data that computing device 204 can also be generated the node for handling miscellaneous equipment are sent to this computing device 204, so each to calculate Equipment 204 realizes the exchange of data.In the present embodiment, it can be obtained in time when can allow by exchanging data and handle next node layer Get required data, it is ensured that training effectiveness.

The data that processing node is generated can be also sent to model storage device 206 by computing device 204.The present embodiment In, reference picture 4, neural network model training system also includes model storage device 206.Model storage device 206 is specifically used to The data that each node of storage processing is generated, and to need the computing device 204 for using these data to provide data.It is so other Computing device 204 can be obtained when needing to use the data that the processing node is generated from the request of model storage device 206.This In embodiment, it will handle in the data Cun Chudao models storage device 206 that node is generated, be easy to the centralized management of data.

In one embodiment, calculation of the computing device 204 available for the node according to next layer in processing will The data aggregate that is generated of processing node, and by the data after polymerization be sent to model storage device 206 or with this equipment section Computing device 204 where next layer of the node that point is connected.Specifically, may have multiple on a computing device 204 Node, if handle the data that the plurality of node generated be required for being sent to corresponding to next layer of one or more nodes one Individual computing device 204, and can use the sum of these data when handling corresponding node for next layer, then can be straight by these data Corresponding computing device 204 or model storage device 206 are sent to after connecing summation, the network bandwidth can be so saved.

Above-mentioned neural network model training system, is distributed to many after every layer of neural network model of node is split On individual computing device 204, so each computing device 204 only handles a part of calculating task of neural network model, Ke Yigen According to the quantity of the scaleable configuring computing devices 204 of neural network model, it is adaptable to various large-scale neural network models Training, solve the problem of limitation of single physical equipment causes to limit neural network model scale.Moreover, coordinating to set Standby 202 synchronize control according to the layer of neural network model to each computing device 204, this ensure that each computing device 204 Processing to respective node is synchronously carried out according to layer, is prevented from calculating the possibility malfunctioned, is improved neural network model training Reliability.

In one embodiment, coordinating equipment 202 is used to send the finger for handling neural network model to each computing device 204 The first of given layer notifies.

Computing device 204 is used to receive and notified according to first, obtains in designated layer and is divided into current computing device 204 Node processing needed for data, handled with the node to current computing device 204；It is determined that according to neural network model Next layer of learning sequence, the data that processing node is generated are sent to be connected down with the node of current computing device 204 Computing device 204 corresponding to node in one layer；Sent to coordination equipment 202 in designated layer and be divided into current computing device The second notice that 204 node processing is finished.

Coordinate equipment 202 be additionally operable to receive the second of each computing device 204 notify after, respectively to each computing device 204 Next layer of the first notice of processing neural network model is sent, until the training sample training to input terminates.

In one embodiment, in the positive calculation stages of neural network model, computing device 204 is additionally operable to according to first Notify, the node of current computing device 204 is divided into correspondence designated layer, and all nodes for obtaining the last layer being connected are each From corresponding activation value and the summation of the product of corresponding side right weight, current calculate is calculated according to the summation of each product of acquisition The activation value of the node of equipment 204.

Computing device 204 is additionally operable to the node institute in pair next layer being connected with the individual node of current computing device 204 Corresponding computing device 204, sends the activation value of the node of current computing device 204 and the product of corresponding side right weight.

Computing device 204 is additionally operable to the node institute in pair next layer being connected with multiple nodes of current computing device 204 Corresponding computing device 204, sends the sum of the product of the plurality of respective activation value of node and corresponding side right weight.

In one embodiment, in the backwards calculation stage of neural network model, computing device 204 is additionally operable to according to first Notify, the node of current computing device 204 is divided into correspondence designated layer, and all nodes for obtaining the last layer being connected are each From residual values and corresponding edge weight；According to the residual values of acquisition and the residual error of corresponding edge weight calculation current computing device 204 Value, and residual values and the corresponding side right weight of activation value adjustment of the node of current computing device 204 according to acquisition.

In one embodiment, the node of each layer of neural network model is divided into corresponding meter according to following any-mode Calculate equipment 204：

After the node of each layer of neural network model is sequentially numbered, node serial number presses predetermined number modulus, modulus result Identical node is divided into same computing device 204.

The node of each layer of neural network model is divided into the group of predetermined number, and every group includes the specified number of continuous adjacent The node of amount, each computing device 204 is divided into per group node respectively.

The node of each layer of neural network model is by random division to each computing device 204.

In one embodiment, computing device 204 is additionally operable to coordinating to carry sequence number when equipment 202 is interacted, and every Sequence number is increased predetermined amplitude by secondary completion certainly with coordinating when interacting of equipment 202.

Coordinate the interaction that the sequence number detection that equipment 202 is additionally operable to carry when interacting according to each computing device 204 goes out repetition Or the interaction of failure, and predetermined registration operation is performed according to testing result.

In one embodiment, coordinate equipment 202 to be additionally operable to after predeterminable event is detected, by the layer of neural network model Each computing device 204 of Synchronization Control trains neural network model according to the training sample of input again.Predeterminable event includes：Coordinate Equipment 202 is restarted；And/or, coordinate equipment 202 and the communication of computing device 204 is disconnected more than preset duration；And/or, coordinate to set Standby 202 receive the sequence number increased certainly by interaction completion number of times entrained by the interaction request of computing device 204, are calculated with other The difference of sequence number entrained by the interaction request of equipment 204 exceedes predetermined threshold value.

In one embodiment, it is one of computing device in the computing device 204 of predetermined number to coordinate equipment 202 204。

One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with The hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory medium In, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic The non-volatile memory mediums such as dish, CD, read-only memory (Read-Only Memory, ROM), or random storage memory Body (Random Access Memory, RAM) etc..

Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, the scope of this specification record is all considered to be.

Embodiment described above only expresses the several embodiments of the present invention, and it describes more specific and detailed, but simultaneously Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that coming for one of ordinary skill in the art Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims

1. a kind of neural network model training system, it is characterised in that the system includes coordinating the meter of equipment and predetermined number Calculate equipment；

The coordination equipment is used to synchronize control to each computing device by the layer of neural network model；The nerve net Every layer of node of network model is distributed on each computing device after splitting；

Each computing device is used in the case where coordinating Synchronization Control of the equipment by the layer of neural network model, according to being input to nerve The training sample of network model, according to the learning sequence of neural network model, is drawn in processing neural network model in equivalent layer Assign to the node of each computing device, will handle the data that are generated of node be sent to model storage device or with this device node Computing device where next layer of node being connected, until the training sample training to input terminates.

2. system according to claim 1, it is characterised in that the coordination equipment is used to handle to each computing device transmission The first of the designated layer of neural network model notifies；

The computing device is used to receive and notified according to first, obtains the node that current computing device is divided into designated layer Processing needed for data, handled with the node to current computing device；It is determined that according to the learning sequence of neural network model Next layer, the node that the data that are generated of node are sent in the next layer being connected with the node of current computing device will be handled Corresponding computing device；To coordinate equipment send be divided into that the node processing of current computing device finishes in designated layer the Two notify；

It is described coordinate equipment be additionally operable to receive the second of each computing device notify after, send processing god to each computing device respectively Next layer of the first notice through network model, until the training sample training to input terminates.

3. system according to claim 2, it is characterised in that in the positive calculation stages of neural network model, the meter Calculate equipment to be additionally operable to notify according to first, the node of current computing device is divided into correspondence designated layer, obtains what is be connected The summation of the respective product of corresponding activation value and corresponding side right weight of all nodes of last layer, according to each product of acquisition Summation calculate current computing device node activation value；

The computing device is additionally operable to corresponding to the node in pair next layer being connected with the individual node of current computing device Computing device, sends the activation value of the node of current computing device and the product of corresponding side right weight；Pair and current computing device The connection of multiple nodes next layer in node corresponding to computing device, send the plurality of respective activation value of node and phase The sum of the product for the side right weight answered.

4. system according to claim 2, it is characterised in that in the backwards calculation stage of neural network model, the meter Calculate equipment to be additionally operable to notify according to first, the node of current computing device is divided into correspondence designated layer, obtains what is be connected The respective residual values of all nodes and corresponding edge weight of last layer；It is current according to the residual values of acquisition and corresponding edge weight calculation The residual values of computing device, and the residual values according to acquisition and the corresponding side of activation value adjustment of the node of current computing device Weight.

5. system according to claim 1, it is characterised in that the node of each layer of the neural network model is according to following Any-mode is divided into corresponding computing device：

After the node of each layer of neural network model is sequentially numbered, node serial number presses predetermined number modulus, and modulus result is identical Node be divided into same computing device；Or,

The node of each layer of neural network model is divided into the group of predetermined number, every group include continuous adjacent specified quantity Node, each computing device is divided into per group node respectively；Or,

The node of each layer of neural network model is by random division to each computing device.

6. system according to claim 1, it is characterised in that the computing device is additionally operable to coordinating to take when equipment is interacted Tape serial number, and completed each with coordinating when interacting of equipment, sequence number is increased into predetermined amplitude certainly；

It is described to coordinate interaction or mistake that the sequence number detection that equipment is additionally operable to carry when interacting according to each computing device goes out repetition The interaction lost, and predetermined registration operation is performed according to testing result.

7. system according to claim 1, it is characterised in that the coordination equipment is additionally operable to detecting predeterminable event Afterwards, neutral net mould is trained according to the training sample of input again by each computing device of layer Synchronization Control of neural network model Type；The predeterminable event includes：

The coordination equipment is restarted；And/or,

It is described to coordinate equipment with the communication disconnection of the computing device more than preset duration；And/or,

The coordination equipment receives and completes the sequence number that number of times increases certainly by interaction entrained by the interaction request of computing device, with The difference of sequence number entrained by the interaction request of other computing devices exceedes predetermined threshold value.

8. system according to claim 1, it is characterised in that the coordination equipment is in the computing device of predetermined number One of computing device.

9. a kind of neural network model training method, methods described includes：

The computing device of predetermined number is in the case where coordinating Synchronization Control of the equipment by the layer of neural network model, according to being input to nerve The training sample of network model, according to the learning sequence of neural network model, is drawn in processing neural network model in equivalent layer Assign to the node of each computing device, will handle the data that are generated of node be sent to model storage device or with this device node Computing device where next layer of node being connected, until the training sample training to input terminates；The neutral net The node of every layer of model is distributed on each computing device after splitting.

10. method according to claim 9, it is characterised in that the computing device of the predetermined number is pressed in coordination equipment Under the Synchronization Control of the layer of neural network model, according to the training sample for being input to neural network model, according to neutral net mould The node of each computing device is divided into the learning sequence of type, processing neural network model in equivalent layer, by processing node institute The data of generation be sent to model storage device or next layer of node being connected with this device node where calculating set It is standby, until the training sample training to input terminates, including：

The computing device of predetermined number each receive coordinate equipment send according to the training sample of input handle neutral net mould The first of the designated layer of type notifies；

Each computing device is notified according to first, is obtained and is divided into designated layer needed for the processing of the node of current computing device Data, are handled with the node to current computing device；

Each computing device determines next layer of the learning sequence according to neural network model, will handle the data that node is generated It is sent to the computing device corresponding to the node in the next layer being connected with the node of current computing device；

Each computing device to coordinate equipment send be divided into that the node processing of current computing device finishes in designated layer the Two notify, coordination equipment is received after the second notice of each computing device, send processing nerve net to each computing device respectively The first of next layer of network model notifies, until the training sample training to input terminates.

It is described 11. method according to claim 10, it is characterised in that in the positive calculation stages of neural network model Each computing device is notified according to first, obtains and number needed for the processing of the node of current computing device is divided into designated layer According to, handled with the node to current computing device, including：

Each computing device is notified according to first, and the node of current computing device is divided into correspondence designated layer, obtains and is connected The summation of the respective product of corresponding activation value and corresponding side right weight of all nodes of the last layer connect, according to each of acquisition The summation of product calculates the activation value of the node of current computing device；

The data that processing node is generated are sent to the node in the next layer being connected with the node of current computing device Corresponding computing device, including：

The computing device corresponding to node in pair next layer being connected with the individual node of current computing device, sends current meter Calculate the activation value of the node of equipment and the product of corresponding side right weight；

The computing device corresponding to node in pair next layer being connected with multiple nodes of current computing device, sends the plurality of The sum of the product of the respective activation value of node and corresponding side right weight.

It is described 12. method according to claim 10, it is characterised in that in the backwards calculation stage of neural network model Each computing device is notified according to first, obtains and number needed for the processing of the node of current computing device is divided into designated layer According to, handled with the node to current computing device, including：

Each computing device is notified according to first, and the node of current computing device is divided into correspondence designated layer, obtains and is connected The respective residual values of all nodes and corresponding edge weight of the last layer connect；

According to the residual values of acquisition and the residual values of corresponding edge weight calculation current computing device, and according to the residual values of acquisition with And the activation value of the node of current computing device adjusts corresponding side right weight.

13. method according to claim 9, it is characterised in that the node of each layer of the neural network model according to Lower any-mode is divided into corresponding computing device：

14. method according to claim 9, it is characterised in that methods described also includes：

Each computing device carries sequence number with coordinating when equipment is interacted, and is completed each with coordinating when interacting of equipment, will Sequence number increases predetermined amplitude certainly；

The sequence number detection for coordinating to carry when equipment is interacted according to each computing device goes out the interaction of repetition or the interaction of failure, and Predetermined registration operation is performed according to testing result.

15. method according to claim 9, it is characterised in that methods described also includes：

The coordination equipment is after predeterminable event is detected, by each computing device of layer Synchronization Control root again of neural network model Neural network model is trained according to the training sample of input；The predeterminable event includes：

The coordination equipment is restarted；And/or,