CN109740738A

CN109740738A - A kind of neural network model training method, device, equipment and medium

Info

Publication number: CN109740738A
Application number: CN201811645093.8A
Authority: CN
Inventors: 申俊峰; 周大军; 张力柯; 荆彦青
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-05-10
Anticipated expiration: 2038-12-29
Also published as: CN109740738B

Abstract

The embodiment of the present application discloses a kind of neural network model training method and device, for the learning object for needing to carry out intensified learning, can first pass through the artificial artificial sample collection for obtaining learning object and generating according to user's operation, and for the learning object neural network model in the machine sample set that autonomous learning obtains in the learning object.In the training neural network model, it can be according to above-mentioned artificial sample collection and machine sample set as training foundation, due to including artificially generated artificial sample for trained training sample concentration, quality is higher for the machine sample that the artificial sample was obtained relative to machine learning initial stage, with more the purpose for the schedule for promoting the learning object, it is mostly for opposite machine sample and the significant interaction of learning object, model parameter so as to shorten trained early period restrains duration, reduces the time of trained neural network model.

Description

A kind of neural network model training method, device, equipment and medium

Technical field

This application involves field of neural networks, more particularly to a kind of neural network model training method, device, equipment and Computer readable storage medium.

Background technique

Intensified learning is also known as trial and error learning, is the environment that one kind allows intelligent body (agent) in learning object (environment) constantly interactive in, and calculated according to a kind of machine learning that the feedback excitation of environment (reward) is learnt Method, the learning algorithm are not based on any priori knowledge, entirely autonomous can learn.According to the difference of learning object, can have not With intelligent body, such as learning object be game when, intelligent body can be role, participant in game etc..

Traditional intensified learning such as Deep Q Network (DQN) is in the neural network model of training itself, completely The data obtained according to machine by autonomous learning are as training data.

The autonomous trial and error of the equal machine of training data under this scene obtains, and especially trains the autonomous trial and error speed of machine early period Slowly, it is not intended to which justice interaction is more, causes to train the model parameter convergence time of early period long, at high cost, extends trained neural network mould The time of type.

Summary of the invention

In order to solve the above-mentioned technical problem, this application provides a kind of neural network model training method and devices, with contracting The model parameter of Short Training early period restrains duration, reduces the time of training neural network model.

The embodiment of the present application discloses following technical solution:

In a first aspect, the embodiment of the present application provides a kind of neural network model training method, which comprises

Obtain the artificial sample collection that learning object is generated according to user's operation；

Obtain for the learning object neural network model in the learning object autonomous learning obtain machine Sample set；

According to the artificial sample collection and the machine sample set training neural network model.

Second aspect, the embodiment of the present application provide a kind of neural network model training device, and described device is obtained including first Take unit, second acquisition unit and training unit:

The first acquisition unit, the artificial sample collection generated for obtaining learning object according to user's operation；

The second acquisition unit, for obtaining the neural network model for being directed to the learning object in the learning object Middle autonomous learning obtains machine sample set；

The training unit, for according to the artificial sample collection and the machine sample set training neural network mould Type.

The third aspect, the embodiment of the present application provide a kind of equipment for neural network model training, and the equipment includes Processor and memory:

Said program code is transferred to the processor for storing program code by the memory；

The processor is used for the neural network mould according to the instruction execution first aspect in said program code Type training method.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage medium Matter is for storing program code, and said program code is for executing neural network model training method described in first aspect.

For the learning object for needing to carry out intensified learning it can be seen from above-mentioned technical proposal, can first pass through artificial The artificial sample collection that learning object is generated according to user's operation is obtained, and is existed for the neural network model of the learning object The machine sample set that autonomous learning obtains in the learning object.It, can be according to above-mentioned people in the training neural network model Work sample set and machine sample set are as training foundation, due to including artificially generated people for trained training sample concentration Work sample, quality is higher for the machine sample which obtained relative to machine learning initial stage, with more promoting The purpose of the schedule of object is practised, is mostly for opposite machine sample and the significant interaction of learning object, so as to The model parameter convergence duration for shortening training early period, reduces the time of trained neural network model.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.

Fig. 1 is the application scenarios schematic diagram of neural network model training method provided by the embodiments of the present application；

Fig. 2 is a kind of flow chart of neural network model training method provided by the embodiments of the present application；

Fig. 3 is exemplary diagram of the QQ driving game provided by the embodiments of the present application as learning object；

Fig. 4 is a kind of flowage structure schematic diagram of neural network model training method provided by the embodiments of the present application；

Fig. 5 is a kind of flowage structure schematic diagram of neural network model pre-training provided by the embodiments of the present application；

Fig. 6 is a kind of flow chart of neural network model training method provided by the embodiments of the present application；

Fig. 7 a is a kind of structure chart of neural network model training device provided by the embodiments of the present application；

Fig. 7 b is a kind of structure chart of neural network model training device provided by the embodiments of the present application；

Fig. 8 is a kind of structure chart of equipment for neural network model training provided by the embodiments of the present application；

Fig. 9 is a kind of structure chart of equipment for neural network model training provided by the embodiments of the present application.

Specific embodiment

In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.

For traditional intensified learning in the neural network model of training itself, training data is entirely by the autonomous trial and error of machine It obtains, especially trains the autonomous trial and error speed of machine early period slow, it is not intended to which justice interaction is more, and quality is relatively poor, causes to train early period Model parameter convergence time is long, at high cost, extends the time of trained neural network model.

In order to solve the above-mentioned technical problem, the embodiment of the present application provides a kind of neural network model training method.Due to Quality is higher for the machine sample that artificially generated artificial sample was obtained relative to machine learning initial stage, with more promoting Practise the purpose of the schedule of object, be mostly for opposite machine sample and the significant interaction of learning object, therefore, In the training method for applying for embodiment offer, the concentration of training sample used by training pattern introduces artificially generated artificial sample This, with assist sample training neural network model.

Neural network model training method provided by the embodiments of the present application can be applied to data processing equipment, as terminal is set Standby, server etc..Wherein, terminal device is specifically as follows smart phone, computer, personal digital assistant (Personal Digital Assistant, PDA), tablet computer etc.；Server is specifically as follows separate server, or cluster service Device.

The technical solution of the application in order to facilitate understanding is clothes below with reference to practical application scene, with data processing equipment Neural network model training method provided by the embodiments of the present application is introduced for business device.

Referring to Fig. 1, Fig. 1 is the application scenarios schematic diagram of neural network model training method provided by the embodiments of the present application, It may include at least one terminal device 101 and server 102 in the application scenarios, learning object can be only fitted to terminal device In 101, artificial sample collection and machine sample set can produce by operating terminal equipment 101.Server 102 can be set from terminal Artificial sample collection that learning object is generated according to user's operation is obtained in standby 101 and for the neural network model of the learning object The machine sample set that autonomous learning obtains in the learning object, to be trained to neural network model.

Neural network model can be training completion, can also continuous training during use.Wherein, study pair As can a kind of object applied to deep learning, can have given task in learning object, need certain by implementing Athletic movement under rule completes task or pushes Task Progress.Learning object for example can be game, sports events etc. Deng if being strengthened in the learning object using game as learning object for the neural network model of the learning object Study study is the technical ability for playing game, if being directed to the study pair using sports events such as high jump project as game object What the neural network model of elephant carried out intensified learning study is the technical ability of high jump under various circumstances.

Sample is for training neural network model.In the present embodiment, sample may include artificial sample, machine sample Sheet or training sample.Artificial sample, the data type for including in machine sample and training sample are consistent.

Each artificial sample is that learning object is generated according to user's operation, and multiple artificial samples constitute artificial sample collection； Each machine sample be for the learning object neural network model in the learning object autonomous learning obtain, Duo Geji Device sample constitutes machine sample set；Training sample can be when having selected neural network model from artificial sample collection and machine sample set In the sample that selects, multiple training sample composing training sample sets.

Artificial sample collection is the data artificially generated for priori knowledge of the learning object based on the mankind, the artificial sample set It can be and arrange in advance, be also possible to continue by executing operation to learning object on terminal device 101 by user It generates.And trained not yet in autonomous learning initial stage neural network model, the model parameter of neural network model may be also It is some initial values, the machine sample obtained by neural network model autonomous learning at this time is mostly to be not intended to learning object The interaction of justice.Therefore, the quality of artificial sample is higher for the machine sample that artificial sample was obtained relative to machine learning initial stage, With more the purpose for the schedule for promoting the learning object.

Therefore server 102 is directed to the nerve net of learning object according to artificial sample collection and machine the sample set training got Network model, due to the addition of the higher artificial sample collection of quality, so that in training neural network model with more propulsion The purpose of the schedule of object is practised, the model parameter so as to shorten trained early period restrains duration, reduces training mind Time through network model.

Neural network model training method provided by the embodiments of the present application can be applied to train artificial intelligence In the application scenarios of (Artificial Intelligence, abbreviation AI), for example, game, robot, industrial automation, education, Medicine, finance etc..For the ease of introduce, subsequent embodiment by using the application scenarios of the AI of training game (game as learn pair As) be introduced.

Next, neural network model training method provided by the embodiments of the present application will be introduced in conjunction with attached drawing.

Referring to fig. 2, Fig. 2 shows a kind of flow charts of neural network model training method, which comprises

S201, the artificial sample collection that learning object is generated according to user's operation is obtained.

In the application scenarios of the AI of training game, game can be according to the game rule of game as learning object, user Game is played, artificial sample collection is generated according to user's operation to generate.Although different game game rules may be different, people Included data type is generally consistent in work sample set.

The artificial sample collection of generation can be saved, such as can be stored in artificial sample pond, in this way, when needing When assisting to train neural network model using artificial sample collection, artificial sample collection can be obtained from artificial sample pond.S202, It obtains the neural network model for being directed to learning object autonomous learning in the learning object and obtains machine sample set.

The machine sample set generated in learning object by neural network model can be saved, such as can protect In the presence of resetting in memory (replay memory), in this way, when needing using machine sample set to train neural network model, Machine sample set can be obtained from replay memory.Data type included by machine sample set can be with artificial sample Collection is consistent.

In one possible implementation, machine sample set and artificial sample collection all include in the learning object The movement of implementation, in implementation movement the learning object environmental parameter, and movement implement after the learning object return Feedforward parameter.

Wherein, the movement implemented in the learning object can be to be implemented in learning object by different modes The permitted movement of learning object for example, passes through movement or pass through the mind that user's operation is implemented in the learning object The movement implemented in the learning object through network model.

Provided environment when the environmental parameter of the learning object can be used for identifying implementation movement in implementation movement, For example, environmental parameter can be the game picture when implementation movement when game is as learning object.

The feedback parameter of the learning object, which can be, after movement is implemented is interacted by implementation movement with learning object When, the feedback that learning object generates, feedback parameter can play the movement evaluated and implemented in learning object to propulsion task The effect influenced brought by schedule, in general, the higher the better for the feedback parameter, and feedback parameter is higher, it is meant that The movement implemented in learning object is more conducive to promote the schedule of learning object.

For example, given task is to avoid the obstruction of obstacle under certain rule when learning object is cool run game It is moved to terminal.User observes that the current environment of learning object includes one of parapet, if what user was implemented based on current environment Movement for jump so that game role is jumped over, parapet advances at utmost speed, it is seen that This move make game role avoid by Obstacle stops and continues to move forward, and is conducive to promote Task Progress.Therefore the feedback parameter that learning object is generated for the movement compares It is high.If the movement that user implements is normally to run towards parapet, so that game role bumps against parapet, stagnation is generated, before hindering It moves, is unfavorable for promoting Task Progress, therefore the feedback parameter that learning object is generated for the movement is relatively low.

Artificial sample collection includes movement, the learning object in implementation movement that user implements in the learning object Environmental parameter, and movement implement after the learning object feedback parameter three between corresponding relationship.Artificial sample collection In include corresponding relationship can reflect out the user movement of selected implementation and user under any environment of learning object The movement of implementation is influenced on brought by propulsion task schedule.

Machine sample set includes the movement implemented in learning object by the neural network model, in implementation movement Corresponding relationship after the environmental parameter of learning object, and movement implementation between the feedback parameter three of the learning object.Machine The corresponding relationship for including in device sample set can reflect out through the neural network model under any environment of learning object The movement of selected implementation, and the movement implemented by the neural network model is to brought by propulsion task schedule It influences.

Autonomous learning is carried out to the technical ability in learning object using neural network model, specific neural network model can be with Determine that the movement to be implemented in learning object, each movement determined are respectively provided with corresponding feedback according to environmental parameter Parameter be can choose out based on feedback parameter and implement which movement is more advantageous to the complete of propulsion learning object under the environmental parameter At progress.

It should be noted that due to being trained not yet in autonomous learning initial stage neural network model, neural network model Model parameter may or some initial values, be to be conducive to by the way which movement neural network model at this time has no knowledge about The schedule of learning object is promoted, causes the meaningless possibility of movement selected at autonomous learning initial stage bigger, example As selected the movement implemented unrelated with the schedule of learning object is promoted or selecting the movement implemented so that learning object Progress is unable to complete, so that it is more to carry out meaningless interaction at autonomous learning initial stage and learning object.For example, in learning object When for cool run game, if in order to promote the schedule of the learning object real in learning object by the neural network model The movement applied should be gone ahead, still, since neural network model trains not yet, so that passing through the neural network mould It may be to walk to the left, but walked to the left in practice for promoting the learning object that type, which selects in autonomous learning the movement implemented, Schedule is nonsensical, at this point, including the movement " walking to the left " implemented in learning object by the neural network model Machine sample be meaningless interaction.

Therefore it will be in conjunction with artificial sample collection and machine sample set training nerve net during subsequent trained neural network model Network model.

Sample in artificial sample collection or machine sample set can be indicated using certain data format, such as can be adopted With following data format: (environmental parameter of learning object, the movement implemented in learning object, movement are implemented when implementation acts The feedback parameter of learning object afterwards).

It should be noted that the purpose that implementation acts in learning object is the schedule for promoting learning object, movement It is complete to propulsion task can to play the movement that evaluation is implemented in learning object for the feedback parameter of the learning object after implementation It is influenced at brought by progress, feedback parameter may include implementing reward parameter that the target action obtains and according to implementing institute State at least one aspect of the target action in the environmental parameter these two aspects obtained in the learning object.Goal movement It can be acted for any one in artificial sample set or machine sample set, including in any one target sample.

The two aspects can act as the movement evaluated and implemented in learning object to propulsion task schedule institute The effect that bring influences.

In one implementation, feedback parameter includes the reward parameter (reward) that implementation goal acts.

The target action implemented in many cases, is different, implements the reward difference that the target action obtains, implements the mesh The reward that mark acts is better, it is believed that and the target action of implementation is more conducive to promote the schedule of learning object, Wherein, the quality of reward can be embodied by reward parameter.

In one implementation, feedback parameter includes the implementation goal movement ginseng of the environment obtained in the learning object Number.

The target action implemented in many cases, is different, and the environment after obtained implementation goal movement may be different, real Environment after applying target action can act the environmental parameter obtained in the learning object by implementation goal to embody.It should The quality of environmental parameter, can reflect to set out to oppose and promotes the influence of the schedule of learning object.

For the QQ driving game shown in Fig. 3, when the movement of implementation is drift, after implementing the movement, if racing car may The guardrail that side can be bumped against, leads to game ends, then it be the environment after implementation creeping motion that racing car, which bumps against the guardrail on side, can be with Determine that the creeping motion is unfavorable for promoting the schedule of learning object by the corresponding environmental parameter of the environment.

Based on this, in another implementation of the present embodiment, in order to avoid excessively pursuing on the one hand corresponding parameter Such as reward parameter is good, and ignore on the other hand corresponding parameter such as environmental parameter, cause according to reward parameter or environment Influence of the target action that parameter evaluation goes out to the schedule for promoting learning object is not accurate enough.Therefore, for the dynamic of implementation Make, such as target action, the reward parameter obtained after implementation goal movement can be comprehensively considered and is existed according to implementation goal movement Environmental parameter these two aspects obtained in learning object carrys out the shadow to the schedule for promoting learning object of evaluation goal movement It rings, i.e., in artificial sample collection or machine sample set, for any one target action, learning object after target action is implemented Feedback parameter include the reward parameter that implementation goal acts and according to implementation goal act in learning object obtained in ring Border parameter.Two aspects can more accurately be evaluated into completion of the target action to learning object is promoted collectively as feedback parameter The influence of progress.

Include the reward parameter that implementation goal acts in feedback parameter and is acted according to implementation goal in learning object Obtained on the basis of environmental parameter, the sample in artificial sample collection or machine sample set can use following data format: (s, a, r, s ').Wherein, s indicates the environmental parameter of learning object when implementation goal movement, and a expression is implemented in learning object Target action, r indicate the reward parameter that implementation goal acts, and s ' indicates to be obtained in learning object according to implementation goal movement The environmental parameter arrived.

S203, the neural network model is trained according to the artificial sample collection and machine sample set.

In training neural network model, sample can be chosen from artificial sample collection and machine sample set and obtained for training The training sample set of neural network model carries out N wheel training to neural network model using training sample set, wherein each round Training sample set used by any one of trained or preceding M wheel training is taken turns may include some people that artificial sample is concentrated A part of machine sample in work sample and machine sample set.

In one possible implementation, if the sample in artificial sample collection or machine sample set can be using following number According to format (s, a, r, s '), in the training process, can according to loss function (loss function) using preset algorithm for example Pre- gradient optimal method training neural network model, until trained neural network model reaches preferable effect.Loss function It can be as shown in formula (1):

Y=r+ γ * max_aQ (s ', a)

Loss=(y-Q (s, a))² (1)

Wherein, s indicates the environmental parameter of learning object when implementation goal movement；A indicates the mesh implemented in learning object Mark movement；R indicates the reward parameter that implementation goal acts；S ' indicates to be obtained in learning object according to implementation goal movement Environmental parameter；(s a) indicates the movement a implemented in corresponding learning object under the environmental parameter s of neural network model output to Q Real value；(s ' a) indicates to implement in corresponding learning object under the environmental parameter s ' of neural network model output dynamic Q Make the real value of a；γ is that (s ', discount factor a), is typically set to 0.99 to value Q；Y indicates neural network model output The theoretical value of the movement a implemented in corresponding learning object under environmental parameter s, loss indicate real value relative to theoretical valence The loss of value.

The model parameter of neural network model can be adjusted according to loss, until model parameter adjusted makes Neural network model is preferable.

If sample in artificial sample collection or machine sample set can be using following data format (s, a, r, s '), nerve net The input of network model is the environmental parameter s of learning object when implementation goal acts, and the output of neural network model is in study pair As the movement a of middle implementation, acting a can be according to r and s ' size determine, under normal circumstances, select relatively good r and s ' institute right Output of the movement a answered as neural network model.The neural network model that training obtains can be used for playing in the machine of game, Allow to play detection of the game realization to game failure using machine.

Since the neural network model that training obtains can be according to any environment parameter, study can be promoted accordingly by providing The movement of the schedule of object, in this way, when using the neural network model, when inputting some in the neural network model After environmental parameter, the movement determined by the neural network model also can be the movement for promoting the schedule of learning object.

It is understood that the machine sample set obtained is mainly stored in replay memory, replay memory The quality of middle machine sample set will have a significant impact to the quality of machine sample used in training neural network model.It connects down Come, the machine sample set in replay memory will be introduced.

In the present embodiment, neural network model can be trained on one side, the neural network model obtained on one side using training Autonomous learning obtains data, and over time, the neural network model that training obtains is become better and better, then, utilize training The quality that obtained neural network model autonomous learning obtains data also can be higher and higher.In this case, it is instructed to improve The quality of machine sample set used in experienced neural network model can incite somebody to action during the training neural network model The data obtained by the neural network model autonomous learning that training obtains add to replay memory as machine sample In machine sample set is added, it is shown in Figure 4.Wherein, (s, a, r, s ') it is by training obtained neural network model certainly The data that primary learning obtains.It, can be with when the machine sample set in replay memory reaches the capacity of replay memory Delete the machine sample being saved into replay memory at first.

Since with trained continuous progress, the neural network model that training obtains is become better and better, pass through neural network mould The quality that type autonomous learning obtains data also can be higher and higher, is added to machine sample using the higher data of quality as machine sample The quality of machine sample set in replay memory can be improved in this concentration, so that machine sample set, which is more advantageous to, promotes The schedule of object is practised, so as to shorten model parameter convergence duration, reduces the time of training neural network model.

It should be noted that the corresponding embodiment of Fig. 2 describes a kind of training method of neural network model, and train mind The machine sample that can be obtained by the artificial sample collection and S202 obtained from S201 through training sample set used in network model The sample chosen is concentrated to constitute, and in order to guarantee that the quality of training sample set, training sample concentration need to include artificial sample, but Be, according to the application scenarios of autonomous learning, it is difficult to entire neural network model process completely according to artificial sample training mind The artificial sample concentrated through network model or even training sample cannot be excessive, and therefore, training sample concentration needs to include artificial sample Sheet and machine sample two parts.

Therefore, a kind of possible implementation of S203 are as follows: chosen from artificial sample collection and the machine sample set respectively Sample obtains the training sample set for training the neural network model, wherein the quantity of training sample concentration artificial sample Meet preset ratio between the quantity of machine sample.Then, according to the training sample set to the neural network model into Row training.Wherein, training sample concentrates the sample for including to be properly termed as training sample.

Next, by the default ratio how determined between training sample concentration artificial sample quantity and machine sample size Example is introduced.

In the present embodiment, the method for determination of preset ratio may include a variety of, and the present embodiment will be mainly for two kinds It is introduced.

The first determines that the mode of preset ratio can be and is determined according to the preceding training result once trained for this instruction Experienced training sample concentrates the preset ratio of two kinds of samples, i.e. the used training sample of n-th training concentrates artificial sample quantity Preset ratio between machine sample size is true according to the training result of the N-1 times training to the neural network model Fixed.

The training result of the N-1 times training can be the neural network model that the N-1 times training obtains, and use N-1 It, can be with output action, according to the neural network model according to the neural network model when neural network model that secondary training obtains The movement of output is adjusted the influence for promoting learning object (game) schedule to preset ratio.

For example, being exported when playing game using the neural network model that the N-1 times training obtains according to the neural network model Be detrimental to promote the movement of learning object (game) schedule, accordingly, can be with so that the schedule of learning object is low The preset ratio between artificial sample quantity and machine sample size is concentrated to be adjusted training sample.Since artificial sample is Be conducive to promote the schedule of learning object, then, in order to enable in the neural network model obtained using training, according to Neural network model can export the movement for being more advantageous to and promoting learning object schedule, then for training neural network model Training sample concentrate artificial sample can increase, i.e., adjustment preset ratio increase artificial sample training sample concentrate institute Accounting example.

Conversely, i.e. adjustment is pre- for training the training sample of neural network model to concentrate artificial sample that can reduce If ratio, which reduces artificial sample, concentrates proportion in training sample.

The mode of second of determining preset ratio, which can be, gradually decreases training sample concentration artificial sample proportion, i.e., Training sample used by n-th training to neural network model concentrates artificial sample proportion to be less than to neural network mould Training sample used by the N-1 times training of type concentrates artificial sample proportion.

It is understood that not only including by the purpose that neural network model plays game when learning object is game Game progress is completed, further includes detection game failure, in order to detect game failure, needs to pass through for an environmental parameter Neural network model implements various different movements, to obtain the environmental parameter (game picture) of the game after implementing various movements Deng to be detected to the environmental parameter (game picture) after the various movements of implementation, for example, game is drawn after detection operation is implemented The place that face does not come out with the presence or absence of rendering.

However, user is directed to an environmental parameter, base since artificial sample collection is what the priori knowledge based on the mankind generated Common movement can be selected in sheet according to priori knowledge, can seldom attempt other movements, it is more single so as to cause artificial sample collection One, diversity is poor.For example, in cool run game, when occurring a cross bar on the racing track for being cool run game that environmental parameter indicates, Priori knowledge based on the mankind, majority may skip when encountering cross bar from cross bar, then, user can generally select Jumping, this acts and skips from cross bar.However, the movement for crossing cross bar be also possible to be slipped under cross bar, still, The priori knowledge of the mankind perhaps slip under cross bar This move it is complicated for operation etc. due to user seldom or substantially The This move that will not select to slip under cross bar crosses cross bar.

It can implement different movements for an environmental parameter when playing game for utility neural network model, with Just the detection for realizing game failure, this requires the training sample that training sample is concentrated has preferable diversity.Therefore, with Neural network model frequency of training is continuously increased, and can gradually be reduced training sample and be concentrated ratio shared by artificial sample, mention Ratio shared by high machine sample, so that the training sample that training sample is concentrated has preferable diversity.

As it can be seen that training sample set can be improved by gradually decreasing artificial sample proportion described in training sample set The diversity of middle training sample enables when playing game using the neural network model that training obtains for an environmental parameter Implement a variety of different movements, to realize the detection of game failure.

Next, how artificial sample set and machine sample set training needle will be used in neural network model training process The neural network model of learning object is introduced.According to the self-characteristic of artificial sample collection and machine sample set, Ke Yi Different neural network model t raining periods rationally utilize artificial sample collection and machine sample set.

In one implementation, before executing S203 can according to artificial sample collection to the neural network model into Row pre-training.Due to the early period in training neural network model, the model parameter convergence of neural network model comes relative to the later period Saying can be slow, and the variation of model parameter can be very big, and obtained neural network model is very unstable.In order in training neural network The early period of model can quickly obtain a stable neural network model, can only make in the early period of training neural network model Pre-training is carried out to neural network model with quality higher artificial sample collection, obtains the neural network model for completing pre-training. The neural network model for completing pre-training is a more stable model, and model parameter variation is smaller, then, then executes S203 completes the neural network model of pre-training according to artificial sample collection and the training of machine sample set.

Wherein, the flowage structure schematic diagram of neural network model pre-training may refer to shown in Fig. 5, and neural network model is pre- Artificial sample used by training can be chosen from the artificial sample pond for saving artificial sample set, according to the artificial sample of selection The neural network model that pre-training obtains completing pre-training is carried out to neural network model.

Due to artificial sample collection with more promote the learning object schedule purpose, mould can be accelerated The stabilization time of shape parameter is quickly obtained the neural network model of stable completion pre-training.In this way, further according to artificial sample Collection and machine sample set training complete pre-training neural network model when, can also reduce trained neural network model when Between.

The pre-training of neural network model cannot continue, always complete pre-training with according to artificial sample collection and Machine sample set continues to train to the neural network model for completing pre-training.It below will be made of how to judge the complete of neural network model It is introduced at pre-training.

The present embodiment mainly introduces two kinds of judgment methods, the first judgment method is when artificial sample collection has been trained to.Mind Pre-training through network model is using artificial sample collection, when artificial sample collection has been trained to, not can continue to carry out mind The sample of pre-training through network model, at this point it is possible to think to complete pre-training.

Second of judgment method is that the default progress of learning object is completed by neural network model.It is understood that It is that the purpose for carrying out neural network model pre-training is to obtain a more stable neural network model, obtains stable mind Through network model it may be considered that neural network model completes pre-training, whether neural network model is stable can be joined by model Number is to embody, i.e., model parameter more changes smaller, and neural network model is more stable, conversely, neural network model is more unstable.That , how to measure whether neural network model stable, can use by neural network model complete learning object progress come It determines, the progress for completing learning object by neural network model is bigger, and model parameter more changes smaller, then neural network model More stable, conversely, model parameter the more changes the more big, then neural network model is more unstable.

For example, learning object is cool run game, 1000 meters of the racing track overall length of cool run game, if being existed by neural network model 200 meters are completed on the racing track of cool run game and just terminates game, and 200 meters completed at this time are to complete to learn by neural network model The progress of object, the progress is bigger, it may be considered that neural network model is more stable；If being run by neural network model 5 meters are completed on the racing track of cruel game and just terminates game, and 5 meters completed at this time are to complete learning object by neural network model Progress, the progress is smaller, it may be considered that neural network model is more unstable.

Therefore, it is much for can presetting through the progress that neural network model completes learning object, thinks nerve net Network model stability, for example, presetting the default progress for completing learning object by neural network model, then it is assumed that neural network Model stability.In this way, when the default progress for completing learning object by neural network model, it is determined that neural network model is complete At pre-training.

Next, neural network model training method will be introduced in conjunction with concrete application scene.In the application scenarios In, neural network model is obtained by training and plays game, to detect game failure, then in training neural network model, game For learning object.Under the application scenarios, referring to Fig. 6, neural network model training method includes:

S601, user play game on the terminal device and obtain artificial sample collection.

Artificial sample collection in S602, acquisition artificial sample pond.

Artificial sample pond is to be constructed according to the user that is collected into playing the artificial sample collection generated when game.

S603, pre-training is carried out to neural network model according to the artificial sample collection of acquisition.

S604, judge whether artificial sample collection has been trained to or whether has completed the pre- of game by neural network model If progress, if so, S605 is executed, if it is not, then executing S603.

S605, machine sample is obtained to game progress autonomous learning by the neural network model for completing pre-training Collection.

The machine sample set in artificial sample collection and replay memory in S606, acquisition artificial sample pond.

Replay memory is constructed according to the machine sample set being collected into.

S607, the neural network model that pre-training is completed according to the artificial sample collection and the training of machine sample set.

Wherein, neural network model training process mainly includes two stages, two stages be respectively the pre-training stage and To the stage that the neural network model for completing pre-training is trained, S601-S604 can be used as pre-training stage, S605- S607 can be used as the stage being trained to the neural network model for completing pre-training.

Based on the neural network model training method that previous embodiment provides, the present embodiment provides a kind of neural network models Training device 700, referring to Fig. 7 a, described device 700 includes first acquisition unit 701, second acquisition unit 702 and training unit 703:

The first acquisition unit 701, the artificial sample collection generated for obtaining learning object according to user's operation；

The second acquisition unit 702, for obtaining the neural network model for being directed to the learning object in the study Autonomous learning obtains machine sample set in object；

The training unit 703, for according to the artificial sample collection and the machine sample set training neural network mould Type.

In one implementation, referring to Fig. 7 b, described device 700 further includes pre-training unit 704:

The pre-training unit 704, for carrying out pre-training to the neural network model according to the artificial sample collection；

The training unit 703, is specifically used for:

The neural network model of pre-training is completed according to the artificial sample collection and the training of machine sample set.

In one implementation, when the artificial sample collection has been trained to or is completed by the neural network model The default progress of the learning object, the pre-training unit 704 determine that the neural network model completes pre-training.

In one implementation, the artificial sample collection includes being implemented in the learning object by user's operation Movement, in implementation movement the learning object environmental parameter, and movement implement after the learning object feedback parameter Corresponding relationship between three；

The machine sample set includes the movement implemented in the learning object by the neural network model, in reality Apply the environmental parameter of learning object when movement, and movement implement after the learning object feedback parameter three between Corresponding relationship.

In one implementation, the training unit 703, is specifically used for:

Sample is chosen from the artificial sample collection and the machine sample set respectively to obtain for training the nerve net The training sample set of network model；The training sample is concentrated to meet between the quantity of artificial sample and the quantity of machine sample and be preset Ratio；

The neural network model is trained according to the training sample set.

In one implementation, the used training sample of n-th training of the neural network model is concentrated artificial Preset ratio between sample size and machine sample size is the instruction according to the N-1 times training to the neural network model Practice what result determined.

In one implementation, described in the used training sample set of n-th training to the neural network model Artificial sample proportion is less than artificial described in the N-1 times used training sample set of training to the neural network model Sample proportion.

In one implementation, the second acquisition unit 702, is specifically used for:

It, will be according to the neural network model in the learning object during training neural network model The machine sample set is added as machine sample in the data that autonomous learning obtains.

In one implementation, for any one target sample in the artificial sample collection or machine sample set, institute State the feedback parameter of the learning object after implementing in target sample including target action, after the target action is implemented The feedback parameter for practising object includes implementing the reward parameter and/or exist according to the target action is implemented that the target action obtains Environmental parameter obtained in the learning object.

The embodiment of the present application also provides a kind of equipment for neural network model training, with reference to the accompanying drawing to being used for The equipment of neural network model training is introduced.Shown in Figure 8, the embodiment of the present application provides a kind of for nerve net The equipment 800 of network model training, the equipment 800 can be server, can generate bigger difference because configuration or performance are different It is different, it may include one or more central processing unit (Central Processing Units, abbreviation CPU) 822 (examples Such as, one or more processors) and memory 832, one or more storage application programs 842 or data 844 Storage medium 830 (such as one or more mass memory units).Wherein, memory 832 and storage medium 830 can be Of short duration storage or persistent storage.The program for being stored in storage medium 830 may include that one or more modules (do not mark by diagram Out), each module may include to the series of instructions operation in server.Further, central processing unit 822 can be set It is set to and is communicated with storage medium 830, one in storage medium 830 is executed in the equipment 800 for neural network model training Series of instructions operation.

For neural network model training equipment 800 can also include one or more power supplys 826, one or one A above wired or wireless network interface 850, one or more input/output interfaces 858, and/or, one or one with Upper operating system 841, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..

The step as performed by server can be based on the server architecture shown in Fig. 8 in above-described embodiment.

Wherein, CPU 822 is for executing following steps:

It obtains the neural network model for being directed to learning object autonomous learning in the learning object and obtains machine sample This collection；

Shown in Figure 9, the embodiment of the present application provides a kind of equipment 900 for neural network model training, should Equipment 900 can also be terminal device, the terminal device can be include mobile phone, tablet computer, personal digital assistant (Personal Digital Assistant, abbreviation PDA), point-of-sale terminal (Point of Sales, abbreviation POS), vehicle mounted electric Any terminal device such as brain, by taking terminal device is mobile phone as an example:

Fig. 9 shows the block diagram of the part-structure of mobile phone relevant to terminal device provided by the embodiments of the present application.Ginseng Fig. 9 is examined, mobile phone includes: radio frequency (Radio Frequency, abbreviation RF) circuit 910, memory 920, input unit 930, display Unit 940, sensor 950, voicefrequency circuit 960, Wireless Fidelity (wireless fidelity, abbreviation WiFi) module 970, place Manage the components such as device 980 and power supply 990.It will be understood by those skilled in the art that handset structure shown in Fig. 9 is not constituted Restriction to mobile phone may include perhaps combining certain components or different component cloth than illustrating more or fewer components It sets.

It is specifically introduced below with reference to each component parts of the Fig. 9 to mobile phone:

RF circuit 910 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, handled to processor 980；In addition, the data for designing uplink are sent to base station.In general, RF circuit 910 Including but not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier (Low Noise Amplifier, abbreviation LNA), duplexer etc..In addition, RF circuit 910 can also by wireless communication with network and other equipment Communication.Any communication standard or agreement, including but not limited to global system for mobile communications can be used in above-mentioned wireless communication (Global System of Mobile communication, abbreviation GSM), general packet radio service (General Packet Radio Service, abbreviation GPRS), CDMA (Code Division Multiple Access, referred to as CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, abbreviation WCDMA), long term evolution (Long Term Evolution, abbreviation LTE), Email, short message service (Short Messaging Service, letter Claim SMS) etc..

Memory 920 can be used for storing software program and module, and processor 980 is stored in memory 920 by operation Software program and module, thereby executing the various function application and data processing of mobile phone.Memory 920 can mainly include Storing program area and storage data area, wherein storing program area can application journey needed for storage program area, at least one function Sequence (such as sound-playing function, image player function etc.) etc.；Storage data area can be stored to be created according to using for mobile phone Data (such as audio data, phone directory etc.) etc..It, can be in addition, memory 920 may include high-speed random access memory Including nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-states Part.

Input unit 930 can be used for receiving the number or character information of input, and generate with the user setting of mobile phone with And the related key signals input of function control.Specifically, input unit 930 may include that touch panel 931 and other inputs are set Standby 932.Touch panel 931, also referred to as touch screen, collect user on it or nearby touch operation (such as user use The operation of any suitable object or attachment such as finger, stylus on touch panel 931 or near touch panel 931), and root Corresponding attachment device is driven according to preset formula.Optionally, touch panel 931 may include touch detecting apparatus and touch Two parts of controller.Wherein, the touch orientation of touch detecting apparatus detection user, and touch operation bring signal is detected, Transmit a signal to touch controller；Touch controller receives touch information from touch detecting apparatus, and is converted into touching Point coordinate, then gives processor 980, and can receive order that processor 980 is sent and be executed.Furthermore, it is possible to using electricity The multiple types such as resistive, condenser type, infrared ray and surface acoustic wave realize touch panel 931.In addition to touch panel 931, input Unit 930 can also include other input equipments 932.Specifically, other input equipments 932 can include but is not limited to secondary or physical bond One of disk, function key (such as volume control button, switch key etc.), trace ball, mouse, operating stick etc. are a variety of.

Display unit 940 can be used for showing information input by user or be supplied to user information and mobile phone it is various Menu.Display unit 940 may include display panel 941, optionally, can use liquid crystal display (Liquid Crystal Display, abbreviation LCD), the forms such as Organic Light Emitting Diode (Organic Light-Emitting Diode, abbreviation OLED) To configure display panel 941.Further, touch panel 931 can cover display panel 941, when touch panel 931 detects After touch operation on or near it, processor 980 is sent to determine the type of touch event, is followed by subsequent processing 980 basis of device The type of touch event provides corresponding visual output on display panel 941.Although in Fig. 9, touch panel 931 and display Panel 941 is the input and input function for realizing mobile phone as two independent components, but in some embodiments it is possible to It is touch panel 931 and display panel 941 is integrated and that realizes mobile phone output and input function.

Mobile phone may also include at least one sensor 950, such as optical sensor, motion sensor and other sensors. Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ambient light Light and shade adjust the brightness of display panel 941, proximity sensor can close display panel 941 when mobile phone is moved in one's ear And/or backlight.As a kind of motion sensor, accelerometer sensor can detect (generally three axis) acceleration in all directions Size, can detect that size and the direction of gravity when static, can be used to identify the application of mobile phone posture, (for example horizontal/vertical screen is cut Change, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.；May be used also as mobile phone The other sensors such as gyroscope, barometer, hygrometer, thermometer, the infrared sensor of configuration, details are not described herein.

Voicefrequency circuit 960, loudspeaker 961, microphone 962 can provide the audio interface between user and mobile phone.Audio-frequency electric Electric signal after the audio data received conversion can be transferred to loudspeaker 961, be converted to sound by loudspeaker 961 by road 960 Signal output；On the other hand, the voice signal of collection is converted to electric signal by microphone 962, is turned after being received by voicefrequency circuit 960 It is changed to audio data, then by after the processing of audio data output processor 980, such as another mobile phone is sent to through RF circuit 910, Or audio data is exported to memory 920 to be further processed.

WiFi belongs to short range wireless transmission technology, and mobile phone can help user's transceiver electronics postal by WiFi module 970 Part, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Fig. 9 is shown WiFi module 970, but it is understood that, and it is not belonging to must be configured into for mobile phone, it can according to need do not changing completely Become in the range of the essence of invention and omits.

Processor 980 is the control centre of mobile phone, using the various pieces of various interfaces and connection whole mobile phone, is led to It crosses operation or executes the software program and/or module being stored in memory 920, and call and be stored in memory 920 Data execute the various functions and processing data of mobile phone, to carry out integral monitoring to mobile phone.Optionally, processor 980 can wrap Include one or more processing units；Preferably, processor 980 can integrate application processor and modem processor, wherein answer With the main processing operation system of processor, user interface and application program etc., modem processor mainly handles wireless communication. It is understood that above-mentioned modem processor can not also be integrated into processor 980.

Mobile phone further includes the power supply 990 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply pipe Reason system and processor 980 are logically contiguous, to realize management charging, electric discharge and power managed by power-supply management system Etc. functions.

Although being not shown, mobile phone can also include camera, bluetooth module etc., and details are not described herein.

In the present embodiment, processor 980 included by the terminal device is also with the following functions:

The embodiment of the present application also provides a kind of computer readable storage medium, and the computer readable storage medium is for depositing Program code is stored up, said program code is for executing the instruction of any one neural network model described in the corresponding embodiment of Fig. 1 to Fig. 6 Practice method.

The description of the present application and term " first " in above-mentioned attached drawing, " second ", " third ", " the 4th " etc. are (if deposited ) it is to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that use in this way Data are interchangeable under appropriate circumstances, so that embodiments herein described herein for example can be in addition to illustrating herein Or the sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that Cover it is non-exclusive include, for example, containing the process, method, system, product or equipment of a series of steps or units need not limit In step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, produce The other step or units of product or equipment inherently.

It should be appreciated that in this application, " at least one (item) " refers to one or more, and " multiple " refer to two or two More than a."and/or" indicates may exist three kinds of relationships, for example, " A and/or B " for describing the incidence relation of affiliated partner It can indicate: only exist A, only exist B and exist simultaneously tri- kinds of situations of A and B, wherein A, B can be odd number or plural number.Word Symbol "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or"." at least one of following (a) " or its similar expression, refers to Any combination in these, any combination including individual event (a) or complex item (a).At least one of for example, in a, b or c (a) can indicate: a, b, c, " a and b ", " a and c ", " b and c ", or " a and b and c ", and wherein a, b, c can be individually, can also To be multiple.

In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the application Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, letter Claim ROM), random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.

The above, above embodiments are only to illustrate the technical solution of the application, rather than its limitations；Although referring to before Embodiment is stated the application is described in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features；And these It modifies or replaces, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of neural network model training method, which is characterized in that the described method includes:

It obtains the neural network model for being directed to learning object autonomous learning in the learning object and obtains machine sample set；

2. the method according to claim 1, wherein described according to the artificial sample collection and machine sample set Before the training neural network model, the method also includes:

Pre-training is carried out to the neural network model according to the artificial sample collection；

It is described that the neural network model is trained according to the artificial sample collection and machine sample set, comprising:

The neural network model for completing the pre-training is trained according to the artificial sample collection and machine sample set.

3. according to the method described in claim 2, it is characterized in that, when the artificial sample collection has been trained to or by described Neural network model completes the default progress of the learning object, determines that the neural network model completes pre-training.

4. method according to claim 1 to 3, which is characterized in that the artificial sample collection includes passing through user Operate the movement implemented in the learning object, in implementation movement the learning object environmental parameter, and movement is real Apply the corresponding relationship between the feedback parameter three of the rear learning object；

The machine sample set includes the movement implemented in the learning object by the neural network model, is implementing to move Correspondence after the environmental parameter of learning object when making, and movement implementation between the feedback parameter three of the learning object Relationship.

5. the method according to claim 1, wherein described assemble for training according to the artificial sample collection and machine sample Practice the neural network model the step of include:

Sample is chosen from the artificial sample collection and the machine sample set respectively to obtain for training the neural network mould The training sample set of type；The training sample, which is concentrated, meets default ratio between the quantity of artificial sample and the quantity of machine sample Example；The neural network model is trained according to the training sample set.

6. according to the method described in claim 5, it is characterized in that, the n-th training to the neural network model is used It is according to the neural network model that training sample, which concentrates the preset ratio between artificial sample quantity and machine sample size, The N-1 times training training result determine.

7. according to the method described in claim 5, it is characterized in that, the n-th training to the neural network model is used Artificial sample proportion described in training sample set, which is less than, uses training to the N-1 times training of the neural network model Artificial sample proportion described in sample set.

8. the method according to claim 1, wherein described obtain the neural network mould for being directed to the learning object Type includes: the step of autonomous learning obtains machine sample set in the learning object

It, will be autonomous in the learning object according to the neural network model during training neural network model Learn obtained data as machine sample and the machine sample set is added.

9. method according to claim 4, which is characterized in that for any one in the artificial sample collection or machine sample set A target sample, includes the feedback parameter of the learning object after target action is implemented in the target sample, and the target is dynamic The feedback parameter for making the learning object after implementing includes implementing reward parameter that the target action obtains and/or according to implementation Target action environmental parameter obtained in the learning object.

10. a kind of neural network model training device, which is characterized in that described device includes first acquisition unit, the second acquisition Unit and training unit:

The first acquisition unit, the artificial sample collection generated for obtaining learning object according to user's operation；Described second obtains Unit is taken, obtains machine for obtaining the neural network model for being directed to learning object autonomous learning in the learning object Sample set；

The training unit, for according to the artificial sample collection and the machine sample set training neural network model.

11. device according to claim 10, which is characterized in that described device further includes pre-training unit:

The pre-training unit, for carrying out pre-training to the neural network model according to the artificial sample collection；

The training unit, is specifically used for:

12. device according to claim 11, which is characterized in that when the artificial sample collection has been trained to or has passed through institute The default progress that neural network model completes the learning object is stated, determines that the neural network model completes pre-training.

13. a kind of equipment for neural network model training, which is characterized in that the equipment includes processor and storage Device:

The processor is used for according to the described in any item neural networks of instruction execution claim 1-9 in said program code Model training method.

14. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is for storing program generation Code, said program code require described in any item neural network model training methods described in 1-9 for perform claim.