CN109550252A

CN109550252A - A kind of game AI training method, apparatus and system

Info

Publication number: CN109550252A
Application number: CN201811323771.9A
Authority: CN
Inventors: 徐波
Original assignee: GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Multi Benefit Network Co Ltd; Guangzhou Duoyi Network Co Ltd
Current assignee: GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Multi Benefit Network Co Ltd; Guangzhou Duoyi Network Co Ltd
Priority date: 2018-11-07
Filing date: 2018-11-07
Publication date: 2019-04-02

Abstract

The invention discloses a kind of game AI training methods, apparatus and system, the method comprise the steps that the current status data of acquisition is transmitted to server-side when running game process；Control the execution of delay voltage；The corresponding decision data of current status data is obtained, and terminates the execution of delay voltage；Decision movement is executed according to the game AI role that decision data control is trained to, and generates reward data and succeeding state data；By current status data, decision data, reward data and succeeding state data organization at training sample, and training sample is transmitted to server-side, to realize that server-side is trained training network based on training sample and updates decision networks according to the data of training network, until network convergence.The present invention can control the execution of delay voltage during client waits server-side to return to decision data, so that game continuous service, so that the game without built-in game pause function is also able to achieve game AI training and is normally carried out.

Description

A kind of game AI training method, apparatus and system

Technical field

The present invention relates to game artificial intelligence technical field more particularly to a kind of game AI training methods, apparatus and system.

Background technique

Artificial intelligence (AI, Artificial Intelligence) is to make computer to simulate certain thought processes of people (such as learn, plan) technology with intelligent behavior, the game AI for applying to field of play is mainly made by series of algorithms Response type, self-adapting type or intellectual behavior are generated in non-player role, make non-player role that there is intelligent behavior.Outstanding Game AI can make corresponding response according to the variation of itself environment, increase the rich and diversity of game.

In order to allow game AI that can spontaneously produce the response to environment, at present by enhancing learning art come training game AI, the search for identity using program and the imitation to human behavior, response of the spontaneous generation to environment eliminate engineer's trip Play rule.Currently, to make sampling is synchronous with training to carry out, need to suspend the operation of game in game AI training process, etc. It is further continued for running after game AI has been trained, i.e. game is alternately performed with game AI, and game running generates volume of data, game Pause, game AI obtain the data and make a policy, and game waits game AI to make a policy, and transports again after getting decision Row, realizes that sampling is synchronous with training with this.

However, the problems such as many game are due to game internal clocking at present, does not support to suspend trip in training game AI Play, game, which does not suspend, can make sampling asynchronous with game AI training, can not be normally carried out so as to cause game AI training.

Summary of the invention

The embodiment of the present invention provides a kind of game AI training method, apparatus and system, to solve in the prior art without temporary The game for stopping function can not be normally carried out the technical problem of game AI training.

In order to solve the above-mentioned technical problems, the present invention provides a kind of game AI training methods, are suitable for client, described Method includes:

When running game process, the current status data of acquisition is transmitted to server-side；Wherein, the game process is It is distributed by the server-side；

Control the execution of delay voltage；Wherein, the delay voltage is that the server-side is waited to return to the current state When the corresponding decision data of data, movement that the game AI role being trained to executes；

The corresponding decision data of the current status data is obtained, and terminates the execution of the delay voltage；

The game AI role being trained to according to the corresponding decision data control of the current status data executes decision Movement, and generate reward data and succeeding state data；

By the current status data, the decision data, the reward data and the succeeding state data organization at Training sample, and the training sample is transmitted to server-side, to realize that the server-side is based on the training sample to described The training network of server-side is trained and updates the decision networks of the server-side according to the data of training network, until network Convergence.

Further, the game process is that the process distributed by the server-side includes:

Whole game data to be trained is divided into advance by several pieces game data by the server-side, and will be described several Part game data is respectively allocated to corresponding client；It wherein, include that at least one is to be run in every part of game data Game process.

Further, the corresponding decision data of the current status data that obtains includes:

Detect the execution time of the delay voltage；

When the execution time of the delay voltage being more than threshold value, phase is randomly selected from the decision data that server-side returns The data answered are as the corresponding decision data of the current status data.

Further, the game process includes several；Then, described when running game process, by the current of acquisition Status data transfers to server-side specifically includes:

When running game process, the corresponding current status data of each game process is acquired；

After the current status data of each game process has been collected, by the current shape of each game process State data are used as with a batch of current status data uniform transmission to the server-side.

It is further, described to obtain the corresponding decision data of the current status data specifically:

It is unified to obtain with a batch of whole corresponding decision data of the current status data.

Further, data correlation is added according to current status data of the collection sequencing to each game process Information, to realize the corresponding current status data association of each decision data；Wherein, each decision data also has The identical data association information of the corresponding current status data.

The present invention also provides a kind of game AI training devices, comprising:

First state data module, for when running game process, the current status data of acquisition to be transmitted to service End；Wherein, the game process is distributed by the server-side；

Delay voltage execution module, for controlling the execution of delay voltage；Wherein, the delay voltage is to wait server-side When returning to the corresponding decision data of the current status data, movement that the game AI role being trained to executes；

First decision data module is prolonged for obtaining the corresponding decision data of the current status data, and described in end The execution acted late；

Decision action executing module, for being trained to according to the corresponding decision data control of the current status data Game AI role execute decision movement, and generate reward data and succeeding state data；

First training sample module is used for the current status data, the decision data, the reward data and institute Succeeding state data organization is stated into training sample, and the training sample is transmitted to server-side, to realize the service end group It is trained in training network of the training sample to the server-side and updates the service according to the data of training network The decision networks at end, until network convergence.

The present invention also provides a kind of game AI training methods, are suitable for server-side, which comprises

Receive the current status data sent by client；Wherein, the current status data is run by the client Game process generates；

According to the current status data, generates corresponding decision data and feed back to client；Wherein, the decision number Generation when executing delay voltage according to the game AI role being trained to for client control；

Receive the training sample that is sent by client, wherein the training sample for client according to the decision data, The current status data and the reward data that generates and succeeding state number after decision movement are executed according to the decision data According to organizing；

Training network is trained according to the training sample and updates decision-making mode according to the data of the trained network Network, until network convergence.

Further, before the current status data for receiving and being sent by client, further includes:

Complete machine game data to be trained is divided into several pieces game data, and the several pieces game data is divided respectively It is assigned to corresponding client；It wherein, include at least one game process to be run in every part of game data.

Further, described that training network is trained and according to the number of the trained network according to the training sample It is specifically included according to decision networks is updated:

It randomly selects several training samples and is input to the trained network, and using stochastic gradient descent method to trip Play AI is trained；

The frequency of training of the trained network is trained in detection according to several training samples of selection；

When the frequency of training is more than preset times, according to the data of training network to the data of the decision networks into Row updates.

Further, described according to the current status data, generate corresponding decision data and feed back to client it Afterwards, further includes:

Delete the corresponding current status data of the decision data for having fed back to client.

Second status data module, for receiving the current status data sent by client；Wherein, the current state Data are generated by client running game process；

Second decision data module, for generating corresponding decision data and feeding back to according to the current status data Client；Wherein, the decision data is that the client controls generation when the game AI role being trained to executes delay voltage；

Second training sample module, for receiving the training sample of client transmission, wherein the training sample is client End is according to the decision data, the current status data and time generated after decision movement is executed according to the decision data Count off evidence and succeeding state data organization form；

Training module, for being trained and training network according to the number of the trained network according to the training sample According to decision networks is updated, until network convergence.

The present invention also provides a kind of game AI training systems, which is characterized in that including server-side and N number of client, N >= 1；Wherein, each client executing following steps:

When running game process, the current status data of acquisition is transmitted to the server-side；Wherein, the game into Journey is distributed by the server-side；

Control the execution of delay voltage；Wherein, the delay voltage is that server-side is waited to return to the current status data When corresponding decision data, the movement for the game AI role execution being trained to；

By the current status data, the decision data, the reward data and the succeeding state data organization at Training sample, and the training sample is transmitted to server-side, to realize that the server-side is based on the training sample to described The training network of server-side is trained and updates the decision networks of the server-side according to the data of training network, until network Convergence；

Complete machine game data to be trained is being divided into several pieces game data by the server-side, and the several pieces are swum Play data are respectively allocated to after corresponding client, execute following steps:

Receive the current status data sent by client；Wherein, described in the current status data is run as client Game process generates；

Receive the training sample that client is sent, wherein the training sample is client according to the decision data, institute It states current status data and executes the reward data and succeeding state data generated after decision movement according to the decision data It organizes；

A kind of game AI training method of above-mentioned offer, apparatus and system, can be by waiting server-side to return to decision number According to period, the game AI role that client control is trained to executes delay voltage, so that the game running of client continues, is not required to Suspend game to wait the decision data of server-side, so that the game without built-in game pause function is also able to achieve trip Play AI training is normally carried out.

Detailed description of the invention

Fig. 1 is the structural schematic diagram of game AI training platform provided in an embodiment of the present invention；

Fig. 2 is a kind of flow chart for game AI training method that the embodiment of the present invention one provides；

Fig. 3 is the flow chart of one embodiment of step S100 in embodiment illustrated in fig. 2；

Fig. 4 is the flow chart of one embodiment of step S300 in embodiment illustrated in fig. 2；

Fig. 5 is the structural schematic diagram of the deep neural network in the embodiment of the present invention；

Fig. 6 is a kind of structural schematic diagram of game AI training device provided by Embodiment 2 of the present invention；

Fig. 7 is a kind of flow chart for game AI training method that the embodiment of the present invention three provides；

Fig. 8 is the flow chart of one embodiment of step S704 in embodiment illustrated in fig. 7；

Fig. 9 is a kind of structural schematic diagram for game AI training device that the embodiment of the present invention four provides.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Embodiment one

Game AI training method provided in this embodiment is executed by game AI training platform, is in development of games stage reality It applies, game AI training platform is to collect game running data, being trained customization to game AI.Specifically, this implementation The game AI training method that example provides is executed by the part of running game in game AI training platform, such as game AI training platform point Client to sample game data and the server-side to game AI training, referring to Fig. 1, Fig. 1 is that the embodiment of the present invention provides Game AI training platform structural schematic diagram；Then, game AI training method provided in this embodiment is by client executing, in reality It applies in example, is described so that game AI training platform is divided into client and server-side as an example.

Referring to Fig. 2, Fig. 2 is a kind of flow chart for game AI training method that the embodiment of the present invention one provides；The present invention A kind of game AI training method is provided, client is suitable for, which comprises

S100, when running game process, the current status data of acquisition is transmitted to server-side；Wherein, the game Process is distributed by the server-side.

Wherein, current status data is the number for the current state for indicating that several game roles are locating in going game According to the continuous service of game, the state of the game role in game may change, and current status data represents trip The status data taken action of play role's the last time, can be, but not limited to include the current operation status of several roles state The status data of the going games such as the status data of data, current action state；For example, having A role, B role, the angle C in game Color, corresponding current state are to attack state, state of escaping, in certain specific location wait state.In general, game angle The current state of color can reflect the ambient enviroment of game AI role in gaming.

Specifically, client runs the game process distributed by server-side, the raw operational data of game process is generated, it will After raw operational data is pre-processed, identified, it is converted into current status data used in decision, and by establishing with server-side Connection, current status data is transmitted to server-side.

Optionally, the game process of client operation can be server-side and be assigned in all game datas of the client Game process, be also possible to the game of the partial game data that server-side is assigned in all game datas of the client into Journey.

Preferably, the game process is that the process distributed by the server-side includes:

Specifically, whole game data is divided into several pieces game data, and distribute to corresponding client, multiple client The operation of corresponding game data is carried out, the generation of data can be accelerated, realizes distributed running game data, sampling efficiency Height shortens the game R&D cycle, even and can also accelerate the generations of data without the game of built-in game acceleration function；Together When, the current status data of several clients is transmitted to server-side processing, can give full play to GPU (that is: the image of server-side Processor) parallel processing function, reduce research and development cost.

Referring to Fig. 3, Fig. 3 is the flow chart of one embodiment of step S100 in embodiment illustrated in fig. 2；Preferably, institute Stating game process includes several；Then, described when running game process in step S100, by the current status data of acquisition Server-side is transmitted to specifically include:

S101, when running game process, acquire the corresponding current status data of each game process；

S102, after the current status data of each game process has been collected, by each game process Current status data is used as with a batch of current status data uniform transmission to the server-side.

Specifically, client runs several game process, the corresponding current status data of each game process is generated, is collected Same batch is re-used as after the current status data of complete each game process and is transmitted to server-side, it is possible to reduce transmission time Number, and multiple game roles are had under normal circumstances, in game environment, the operation of each game role may use different trips It plays Process flowchart, it is unified that the current status data of each game process is transmitted to server-side, it can allow pair of server-side Game environment has more comprehensive analysis on the whole, and the decision data of return more meets current game environment.

Preferably, data correlation letter is added according to current status data of the collection sequencing to each game process Breath, to realize the corresponding current status data association of each decision data；Wherein, each decision data also have with The identical data association information of its corresponding described current status data.

Wherein, data association information is the associated letter of current status data for realizing that each decision data is corresponding Breath, and the data correlation data being added in each decision data and corresponding current status data are identical, can make It obtains client and acquires decision data corresponding with current status data.

Optionally, data association information be SID information, SID information may include the corresponding game of current status data into The current time stamp of client when the process ID of journey, acquisition current status data.

Optionally, when multiple client running game data in a distributed manner, the data association information can be added pair The client-side information answered.Then, client-side information can be the MAC Address of client.

S200, the execution for controlling delay voltage；Wherein, the delay voltage is to wait the server-side return described current When the corresponding decision data of status data, movement that the game AI role being trained to executes.

Wherein, decision data is that server-side carries out a series of calculating to current status data, analysis, simulates the use obtained The data for responding the behavior act of locating game environment are executed in control game AI role.Decision data is determining by server-side Plan assessing network is calculated, specifically, decision networks is a kind of deep neural network, evaluates to execute under each state and respectively determine The value of plan calculates and generates decision data.

Specifically, server-side is to client end response and return pair after current status data is transmitted to server-side by client The decision data value client answered needs the regular hour, and server-side is being waited to return to the corresponding decision data of current status data When, the game continuous service of client controls the game AI role being trained to and executes delay voltage.In this way, being waited in client When server-side returns to current status data corresponding decision data, the game of client does not need to suspend yet, continuous service game, Specifically game AI role executes delay voltage in the operational process when waiting, so that the game of pause function can also be not just Often carry out game AI training.

Optionally, delay voltage can be that game AI role is as you were, be also possible to other random movements.

S300, the corresponding decision data of the current status data is obtained, and terminates the execution of the delay voltage.

Referring to Fig. 4, Fig. 4 is the flow chart of one embodiment of step S300 in embodiment illustrated in fig. 2；Preferably, institute Stating the corresponding decision data of the acquisition current status data includes:

S301, the execution time for detecting the delay voltage；

S302, when the execution time of the delay voltage being more than threshold value, it is random from the decision data that server-side returns Corresponding data are extracted as the corresponding decision data of the current status data.

Specifically, detection delay voltage the execution time, and when being executed between be more than threshold value when, before server-side Corresponding data are randomly selected in the decision data of return as the corresponding decision-making state of current status data, are avoided due to net It is excessively slow that the decision data of server-side caused by the reasons such as network is unstable returns, so that the game AI role in game stops always Stay in the state for executing delay voltage, the situation that game sampling efficiency lowers；And by from the decision data that server-side returns Corresponding data are randomly selected as the corresponding decision data of the current status data, may further ensure that game is persistently transported Row does not have to pause game.

Optionally, the threshold value for executing the time may be configured as 100ms.

It is preferably, described to obtain the corresponding decision data of the current status data specifically:

Specifically, client acquires the corresponding current status data of each game process when game process includes several As same batch uniform transmission to server-side after complete, then, when client obtains decision data, uniformly obtained from server-side same Whole corresponding decision datas of the current status data of batch.

S400, the game AI role being trained to according to the corresponding decision data control of the current status data execute Decision movement, and generate reward data and succeeding state data.

Wherein, decision movement refers to the movement executed according to decision data, is the response to current state, such as when knowing When B role is escaping, decision movement is pursuit B role, then the game AI role being trained to executes decision movement, to B role Escape condition responsive be to B role pursue.Reward data refer to execute decision act as a result, reward data can be It is positive or negative, it is however generally that, it is that positive reward data means to execute the result is that " good ", to reaching final mesh Indicate positive effect.Reward data is determined by the Reward Program designed.The game AI role being trained to is exactly by long-term instruction It is experienced, it is interacted with the external world and observes the return under each state to learn action sequence.Succeeding state data are to execute decision to move Current status data after work, succeeding state data are also by the current status data as subsequent transmission to server-side, client Persistently carry out data sampling.

S500, by the current status data, the decision data, the reward data and the succeeding state data group It is made into training sample, and the training sample is transmitted to server-side, to realize that the server-side is based on the training sample pair The training network of the server-side is trained and updates according to the data of training network the decision networks of the server-side, until Network convergence.

Wherein, the current status data in training sample, decision data, reward data and succeeding state data are opposite The data answered；The training network and decision networks of server-side are deep neural networks.

Optionally, the data of training network and decision networks may be configured as identical at the beginning.

Referring to Fig. 5, Fig. 5 is the structural schematic diagram of the deep neural network in the embodiment of the present invention；Optionally, depth mind Structure through network can be set as including 2 convolutional layers and 2 full articulamentums, be specifically divided into the first convolutional layer 51, the second convolutional layer 52, the first full articulamentum 53, the second full articulamentum 54；Convolutional layer is used for feature extraction, and full articulamentum is determined as classifier calculated The decision in plan space is distributed；Each convolutional layer is all made of batch standardization and relu activation primitive, and each full articulamentum is all made of Relu activation primitive.

It optionally, is the consumption of the study idea of balance games AI and calculating, the parameter of convolutional layer and full articulamentum can basis Specific game is adjusted, it is however generally that, the convolution kernel size of the first convolutional layer 51 can be bigger, and step-length is generally 1~4, leads to The n times power that road number is 2；Port number is bigger, and the learning ability of model is stronger, but corresponding calculating consumption is bigger.Illustratively, The convolution kernel size of one convolutional layer 51 is set as 6 × 6, port number 8, step-length 3；The convolution kernel of second convolutional layer 52 is having a size of 3 × 3, port number 8, step-length 2；First complete 53 port number of articulamentum is 128；The port number of second full articulamentum 54 is decision Space number.

It should be noted that training network needs to be implemented the training of preset times, and the parameter of decision networks is regularly updated, And preset times are arranged according to the case where specific game data.

When it is implemented, the current status data of acquisition is passed when the game process of client operation service end distribution Transport to server-side；When client waits the server-side to return to the current status data corresponding decision data, control delay The execution of movement, so that without suspending game, continuous service game in waiting process；It is corresponding to obtain the current status data Decision data, and terminate the execution of the delay voltage；According to the corresponding decision data control of the current status data The game AI role being trained to executes decision movement, and generates reward data and succeeding state data；By the current state number According to, the decision data, the reward data and the succeeding state data organization at training sample, and by the training sample It is transmitted to server-side, to realize that the server-side is trained simultaneously based on training network of the training sample to the server-side The decision networks of the server-side is updated according to the data of training network, until network convergence, is realized with this and swum in continuous service In the case where play, the training to game AI is normally completed.

Implement game AI training method provided in this embodiment, during decision data capable of being returned to by waiting server-side, The game AI role that client control is trained to executes delay voltage, so that the game running of client continues, does not need to suspend Game waits the decision data of server-side, so that the game without built-in game pause function is also able to achieve game AI instruction White silk is normally carried out.

In order to make it easy to understand, being further Jie to the implementation process of game AI training method provided in this embodiment below It continues:

Referring to Fig. 1, Fig. 1 is the structural schematic diagram of game AI training platform provided in an embodiment of the present invention；In game AI Training platform building server-side and client, client can execute the game AI training method in one or more hosts, There are the first exchange data pool, game management module, training sample molded tissue block in client foundation；Have second in server-side creation Exchange pond, decision networks, training network.It is counted between first exchange data pool and the second exchange data pool by network connection According to interaction；Game management module can with running game data, generate operation data etc., training sample molded tissue block can will be related Data organization is at training sample for server-side training.

Further, client first exchange data pool with the second of server-side exchange data pool establish it is stateful Pond, decision pond and training pool；Decision networks and training network are deep neural network, contain function model.

The game data that client is distributed at game management module operation service end, the operation data of generation is located in advance Current status data is generated after reason, identification, and by the state pool of current status data deposit client, when the game data of operation It, can be after the current status data for having collected each game process, by working as each game process when containing several game process Preceding status data is transmitted to the state pool of server-side as a batch.

At this point, the game AI role that client control is trained to executes delay voltage.

Under normal circumstances, within a certain period of time, client can get current state number from the decision pond of server-side According to corresponding decision data, and it is stored in the decision pond of client, wherein the generation process of decision data is the state pool of server-side After getting current status data, server-side is pre-processed to current status data and is transferred to the decision networks of server-side, Decision networks carries out the processing such as analytical calculation to current status data, obtains decision data；In abnormal cases, client is one Server-side has not been obtained in threshold value of fixing time and returns to corresponding decision data, the acquisition that client can be stored from decision pond at random Decision data as the corresponding decision data of current status data.

After client obtains decision data, the delay voltage executes stopping, and client is determined according to decision data execution It instigates to make, generates reward data and succeeding state data.

The training sample molded tissue block of client will obtain respectively current state number from the state pool of client, decision pond According to, decision data, and by current status data, decision data and corresponding reward data, succeeding state data organization at instruction Practice sample, is stored in the training pool of client；

The training sample for being stored into client training pool is transmitted to the training pool storage of server-side, to realize the instruction of server-side Practice after the training sample randomly selected is transferred to trained network by pond, training network is trained game AI role and periodically more New decision networks, until network convergence.

Embodiment two

Referring to Fig. 6, Fig. 6 is a kind of structural schematic diagram of game AI training device provided by Embodiment 2 of the present invention；This Inventive embodiments two additionally provide a kind of game AI training device, comprising:

First state data module 11, for when running game process, the current status data of acquisition to be transmitted to clothes Business end；Wherein, the game process is distributed by the server-side；

Delay voltage execution module 12, for controlling the execution of delay voltage；Wherein, the delay voltage be etc. it is to be serviced When end returns to the current status data corresponding decision data, movement that the game AI role being trained to executes；

First decision data module 13, for obtaining the corresponding decision data of the current status data, and described in end The execution of delay voltage；

Decision action executing module 14, for being instructed according to the corresponding decision data control of the current status data is described Experienced game AI role executes decision movement, and generates reward data and succeeding state data；

First training sample module 15, for by the current status data, the decision data, the reward data and The succeeding state data organization is transmitted to server-side at training sample, and by the training sample, to realize the server-side It is trained based on training network of the training sample to the server-side and updates the clothes according to the data of training network The decision networks at business end, until network convergence.

Preferably, the first decision data module 13 further include:

Detection unit, for detecting the execution time of the delay voltage；

Stochastic Decision-making data capture unit, for when the execution time of the delay voltage be more than threshold value when, from server-side Corresponding data are randomly selected in the decision data of return as the corresponding decision data of the current status data.

Preferably, the game process includes several；Then, the first state data module 11 further include:

Current status data acquisition unit, it is corresponding current for when running game process, acquiring each game process Status data；

Current status data uniform transmission unit has been collected for the current status data when each game process Afterwards, using the current status data of each game process as with a batch of current status data uniform transmission to the clothes Business end.

Preferably, the first decision data module 13 further include:

Decision data unifies acquiring unit, corresponds to for unified obtain with a batch of whole current status data Decision data.

Preferably, the game AI training device further include:

Data association information writing unit, for the current state according to collection sequencing to each game process Data association information is added in data, to realize the corresponding current status data association of each decision data；Wherein, each institute It states decision data and also has the identical data association information of the corresponding current status data.

Technical solution provided in this embodiment, by first state data module 11, when client running game process, The current status data of acquisition is transmitted to server-side；Wherein, the game process is distributed by the server-side；By prolonging Slow action executing module 12 controls the execution of delay voltage；Wherein, the delay voltage is to wait server-side return described current When the corresponding decision data of status data, movement that the game AI role being trained to executes；Pass through the first decision data module 13 The corresponding decision data of the current status data is obtained, and terminates the execution of the delay voltage；Then it is acted by decision The game AI role that execution module 14 is trained to according to the corresponding decision data control of the current status data, which executes, to determine It instigates to make, and generates reward data and succeeding state data；And then pass through the first training sample module 15 again for the current shape State data, the decision data, the reward data and the succeeding state data organization are at training sample, and by the training Sample delivery is to server-side, to realize that the server-side is instructed based on training network of the training sample to the server-side The decision networks for practicing and updating according to the data of training network the server-side, until network convergence.So set, can pass through During waiting server-side to return to decision data, the game AI role that client control is trained to executes delay voltage, so that client The game running at end continues, and does not need pause game to wait the decision data of server-side, so that temporary without built-in game The game for stopping function is also able to achieve game AI training and is normally carried out.

It should be noted that the game AI training device provided by Embodiment 2 of the present invention is for executing above-described embodiment Described in one any one the step of game AI training method, the working principle and beneficial effect of the two are corresponded, because without It repeats again.

It will be understood by those skilled in the art that the schematic diagram of the game AI training device is only game AI training device Example, do not constitute the restriction to game AI training device, may include than illustrating more or fewer components, or combination Certain components or different components, such as the game AI training device can also include input-output equipment, network insertion Equipment, bus etc..

Embodiment three

Game AI training method provided in this embodiment is executed by server-side, in embodiment, with game AI training platform It is divided into for client and server-side and is described.Referring to Fig. 7, Fig. 7 is a kind of game AI that the embodiment of the present invention three provides The flow chart of training method

The embodiment of the present invention three additionally provides a kind of game AI training method, and this method is suitable for server-side, the method Include:

The current status data that S701, reception are sent by client；Wherein, the current status data is by the client Running game process generates；

S702, according to the current status data, generate corresponding decision data and feed back to client；Wherein, described Decision data is generation when the game AI role that client control is trained to executes delay voltage；

Specifically, prolonging after server-side receives current status data in the game AI execution that client control is trained to Late when movement, after carrying out the processing such as a series of analysis, calculating to current status data, it is corresponding certainly to generate current status data Plan data, and return to corresponding client.The generation of decision data is that the data processing of the decision networks based on server-side obtains It arrives.

The training sample that S703, reception are sent by client, wherein the training sample is client according to the decision Data, the current status data and the reward data that generates and subsequent shape after decision movement are executed according to the decision data State data organization forms；

Wherein, the current status data in training sample, decision data, reward data and succeeding state data are opposite The data answered.

S704, training network is trained according to the training sample and is determined according to the update of the data of the trained network Plan network, until network convergence.

Wherein, the training network of server-side and decision networks are deep neural networks.

Referring to Fig. 5, the structural schematic diagram of the deep neural network in Fig. 5 embodiment of the present invention；Optionally, depth nerve The structure of network can be set as including 2 convolutional layers and 2 full articulamentums, be specifically divided into the first convolutional layer 51, the second convolutional layer 52, First full articulamentum 53, the second full articulamentum 54；Convolutional layer is used for feature extraction, and full articulamentum is empty as classifier calculated decision Between decision distribution；Each convolutional layer is all made of batch standardization and relu activation primitive, and each full articulamentum is all made of relu Activation primitive.

It should be noted that training network needs to be implemented the training of preset times, and regularly update the parameter of decision networks.

Preferably, it is described receive the current status data sent by client before, further includes:

Specifically, server-side establishes a connection with several clients, several clients run corresponding game number According to being interactively communicated respectively with server-side.The game data that server-side distributes to several clients can be it is identical, can also be with It gives the distribution of different clients different game datas, is that several clients handle different tasks respectively.It is designed in this way, point Complete machine game data is run to cloth, the generation of data can be accelerated, improves sampling efficiency, shortens the R&D cycle；By several The current status data of client is sent to the generation that server-side carries out decision data, can give full play at the image of server-side The parallel processing function of device GPU is managed, research and development cost is saved.

Optionally, when the current status data received has data association information, decision data of the server-side to generation Data association information is written, specifically, the data association information of write-in decision data and corresponding current status data Data association information is identical.

Referring to Fig. 8, Fig. 8 is the flow chart of one embodiment of step S704 in embodiment illustrated in fig. 7；Preferably, institute State according to the training sample to training network be trained and according to the data of the trained network update decision networks it is specific Include:

S801, it several training samples is randomly selected is input to the trained network, and use stochastic gradient descent Method is trained game AI；

Specifically, server-side can store the training sample obtained from client, when being trained to game AI, from storage Several training samples are randomly selected in training sample, are trained using stochastic gradient descent method.Due to training sample scale It is larger, the whole training sample of input can not needed using stochastic gradient descent method and be trained, to reduce to calculating Training speed is accelerated in the requirement of memory source.

Optionally, when detecting that the quantity of training sample of server-side storage is more than preset value, certain training will be deleted Sample is deleted since the training sample being stored in earliest specifically, may be configured as server-side according to the deposit sequence of training sample It removes.

S802, detection train the frequency of training of the trained network according to several training samples of selection；

S803, when the frequency of training be more than preset times when, according to training network data to the decision networks Data are updated.

Preferably, described according to the current status data, generate corresponding decision data and feed back to client it Afterwards, further includes:

In this preferred embodiment, to the corresponding current status data of the decision data for having fed back to client into Row is deleted, and can also be deleted to the decision data for having fed back to client to avoid the waste of memory headroom, further It removes.

When it is implemented, server-side receives the current status data sent by client；And according to the current state number According to generating corresponding decision data and feed back to client；Wherein, the decision data client control is trained to Game AI role generates when executing delay voltage；Server-side receives the training sample sent by client；And then according to the instruction Practice sample to be trained training network and update decision networks according to the data of the trained network, until network convergence.

Implement the game AI training method that the embodiment of the present invention three provides, feelings that can be lasting in the game running of client Under condition, particularly when the game AI role that client control is trained to executes delay voltage, decision data is generated, to make The game for obtaining no built-in game pause function is also able to achieve being normally carried out for game AI training.

Example IV

Referring to Fig. 9, Fig. 9 is a kind of structural schematic diagram for game AI training device that the embodiment of the present invention four provides；This Invention additionally provides a kind of game AI training device, comprising:

Second status data module 21, for receiving the current status data sent by client；Wherein, the current shape State data are generated by client running game process；

Second decision data module 22, for generating corresponding decision data and feeding back according to the current status data To client；Wherein, the decision data is that the client controls production when the game AI role being trained to executes delay voltage It is raw；

Second training sample module 23, for receiving the training sample of client transmission, wherein the training sample is visitor Family end generates after executing decision movement according to the decision data, the current status data and according to the decision data Reward data and succeeding state data organization form；

Training module 24, for being trained and training network according to the trained network according to the training sample Data update decision networks, until network convergence.

Preferably, the game AI training device further include:

Distribution module, for complete machine game data to be trained to be divided into several pieces game data, and by the several pieces Game data is respectively allocated to corresponding client；It wherein, include at least one trip to be run in every part of game data Play process.

Preferably, the training module 24 further include:

Training unit is input to the trained network for randomly selecting several training samples, and using random Gradient descent method is trained game AI；

Frequency of training detection unit, for detecting several training sample training trained networks according to selection Frequency of training；

Updating unit, for being determined to described according to the data of training network when the frequency of training is more than preset times The data of plan network are updated.

Technical solution provided in this embodiment receives the current shape sent by client by the second status data module 21 State data；By the second decision data module 22 according to the current status data, generates corresponding decision data and feed back to Client；Wherein, the decision data is that the client controls generation when the game AI role being trained to executes delay voltage； The training sample that client is sent is received by the second training sample module 23；By training module 24 according to the training sample Training network is trained and updates decision networks according to the data of the trained network, until network convergence.So set, It can particularly be executed in the game AI role that client control is trained in the case where the game running of client continues When delay voltage, decision data is generated, so that the game without built-in game pause function is also able to achieve game AI training It is normally carried out.

It should be noted that the game AI training device that the embodiment of the present invention four provides is for executing above-described embodiment Described in three any one the step of game AI training method, the working principle and beneficial effect of the two are corresponded, because without It repeats again.

Embodiment five

The embodiment of the present invention five additionally provides a kind of game AI training system, which is characterized in that including server-side and N number of visitor Family end, N >=1；Wherein, each client executing following steps:

It should be noted that the embodiment of the present invention five provide the game AI training system in each client and The correlation step that server-side executes is respectively corresponding the step of implementing game AI training method in one and embodiment three, working principle It corresponds, thus repeats no more with beneficial effect.

Optionally, the step of each client executing in the game AI training system can also include such as embodiment one Described in any one the step of game AI training method, server-side execute the step of can also include as embodiment three it is any one The step of game AI training method described in item, and working principle and beneficial effect correspond, thus repeat no more.

It should be noted that the apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual It needs that some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.In addition, device provided by the invention In embodiment attached drawing, the connection relationship between module indicate between them have communication connection, specifically can be implemented as one or A plurality of communication bus or signal wire.Those of ordinary skill in the art are without creative efforts, it can understand And implement.

The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as Protection scope of the present invention.

Claims

1. a kind of game AI training method, which is characterized in that be suitable for client, which comprises

When running game process, the current status data of acquisition is transmitted to server-side；Wherein, the game process is by institute State server-side distribution；

Control the execution of delay voltage；Wherein, the delay voltage is that the server-side is waited to return to the current status data When corresponding decision data, the movement for the game AI role execution being trained to；

By the current status data, the decision data, the reward data and the succeeding state data organization at training Sample, and the training sample is transmitted to server-side, to realize that the server-side is based on the training sample to the service The training network at end is trained and updates the decision networks of the server-side according to the data of training network, until network is received It holds back.

2. game AI training method as described in claim 1, which is characterized in that the game process is by the server-side point The process matched includes:

Whole game data to be trained is divided into advance by several pieces game data by the server-side, and the several pieces are swum Play data are respectively allocated to corresponding client；It wherein, include at least one game to be run in every part of game data Process.

3. game AI training method as described in claim 1, which is characterized in that the acquisition current status data is corresponding Decision data include:

Detect the execution time of the delay voltage；

When the execution time of the delay voltage being more than threshold value, randomly selected from the decision data that server-side returns corresponding Data are as the corresponding decision data of the current status data.

4. game AI training method as described in claim 1, which is characterized in that the game process includes several；Then, institute It states when running game process, the current status data of acquisition is transmitted to server-side and is specifically included:

After the current status data of each game process has been collected, by the current state number of each game process According to as with a batch of current status data uniform transmission to the server-side.

5. game AI training method as claimed in claim 4, which is characterized in that the acquisition current status data is corresponding Decision data specifically:

6. game AI training method as claimed in claim 4, which is characterized in that according to collection sequencing to each trip Data association information is added in the current status data of play process, to realize the corresponding current state number of each decision data According to association；Wherein, each decision data also has the identical data correlation of the corresponding current status data Information.

7. a kind of game AI training device characterized by comprising

First state data module, for when running game process, the current status data of acquisition to be transmitted to server-side；Its In, the game process is distributed by the server-side；

Delay voltage execution module, for controlling the execution of delay voltage；Wherein, the delay voltage is that server-side is waited to return When the corresponding decision data of the current status data, movement that the game AI role being trained to executes；

First decision data module, for obtaining the corresponding decision data of the current status data, and it is dynamic to terminate the delay The execution of work；

Decision action executing module, the trip for being trained to according to the corresponding decision data control of the current status data The AI role that plays executes decision movement, and generates reward data and succeeding state data；

First training sample module, for by the current status data, the decision data, the reward data and it is described after Continuous status data is organized into training sample, and the training sample is transmitted to server-side, to realize that the server-side is based on institute Training sample is stated to be trained the training network of the server-side and update the server-side according to the data of training network Decision networks, until network convergence.

8. a kind of game AI training method, which is characterized in that be suitable for server-side, which comprises

Receive the current status data sent by client；Wherein, the current status data is by the client running game Process generates；

According to the current status data, generates corresponding decision data and feed back to client；Wherein, the decision data is Generation when the game AI role that the client control is trained to executes delay voltage；

Receive the training sample that is sent by client, wherein the training sample is client according to the decision data, described Current status data and the reward data and succeeding state data group that generate after decision movement are executed according to the decision data It knits；

Training network is trained according to the training sample and updates decision networks according to the data of the trained network, directly To network convergence.

9. game AI training method as claimed in claim 8, which is characterized in that sent in the reception by client current Before status data, further includes:

Complete machine game data to be trained is divided into several pieces game data, and the several pieces game data is respectively allocated to Corresponding client；It wherein, include at least one game process to be run in every part of game data.

10. game AI training method as claimed in claim 8, which is characterized in that it is described according to the training sample to training Network is trained and updates decision networks according to the data of the trained network and specifically includes:

It randomly selects several training samples and is input to the trained network, and using stochastic gradient descent method to game AI It is trained；

When the frequency of training is more than preset times, carried out more according to data of the data of training network to the decision networks Newly.

11. game AI training method as claimed in claim 8, which is characterized in that it is described according to the current status data, it is raw At corresponding decision data and feed back to after client, further includes:

12. a kind of game AI training device characterized by comprising

Second status data module, for receiving the current status data sent by client；Wherein, the current status data It is generated by client running game process；

Second decision data module, for generating corresponding decision data and feeding back to client according to the current status data End；Wherein, the decision data is that the client controls generation when the game AI role being trained to executes delay voltage；

Second training sample module, for receiving the training sample of client transmission, wherein the training sample is client root Execute according to the decision data, the current status data and according to the decision data return number generated after decision movement It is formed according to succeeding state data organization；

Training module, for according to the training sample to training network be trained and according to the data of the trained network more New decision networks, until network convergence.

13. a kind of game AI training system, which is characterized in that including server-side and N number of client, N >=1；Wherein, each described Client executing following steps:

When running game process, the current status data of acquisition is transmitted to the server-side；Wherein, the game process is It is distributed by the server-side；

Control the execution of delay voltage；Wherein, the delay voltage is to wait server-side to return to the current status data to correspond to Decision data when, movement that the game AI role that is trained to executes；

By the current status data, the decision data, the reward data and the succeeding state data organization at training Sample, and the training sample is transmitted to server-side, to realize that the server-side is based on the training sample to the service The training network at end is trained and updates the decision networks of the server-side according to the data of training network, until network is received It holds back；

Complete machine game data to be trained is being divided into several pieces game data by the server-side, and by the several pieces game number After being respectively allocated to corresponding client, following steps are executed:

Receive the current status data sent by client；Wherein, the current status data runs the game by client Process generates；

Receive the training sample that client is sent, wherein the training sample be client according to the decision data, described work as Preceding status data and the reward data and succeeding state data organization that generate after decision movement are executed according to the decision data It forms；