CN111569430B

CN111569430B - Game decision model training method and device, electronic equipment and storage medium

Info

Publication number: CN111569430B
Application number: CN202010507435.0A
Authority: CN
Inventors: 蔡康
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2023-04-07
Anticipated expiration: 2040-06-05
Also published as: CN111569430A

Abstract

The embodiment of the invention provides a training method and a training device for a game decision model, electronic equipment and a storage medium, wherein a sample sequence corresponding to video data of game-to-game is obtained, and the sample sequence comprises samples and situation data corresponding to the samples; traversing samples in the sample sequence, and determining sample labels corresponding to the samples according to the local state data corresponding to the samples; when the sample label corresponding to the sample cannot be determined according to the office state data corresponding to the sample, taking the sample as a target sample; and obtaining a sample label corresponding to a sample after the target sample in the sample sequence as a target sample label corresponding to the target sample. The embodiment of the invention can improve the accuracy of decision semantic information extraction, and can improve the intelligent level of the game AI by training the game decision model of the AI by using the decision semantic information, thereby improving the game experience of players and AI.

Description

Game decision model training method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a training method of a game decision model, a training device of the game decision model, electronic equipment and a storage medium.

Background

Nowadays, the game production can not be independent of the application of AI (Artificial Intelligence), so the AI design in the game is an important component of the game production.

For example, in a Moba (Multiplayer Online Battle Arena) game, there is usually an AI that handles virtual objects in the game as a player, and is confronted or cooperated with the player. The AI of such simulated players must have a certain countermeasure, and need to make long-term, predictive global decisions, and therefore is difficult to write directly by rules.

Currently, a game AI development is to adopt a supervised learning technology in machine learning, and train an AI through video data of a player in a game, so that the AI learns a decision-making mode of the player. Video data of a player in a game belongs to an original data set, the original data set is converted into a training data set which can be used by a game decision model in machine learning, wherein the training data set needs to include decision semantic information of the player, namely, a countermeasure strategy for the player to control a game role in the game, however, the decision semantic information of the player cannot be directly obtained from the video data of the game, but the accuracy of the decision semantic information directly influences the accuracy of AI model training, further determines the intelligence level of AI, and influences the game experience of the player.

Disclosure of Invention

In view of the above, embodiments of the present invention are proposed to provide a method for training a game decision model and a corresponding apparatus, electronic device, storage medium for training a game decision model, which overcome or at least partially solve the above problems.

In order to solve the above problems, an embodiment of the present invention discloses a method for training a game decision model, where the method includes:

acquiring a sample sequence corresponding to video data of game-to-game, wherein the sample sequence comprises samples and situation data corresponding to the samples; the sample is a sampling point obtained by sampling a virtual object in the video data at a preset time interval, and the situation data is counterpart state data corresponding to the sampling point;

traversing samples in the sample sequence, and determining sample labels corresponding to the samples according to the local state data corresponding to the samples; the sample label is used for representing decision semantic information aiming at a target virtual object in the video data;

when the sample label corresponding to the sample cannot be determined according to the office matching state data corresponding to the sample, taking the sample as a target sample;

obtaining a sample label corresponding to a sample after the target sample in the sample sequence, and taking the sample label as a target sample label corresponding to the target sample; the sample and the sample label corresponding to the sample, and the target sample and the sample label corresponding to the target sample are used for training a game decision model to be trained.

Optionally, the traversing samples in the sample sequence, and determining sample labels corresponding to the samples according to the local state data corresponding to the samples includes:

acquiring a judgment condition and a judgment label corresponding to the judgment condition;

traversing the samples in the sample sequence according to the sequence from back to front, and determining whether the local state data corresponding to the samples meet the judgment condition;

and when the match state data corresponding to the sample meets the judgment condition, setting the sample label corresponding to the sample as the judgment label corresponding to the judgment condition.

Optionally, the determination condition corresponds to a priority, and the determining whether the local state data corresponding to the sample satisfies the determination condition includes:

determining whether the office state data corresponding to the samples meet the judgment condition or not according to the sequence of the priorities from high to low;

when the match state data corresponding to the sample meets the judgment condition, setting the sample label corresponding to the sample as the judgment label corresponding to the judgment condition, including:

and when the match state data corresponding to the sample meets the judgment condition, setting the sample label corresponding to the sample as the judgment label corresponding to the judgment condition with the highest priority.

Optionally, the traversing samples in the sample sequence, and determining a sample label corresponding to the sample according to the local state data corresponding to the sample, further includes:

and when the office matching state data corresponding to the sample does not meet the judgment condition, determining that the sample label corresponding to the sample cannot be determined according to the office matching state data corresponding to the sample.

Optionally, before the sample sequence is provided with a special sample, the sample tag corresponding to the special sample is a designated tag, and the sample tag corresponding to the sample after the target sample in the sample sequence is obtained as the target sample tag corresponding to the target sample, the method further includes:

adding the special sample to the sequence of samples as a penultimate sample in the sequence of samples when the target sample is the penultimate sample in the sequence of samples.

The embodiment of the invention discloses a training method of a game decision model, which comprises the following steps:

acquiring a sample sequence corresponding to video data of game-to-game, wherein the sample sequence comprises samples and situation data corresponding to the samples; the sample is obtained by sampling a virtual object in the video data at preset time intervals, and the situation data is game state data corresponding to the sampling point;

taking the sample without the determined sample label as a target sample, and traversing the samples in the sample sequence again;

traversing samples in the sample sequence, and determining whether the local state data corresponding to the samples meet the judgment condition;

and when the match state data meets the judgment condition, setting the sample label of the sample as a judgment label corresponding to the judgment condition.

determining whether the game state data corresponding to the samples meet the judgment condition or not according to the sequence of the priority from high to low;

and when the match state data corresponding to the sample meets the judgment condition, setting a sample label corresponding to the sample as a judgment label corresponding to the judgment condition with the highest priority.

Optionally, the step of taking the sample with no determined sample label as the target sample and traversing the samples in the sample sequence again includes:

and taking the sample without the determined sample label as a target sample, and traversing the samples in the sample sequence again according to the backward-forward order.

Optionally, the obtaining a sample label corresponding to a sample subsequent to the target sample in the sample sequence as a target sample label corresponding to the target sample includes:

setting a sample label of the target sample to a designated label when the target sample is a penultimate sample in the sequence of samples.

The embodiment of the invention discloses a training device of a game decision model, which comprises:

the game system comprises a first sample sequence acquisition module, a second sample sequence acquisition module and a game playing module, wherein the first sample sequence acquisition module is used for acquiring a sample sequence corresponding to video data of game playing, and the sample sequence comprises samples and situation data corresponding to the samples; the sample is a sampling point obtained by sampling a virtual object in the video data at a preset time interval, and the situation data is counterpart state data corresponding to the sampling point;

the first traversal module is used for traversing the samples in the sample sequence and determining sample labels corresponding to the samples according to the match state data corresponding to the samples; the sample label is used for representing decision semantic information aiming at a target virtual object in the video data;

the target sample determining module is used for taking the sample as a target sample when the sample label corresponding to the sample cannot be determined according to the local alignment state data corresponding to the sample;

a first target sample label determining module, configured to obtain a sample label corresponding to a sample subsequent to the target sample in the sample sequence, as a target sample label corresponding to the target sample; the sample and the sample label corresponding to the sample, and the target sample and the sample label corresponding to the target sample are used for training a game decision model to be trained.

the second sample sequence acquisition module is used for acquiring a sample sequence corresponding to video data of game match, wherein the sample sequence comprises samples and situation data corresponding to the samples; the sample is a sampling point obtained by sampling a virtual object in the video data at a preset time interval, and the situation data is counterpart state data corresponding to the sampling point;

the second traversal module is used for traversing the samples in the sample sequence and determining sample labels corresponding to the samples according to the local alignment state data corresponding to the samples; the sample label is used for representing decision semantic information aiming at a target virtual object in the video data;

the third traversal module is used for taking the sample of which the sample label is not determined as the target sample and traversing the samples in the sample sequence again;

a second target label determining module, configured to obtain a sample label corresponding to a sample subsequent to the target sample in the sample sequence, as a target sample label corresponding to the target sample; the sample and the sample label corresponding to the sample, and the target sample and the sample label corresponding to the target sample are used for training a game decision model to be trained.

The embodiment of the invention discloses electronic equipment, which comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein when the computer program is executed by the processor, the steps of the training method of the game decision model are realized.

The embodiment of the invention discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the training method of the game decision model are realized.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, a sample sequence corresponding to video data of game play is obtained, samples in the sample sequence are traversed, a sample label corresponding to the sample is determined according to play state data corresponding to the sample, the sample label is used for representing decision semantic information for a virtual object, when the sample label corresponding to the sample cannot be determined according to the play state data corresponding to the sample, the sample is used as a target sample, a sample label corresponding to a next sample behind the target sample in the sample sequence is obtained and used as a target sample label corresponding to the target sample, wherein the sample with the label set and the sample label corresponding to the sample, and the target sample label corresponding to the target sample and the target sample can be used as a training data set to train a game decision model to be trained. In the embodiment of the invention, the decision semantic information of the sample can be determined according to the game state data, and the decision semantic information of the target sample which cannot determine the decision semantic information according to the game state data can be determined according to the decision semantic information of the next sample after the target sample in the sample sequence.

Drawings

FIG. 1 is a flow chart of the steps of an embodiment of a method of training a game decision model of the present invention;

FIG. 2 is a flow chart of decision semantic information extraction provided by an embodiment of a training method of a game decision model according to the present invention;

FIG. 3 is a flow chart of steps in another embodiment of a method of training a game decision model of the present invention;

FIG. 4 is a sample sequence diagram provided by another embodiment of a method for training a game decision model of the present invention;

FIG. 5 is a block diagram of an embodiment of a training apparatus for a game decision model according to the present invention;

FIG. 6 is a block diagram of another embodiment of a training apparatus for a game decision model according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for training a game decision model according to the present invention is shown, and the embodiment of the present invention may specifically include the following steps:

step 101, obtaining a sample sequence corresponding to video data of game-to-game, wherein the sample sequence comprises samples and situation data corresponding to the samples; the samples are sampling points obtained by sampling virtual objects in the video data at preset time intervals, and the situation data are corresponding situation state data corresponding to the sampling points.

The video data refers to video contents of a game. Specifically, the video data may be a video during live game play, or may be a video during game play of a player. Specifically, the game play in the video data may be a Moba game play.

For convenience of understanding, the description below will be made by taking a Moba game as an example when the embodiment of the present invention is described. The Moba game generally needs to buy equipment in battle, a player is generally divided into two enemy camps, the two camps compete with each other in a scattered game map, usually, besides game characters selected by two parties, game units operated by non-players such as soldiers, defense towers, small wild monsters and special wild monsters are arranged on the map, and each player can control the selected game character to hit a killer Fang Huozhong cubic game unit on the map, so that resources are obtained, an enemy base is destroyed, and a final win is obtained. Of course, embodiments of the present invention are equally applicable to other games or applications where decisions and competition exist, in addition to the Moba game.

After the video data is acquired, the video data is sampled at preset time intervals to obtain samples (sampling points), and for example, a virtual object of the video data may be sampled at time intervals of 1s (second), so as to obtain a sample sequence corresponding to the video data.

The sample sequence comprises a sample and situation data corresponding to the sample, specifically, the sample can be a sampling point obtained by sampling a virtual object in the video data at a preset time interval, and the situation data is corresponding situation state data corresponding to the sampling point. For example, the virtual object is a virtual object manipulated by a player in a game play, such as a game character manipulated by a player in a Moba game, the play-oriented state data is state data of the virtual object in the game play, such as state data caused by the position of the game character manipulated by the player in the Moba game or released skills on two game characters or two defense towers in the game play, and illustratively, the manipulation of the game character by the player to attack an opponent game character causes a decrease in blood volume of the opponent, or the player controls the game character to move toward the base.

102, traversing samples in the sample sequence, and determining sample labels corresponding to the samples according to the office state data corresponding to the samples; the sample label is used for representing decision semantic information aiming at a target virtual object in the video data.

The sample tag is that a player moves or attacks by operating a virtual object, so as to reflect decision semantic information made by the player in the video data aiming at countermeasure policy data of a target virtual object (such as a strange, a my game role or an enemy game role) in the video data, for example, the decision semantic information may be that the player operates the game role to perform middle line alignment, make a big dragon, make a road and my field area red buff strange, and the like.

In one embodiment, for a sample sequence of video data, samples in the sample sequence are traversed to determine a sample label of a sample according to the office state data corresponding to each sample in the sample sequence.

And 103, when the sample label corresponding to the sample cannot be determined according to the office state data corresponding to the sample, taking the sample as a target sample.

Step 104, obtaining a sample label corresponding to a sample after the target sample in the sample sequence as a target sample label corresponding to the target sample; the sample and the sample label corresponding to the sample, and the target sample and the sample label corresponding to the target sample are used for training a game decision model to be trained.

The sample label corresponding to the sample may be determined according to the office state data corresponding to the sample, but the sample label corresponding to the sample may not be determined according to the office state data corresponding to the sample, and at this time, the sample may be used as a target sample to be processed next to determine the sample label. Specifically, a sample label corresponding to a sample subsequent to the target sample in the sample sequence is obtained, and the sample label corresponding to the subsequent sample is used as a target sample label corresponding to the target sample.

For example, assuming that a sample label corresponding to the sample 3 is determined in the process of traversing the sample sequence, when the sample 2 is traversed, because the sample 2 cannot determine the corresponding sample label according to the corresponding local state data, the sample 2 may be used as a target sample, and then a sample label corresponding to a sample subsequent to the sample 2 in the sample sequence, that is, a sample label corresponding to the sample 3, is obtained as a sample label of the sample 2.

In a specific implementation, a training data set for training a game decision model is composed of samples and sample labels, for a game application scenario, the samples are obtained by sampling virtual objects operated and controlled by a player in video data at a preset time interval, for example, game roles operated and controlled by the player at a certain time, the sample labels are decision semantic information made by the player in the video data, for example, middle line alignment, big dragon, red buff wild monster in the road-our field area, and the like in a Moba game, the decision semantic information cannot be directly obtained from the video data.

The training method of the game decision model can determine the decision semantic information of the sample according to the game state data, and can determine the decision semantic information of the target sample which cannot determine the decision semantic information according to the game state data according to the decision semantic information of the next sample after the target sample in the sample sequence.

It should be noted that, in practical application, all games can store video data of players in game-play, and the games have a certain strategic nature, so that the method provided by the embodiment of the present invention can be applied to extract decision semantic information of players from the video data of all games, so as to perform applications such as model training according to the decision semantic information.

In an optional embodiment of the present invention, the step 102 of traversing the samples in the sample sequence, and determining the sample labels corresponding to the samples according to the local state data corresponding to the samples may specifically include the following steps:

traversing the samples in the sample sequence according to the sequence from back to front, and determining whether the office state data corresponding to the samples meet the judgment condition;

In the embodiment of the invention, at least one judgment condition and a judgment label corresponding to the judgment condition are preset, the judgment condition and the judgment label corresponding to the judgment condition can be obtained when a sample sequence is traversed, then the samples in the sample sequence are traversed according to the sequence from back to front, whether the alignment state data corresponding to each sample meets the judgment condition is determined, and if the alignment state data corresponding to the samples meet one or more judgment conditions, the judgment labels corresponding to the judgment conditions can be set as the sample labels of the samples.

In an optional embodiment of the present invention, the determining that the answer state data corresponding to the sample satisfies the determination condition may specifically include the following steps:

and determining whether the office state data corresponding to the samples meet the judgment condition or not according to the sequence from high priority to low priority.

The determination conditions have preset priorities, and when the two or more determination conditions are provided, whether the match state data corresponding to each sample meets the determination conditions can be determined according to the sequence of the priorities from high to low.

In an optional embodiment of the present invention, when the match state data corresponding to the sample satisfies the determination condition, the setting the sample label corresponding to the sample as the determination label corresponding to the determination condition may specifically include the following steps:

Specifically, for a sample sequence of video data, each sample of the sample sequence may be individually determined by using the determination conditions in the order of priority from high to low, and if the match state data corresponding to the sample satisfies a plurality of determination conditions, the determination condition with the highest priority is used as the criterion, that is, the determination label corresponding to the determination condition with the highest priority is used as the sample label of the sample. Optionally, if a certain sample is determined by using the determination conditions in the order of priority from high to low, and when it is determined that the match state data corresponding to the sample satisfies the determination conditions, the determination may be performed without continuously using the determination conditions, and the determination label corresponding to the determination conditions is directly used as the sample label of the sample, thereby reducing unnecessary data processing processes and improving processing efficiency.

In a specific example, the decision tag is used to represent decision semantic information of a player, such as a middle line or a wild monster, each decision tag corresponds to a decision condition with a specified priority, where the decision condition, the priority corresponding to the decision condition, and the decision tag corresponding to the decision condition may all be set by a developer according to an actual situation of a game, which is not limited in the embodiment of the present invention.

Taking the Moba game as an example, the determination labels may be set to be classified into the following 7 types:

l1, target is BOSS;

l2, the target is enemy hero;

l3, the target is wild;

l4, the target is an enemy defense tower or base;

l5, the target is busy;

l6, aiming at defending our defense tower and base;

l7, the aim is to return to spring water supply.

The 7 judgment conditions corresponding to the 7 judgment labels L1 to L7 are C1 to C7, specifically:

c1, BOSS (such as dragon) is injured from player manipulation of a game character;

c2, the enemy player is hurt by the player to control the game role;

c3, strange injury from the player to control the game role;

c4, the defense tower is damaged by the player to control the game role;

c5, the player controls the game role to be on line;

c6, the player controls the game role to be positioned at the periphery of the defense tower and the base of the player;

c7, the player controls the game character to be in the fountain of the player.

Since the skills released by the player to control the game character may include range injuries (for example, injuring an enemy player game character and strange at the same time, or being in a position satisfying a plurality of determination conditions at the same time (for example, the player controls the game character at the boundary between two bases), there may be a case where the game character corresponding to the game state data satisfies a plurality of determination conditions at the same time, for example, the determination conditions C1 and C2 are satisfied at the same time, and then the determination tags L1 and L2 corresponding to the determination conditions C1 and C2, respectively, may be used as sample tags.

It should be noted that the above detailed descriptions of the determination labels L1 to L7 and the determination conditions C1 to C7 are only examples, and in practical applications, other settings may be made by a developer according to the actual situation of the game, such as adding or modifying after the game version is updated.

In an optional embodiment of the present invention, the traversing samples in the sample sequence, and determining the sample labels corresponding to the samples according to the match state data corresponding to the samples may further include the following steps:

When traversing the sample sequence to determine each sample in the sample sequence by using the determination condition, if there is a possibility that the alignment state data corresponding to a part of the samples does not satisfy the determination condition, it may be determined that the alignment state data corresponding to the sample cannot determine the sample label corresponding to the sample. The sample whose office state data does not satisfy the determination condition may be taken as a target sample, and a sample label of a sample subsequent to the target sample in the sample sequence may be taken as a target sample label of the target sample.

In an optional embodiment of the present invention, before the sample sequence is provided with a special sample, the sample tag corresponding to the special sample is a designated tag, and the sample tag corresponding to a sample after the target sample in the sample sequence is obtained as the target sample tag corresponding to the target sample, the method may further include the following steps:

If the target sample is the last sample in the sample sequence, the target sample does not exist in the sample sequence, and a special sample can be obtained at this time, and the sample label of the special sample is the designated label. Wherein the designation of the label as a default value (default value) means that no policy is made. After the special sample is obtained, the special sample can be added to the sample sequence to be used as the last sample in the sample sequence, at this time, the special sample is the next sample of the target sample, and then, a sample label corresponding to the next sample of the target sample can be used as a target sample label corresponding to the target sample, so that the sample which cannot determine the sample label can be determined according to the determined sample label of the sample, and thus, all samples in the sample sequence can complete setting of the label.

Because the sample which cannot determine the sample label in the embodiment of the present invention can be determined according to the sample label of the determined sample, the determination condition does not need to set a plurality of determination conditions so that one sample label can be determined for the local state data of each sample, that is, the determination condition in the embodiment of the present invention may set only some determination conditions with high confidence, which can reduce the workload of developers and improve the accuracy of the sample label.

Referring to fig. 2, which is a flow chart of extracting decision semantic information provided by an embodiment of a training method of a game decision model of the present invention, in order to more efficiently perform label setting processing on samples in a sample sequence, it may be determined that there are M types of labels in common according to a sample sequence of video data traversed once from back to front, assuming that the number of samples is N, and then there are the following steps:

1) Traversing from the last sample of the sample sequence, recording the value of the serial number i of the current sample as N +1, and performing the step 2;

2) Judging whether i is equal to 1, if i is equal to 1, ending the process; if i is not equal to 1 (actually, the condition that i is less than 1 does not exist), performing step 3;

3) Judging whether i is equal to N +1, if i is equal to N +1, setting the value of X as a default value, and indicating that no strategy is made; if i is not equal to N +1 (actually, the condition that i is greater than N +1 does not exist), setting the value of X as a sample label of the sample i, and performing step 4; wherein, X is a temporary storage variable value, does not represent a certain sample, and is specially used for recording the next sample of the current sample in the sample sequence from the back to the front;

4) Setting i = i-1, and performing step 5;

5) For the ith sample, recording the judgment condition of Cj at this time, and the value of j is 1, and performing step 6;

6) Judging whether the ith sample meets the judgment condition Cj, if the ith sample meets the judgment condition Cj, setting the sample label of the sample as a judgment label Lj corresponding to the judgment condition Cj, and skipping to the step 2; if the judgment condition Cj is not satisfied, performing step 7;

7) Judging whether j is equal to M, if j is equal to M, setting the sample label of the sample i as the value of X, and returning to the step 2; if j is not equal to M (there is actually no case where j is greater than M), set j = j +1, and proceed to step 6.

In the actual game process, the decision of the player usually has time continuity, and for some samples of which the decision semantic information cannot be determined, the decision semantic information can be determined by means of the decision semantic information of the determinable samples subsequent to the samples, because the samples of which the decision semantic information cannot be determined may be a preparation process for performing subsequent decisions, and formally, the embodiment of the invention needs to traverse the sample sequence from back to front, and only needs to traverse the sample sequence once, so that the efficient determination of the label (the decision semantic information) can be realized.

Referring to fig. 3, a flowchart illustrating steps of another embodiment of a method for training a game decision model according to the present invention is shown, and the embodiment of the present invention may specifically include the following steps:

step 301, obtaining a sample sequence corresponding to video data of a game-to-game, wherein the sample sequence comprises samples and situation data corresponding to the samples; the samples are sampling points obtained by sampling virtual objects in the video data at preset time intervals, and the situation data are corresponding situation state data corresponding to the sampling points.

Step 302, traversing the samples in the sample sequence, and determining sample labels corresponding to the samples according to the local state data corresponding to the samples; the sample label is used for representing decision semantic information aiming at a target virtual object in the video data.

And traversing the samples in the sample sequence aiming at the sample sequence of a certain video data so as to determine the sample label of the sample according to the office alignment state data corresponding to each sample in the sample sequence. If the corresponding sample label cannot be determined according to the local state data corresponding to the sample, the sample label may not be processed temporarily, that is, the sample label may be temporarily left, and then the sample label is processed when the sample sequence is traversed again.

It should be noted that, the sample sequence may be traversed in the sequence from front to back or from back to front, or even the sample sequence does not need to be traversed in the sequence, and it is only required to ensure that the sample label is determined once for each sample in the sample sequence.

Step 303, using the sample whose sample label is not determined as the target sample, and traversing the samples in the sample sequence again.

After the first traversal of the sample sequence is completed, if a sample with no sample label is determined to exist in the sample sequence, the sample without the sample label may be used as a target sample, and the sample sequence is traversed again, that is, the second traversal is performed.

Step 304, obtaining a sample label corresponding to a sample after the target sample in the sample sequence, as a target sample label corresponding to the target sample; the sample and the sample label corresponding to the sample, and the target sample and the sample label corresponding to the target sample are used for training a game decision model to be trained.

For the target sample, a sample label corresponding to a sample subsequent to the target sample in the sample sequence may be obtained as a target sample label corresponding to the target sample. It should be noted that, when traversing the sample sequence again, the target sample only needs to be determined with the target sample label, and the sample with the sample label already determined does not need to be determined again.

The training method of the game decision model can determine the sample label corresponding to the sample according to the game-pair state data corresponding to the sample when the sample sequence is traversed for the first time, wherein the sample which can not be used for determining the sample label can be used as a target sample, and then the sample sequence is traversed for the second time to obtain the sample label corresponding to the next sample after the target sample in the sample sequence as the target sample label corresponding to the target sample.

In an optional embodiment of the present invention, the traversing samples in the sample sequence, and determining a sample label corresponding to the sample according to the match status data corresponding to the sample includes:

In the embodiment of the invention, at least one judgment condition and a judgment label corresponding to the judgment condition are preset, the judgment condition and the judgment label corresponding to the judgment condition can be obtained when the sample sequence is traversed, then the samples in the sample sequence are traversed according to the sequence from back to front, whether the game state data corresponding to each sample meets the judgment condition is determined, and if the game state data meets one or more judgment conditions, the judgment label corresponding to the judgment condition can be set as the sample label of the sample.

the determination conditions have preset priorities, and when the determination conditions are two or more, whether the game state data corresponding to each sample meets the determination conditions can be determined according to the sequence from high priority to low priority.

Specifically, for a sample sequence of video data, each sample of the sample sequence may be individually determined by using the determination conditions in the order of priority from high to low, and if the match state data corresponding to the sample satisfies a plurality of determination conditions, the determination condition with the highest priority is used as the criterion, that is, the determination label corresponding to the determination condition with the highest priority is used as the sample label of the sample. Optionally, if a certain sample is determined by using the determination conditions in the order from high to low in priority, and it is determined that the match state data corresponding to the sample satisfies the determination conditions, the determination may be performed without continuously using the determination conditions in the following, and the determination label corresponding to the determination conditions is directly used as the sample label of the sample, thereby reducing unnecessary data processing processes and improving processing efficiency.

In an optional embodiment of the present invention, in step 303, taking the sample whose sample label is not determined as the target sample, and traversing the samples in the sample sequence again may specifically include the following steps:

Traversing the sample sequence from back to front, starting from the last sample, if the sample is not processed in the first traversal, that is, the setting of the sample label is not completed, the sample is the target sample, and the sample label of the sample can be set to be the sample label of the sample next to the target sample in the sample sequence; if the sample has completed the sample tag setting on the first pass, then the process may be skipped and each sample processed as described above until all samples in the sequence of samples have been traversed.

In an optional embodiment of the present invention, in step 304, the obtaining a sample label corresponding to a sample subsequent to the target sample in the sample sequence as the target sample label corresponding to the target sample may specifically include the following steps:

On the second pass, the sample is first traversed starting with the last sample of the sequence of samples, and in particular if the sample is unprocessed, then the sample is the target sample, and its sample label is set to the designated label, where the designated label is a default value, indicating that no decision is made.

The embodiment of the invention completes the label setting of each sample in the sample sequence by traversing the sample sequence twice, wherein the first traversal is to determine the sample label according to the match state data of the sample, and the second traversal is to determine the sample label according to the sample (target sample) of which the sample label can not be determined according to the match state data based on the characteristic of time continuity when a player makes a decision and the sample label of the sample label is determined in the first traversal.

The embodiment of the invention can determine the sample labels of partial samples according to the judgment condition during the first traversal, and determine the sample labels of the rest samples (target samples) according to the partial samples with the determined sample labels during the second traversal, thereby extracting the sample label of each sampling point (sample) in the video data, namely the decision semantic information of the player. That is, it is to be noted that not every sample in the embodiment of the present invention determines a sample label according to a determination condition, so the determination condition set in the embodiment of the present invention may be a determination condition with high confidence, and since a decision usually has time continuity, a sample of a determined sample label may be referred to determine target samples of remaining uncertain sample labels.

The processing process for realizing the decision semantic information of the sample sequence in the embodiment of the invention can comprise the following three steps, wherein the step 1 is a first traversal process, and the steps 2 and 3 are second traversal processes, specifically:

1) For a sample sequence of a certain video data, judging each sample of the sample sequence according to a judgment condition independently, and if the samples simultaneously satisfy a plurality of judgment conditions, judging the judgment condition with high priority, wherein the priority of the judgment condition is reasonably set by a developer according to the actual situation; wherein, the samples which do not meet any judgment condition are not processed, and are reserved for the subsequent step processing;

2) Starting first with the last sample of the sample sequence, in particular if the sample is not processed, its sample label is set to a default value;

3) Traversing the sample sequence from back to front, starting with the second last sample, and if the sample is not processed in step 1, setting the sample label of the sample as the sample label of the sample which is next to the sample sequence; if so, skip processing to continue with the next sample. And (4) circulating to carry out the step (3) until all the samples are traversed.

Referring to fig. 4, a sample sequence diagram according to the present invention is shown, where a large square indicates a sample sequence, a small square indicates a sample of the sample sequence, a small square indicates that no sample label is set for the sample, a different color indicates that the sample is set for a different sample label, and a same color indicates that the sample is set for a same sample label, where a darker color in fig. 4 may indicate that the corresponding determination condition has a higher priority, such as a black color indicating a great dragon determination condition (condition 1), a dark gray color indicating a determination condition for a killer game character (condition 2), and a light gray color indicating a wild determination condition (condition 3).

Firstly, a sample sequence shown at the left side of fig. 4 is obtained, at this time, no sample label is set for any sample of the sample sequence, so that all small squares are white, the sample sequence is traversed for the first time, for each sample in the sample sequence, whether condition 1 is satisfied, namely, the determination condition with the highest priority is firstly judged, if so, the sample is set to be black, if not, whether condition 2 is satisfied, namely, the determination condition with the second highest priority is continuously judged, and the process is repeated until all determination conditions (condition 1, condition 2 and condition 3) are judged for the sample, and if all determination conditions are not satisfied, the sample is regarded as a target sample, and is not processed for the moment, namely, the sample still keeps white. After the first traversal of the sample sequence is completed, the sample sequence as shown in fig. 4 may be obtained.

Subsequently, if there is a target sample in the sample sequence for which the sample label cannot be determined, i.e. a small square remaining white in fig. 4, the sample sequence will be traversed for the second time, and the sample label of the next sample can be obtained as the sample label of the target sample for the target sample, for example, the next sample is black, and then the target sample is also set to black. When the second traversal of the sample sequence is completed, the sample sequence shown in the right of fig. 4 can be obtained, and the entire sample sequence completes the determination of the sample label.

In the embodiment of the invention, after the second traversal is completed, all samples in the whole sample sequence are determined to be sample labels, that is, the decision semantic information of all samples is determined, and the decision semantic information is used for training a game decision model, so that the accuracy of the model is improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 5, a block diagram of a training apparatus of a game decision model according to an embodiment of the present invention is shown, where the embodiment of the present invention may specifically include the following modules:

a first sample sequence obtaining module 501, configured to obtain a sample sequence corresponding to video data of a game match, where the sample sequence includes a sample and situation data corresponding to the sample; the samples are sampling points obtained by sampling virtual objects in the video data at preset time intervals, and the situation data are corresponding situation state data corresponding to the sampling points.

A first traversal module 502, configured to traverse the samples in the sample sequence, and determine a sample label corresponding to the sample according to the match state data corresponding to the sample; the sample label is used for representing decision semantic information aiming at a target virtual object in the video data.

A target sample determining module 503, configured to use the sample as a target sample when the sample label corresponding to the sample cannot be determined according to the match state data corresponding to the sample.

A first target sample label determining module 504, configured to obtain a sample label corresponding to a sample subsequent to the target sample in the sample sequence, as a target sample label corresponding to the target sample; the sample and the sample label corresponding to the sample, and the target sample and the sample label corresponding to the target sample are used for training a game decision model to be trained.

In an optional embodiment of the present invention, the first traversal module 502 is configured to obtain a decision condition and a decision tag corresponding to the decision condition; traversing the samples in the sample sequence according to the sequence from back to front, and determining whether the local state data corresponding to the samples meet the judgment condition; and when the match state data corresponding to the sample meets the judgment condition, setting the sample label corresponding to the sample as the judgment label corresponding to the judgment condition.

In an optional embodiment of the present invention, the determination condition has a corresponding priority, and the first traversal module 502 is configured to determine whether the match state data corresponding to the sample satisfies the determination condition according to an order from high to low of the priority; when the match state data corresponding to the sample meets the judgment condition, setting the sample label corresponding to the sample as the judgment label corresponding to the judgment condition, including: and when the match state data corresponding to the sample meets the judgment condition, setting the sample label corresponding to the sample as the judgment label corresponding to the judgment condition with the highest priority.

In an optional embodiment of the present invention, the first traversal module 502 is configured to determine that the sample label corresponding to the sample cannot be determined according to the match state data corresponding to the sample when the match state data corresponding to the sample does not satisfy the determination condition.

In an optional embodiment of the present invention, the sample sequence is provided with a special sample, and a sample label corresponding to the special sample is a designated label, where the apparatus further includes:

a special sample adding module, configured to add the special sample to the sample sequence as a penultimate sample in the sample sequence when the target sample is the penultimate sample in the sample sequence.

Referring to fig. 6, a block diagram of a training apparatus of another game decision model according to another embodiment of the present invention is shown, and the embodiment of the present invention may specifically include the following modules:

a second sample sequence obtaining module 601, configured to obtain a sample sequence corresponding to video data of a game match, where the sample sequence includes a sample and situation data corresponding to the sample; the sample is a sampling point obtained by sampling a virtual object in the video data at a preset time interval, and the situation data is counterpart state data corresponding to the sampling point;

a second traversal module 602, configured to traverse the samples in the sample sequence, and determine a sample label corresponding to the sample according to the local state data corresponding to the sample; the sample label is used for representing decision semantic information aiming at a target virtual object in the video data;

a third traversing module 603, configured to use a sample for which a sample label is not determined as a target sample, and traverse the samples in the sample sequence again;

a second target label determining module 604, configured to obtain a sample label corresponding to a sample subsequent to the target sample in the sample sequence, as a target sample label corresponding to the target sample; the sample and the sample label corresponding to the sample, and the target sample and the sample label corresponding to the target sample are used for training a game decision model to be trained.

In an optional embodiment of the present invention, the second traversal module 602 is configured to obtain a determination condition and a determination tag corresponding to the determination condition; traversing samples in the sample sequence, and determining whether the local state data corresponding to the samples meet the judgment condition; and when the match state data meets the judgment condition, setting the sample label of the sample as a judgment label corresponding to the judgment condition.

In an optional embodiment of the present invention, the determination condition has a corresponding priority, and the second traversal module 602 is configured to determine whether the local state data corresponding to the sample satisfies the determination condition according to an order from high to low of the priority; when the match state data corresponding to the sample meets the judgment condition, setting the sample label corresponding to the sample as the judgment label corresponding to the judgment condition, including: and when the match state data corresponding to the sample meets the judgment condition, setting the sample label corresponding to the sample as the judgment label corresponding to the judgment condition with the highest priority.

In an optional embodiment of the present invention, the third traversal module 603 is configured to take the sample for which the sample label is not determined as the target sample, and traverse the samples in the sample sequence again in a back-to-front order.

In an optional embodiment of the invention, the second target label determining module 604 is configured to set the sample label of the target sample to a designated label when the target sample is the first last sample in the sample sequence.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiment of the invention discloses electronic equipment, which comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein when the computer program is executed by the processor, the steps of the embodiment of the training method of the game decision model are realized.

The embodiment of the invention discloses a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the embodiment of the training method of the game decision model are realized.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or terminal apparatus that comprises the element.

The game decision model training method, the game decision model training device, the electronic device and the storage medium provided by the invention are introduced in detail, specific examples are applied in the text to explain the principle and the implementation mode of the invention, and the description of the above examples is only used for helping understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of training a game decision model, the method comprising:

when the sample label corresponding to the sample cannot be determined according to the office state data corresponding to the sample, taking the sample as a target sample;

2. The method of claim 1, wherein traversing the samples in the sample sequence, determining the sample labels corresponding to the samples according to the local state data corresponding to the samples, comprises:

3. The method of claim 2, wherein the determination condition corresponds to a priority, and the determining whether the hand-over status data corresponding to the sample satisfies the determination condition comprises:

4. The method of claim 2, wherein traversing the samples in the sequence of samples, determining the sample labels corresponding to the samples according to the local state data corresponding to the samples, further comprises:

5. The method according to claim 1, wherein the sample sequence is provided with a special sample, the sample label corresponding to the special sample is a designated label, and before the sample label corresponding to a sample after the target sample in the sample sequence is obtained and used as the target sample label corresponding to the target sample, the method further comprises:

6. A method of training a game decision model, the method comprising:

7. The method of claim 6, wherein traversing the samples in the sequence of samples, determining the sample labels corresponding to the samples according to the alignment status data corresponding to the samples, comprises:

8. The method of claim 7, wherein the determination condition corresponds to a priority, and wherein the determining whether the local state data corresponding to the sample satisfies the determination condition comprises:

9. The method of claim 6, wherein the step of using the sample with no determined sample label as the target sample and traversing the samples in the sample sequence again comprises:

10. The method according to claim 9, wherein the obtaining a sample label corresponding to a sample subsequent to the target sample in the sample sequence as a target sample label corresponding to the target sample comprises:

11. An apparatus for training a game decision model, the apparatus comprising:

12. An apparatus for training a game decision model, the apparatus comprising:

the third traversal module is used for taking the sample without the determined sample label as a target sample and traversing the samples in the sample sequence again;

13. An electronic device comprising a processor, a memory, and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing the steps of the method of training a game decision model according to any one of claims 1 to 10.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of training a game decision model according to any one of claims 1 to 10.