JP4804226B2

JP4804226B2 - Shooting game processing method, apparatus thereof, program thereof, and recording medium thereof

Info

Publication number: JP4804226B2
Application number: JP2006147033A
Authority: JP
Inventors: 洋川野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-05-26
Filing date: 2006-05-26
Publication date: 2011-11-02
Anticipated expiration: 2026-05-26
Also published as: JP2007313104A

Description

本発明は、画面上に自機と敵機を表示し、自機を操作して、自機に攻撃を加えてくる敵機と闘うシューティングゲームの処理方法、その装置、そのプログラム及びその記録媒体に関する。 The present invention relates to a method of processing a shooting game for displaying an own aircraft and an enemy aircraft on a screen, operating the own aircraft, and fighting an enemy aircraft that attacks the own aircraft, its apparatus, its program, and its recording medium About.

シューティングゲームの処理方法、特に敵機の動作を制御する方法としては、自機の位置を考慮しないアルゴリズム、例えば、自機の位置とは無関係に敵機が一定方向に下降し続けるアルゴリズムや、自機の位置とは無関係に敵機が静止し続けるアルゴリズムを採用する方法があった。また、自機の位置を考慮するアルゴリズムとしては、例えば、自機の位置に近づくように敵機が動くアルゴリズムや、常に自機との位置を一定距離・方向に保つように敵機が動くアルゴリズムがあった。
また、チェス、オセロ、将棋といって非アクションの思考ゲームにおいては、人工知能の技術を導入したアルゴリズムの研究が行われている（例えば、非特許文献１参照）。
Jonathan Schaeffer,H.Jaap van den Herik,“Games,computers,and artificial intelligence”,Artificial Intelligence,2002,Vol.134,p1-7 川野洋「未知不均一潮流中での航行を考慮した劣駆動水中ロボットの動作計画と制御」,JSAI2005,人工知能学会（第１９回）,1D1-04,2005年 Shooting game processing methods, particularly methods for controlling the operation of enemy aircraft, include algorithms that do not take into account the position of the aircraft, such as algorithms that keep the enemy aircraft descending in a certain direction regardless of the location of the aircraft, There was a method that adopted an algorithm that kept the enemy aircraft stationary regardless of the position of the aircraft. In addition, as an algorithm that considers the position of the own aircraft, for example, an algorithm that moves the enemy aircraft to approach the position of the own aircraft, or an algorithm that moves the enemy aircraft to always keep the position of the own aircraft at a certain distance and direction was there.
In addition, in a non-action thinking game such as chess, othello, and shogi, research on algorithms incorporating artificial intelligence technology has been conducted (for example, see Non-Patent Document 1).
Jonathan Schaeffer, H. Jaap van den Herik, “Games, computers, and artificial intelligence”, Artificial Intelligence, 2002, Vol. 134, p1-7 Hiroshi Kawano “Operation Planning and Control of Underactuated Underwater Robot Considering Navigation in Unknown and Uneven Currents”, JSAI2005, Japanese Society for Artificial Intelligence (19th), 1D1-04, 2005

しかし、従来技術による敵機の行動制御アルゴリズムは、予めプログラムによって決められたものであり、自機の攻撃を意図的に避けたり、敵機自身が自分の位置を自機への攻撃に有利な位置に誘導するといった知的な動作をするものはなかった。
また、各プレイヤーの自機操作の癖を考慮した敵機攻撃弾発射アルゴリズムの自動的な更新も行われていなかったため、ユーザが一度そのゲームに習熟してしまうと、ユーザがすぐにそのゲームに飽きてしまうという問題があった。
そのような問題を解くためには、人工知能の技術を導入するのが有効であると考えられるが、現在人工知能におけるゲームの研究は、チェス、オセロ、将棋といった非アクションの思考ゲームが中心であり、人工知能の技術をシューティングゲームのようなアクションゲームに応用する試みはなかった。 However, the enemy aircraft's behavior control algorithm according to the prior art is determined in advance by a program, and the enemy aircraft deliberately avoids its own attack, or the enemy aircraft itself is advantageous for attacking its own aircraft. There was nothing that intelligently navigated to position.
Also, since the enemy attack bullet firing algorithm was not automatically updated in consideration of the trap of each player's own operation, once the user became proficient in the game, the user immediately entered the game. There was a problem of getting bored.
In order to solve such problems, it is considered effective to introduce artificial intelligence technology, but currently research on games in artificial intelligence is centered on non-action thinking games such as chess, othello and shogi. There was no attempt to apply artificial intelligence technology to action games like shooting games.

また、チェス、オセロ、将棋といった思考ゲームを解くための人工知能技術の研究は、効率的に深い探索計算を行うための手法の開発に集中している（例えば、非特許文献１参照。）。もちろん、シューティングゲームにおいても、知的な敵機の動作アルゴリズムを開発するためには、探索技術は重要である。しかし、シューティングゲームにおいて、探索の深さに関する要求は低く、むしろ、いかにしてアクションゲームにおいて求められているレベルの実時間性を実現するかが大事である。しかし、そのような要求を満たす技術はなかった。 Also, research on artificial intelligence technology for solving thinking games such as chess, othello, and shogi has concentrated on the development of methods for efficiently performing deep search calculations (see, for example, Non-Patent Document 1). Of course, even in shooting games, search technology is important to develop intelligent enemy aircraft operation algorithms. However, in the shooting game, the demand for the depth of search is low. Rather, how to achieve the level of real time required in the action game is important. However, there was no technology that could meet such requirements.

本発明によれば、敵機の自機位置からの相対位置を状態の変数とし、敵機の移動速度を行動の変数とするマルコフ状態遷移モデルを用いて、敵機の行動を選択するシューティングゲーム処理方法において、変位量計算手段が、敵機の各行動と自機の各行動の組み合わせごとに、相対位置の変位量を求める。第１状態遷移確率計算手段が、上記マルコフ状態遷移モデルのある状態における、その状態を構成する相対位置と同じ次元を持つ格子を、上記敵機の各行動と自機の各行動の組み合わせごとに求まった変位量だけ平行移動させ、その他の格子との共通部分の面積に比例した確率を、上記敵機の各行動と自機の各行動の組み合わせごとの状態遷移確率として求める。乗算手段が、上記敵機の各行動と自機の各行動の組み合わせごとの状態遷移確率に、自機に行動の種類の数で１を割った値を乗算する。第２状態遷移確率計算手段が、自機のすべての行動について、上記乗算過程で求まった値の和を取ることにより、敵機が各行動を取ったときの状態遷移確率を求める。報酬決定手段が、状態と行動と遷移先の状態の各組み合わせについての報酬を決定する。動作計画手段が、上記敵機が各行動を取ったときの状態遷移確率と上記報酬を用いて、マルコフ状態遷移モデルにおける動作計画法に基づき、敵機行動方策データを求める。状態獲得手段が、敵機の現在の状態を獲得する。敵機行動選択手段が、上記獲得された敵機の現在の状態と上記敵機行動方策データに基づいて敵機の行動を選択する。 According to the present invention, a shooting game for selecting an action of an enemy aircraft using a Markov state transition model in which the relative position of the enemy aircraft from its own position is a state variable and the movement speed of the enemy aircraft is a behavior variable. In the processing method, the displacement amount calculating means obtains the displacement amount of the relative position for each combination of each action of the enemy aircraft and each action of the own aircraft. The first state transition probability calculating means calculates a grid having the same dimension as the relative position constituting the state in a state of the Markov state transition model for each combination of each action of the enemy aircraft and each action of the own aircraft. The translation is performed by the obtained displacement amount, and the probability proportional to the area of the common part with the other lattice is obtained as the state transition probability for each combination of each action of the enemy aircraft and each action of the own aircraft. The multiplication means multiplies the state transition probability for each combination of each action of the enemy aircraft and each action of the own aircraft by a value obtained by dividing the own aircraft by the number of types of actions. The second state transition probability calculating means obtains the state transition probability when the enemy aircraft takes each action by taking the sum of the values obtained in the multiplication process for all the actions of the own machine. The reward determining means determines a reward for each combination of state, action, and transition destination state. The action planning means obtains enemy aircraft action policy data based on the action planning method in the Markov state transition model using the state transition probability when the enemy aircraft takes each action and the reward. The state acquisition means acquires the current state of the enemy aircraft. The enemy aircraft action selection means selects an action of the enemy aircraft based on the acquired current state of the enemy aircraft and the enemy aircraft action policy data.

本発明により、シューティングゲームにおける敵機の動作がより知的になり、これまで以上に、ゲーム性の増したシューティングゲームが実現された。また、プレイヤーの自機操作の癖を考慮した敵機動作アルゴリズムの更新も可能になり、今までよりも飽きが来にくいゲームを実現することができる。 According to the present invention, the operation of the enemy aircraft in the shooting game has become more intelligent, and a shooting game with improved game performance has been realized. Also, it is possible to update the enemy aircraft operation algorithm in consideration of the player's own operation habits, and it is possible to realize a game that is less tired than before.

［理論的背景］
＜シューティングゲーム＞
図３に、本発明による処理の対象となるシューティングゲームを例示する。画面上には、自機１と自機１の攻撃弾３、敵機２と敵機２の攻撃弾４が表示される。攻撃弾３、敵機２、攻撃弾４は、複数存在していても良い。自機１、敵機２、攻撃弾３、攻撃弾４は、Ｘ軸とＹ軸で構成される２次元の平面上を位置するものとする。
ユーザは、自機１を操作して、敵機２と闘う。ユーザは、一回の操作で、Ｗ種類の速度（方向と速さ）で、行動単位時間Ｔだけ自機を移動することができる。自機１が取り得るＷ種類の移動速度（方向と速さ）（Ｖｓｘ（ｗ），Ｖｓｙ（ｗ））は予め決められており、それぞれの移動速度（Ｖｓｘ（ｗ），Ｖｓｙ（ｗ））には、移動速度番号ｗ（ｗ＝１，…，Ｗ）が付けられているものとする。以下では、移動速度番号ｗを選択する行動のことを行動（ｗ）という。また、ユーザは、Ｗ種類の速度による移動と同時に、敵機２を攻撃するための攻撃弾３を発射する操作をすることができる。 [Theoretical background]
<Shooting game>
FIG. 3 exemplifies a shooting game to be processed according to the present invention. On the screen, the own machine 1 and the attack bullet 3 of the own machine 1, and the enemy machine 2 and the attack bullet 4 of the enemy machine 2 are displayed. There may be a plurality of attack bullets 3, enemy aircraft 2, and attack bullets 4. Assume that the own aircraft 1, enemy aircraft 2, attack bullet 3, and attack bullet 4 are located on a two-dimensional plane composed of the X axis and the Y axis.
The user operates the own aircraft 1 to fight the enemy aircraft 2. The user can move his / her own device for the action unit time T at W speeds (direction and speed) in one operation. W types of moving speeds (direction and speed) (Vsx (w), Vsy (w)) that the own device 1 can take are determined in advance, and the respective moving speeds (Vsx (w), Vsy (w)) Is given a moving speed number w (w = 1,..., W). Hereinafter, the action of selecting the moving speed number w is referred to as action (w). In addition, the user can perform an operation of firing an attack bullet 3 for attacking the enemy aircraft 2 at the same time as movement at W speeds.

敵機２は、行動単位時間Ｔごとに、Ｋ種類の移動速度（方向と速さ）で行動することができる。敵機２が取り得るＫ種類の移動速度（Ｖｅｘ（ｋ），Ｖｅｙ（ｋ））は予め決められており、それぞれの移動速度（Ｖｅｘ（ｋ），Ｖｅｙ（ｋ））には、移動速度番号ｋ（ｋ＝１，…，Ｋ）が付けられているものとする。また、敵機２は移動と同時に、攻撃弾４を発射することができる。攻撃弾発射フラグをｌａｕとし、ｌａｕ＝１のときに敵機２が攻撃弾４を発射、ｌａｕ＝０のときに敵機２が攻撃弾４を発射しないと定義されているものとする。
自機１が、敵機２及び敵機２が発射する攻撃弾４と衝突した場合には、自機１は破壊され、そこでゲームオーバーとなる。逆に、自機１の攻撃弾３が敵機２に衝突した場合には、敵機２は破壊される。自機１の攻撃弾３によりすべての敵機２を破壊した場合には、ユーザは、そのゲームに勝利することになる。
本発明では、このようなシューティングゲームにおける敵機２の動作の制御に、以下で説明するマルコフ状態遷移モデルを用いた動作計画法を用いる。 The enemy aircraft 2 can act at K types of moving speeds (direction and speed) for each behavior unit time T. K types of moving speeds (Vex (k), Vey (k)) that can be taken by the enemy aircraft 2 are determined in advance, and each moving speed (Vex (k), Vey (k)) has a moving speed number. It is assumed that k (k = 1,..., K) is attached. Further, the enemy aircraft 2 can fire the attack bullet 4 simultaneously with the movement. Assume that the attack bullet firing flag is lau, and that enemy aircraft 2 launches attack bullet 4 when lau = 1, and that enemy aircraft 2 does not fire attack bullet 4 when lau = 0.
When the own aircraft 1 collides with the enemy aircraft 2 and the attack bullet 4 fired by the enemy aircraft 2, the own aircraft 1 is destroyed and the game is over there. On the contrary, when the attack bullet 3 of the own aircraft 1 collides with the enemy aircraft 2, the enemy aircraft 2 is destroyed. When all the enemy aircrafts 2 are destroyed by the attack bullet 3 of the own device 1, the user wins the game.
In the present invention, an operation planning method using a Markov state transition model described below is used for controlling the operation of the enemy aircraft 2 in such a shooting game.

＜マルコフ状態遷移モデル＞
次に、本発明の前提知識となるマルコフ状態遷移モデル及びマルコフ状態遷移モデルを用いた動作計画法について説明する。
環境を以下のようにモデル化したものが、マルコフ状態遷移モデルである。環境のとりうる離散的な状態の集合をＳ＝｛ｓ _１，ｓ_２，…，ｓ_ｎ｝、行動主体が取り得る行動の集合をＡ＝｛ａ_１，ａ_２，…ａ_ｍ｝と表す。環境中のある状態 s ∈ Ｓにおいて、行動主体がある行動 a を実行すると、環境は確率的に状態 s' ∈Ｓへ遷移する。その遷移確率を
Ｐ（ｓ，ｓ’，ａ）＝Ｐｒ｛ｓ_ｔ＋１＝ｓ’｜ｓ_ｔ＝ｓ，ａ_ｔ＝ａ｝
により表す。このとき環境から行動主体へ報酬ｒが確率的に与えられるが、その期待値を
Ｒ（ｓ，ｓ’，ａ）＝Ｅ｛ｒ_t ｜ｓ_ｔ=ｓ, ａ_ｔ＝ａ，s_ｔ＋１＝s’｝
とする。行動主体の各時刻における意志決定は方策関数
π（ｓ，ａ）＝Ｐｒ｛ａ_ｔ＝ａ｜ｓ_ｔ＝ｓ｝
によって表される。π（ｓ，ａ）は、全状態ｓ，全行動ａにおいて定義される。方策関数π（ｓ，ａ）は、単に方策π とも呼ばれる。
すべての状態ｓ、行動ａ、遷移先の状態ｓ’の組み合わせについての遷移確率Ｐ（ｓ，ｓ’，ａ）と報酬Ｒ（ｓ，ｓ’，ａ）の値が定まっていれば、ダイナミックプログラミング（Dynamic Programminｇ）法により、方策πを計算することができる（例えば、下記参考文献１参照）。ダイナミックプログラミング法の処理は、周知技術であるため説明は省略する。 <Markov state transition model>
Next, a Markov state transition model and a motion planning method using the Markov state transition model, which are prerequisite knowledge of the present invention, will be described.
The Markov state transition model is a model of the environment as follows. Possible discrete _S = the set of states of the environment _{{s 1, s 2, ...} , s n}, action _A = the set of entities can take action _{{a 1, a 2, ...} a m} represents the . When an action subject executes an action a in a state s ∈ S in the environment, the environment probabilistically transitions to the state s ′ ∈ S. The transition probability is P (s, s ′, a) = Pr {s _{t + 1} = s ′ | s _t = s, a _t = a}
Is represented by This time reward r from the environment to the actors is given probabilistically, the expected value R (s, s', a ) = E {r t | s t = s, a t = a, s t + 1 = s '}
And Decision Strategies function π at each time actors _{(s, a) = Pr {} a t = a | s t = s}
Represented by π (s, a) is defined in all states s and all actions a. The policy function π (s, a) is also simply referred to as policy π.
If the values of transition probabilities P (s, s', a) and reward R (s, s', a) for all combinations of state s, action a, and transition destination state s' are determined, dynamic programming The policy π can be calculated by the (Dynamic Programming) method (for example, refer to Reference Document 1 below). Since the processing of the dynamic programming method is a well-known technique, description thereof is omitted.

上記ダイナミックプログラミング法により計算された方策πから、各状態ｓにおける最適な行動aを決定することができる。以上が、マルコフ状態遷移モデルを用いた動作計画法の概要である。
（参考文献１）三上貞芳、皆川雅章共訳、R.S.Sutton、A.G.Barto 原著「強化学習」森北出版、1998、pp.94-118
上記のようなシューティングゲームに、上記マルコフ状態遷移モデルを用いた動作計画法を適用するために、以下で述べる実施形態では、マルコフ状態遷移モデルにおけるマルコフ空間を、敵機の自機位置から相対距離（Ｘ，Ｙ）と、敵機の残存攻撃弾数Ｂの２つの変数で構成する。すなわち、状態ｓは、敵機の自機位置から相対距離（Ｘ，Ｙ）と、敵機の残存攻撃弾数Ｂで離散的に表現されることになる。なお、マルコフ状態遷移モデルにおけるマルコフ空間を、敵機の自機位置から相対距離（Ｘ，Ｙ）のみで構成されるようにしても良い。また、敵機キャラクターが、例えば自動車のような、機体の向きに沿って進むものであるならば、敵機の方位角を用いて、マルコフ空間を構成しても良い。すなわち、敵機キャラクターの自機位置からの相対距離以外の変数で、マルコフ空間を構成するようにしても良い。 The optimal action a in each state s can be determined from the policy π calculated by the dynamic programming method. The above is the outline of the motion planning method using the Markov state transition model.
(Reference 1) Sadayoshi Mikami, Masaaki Minagawa Co-translation, RSSutton, AGBarto Original work “Reinforcement Learning” Morikita Publishing, 1998, pp.94-118
In order to apply the motion planning method using the Markov state transition model to the shooting game as described above, in the embodiment described below, the Markov space in the Markov state transition model is set to a relative distance from the position of the enemy aircraft. (X, Y) and the remaining attack bullet number B of the enemy aircraft. That is, the state s is discretely expressed by the relative distance (X, Y) from the enemy aircraft's own position and the remaining attack bullet number B of the enemy aircraft. Note that the Markov space in the Markov state transition model may be configured only by the relative distance (X, Y) from the position of the enemy aircraft. Further, if the enemy aircraft character advances along the direction of the aircraft, such as an automobile, the Markov space may be configured using the azimuth angle of the enemy aircraft. That is, the Markov space may be configured with variables other than the relative distance from the enemy aircraft character's own position.

また、以下で述べる本実施形態では、マルコフ空間内での行動ａは、敵機の移動速度番号ｋと、攻撃弾発射フラグｌａｕの２つの変数から構成されるものとする。なお、マルコフ空間を敵機の自機位置からの相対距離（Ｘ，Ｙ）のみで構成する場合には、行動ａを敵機の移動速度番号ｋのみから構成するようにしても良い。 Further, in the present embodiment described below, the action a in the Markov space is assumed to be composed of two variables, an enemy aircraft movement speed number k and an attack bullet launch flag lau. When the Markov space is configured only by the relative distance (X, Y) from the position of the enemy aircraft, the action a may be configured only by the movement speed number k of the enemy aircraft.

［実施形態］
図１と図２を参照して、本発明によるシューティングゲーム処理装置１００について説明する。図１は、シューティングゲーム処理装置１００の機能構成を例示する図である。図２は、シューティングゲーム処理装置１００の処理フローを例示する図である。
シューティングゲーム処理装置１００は、例えば、記憶部１０、状態遷移確率計算部２０、報酬決定部３０、動作計画部４０、敵機行動方策データ５０、行動決定部６０、表示部７０から構成される。
状態遷移確率計算部２０は、例えば、変位量計算部２１、第１状態遷移確率計算部２２、乗算部２３、第２状態遷移確率計算部２４から構成される。報酬決定部３０は、例えば、距離計算部３１、命中率計算部３２、第１射線判定部３３、第２射線判定部３４、設定状態検出部３５、一時記憶部３８、入力部３９から構成される。行動決定部６０は、例えば、状態獲得部６１、敵機行動選択部６２、敵機位置計算部６３から構成される。 [Embodiment]
A shooting game processing apparatus 100 according to the present invention will be described with reference to FIGS. 1 and 2. FIG. 1 is a diagram illustrating a functional configuration of the shooting game processing apparatus 100. FIG. 2 is a diagram illustrating a processing flow of the shooting game processing apparatus 100.
The shooting game processing device 100 includes, for example, a storage unit 10, a state transition probability calculation unit 20, a reward determination unit 30, an action plan unit 40, enemy aircraft action policy data 50, an action determination unit 60, and a display unit 70.
The state transition probability calculation unit 20 includes, for example, a displacement amount calculation unit 21, a first state transition probability calculation unit 22, a multiplication unit 23, and a second state transition probability calculation unit 24. The reward determination unit 30 includes, for example, a distance calculation unit 31, a hit rate calculation unit 32, a first ray determination unit 33, a second ray determination unit 34, a setting state detection unit 35, a temporary storage unit 38, and an input unit 39. The The action determination unit 60 includes, for example, a state acquisition unit 61, an enemy aircraft action selection unit 62, and an enemy aircraft position calculation unit 63.

＜ステップ１＞
状態遷移確率計算部２０の変位量計算部２１は、敵機の行動（ｋ，ｌａｕ）と自機の行動（ｗ）の各組み合わせごとに、敵機の自機位置からの相対位置の変位量（ｄＸ，ｄＹ）を求める。具体的には、状態遷移確率計算部２０の変位量計算部２１は、記憶部１０から読み出した、敵機が行動（ｋ，ｌａｕ）を選択した場合の速度（Ｖｅｘ（ｋ），Ｖｅｙ（ｋ））と、自機が行動（ｗ）を選択した場合の速度（Ｖｓｘ（ｗ），Ｖｓｙ（ｗ））と、自機と敵機の行動単位時間をＴとを用いて、
ｄＸ＝（Ｖｅｘ（ｋ）−Ｖｓｘ（ｗ））×Ｔ
ｄＹ＝（Ｖｅｙ（ｋ）−Ｖｓｙ（ｗ））×Ｔ
を、敵機の行動（ｋ，ｌａｕ）と自機の行動（ｗ）のすべての組み合わせについて計算する。計算された敵機の自機位置からの相対位置の変位量（ｄＸ，ｄＹ）は、第１状態遷移確率計算部２２に出力される。 <Step 1>
The displacement amount calculation unit 21 of the state transition probability calculation unit 20 calculates the displacement amount of the relative position from the own aircraft position for each combination of the action (k, lau) of the enemy aircraft and the action (w) of the own aircraft. (DX, dY) is obtained. Specifically, the displacement calculation unit 21 of the state transition probability calculation unit 20 reads the speed (Vex (k), Vey (k) when the enemy aircraft selects the action (k, lau) read from the storage unit 10. )), The speed (Vsx (w), Vsy (w)) when the own aircraft selects the action (w), and the action unit time of the own aircraft and the enemy aircraft using T,
dX = (Vex (k) −Vsx (w)) × T
dY = (Vey (k) −Vsy (w)) × T
Are calculated for all combinations of the action (k, lau) of the enemy aircraft and the action (w) of the own aircraft. The calculated displacement (dX, dY) of the relative position of the enemy aircraft from its own position is output to the first state transition probability calculation unit 22.

＜ステップ２＞
第１状態遷移確率計算部２２は、すべての状態ｓ（Ｘ，Ｙ，Ｂ）と行動（ｗ，ｋ，ｌａｕ）と遷移先の状態ｓ’（Ｘ’，Ｙ’，Ｂ’）の組み合わせごとの状態遷移確率Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）を計算する。
図４に示すように、状態ｓ（Ｘ，Ｙ，Ｂ）が、敵機の自機位置からの相対位置のＸ座標，敵機の自機位置からの相対位置のＹ座標から構成される２次元の格子で表わされるものとする。例えば、ここで、ある状態ｓ（Ｘ，Ｙ，Ｂ）にある敵機は、その状態ｓ（Ｘ，Ｙ，Ｂ）を表わす２次元の格子内の各点に、等しい確率で存在するものとする。この仮定の下では、ある状態ｓ（Ｘ，Ｙ，Ｂ）にある敵機が行動（ｋ，ｌａｕ）を取り、自機が行動（ｗ）を取ったときの状態遷移確率Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）は、敵機の状態ｓ（Ｘ，Ｙ，Ｂ）を表わす２次元の格子をｇとし、その格子ｇを上記変位量（ｄＸ，ｄＹ）だけ平行移動した格子をｇｄとすると、上記平行移動した格子ｇｄとその他の格子の重なった面積に比例して求めることができると考えられる。 <Step 2>
The first state transition probability calculation unit 22 performs a combination of all states s (X, Y, B), actions (w, k, lau), and transition destination states s ′ (X ′, Y ′, B ′). State transition probability Pe (w, k, lau, s, s') is calculated.
As shown in FIG. 4, the state s (X, Y, B) is composed of the X coordinate of the relative position of the enemy aircraft from its own position and the Y coordinate of the relative position of the enemy aircraft from its own position. It shall be represented by a dimensional grid. For example, here, an enemy aircraft in a certain state s (X, Y, B) exists at each point in the two-dimensional lattice representing the state s (X, Y, B) with an equal probability. To do. Under this assumption, an enemy aircraft in a certain state s (X, Y, B) takes action (k, lau), and the state transition probability Pe (w, k) when the own machine takes action (w). , Lau, s, s ′) is a lattice obtained by translating the lattice g by the displacement (dX, dY), where g is a two-dimensional lattice representing the enemy aircraft state s (X, Y, B). Assuming that gd is obtained, it can be obtained in proportion to the area in which the lattice gd moved in parallel and the other lattice overlap.

すなわち、第１状態遷移確率計算部２２は、ｎを１から４の整数とし、状態ｓｎ’を表わす格子をｇｎ’とし、格子ｇｄと格子ｇｎ’の重なった部分の面積をＤ（ｇｄ，ｇｎ’）とすると、状態ｓｎ’に遷移する確率を、
Ｄ（ｇｄ，ｇｎ’）／Σ_ｎＤ（ｇｄ，ｇｎ’）・・・（１）
を計算することにより求める。格子ｇｄが２つの格子ｇｎ’と重なる場合には、ｎを１から２までの整数として、上記（１）式を計算して状態遷移確率Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）を求める。格子ｇｄが１つの格子ｇｎ’と重なる場合には、ｎ＝１として、上記（１）式を計算して状態遷移確率Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）を求める。 That is, the first state transition probability calculation unit 22 sets n to an integer from 1 to 4, sets the grid representing the state sn ′ to gn ′, and sets the area of the overlapping portion of the grid gd and the grid gn ′ to D (gd, gn '), The probability of transition to state sn'
D (gd, gn ′) / Σ _n D (gd, gn ′) (1)
Is obtained by calculating When the lattice gd overlaps the two lattices gn ′, the above equation (1) is calculated by setting n to an integer from 1 to 2, and the state transition probability Pe (w, k, lau, s, s ′) is calculated. Ask. When the lattice gd overlaps one lattice gn ′, the state transition probability Pe (w, k, lau, s, s ′) is obtained by calculating the above equation (1) with n = 1.

例えば、状態ｓにある敵機が行動（ｋ，ｌａｕ）を取り、自機が行動（ｗ）を取ったときの敵機の自機位置からの相対位置の変位量ｄＸ，ｄＹだけ、状態ｓを表わす格子ｇを平行移動させたとき、その平行移動させた格子ｇｄが、ｇ１’〜ｇ４’の４つの格子と重なったとする。また、重なった部分の面積は、それぞれ、Ｄ（ｇｄ，ｇ１’）＝６，Ｄ（ｇｄ，ｇ２’）＝６，Ｄ（ｇｄ，ｇ３’）＝１，Ｄ（ｇｄ，ｇ４’）＝１であるとする。このとき、敵機が状態ｓ１’に遷移する確率は、Ｄ（ｇｄ，ｇ１’）／Σ_ｎＤ（ｇｄ，ｇｎ’）＝６／１４＝０．４２８…、敵機が状態ｓ２’に遷移する確率は、Ｄ（ｇｄ，ｇ２’）／Σ_ｎＤ（ｇｄ，ｇｎ’）＝６／１４＝０．４２８…、敵機が状態ｓ３’に遷移する確率は、Ｄ（ｇｄ，ｇ３’）／Σ_ｎＤ（ｇｄ，ｇｎ’）＝１／１４＝０．０７…、敵機が状態ｓ４’に遷移する確率は、Ｄ（ｇｄ，ｇ４’）／Σ_ｎＤ（ｇｄ，ｇｎ’）＝１／１４＝０．０７…と求めることができる。 For example, when the enemy aircraft in the state s takes action (k, lau) and the own aircraft takes action (w), only the displacements dX and dY of the relative position from the position of the enemy aircraft are in the state s. When the lattice g representing is translated, it is assumed that the translated lattice gd overlaps the four lattices g1 ′ to g4 ′. The areas of the overlapping portions are D (gd, g1 ′) = 6, D (gd, g2 ′) = 6, D (gd, g3 ′) = 1, D (gd, g4 ′) = 1, respectively. Suppose that At this time, the probability that the enemy aircraft transits to the state s1 ′ is D (gd, g1 ′) / Σ _n D (gd, gn ′) = 6/14 = 0.428, and the enemy aircraft transits to the state s2 ′. D (gd, g2 ′) / Σ _n D (gd, gn ′) = 6/14 = 0.428... And the probability that the enemy aircraft will transition to state s3 ′ is D (gd, g3 ′). _{/ Σ n D (gd, gn} ') probability = 1/14 = 0.07 ..., the enemy machine state s4' transition to the, D (gd, g4 ') / Σ n D (gd, gn') = 1/14 = 0.07...

第１状態遷移確率計算部２２は、変位量（ｄＸ，ｄＹ）だけ平行移動させた格子ｇｄと重なる格子ｇｎ’に対応した状態ｓｎ’へ遷移する状態遷移確率Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）については、上記式（１）を計算することにより求める。上記状態ｓｎ’以外の状態ｓ’に遷移する確率Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）については、０とする。
このようにして計算された状態遷移確率Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）は、乗算部２３に出力される。 The first state transition probability calculating unit 22 changes the state transition probability Pe (w, k, lau, s) to the state sn ′ corresponding to the lattice gn ′ that overlaps the lattice gd translated by the displacement amount (dX, dY). , S ′) is obtained by calculating the above equation (1). The probability Pe (w, k, lau, s, s ′) for transition to the state s ′ other than the state sn ′ is 0.
The state transition probability Pe (w, k, lau, s, s ′) calculated in this way is output to the multiplication unit 23.

＜ステップＳ３＞
乗算部２３は、上記状態遷移確率Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）に、自機が行動（ｗ）を選択する確率Ｐｖ（ｓ，ｗ）を乗算する。計算結果であるＰｖ（ｓ，ｗ）・Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）は、第２状態遷移確率計算部２４に出力される。
特にデータがなければ、自機が行動（ｗ）を選択する確率Ｐｖ（ｓ，ｗ）は、１を自機の移動速度の種類の数Ｗで割った値にすることができる。 <Step S3>
The multiplication unit 23 multiplies the state transition probability Pe (w, k, lau, s, s ′) by the probability Pv (s, w) that the own device selects the action (w). The calculation results Pv (s, w) · Pe (w, k, lau, s, s ′) are output to the second state transition probability calculation unit 24.
If there is no particular data, the probability Pv (s, w) that the own device selects the action (w) can be a value obtained by dividing 1 by the number W of types of the moving speed of the own device.

＜ステップＳ４＞
第２状態遷移確率計算部２４は、上記Ｐｖ（ｓ，ｗ）・Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）について、自機の行動（ｗ）についての和を取ることにより、状態ｓにある敵機が各行動（ｋ，ｌａｕ）を取ったときに、状態ｓ’に遷移する状態遷移確率Ｐ（ｓ，ｓ’，ｋ，ｌａｕ）を求める。すなわち、第２状態遷移確率計算部２４は、
Σ_ｗ∈ＷＰｖ（ｓ，ｗ）・Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）
を計算することにより、状態遷移確率Ｐ（ｓ，ｓ’，ｋ，ｌａｕ）を求める。計算された状態遷移確率Ｐ（ｓ，ｓ’，ｋ，ｌａｕ）は、動作計画部４０に出力される。 <Step S4>
The second state transition probability calculation unit 24 calculates the state s by taking the sum of the action (w) of the own device with respect to the Pv (s, w) · Pe (w, k, lau, s, s ′). The state transition probability P (s, s ′, k, lau) for transitioning to the state s ′ when the enemy aircraft at (1) takes each action (k, lau) is obtained. That is, the second state transition probability calculation unit 24
_ΣwεW Pv (s, w) · Pe (w, k, lau, s, s')
To obtain the state transition probability P (s, s ′, k, lau). The calculated state transition probability P (s, s ′, k, lau) is output to the motion planning unit 40.

＜ステップＳ５＞
報酬決定部３０は、状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’のすべての組み合わせについて、状態ｓにある敵機が行動（ｋ，ｌａｕ）を取って状態ｓ’に遷移したときの報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）を決定する。
報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）は、ユーザが任意に設定することができる。例えば、報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）は、以下のように設定することができる。 <Step S5>
The reward determination unit 30 makes transition to the state s ′ when the enemy aircraft in the state s takes the action (k, lau) for all combinations of the state s, the action (k, lau), and the state s ′ that is the transition destination. A reward r (s, s ′, k, lau) for time is determined.
The reward r (s, s ′, k, lau) can be arbitrarily set by the user. For example, the reward r (s, s ′, k, lau) can be set as follows.

≪報酬ｒの決め方１≫
第１射線判定部３３は、行動（ｋ，ｌａｕ）を選択することにより、状態ｓ’（Ｘ’，Ｙ’，Ｂ’）に遷移した敵機の自機位置からの相対位置（Ｘ’，Ｙ’）から、自機が敵機の射線上にあるかどうかを判定する。自機が敵機の射線上にあるかどうかは、敵機の攻撃弾ｉの飛来方向を（ｃｏｓθｉ，ｓｉｎθｉ）とすると、上記相対位置（Ｘ’，Ｙ’）における格子が線分（ｔ×ｃｏｓθｉ，ｔ×ｓｉｎθｉ）を含むか否かにより、判定することができる。 ≪How to determine reward r 1≫
The first ray determination unit 33 selects the action (k, lau), and thereby the relative position (X ′, X ′, X ′, Y ′, B ′) of the enemy aircraft that has transitioned to the state s ′ (X ′, Y ′, B ′). From Y ′), it is determined whether or not the own aircraft is on the enemy's line of sight. Whether or not the aircraft is on the enemy's line of fire is determined by assuming that the incoming direction of the attack bullet i of the enemy aircraft is (cosθi, sinθi), the lattice at the relative position (X ′, Y ′) is a line segment (t × This can be determined by whether or not cos θi, t × sin θi) is included.

第１射線判定部３３は、予め設定された敵機の攻撃弾ｉの飛来方向（ｃｏｓθｉ，ｓｉｎθｉ）を記憶部１０から読み出し、上記相対位置（Ｘ’，Ｙ’）における格子が線分（ｔ×ｃｏｓθｉ，ｔ×ｓｉｎθｉ）を含むか否かを判定する。自機が、遷移後に相対位置（Ｘ’，Ｙ’）に位置する敵機の射線上にあると判定された場合には、第１射線判定部３３は、そのような状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’の組み合わせについての報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）を、自機が、遷移後に相対位置（Ｘ’，Ｙ’）に位置する敵機の射線上にない場合よりも、高く設定する。
このように報酬を定めることにより、敵機が、敵機の射線上に自機が位置するように行動する方策πを得ることができる。 The first ray determination unit 33 reads the preset flying direction (cos θi, sin θi) of the attack bullet i of the enemy aircraft from the storage unit 10, and the lattice at the relative position (X ′, Y ′) is a line segment (t It is determined whether or not x cos θi, t × sin θi) is included. When it is determined that the own aircraft is on the ray of an enemy aircraft located at the relative position (X ′, Y ′) after the transition, the first ray determination unit 33 performs such state s and action (k , Lau) and the reward r (s, s ', k, lau) for the combination of the transition destination state s', the enemy aircraft that is located at the relative position (X ', Y') after the transition Set higher than when not on line.
By determining the reward in this way, it is possible to obtain a policy π in which the enemy aircraft acts so that the enemy aircraft is positioned on the enemy's ray.

≪報酬ｒの決め方２≫
第２射線判定部３４は、行動（ｋ，ｌａｕ）を選択することにより、状態ｓ’（Ｘ’，Ｙ’，Ｂ’）に遷移した敵機の自機位置からの相対位置（Ｘ’，Ｙ’）が、自機の射線上にあるかどうかを判定する。自機の射線上にあるかどうかは、自機の攻撃弾ｊの飛来方向を（ｃｏｓθｊ，ｓｉｎθｊ）とすると、上記相対位置（Ｘ’，Ｙ’）における格子が線分（ｔ×ｃｏｓθｊ，ｔ×ｓｉｎθｊ）を含むか否かにより、判定することができる。 ≪How to determine reward r 2≫
The second ray determination unit 34 selects the action (k, lau), and thereby the relative position (X ′, X ′, X ′, Y ′, B ′) of the enemy aircraft that has transitioned to the state s ′ (X ′, Y ′, B ′). It is determined whether Y ′) is on the ray of the aircraft. Whether or not it is on the ray of its own aircraft is determined by assuming that the flight direction of its own attack bullet j is (cos θj, sin θj), the lattice at the relative position (X ′, Y ′) is a line segment (t × cos θj, t It can be determined by whether or not xsin θj is included.

第２射線判定部３４は、予め設定された自機の攻撃弾ｊの飛来方向（ｃｏｓθｊ，ｓｉｎθｊ）を記憶部１０から読み出し、上記相対位置（Ｘ’，Ｙ’）における格子が線分（ｔ×ｃｏｓθｊ，ｔ×ｓｉｎθｊ）を含むか否かを判定する。遷移後の敵機の相対位置（Ｘ’，Ｙ’）が自機の射線上にあると判定された場合には、第２射線判定部３４は、そのような状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’の組み合わせについての報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）を、遷移後の敵機の相対位置（Ｘ’，Ｙ’）が自機の射線上にない場合よりも、低く設定する。
このように報酬を定めることによって、敵機が自機の射線を避けるように行動する方策πを得ることができる。 The second ray determination unit 34 reads the preset flying direction (cos θj, sin θj) of the attack bullet j of the own aircraft from the storage unit 10, and the lattice at the relative position (X ′, Y ′) is a line segment (t X cos θj, t × sin θj) is determined. In a case where it is determined that the relative position (X ′, Y ′) of the enemy aircraft after the transition is on the ray of the own aircraft, the second ray determination unit 34 determines such state s and action (k, lau). ) And the remuneration r (s, s ', k, lau) for the combination of the transition destination state s', the relative position (X ', Y') of the enemy aircraft after the transition is not on its own line of sight Set lower.
By determining the reward in this way, it is possible to obtain a policy π in which the enemy aircraft acts so as to avoid its own line of sight.

≪報酬ｒの決め方３≫
距離計算部３１が、遷移先の状態ｓ’における敵機の自機位置からの相対距離（Ｘ’，Ｙ’）から敵機と自機の距離ｒを計算し、その距離ｒが小さくなるほど、その状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’についての報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）を高く設定することができる。
このように報酬を定めることによって、敵機が自機に接近する行動を選択する方策πを得ることができる。 ≪How to determine reward r 3≫
The distance calculation unit 31 calculates the distance r between the enemy aircraft and the own aircraft from the relative distance (X ′, Y ′) from the position of the enemy aircraft in the transition destination state s ′, and as the distance r decreases, The reward r (s, s ′, k, lau) for the state s, the action (k, lau), and the transition destination state s ′ can be set high.
By determining the reward in this way, it is possible to obtain a policy π for selecting an action in which the enemy aircraft approaches the aircraft.

≪報酬ｒの決め方４≫
命中率計算部３２が、状態ｓにおける敵機の自機位置からの相対距離（Ｘ，Ｙ）から、敵機の攻撃弾の自機への命中率を計算する。命中率計算部３２は、その計算結果を、その状態ｓにおいて敵機が攻撃弾を発射する行動（ｋ，ｌａｕ＝１）を取る場合の報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）に設定することができる。 ≪How to determine reward r 4≫
The hit rate calculation unit 32 calculates the hit rate of the enemy aircraft attacking bullets from the relative distance (X, Y) from the enemy aircraft position in the state s. The hit rate calculation unit 32 sets the calculation result to a reward r (s, s ′, k, lau) when the enemy aircraft takes an action (k, lau = 1) in the state s. can do.

ここで、自機と敵機の距離がｒのとき、敵機が攻撃弾によって自機を撃墜する潜在的な確率Ｐｓは、
Ｐｓ＝Ａｅｘｐ（−ａｒ） …（２）
で与えられると考えられる。Ａとａは正の定数である。例えば、自機の大きさと敵弾の攻撃の大きさの和の半分の値をＳ、画面の長さをＬ、Ｕを０．１〜０．５の任意の実数とすると、Ａを１．０〜０．８、ａを、
ａ＝Ｕ×（Ｌ／Ｓ）^-１．５
とすると、経験上、精度の高い自機撃墜確率Ｐを求めることができる。さらに具体的に言うと、例えば、ゲーム画面が縦６００ピクセル、横４００ピクセル程度の大きさ、自機の形状が３２×３２ピクセルの大きさ、敵機の攻撃弾の大きさが４ピクセルであるとき、Ａ＝１．０〜０．８、ａ＝０．００１〜０．０１とすると良い。上記（２）式によれば、敵機が攻撃弾によって自機を撃墜する潜在的な確率Ｐは、図５に示すように、敵機と自機の距離ｒが小さくなるにつれて、指数関数的に上昇する。 Here, when the distance between the enemy aircraft and the enemy aircraft is r, the potential probability Ps that the enemy aircraft shoots down the enemy aircraft with an attack bullet is
Ps = Aexp (−ar) (2)
It is thought that it is given by. A and a are positive constants. For example, assuming that the half of the sum of the size of the aircraft and the size of the enemy bullet attack is S, the length of the screen is L, and U is an arbitrary real number between 0.1 and 0.5, A is 1. 0 to 0.8, a
a = U × (L / S) ^−1.5
Then, from experience, it is possible to obtain a highly accurate self-shooting probability P. More specifically, for example, the game screen is 600 pixels long and 400 pixels wide, the shape of the aircraft is 32 × 32 pixels, and the size of the enemy's attack bullet is 4 pixels. At this time, it is preferable that A = 1.0 to 0.8 and a = 0.001 to 0.01. According to the above equation (2), the potential probability P that an enemy aircraft shoots down its own aircraft with an attacking bullet is exponential as the distance r between the enemy aircraft and its own aircraft becomes smaller, as shown in FIG. To rise.

距離計算部３１は、状態ｓにおける敵機の自機からの相対位置（Ｘ，Ｙ）から、状態ｓにおける敵機と自機の距離ｒを計算する。
命中率計算部３２は、距離計算部３１が計算した距離ｒと、記憶部１０から読み出したＡとａとから、上記（２）式を計算することにより、敵機の攻撃弾の自機への命中率Ｐｓを計算する。命中率計算部３２は、このＰｓを、状態ｓにおいて敵機が行動（ｋ，ｌａｕ＝１）を選択して状態ｓ’に遷移した場合の報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）に設定する。
このように報酬を定めることによって、敵機が自機に十分に接近してから攻撃弾を発射する行動を選択する方策πを得ることができる。 The distance calculation unit 31 calculates the distance r between the enemy aircraft and the subject aircraft in the state s from the relative position (X, Y) of the enemy aircraft from the subject aircraft in the state s.
The hit rate calculation unit 32 calculates the above equation (2) from the distance r calculated by the distance calculation unit 31 and A and a read from the storage unit 10, thereby returning the attacking bullet of the enemy aircraft to its own aircraft. The hit rate Ps is calculated. The hit rate calculation unit 32 converts this Ps into a reward r (s, s ′, k, lau) when the enemy aircraft selects an action (k, lau = 1) in the state s and transitions to the state s ′. Set.
By determining the reward in this way, it is possible to obtain a policy π for selecting an action of firing an attack bullet after the enemy aircraft has sufficiently approached itself.

≪報酬ｒの決め方５≫
敵機の残存攻撃弾数Ｂが何発目のときにどの相対位置（Ｘ，Ｙ）において、攻撃弾を発射するのかは、入力部３９から入力することによりユーザが自由に設定することができる。報酬決定部３０の設定状態検出部３５は、ユーザの設定により予め定められた敵機の残存攻撃弾数Ｂごとの相対位置（Ｘ，Ｙ）を検出し、その状態ｓ（Ｘ，Ｙ，Ｂ）と攻撃弾を発射する行動（ｋ，ｌａｕ＝１）と遷移先の状態ｓ’（Ｘ’，Ｙ’，Ｂ’）についての報酬（ｓ，ｓ’，ｋ，ｌａｕ）を、その他の状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’についての報酬（ｓ，ｓ’，ｋ，ｌａｕ）よりも高く設定する。 ≪How to determine reward r 5≫
By inputting from the input unit 39, the user can freely set the relative position (X, Y) at which the number of remaining attack bullets B of the enemy aircraft is fired. . The setting state detection unit 35 of the reward determination unit 30 detects the relative position (X, Y) for each remaining attack bullet number B of the enemy aircraft determined in advance by the user setting, and the state s (X, Y, B). ) And the action (k, lau = 1) for firing an attacking bullet and the reward (s, s ′, k, lau) for the state s ′ (X ′, Y ′, B ′) of the transition destination, and other states It is set higher than the reward (s, s ′, k, lau) for s, action (k, lau), and the transition destination state s ′.

例えば、状態ｓ１（Ｘ＝３，Ｙ＝１，Ｂ＝２）において、攻撃弾を発射する行動（ｋ，ｌａｕ＝１）を取り、状態ｓ’に遷移する場合についての報酬（ｓ，ｓ’，ｋ，ｌａｕ）を１に設定し、その他の、敵機の残存攻撃弾数Ｂ＝２である状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’の組み合わせについての報酬（ｓ，ｓ’，ｋ，ｌａｕ）を０に設定する。また、状態ｓ２（Ｘ＝−３，Ｙ＝１，Ｂ＝１）において、攻撃弾を発射する行動（ｋ，ｌａｕ＝１）を取り、状態ｓ’に遷移する場合についての報酬（ｓ，ｓ’，ｋ，ｌａｕ）を１に設定し、その他の、敵機の残存攻撃弾数Ｂ＝１である状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’の組み合わせについての報酬（ｓ，ｓ’，ｋ，ｌａｕ）を０に設定する。 For example, in the state s1 (X = 3, Y = 1, B = 2), a reward (s, s ′) for taking an action (k, lau = 1) to fire an attack bullet and making a transition to the state s ′ , K, lau) is set to 1, and other rewards (s, s, for the combination of the state s, the action (k, lau) and the transition state s ′ in which the remaining number B of the enemy aircraft is B = 2 s ′, k, lau) is set to zero. Further, in the state s2 (X = -3, Y = 1, B = 1), a reward (s, s) for taking an action (k, lau = 1) to fire an attack bullet and making a transition to the state s ′ ', K, lau) is set to 1, and other rewards (s for combinations of state s, action (k, lau), and transition destination state s' where the number of remaining attack bullets B of the enemy aircraft is 1 , S ′, k, lau) is set to zero.

このように報酬を設定することにより、敵機の残存攻撃弾数Ｂ＝２の状態においては、まず、敵機が相対位置（Ｘ＝３，Ｙ＝１）まで移動してこの位置で攻撃弾を発射し、その後、敵機が相対位置（Ｘ＝−３，Ｙ＝１）まで移動してこの位置で攻撃弾を発射するような方策πを得ることができる。 By setting the reward in this way, in the state where the enemy aircraft's remaining attack bullet number B = 2, first, the enemy aircraft moves to the relative position (X = 3, Y = 1), and at this position the attack bullet , And then a strategy π is obtained such that the enemy aircraft moves to the relative position (X = −3, Y = 1) and fires an attack bullet at this position.

≪報酬の決め方６≫
報酬の決め方１〜５で説明した報酬の定め方を組み合わせて、報酬を決定しても良い。
例えば、まず、第２射線判定部３４が、報酬の決め方２で説明した方法を用いて、遷移後の敵機の相対位置（Ｘ’，Ｙ’）が自機の射線上にある場合の、状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’の組み合わせについての報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）を−１に設定する。そして、命中率計算部３２が、報酬の決め方４で説明した方法を用いて、それ以外の状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’の組み合わせについての報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）を定めるようにすることができる。
また、報酬決定部３０、距離計算部３１、命中率計算部３２、第１射線判定部３３、第２射線判定部３４、設定状態検出部３５の何れかが決定した報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）が、一時記憶部３８に格納されているものとする。このとき、上記決定をした手段以外の報酬決定部３０、距離計算部３１、命中率計算部３２、第１射線判定部３３、第２射線判定部３４、設定状態検出部３５の何れかが、一時記憶部３８から報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）を読み出し、その報酬について補正をするようにしても良い。 ≪How to decide reward 6≫
The rewards may be determined by combining the reward determination methods described in reward determination methods 1 to 5.
For example, first, when the second ray determination unit 34 uses the method described in Reward determination method 2 and the relative position (X ′, Y ′) of the enemy aircraft after the transition is on the ray of the own aircraft, The reward r (s, s ′, k, lau) for the combination of the state s, the action (k, lau), and the transition destination state s ′ is set to −1. Then, the hit rate calculation unit 32 uses the method described in the method 4 for determining the reward, and reward r (s, s for the combination of the other state s, the action (k, lau), and the transition destination state s ′. ', K, lau) can be defined.
Further, the reward r (s, s ′) determined by any one of the reward determination unit 30, the distance calculation unit 31, the hit rate calculation unit 32, the first ray determination unit 33, the second ray determination unit 34, and the setting state detection unit 35. , K, lau) are stored in the temporary storage unit 38. At this time, any one of the reward determination unit 30, the distance calculation unit 31, the hit rate calculation unit 32, the first ray determination unit 33, the second ray determination unit 34, and the setting state detection unit 35 other than the above-determined means, The reward r (s, s ′, k, lau) may be read from the temporary storage unit 38 and the reward may be corrected.

例えば、第２射線判定部は、遷移後の敵機の相対位置（Ｘ’，Ｙ’）が、自機の射線上にあると判定された場合には、その状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’の組み合わせについての報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）を、一時記憶部３８から読み出した報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）に０．５を掛けた値に設定し、自機の射線上にないと判定された場合には、その状態ｓと行動（ｋ，ｌａｕ）と遷移先の状態ｓ’の組み合わせについての報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）を、一時記憶部３８から読み出した報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）に１を掛けた値に設定することができる。 For example, when it is determined that the relative position (X ′, Y ′) of the enemy aircraft after the transition is on the ray of the own aircraft, the second ray determination unit determines the state s and action (k, lau). ) And a transition r (s, s ′, k, lau) for the combination of the transition destination state s ′, 0.5 is added to the reward r (s, s ′, k, lau) read from the temporary storage unit 38. When it is determined that it is not on the line of the aircraft, the reward r (s, s ′) for the combination of the state s, the action (k, lau), and the transition destination state s ′ is set. , K, lau) can be set to a value obtained by multiplying the reward r (s, s ′, k, lau) read from the temporary storage unit 38 by one.

＜ステップＳ６＞
動作計画部４０は、状態ｓ（Ｘ，Ｙ，Ｂ）と行動（ｋ，ｌａｕ）と遷移先の状態ｓ’（Ｘ’，Ｙ’，Ｂ’）の各組み合わせごとの状態遷移確率Ｐ（ｓ，ｓ’，ｋ，ｌａｕ）と、報酬ｒ（ｓ，ｓ’，ｋ，ｌａｕ）とから、ダイナミックプログラミング法により、敵機行動方策π（ｓ，ｓ’，ｋ，ｌａｕ）を求める。ダイナミックプログラミング法の処理は、周知技術であるため説明は省略する（例えば、上記参考文献１参照）。
計算された敵機行動方策π（ｓ，ｓ’，ｋ，ｌａｕ）は、敵機行動方策データ５０として格納される。敵機行動方策データ５０は、例えば、すべての敵機の状態ｓ（Ｘ，Ｙ，Ｂ）に対して、敵機の報酬ｒを最大化する行動（ｋ，ｌａｕ）が対応付けられたデータベースである。 <Step S6>
The motion planning unit 40 determines the state transition probability P (s) for each combination of the state s (X, Y, B), the action (k, lau), and the state s ′ (X ′, Y ′, B ′) of the transition destination. , S ′, k, lau) and the reward r (s, s ′, k, lau), an enemy aircraft action policy π (s, s ′, k, lau) is obtained by a dynamic programming method. Since the processing of the dynamic programming method is a well-known technique, a description thereof will be omitted (for example, see the above-mentioned reference 1).
The calculated enemy aircraft action policy π (s, s ′, k, lau) is stored as enemy aircraft action policy data 50. The enemy aircraft action policy data 50 is, for example, a database in which an action (k, lau) that maximizes the reward r of the enemy aircraft is associated with the state s (X, Y, B) of all enemy aircraft. is there.

＜ステップＳ７＞
行動決定部６０は、敵機の行動（ｋ，ｌａｕ）を選択し、その行動（ｋ，ｌａｕ）から敵機の位置を更新する。具体的には、敵機行動選択部６２が、状態獲得部６１が獲得した敵機の状態ｓ（Ｘ，Ｙ，Ｂ）をキーにして、敵機行動方策データ５０を検索することにより、敵機の行動（ｋ，ｌａｕ）を求める。選択された行動（ｋ，ｌａｕ）は、敵機位置計算部６３に出力される。敵機位置計算部６３は、記憶部１０から、敵機が行動（ｋ，ｌａｕ）を取ったときの移動速度（Ｖｅｘ（ｋ），Ｖｅｙ（ｋ））と行動単位時間Ｔを読み出し、（Ｔ×Ｖｅｘ（ｋ），Ｔ×Ｖｅｙ（ｋ））を計算することにより、更新後の敵機の位置を求める。更新後の敵機の位置は、表示部７０に出力される。
上記のステップＳ１〜Ｓ７の処理を一定時間間隔τおきに行うことにより、敵機の動作がより知的になり、これまで以上にゲーム性の増したシューティングゲームを実現することができる。 <Step S7>
The action determination unit 60 selects an action (k, lau) of the enemy aircraft, and updates the position of the enemy aircraft from the action (k, lau). Specifically, the enemy aircraft action selection unit 62 searches the enemy aircraft action policy data 50 using the enemy aircraft state s (X, Y, B) acquired by the state acquisition unit 61 as a key, thereby Find the action (k, lau) of the machine. The selected action (k, lau) is output to the enemy aircraft position calculation unit 63. The enemy aircraft position calculation unit 63 reads the movement speed (Vex (k), Vey (k)) and the action unit time T when the enemy aircraft takes action (k, lau) from the storage unit 10, and (T By calculating (Vex (k), TxVey (k)), the position of the updated enemy aircraft is obtained. The updated position of the enemy aircraft is output to the display unit 70.
By performing the processing of steps S1 to S7 at a constant time interval τ, the operation of the enemy aircraft becomes more intelligent, and a shooting game with a higher game performance than before can be realized.

［変形例等］
上記実施形態では、ステップＳ３において、自機が行動（ｗ）を選択する確率Ｐｖ（ｓ，ｗ）は、１を自機の移動速度の種類の数Ｗで割った値とした。しかし、自機が行動（ｗ）を選択する確率Ｐｖ（ｓ，ｗ）を、プレイヤーの操作履歴から計算しても良い。 [Modifications, etc.]
In the above embodiment, in step S3, the probability Pv (s, w) that the own device selects the action (w) is a value obtained by dividing 1 by the number W of types of the moving speed of the own device. However, the probability Pv (s, w) that the player selects the action (w) may be calculated from the operation history of the player.

状態遷移確率計算部２０の操作履歴獲得部２５は、例えば、相対位置（Ｘ，Ｙ）において各行動（ｗ）が選択された回数Ｄｗ（Ｘ，Ｙ）とすると、各相対位置（Ｘ，Ｙ）ごとにＤｗ（Ｘ，Ｙ）をカウントする。敵機の自機位置からの相対位置（Ｘ，Ｙ）ごとの操作履歴Ｄｗ（Ｘ，Ｙ）は、操作履歴データベース２６に格納される
確率計算部２７は、操作履歴データベース２６から、Ｄｗ（Ｘ，Ｙ）を読み出し、
Ｐｖ（ｓ，ｗ）＝Ｄｗ（Ｘ，Ｙ）／Σ_ｗ∈ＷＤｗ（Ｘ，Ｙ）
を計算することにより、各状態ｓにおいて自機が行動（ｗ）を選択する確率Ｐｖ（ｓ，ｗ）を求める。乗算部２３は、確率計算部２７が計算した上記Ｐｖ（ｓ，ｗ）を用いて、Ｐｖ（ｓ，ｗ）・Ｐｅ（ｗ，ｋ，ｌａｕ，ｓ，ｓ’）を計算する。 For example, the operation history acquisition unit 25 of the state transition probability calculation unit 20 assumes that each action (w) is selected Dw (X, Y) at the relative position (X, Y). ) Counts Dw (X, Y) every time. The operation history Dw (X, Y) for each relative position (X, Y) from the position of the enemy aircraft is stored in the operation history database 26. The probability calculation unit 27 receives the Dw (X , Y)
Pv (s, w) = Dw (X, Y) / _ΣwεW Dw (X, Y)
To calculate the probability Pv (s, w) that the own device selects the action (w) in each state s. The multiplication unit 23 calculates Pv (s, w) · Pe (w, k, lau, s, s ′) using the Pv (s, w) calculated by the probability calculation unit 27.

このように、ゲームプレイ中に操作履歴を記録して、その結果をマルコフ空間内での敵機の状態遷移確率の算出に反映し、ゲーム中にオンラインで動作計画法による計画計算を行うことで、敵機の行動方策πを、プレイヤーの自機操作の癖を反映したものに更新することができる。
また、上記実施形態におけるステップＳ１〜Ｓ７の処理を行う時間間隔であるτの時間長を調節することで、ゲームの難易度を調整することができる。一般に、τの時間長が長ければ、敵機の行動選択頻度が下がるため、敵機の動作は単純化し、また、自機攻撃回避動作の遅れも生じるので、ゲームの難易度は下がる。 In this way, by recording the operation history during game play, reflecting the result in calculating the state transition probability of the enemy aircraft in the Markov space, and performing the plan calculation by the motion planning method online during the game , The action plan π of the enemy aircraft can be updated to reflect the trap of the player's own operation.
Moreover, the difficulty level of a game can be adjusted by adjusting the time length of (tau) which is the time interval which performs the process of step S1-S7 in the said embodiment. In general, if the time length of τ is long, the action selection frequency of the enemy aircraft decreases, so the operation of the enemy aircraft is simplified, and the delay of the own-attack avoidance operation also occurs, so the difficulty of the game decreases.

また、上記シューティングゲーム処理装置１００の処理機能をコンピュータによって実現することができる。この場合、シューティングゲーム処理装置１００の処理機能の内容はプログラムによって記述される。そして、このプログラムを図６に示すようなコンピュータで実行することにより、上記シューティングゲーム処理装置１００の各処理機能がコンピュータ上で実現される。
この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 Further, the processing functions of the shooting game processing apparatus 100 can be realized by a computer. In this case, the contents of the processing functions of the shooting game processing apparatus 100 are described by a program. Then, by executing this program on a computer as shown in FIG. 6, each processing function of the shooting game processing apparatus 100 is realized on the computer.
The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。
このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.
A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is provided for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、シューティングゲーム処理装置等を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。
また、本発明であるシューティングゲーム処理方法、その装置、そのプログラム及びその記録媒体は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 Further, in this embodiment, the shooting game processing device or the like is configured by causing a predetermined program to be executed on the computer, but at least a part of these processing contents may be realized in hardware. .
Further, the shooting game processing method, the apparatus, the program, and the recording medium according to the present invention are not limited to the above-described embodiments, and can be appropriately changed without departing from the gist of the present invention.

シューティングゲーム処理装置１００の機能構成を例示する図。The figure which illustrates the function structure of the shooting game processing apparatus. シューティングゲーム処理装置１００の処理フローを例示する図。The figure which illustrates the processing flow of the shooting game processing apparatus. シューティングゲームの模式図。A schematic diagram of a shooting game. 状態遷移確率Ｐｅの計算の説明を補助するための図。The figure for assisting explanation of calculation of state transition probability Pe. 自機と敵機の距離ｒと自機撃墜確率Ｐの関係を例示した図。The figure which illustrated the relationship between the distance r of the own machine and enemy machine, and the own machine shooting probability P. 本発明によるシューティングゲーム処理装置を、コンピュータにより実行するときの機能構成を例示した図。The figure which illustrated functional composition when performing the shooting game processing device by the present invention by a computer.

Explanation of symbols

１自機
２敵機
３攻撃弾
４攻撃弾
５ＣＰＵ
６ＲＡＭ
７出力部
８補助記憶部
９入力部
９’ バス
１０記憶部
２０状態遷移確率計算部
２１変位量計算部
２２状態遷移確率計算部
２３乗算部
２４状態遷移確率計算部
２５操作履歴獲得部
２６操作履歴データベース
２７確率計算部
３０報酬決定部
３１距離計算部
３２命中率計算部
３３第１射線判定部
３４第２射線判定部
３５設定状態検出部
３８一時記憶部
３９入力部
４０動作計画部
５０敵機行動方策データ
６０行動決定部
６１状態獲得部
６２敵機行動選択部
６３敵機位置計算部
７０表示部
１００シューティングゲーム処理装置 1 Own aircraft 2 Enemy aircraft 3 Attack bullet 4 Attack bullet 5 CPU
6 RAM
7 output unit 8 auxiliary storage unit 9 input unit 9 ′ bus 10 storage unit 20 state transition probability calculation unit 21 displacement amount calculation unit 22 state transition probability calculation unit 23 multiplication unit 24 state transition probability calculation unit 25 operation history acquisition unit 26 operation history Database 27 Probability calculation unit 30 Reward determination unit 31 Distance calculation unit 32 Hit rate calculation unit 33 First ray determination unit 34 Second ray determination unit 35 Setting state detection unit 38 Temporary storage unit 39 Input unit 40 Operation planning unit 50 Enemy aircraft behavior Policy data 60 Action determination unit 61 State acquisition unit 62 Enemy aircraft action selection unit 63 Enemy aircraft position calculation unit 70 Display unit 100 Shooting game processing device

Claims

Storage means, displacement amount calculation means, first state transition probability calculation means, multiplication means, second state transition probability calculation means, reward determination means, action planning means, state acquisition means, enemy aircraft behavior A computer equipped with a selection means selects the action of the enemy aircraft using a Markov state transition model with the relative position of the enemy aircraft from its own position as the state variable and the movement speed of the enemy aircraft as the action variable. in the method for performing the processing of the shooting game,
The storage means stores the moving speed when the enemy aircraft takes each action, and the moving speed when the own machine takes each action,
Displacement that the displacement amount calculation means obtains the displacement amount of the relative position using the corresponding enemy aircraft read from the storage means and the moving speed of the own aircraft for each combination of each action of the enemy aircraft and each action of the own aircraft Quantity calculation process,
The first state transition probability calculating means calculates a grid having the same dimension as the relative position constituting the state in a state of the Markov state transition model for each combination of each action of the enemy aircraft and each action of the own aircraft. The first state transition probability obtained by translating the obtained displacement amount and determining the probability proportional to the area of the common part with the other grid as the state transition probability for each combination of each action of the enemy aircraft and each action of the own aircraft Calculation process,
Multiplication means, the multiplication process in which the state transition probability of each combination of each behavior of each behavior and the own apparatus of the enemy, multiplying the value obtained by dividing one by the number of types of actions of its own,
The second state transition probability calculating means calculates the state transition probability when the enemy aircraft takes each action by taking the sum of the values obtained in the multiplication process for all the actions of the own machine. Probability calculation process,
A reward determination process in which a reward determination means determines a reward for each combination of a state, an action, and a transition destination state;
An action planning process for obtaining enemy aircraft action policy data based on the action planning method in the Markov state transition model using the state transition probability when the enemy aircraft takes each action and the reward,
State acquisition means, the state acquisition process to acquire the current state of the enemy aircraft,
An enemy aircraft action selection means for selecting an enemy aircraft action based on the current status of the acquired enemy aircraft and the enemy aircraft action policy data;
A shooting game processing method characterized by comprising:

The shooting game processing method according to claim 1,
The reward determination process further includes a process of setting a high reward for the combination of the state, the action, and the state of the transition destination when the own aircraft is on the line of the enemy aircraft after the state transition,
A method for processing a shooting game.

In shooting game processing method according to claim 1 Symbol placement,
The reward determination process further includes a process of setting a low reward for a combination of the state, the action, and the state of the transition destination when the enemy aircraft after the state transition is on the line of the own aircraft,
A method for processing a shooting game.

In the shooting game processing method according to any one of claims 1 to 3,
The own machine action selection probability calculating means further includes an own machine action selection probability calculating process for obtaining a probability that the own machine selects each action from the operation history data of the player,
In the multiplication process, the state transition probability for each combination of each action of the enemy aircraft and each action of the own machine is multiplied by the probability that the own machine selects each action obtained in the calculation process of the own machine action selection probability. Process,
A method for processing a shooting game.

The shooting game processing method according to any one of claims 1 to 4,
The number of attacking bullets that can be fired by enemy aircraft is further included as a state variable constituting the Markov state transition model, and whether or not the enemy aircraft is firing an attacking bullet is further included as an action variable.
A method for processing a shooting game.

In the shooting game processing method according to any one of claims 1 to 5,
The reward determination process further includes a process of setting a high reward for a combination of the state, the action, and the transition destination state when the distance between the enemy aircraft after the state transition and the own aircraft after the state transition is close. ,
A method for processing a shooting game.

The shooting game processing method according to claim 5,
In the above reward determination process, when an enemy aircraft takes an action to fire an attack bullet, the closer the distance between the enemy aircraft before the state transition and the own aircraft before the state transition is, the closer the state, the action, and the destination Including the process of setting a higher reward for the combination of states,
A method for processing a shooting game.

In the shooting game processing method according to claim 5 or 7,
In the above reward determination process, when the enemy aircraft takes an action of firing an attack bullet at a different relative position determined in advance for each number of attack bullets that can be fired by the enemy aircraft, its state, action, and transition destination Including a process of setting a high reward value for a combination of states
A method for processing a shooting game.

In a shooting game processing device that selects the action of an enemy aircraft using a Markov state transition model in which the relative position from the position of the enemy aircraft is a state variable, and the movement speed of the enemy aircraft is a behavior variable,
Storage means for storing the movement speed when the enemy aircraft took each action and the movement speed when the own machine took each action;
For each combination of each action of the enemy aircraft and each action of the own aircraft, a displacement amount calculating means for obtaining a displacement amount of the relative position using the corresponding enemy aircraft read from the storage means and the moving speed of the own aircraft ;
In a state of the Markov state transition model, the lattice having the same dimension as the relative position constituting the state is translated by a displacement amount obtained for each combination of each action of the enemy aircraft and each action of the own aircraft, First state transition probability calculating means for obtaining a probability proportional to the area of the common part with the other lattice as a state transition probability for each combination of each action of the enemy aircraft and each action of the own aircraft;
Multiplying means for multiplying the state transition probability for each combination of each action of the enemy aircraft and each action of the own aircraft by a value obtained by dividing 1 by the number of types of the action of the own aircraft;
A second state transition probability calculating means for obtaining a state transition probability when the enemy aircraft takes each action by taking the sum of the values obtained by the multiplying means for all actions of the own aircraft;
Reward determining means for determining a reward for each combination of state, action, and transition destination state;
Using the state transition probability when the enemy aircraft took each action and the reward, an action planning means for obtaining enemy action policy data based on the action planning method in the Markov state transition model,
State acquisition means to acquire the current state of the enemy aircraft,
An enemy aircraft action selection means for selecting an action of the enemy aircraft based on the current state of the acquired enemy aircraft and the enemy aircraft action policy data;
A shooting game processing device comprising:

In a shooting game processing program that selects the action of an enemy aircraft using a Markov state transition model in which the relative position of the enemy aircraft from its own position is a state variable and the movement speed of the enemy aircraft is a behavior variable,
The storage means stores the moving speed when the enemy aircraft takes each action, and the moving speed when the own machine takes each action,
For each combination of each action of the enemy aircraft and each action of the own aircraft, a displacement amount calculation process for obtaining a displacement amount of the relative position using the corresponding enemy aircraft read from the storage means and the moving speed of the own aircraft;
In a state of the Markov state transition model, the lattice having the same dimension as the relative position constituting the state is translated by a displacement amount obtained for each combination of each action of the enemy aircraft and each action of the own aircraft, A first state transition probability calculation process for obtaining a probability proportional to the area of the common part with the other lattice as a state transition probability for each combination of each action of the enemy aircraft and each action of the own aircraft;
A multiplication process of multiplying the state transition probability for each combination of each action of the enemy aircraft and each action of the own aircraft by a value obtained by dividing 1 by the number of types of the action of the own aircraft;
A second state transition probability calculation process for obtaining a state transition probability when the enemy aircraft takes each action by taking the sum of the values obtained in the above multiplication process for all actions of the own aircraft,
A reward determination process for determining a reward for each combination of state, action, and destination state;
Using the state transition probability when the enemy aircraft takes each action and the reward, the action planning process for obtaining enemy action policy data based on the action planning method in the Markov state transition model,
A state acquisition process to acquire the current state of the enemy aircraft,
An enemy aircraft action selection process for selecting the action of the enemy aircraft based on the current state of the acquired enemy aircraft and the enemy aircraft action policy data;
Shooting game processing program for causing a computer to execute the.

A computer-readable recording medium on which the shooting game processing program according to claim 10 is recorded.