JP2022150063A

JP2022150063A - Terminal device, program to be executed by computer, and computer-readable recording medium having program recorded therein

Info

Publication number: JP2022150063A
Application number: JP2021052479A
Authority: JP
Inventors: 真衣太田; Mai Ota; 眞太郎丸; Makoto Taroumaru; 崇詞今中; Takashi Imanaka; 一人矢野; Kazuto Yano
Original assignee: ATR Advanced Telecommunications Research Institute International; Fukuoka University
Current assignee: ATR Advanced Telecommunications Research Institute International; Fukuoka University
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2022-10-07
Anticipated expiration: 2041-03-25
Also published as: JP7370018B2

Abstract

To provide a terminal device that performs wireless communication in coexistence with a terminal device for performing wireless communication using a different wireless communication system.SOLUTION: A learning unit 4 repeatedly performs learning based on the state of a transmission channel during an observation period which is a period during which a packet communication result, an idle period of wireless communication, and the presence or absence of wireless communication by other terminal devices is observed, selects a channel bringing a maximum average reward as a transmission channel with a given probability, and selects a packet length bringing the maximum average reward with a given probability according to the state of the transmission channel during the observation period. Control means 3 generates a packet containing transmission data, and outputs the generated packet to transmission means 5 when the transmission channel received from the learning unit 4 is empty. The transmission means 5 transmits the packet received from the control means 3 with a packet length received from the learning unit 4.SELECTED DRAWING: Figure 2

Description

新規性喪失の例外適用申請有り There is an application for exception to loss of novelty

この発明は、端末装置、コンピュータに実行させるためのプログラムおよびプログラムを記録したコンピュータ読み取り可能な記録媒体に関するものである。 The present invention relates to a terminal device, a program to be executed by a computer, and a computer-readable recording medium recording the program.

無線ＬＡＮ（Local Area Network）などに代表されるＣＳＭＡ／ＣＡ（Carrier Sense Multiple Access/ Collision Avoidance）方式では，同時送信などによるパケットの衝突・損失が起きた場合、バックオフ時間（他局からの送信電波が止まったことを検知した後、直ちに送信せず、自局が送信するまでの意図的な待ち時間）を長くすることにより、パケットの衝突確率を低減させる（特許文献１）。 In the CSMA/CA (Carrier Sense Multiple Access/Collision Avoidance) method represented by wireless LAN (Local Area Network), etc., when packet collision or loss occurs due to simultaneous transmission, the backoff time (transmission from other stations Packet collision probability is reduced by lengthening the intentional waiting time until transmission by the own station instead of immediately transmitting after detecting that radio waves have stopped (Patent Document 1).

特開２００６－０１３８９４号公報JP 2006-013894 A

しかし、異なる複数の無線通信システムによる無線通信が共存する場合、各無線通信システムを用いて無線通信を行う端末装置Ａが他の無線通信システムを用いて無線通信を行う端末装置Ｂによる無線通信を考慮せずに自由に無線通信を行うと、パケット衝突が生じる。その結果、端末装置Ａは、端末装置Ｂと共存して無線通信を行うことが困難であるという問題がある。 However, when wireless communications by a plurality of different wireless communication systems coexist, terminal device A that performs wireless communication using each wireless communication system communicates wirelessly with terminal device B that performs wireless communication using another wireless communication system. Unbridled wireless communication results in packet collisions. As a result, there is a problem that it is difficult for the terminal device A to coexist with the terminal device B and perform wireless communication.

そこで、この発明の実施の形態によれば、異なる無線通信システムを用いて無線通信を行う端末装置と共存して無線通信を行う端末装置を提供する。 Therefore, according to the embodiment of the present invention, a terminal device that performs wireless communication coexisting with a terminal device that performs wireless communication using a different wireless communication system is provided.

また、この発明の実施の形態によれば、異なる無線通信システムを用いて無線通信を行う端末装置と共存して無線通信をコンピュータに実行させるためのプログラムを提供する。 Further, according to the embodiment of the present invention, there is provided a program for causing a computer to perform wireless communication coexisting with a terminal device that performs wireless communication using a different wireless communication system.

更に、この発明の実施の形態によれば、異なる無線通信システムを用いて無線通信を行う端末装置と共存して無線通信をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体を提供する。 Furthermore, according to the embodiment of the present invention, there is provided a computer-readable recording medium recording a program for causing a computer to perform wireless communication coexisting with a terminal device that performs wireless communication using a different wireless communication system. do.

（構成１）
この発明の実施の形態によれば、端末装置は、通信手段と、第１の検出手段と、第２の検出手段と、学習器とを備える。通信手段は、第１の動作期間において、パケットを送信するチャネルである送信用チャネルを用いてパケットを送信する。第１の検出手段は、通信手段によってパケットが送信される毎に、第１の動作期間において、パケットが送信されたときの通信結果を検出するとともにパケットの送信後の無線通信の空き期間を検出する。第２の検出手段は、送信用チャネルを受ける毎に、第１の動作期間において、他の端末装置による無線通信の有無を観測する期間である観測期間における送信用チャネルの状態を検出する。学習器は、第１の動作期間において検出された通信結果、空き期間、および観測期間における送信用チャネルの状態と、パケットの送信に用いるチャネルの候補である候補チャネルとを受け付け、通信結果および空き期間に基づいて、第１の動作期間においてパケットが送信用チャネルで送信されたときに得られる報酬である即時報酬を算出する第１の処理と、観測期間における送信用チャネルの状態に対応する１つのパケット長を選択した回数によって１つの送信用チャネルにおける即時報酬の累積値を平均した報酬であり、かつ、第１の動作期間の後の動作期間である第２の動作期間における報酬である平均報酬を第１の処理において算出された即時報酬を用いて算出する第２の処理と、候補チャネルと観測期間における送信用チャネルの状態とパケットのパケット長と平均報酬とを対応付けた対応表を作成または更新し、その作成または更新した対応表に基づいて最大の平均報酬が得られるときのチャネルを所定の確率で送信用チャネルとして選択するとともに観測期間における送信用チャネルの状態に応じて最大の平均報酬が得られるときのパケット長を所定の確率で選択し、その選択した送信用チャネルおよびパケット長を出力する第３の処理とを観測期間における送信用チャネルの状態、通信結果および空き期間を受け付ける毎に実行する。そして、通信手段は、更に、第３の処理において選択された送信用チャネルおよびパケット長を学習器から受ける毎に、第２の動作期間において、その受けた送信用チャネルが空いているとき、学習器から受けたパケット長を有するパケットを送信する。 (Configuration 1)
According to an embodiment of the present invention, a terminal device comprises communication means, first detection means, second detection means, and a learner. The communication means transmits packets using a transmission channel, which is a channel for transmitting packets, during the first operation period. The first detection means detects a communication result when the packet is transmitted and detects an idle period of wireless communication after the packet is transmitted in the first operation period every time a packet is transmitted by the communication means. do. The second detection means detects the state of the transmission channel in an observation period, which is a period for observing the presence or absence of radio communication by other terminal devices, in the first operation period each time a transmission channel is received. The learning device receives the communication result detected in the first operation period, the idle period, the state of the transmission channel in the observation period, and the candidate channel that is a candidate for the channel used for packet transmission. A first process of calculating an immediate reward, which is a reward obtained when a packet is transmitted in the transmission channel in the first operation period, based on the period, and calculating 1 corresponding to the state of the transmission channel in the observation period. A reward obtained by averaging the cumulative value of immediate rewards in one transmission channel by the number of times one packet length is selected, and a reward in a second operation period that is an operation period after the first operation period. A second process of calculating a reward using the immediate reward calculated in the first process, and a correspondence table that associates the candidate channel, the state of the transmission channel in the observation period, the packet length of the packet, and the average reward. Created or updated, and based on the created or updated correspondence table, the channel that yields the maximum average reward is selected as the transmission channel with a predetermined probability, and the maximum A third process of selecting a packet length when an average reward is obtained with a predetermined probability and outputting the selected transmission channel and packet length; Executes each time it is accepted. Each time the communication means receives the transmission channel and packet length selected in the third processing from the learning device, the communication means learns during the second operation period when the received transmission channel is available. Send a packet with the packet length received from the device.

（構成２）
構成１において、学習器は、第１の処理において、通信結果がパケットの送信の失敗であるとき、即時報酬を零と算出し、通信結果がパケットの送信の成功であるとき、空き期間に所定の期間を加算した加算結果の逆数を即時報酬として算出する。 (Configuration 2)
In configuration 1, in the first process, the learning device calculates the immediate reward as zero when the communication result is a packet transmission failure, and when the communication result is a packet transmission success, a predetermined reward in the idle period The reciprocal of the addition result obtained by adding the period of is calculated as the immediate reward.

（構成３）
構成１または構成２において、学習器は、第２の処理において、第１の動作期間における即時報酬と第１の動作期間における平均報酬と観測期間における送信用チャネルの状態に対応する１つのパケット長を選択した回数とに基づいて第２の動作期間における平均報酬を算出して平均報酬を更新する。 (Composition 3)
In configuration 1 or configuration 2, in the second processing, the learner obtains an immediate reward in the first operation period, an average reward in the first operation period, and one packet length corresponding to the state of the transmission channel in the observation period. is selected, and the average reward for the second operation period is calculated based on the selected number of times to update the average reward.

（構成４）
構成３において、学習器は、第２の処理において、第１の動作期間における即時報酬をＲ_ｔとし、第１の動作期間における平均報酬をＶ_ｔとし、第２の動作期間における平均報酬をＶ_ｔ＋１とし、観測期間における送信用チャネルの状態に対応する１つのパケット長を選択した回数をｎ（ｎは、１以上の整数である。）としたとき、以下の式（１）によって平均報酬Ｖ_ｔ＋１を算出することによって平均報酬を更新する。 (Composition 4)
In configuration 3, in the second process, the learner sets the immediate reward in the first action period to _Rt , the average reward in the first action period to Vt, and the average _reward in the second action period to V Let _t+1 and let n be the number of times one packet length is selected corresponding to the state of the transmission channel during the observation period (n is an integer equal to or greater than 1). Update the average reward by calculating _t+1 .

Ｖ_ｔ＋１＝Ｖ_ｔ＋（Ｒ_ｔ－Ｖ_ｔ）／ｎ・・・（１）
（構成５）
構成３または構成４において、学習器は、第３の処理において、確率（１－ε）（εは、１～０の範囲の実数である。）で第２の動作期間における平均報酬が最大であるチャネルを候補チャネルから送信用チャネルとして選択し、確率εで任意のチャネルを候補チャネルから送信用チャネルとして選択する。 V _t+1 =V _t +(R _t −V _t )/n (1)
(Composition 5)
In configuration 3 or configuration 4, in the third process, the learner has a probability of (1−ε) (ε is a real number in the range of 1 to 0) and the average reward in the second operation period is the maximum A certain channel is selected as a transmission channel from the candidate channels, and an arbitrary channel is selected from the candidate channels as a transmission channel with probability ε.

（構成６）
構成３から構成６のいずれかにおいて、学習器は、第３の処理において、観測期間における送信用チャネルの状態に対して第２の動作期間における平均報酬が最大であるパケット長を選択する。 (Composition 6)
In any one of configurations 3 to 6, in the third processing, the learner selects the packet length that maximizes the average reward in the second operation period for the state of the transmission channel in the observation period.

（構成７）
構成１から構成６のいずれかにおいて、端末装置は、制御手段を更に備える。制御手段は、パケットの送信が成功した確率である送信成功率がしきい値以下であるとき、候補チャネルの帯域と異なる帯域のチャネルを新たな候補チャネルとして選択し、その選択した新たな候補チャネルを用いるように学習器を制御する。学習器は、新たな候補チャネルを用いて第１の処理、第２の処理および第３の処理を観測期間における送信用チャネルの状態、通信結果および空き期間を受け付ける毎に実行する。 (Composition 7)
In any one of configurations 1 to 6, the terminal device further includes control means. The control means selects a channel of a band different from the band of the candidate channel as a new candidate channel when the transmission success rate, which is the probability of successful packet transmission, is equal to or less than a threshold, and selects the selected new candidate channel. Control the learner to use The learning device executes the first process, the second process, and the third process using a new candidate channel each time it receives the state of the transmission channel, the communication result, and the idle period during the observation period.

（構成８）
また、この発明の実施の形態によれば、プログラムは、
通信手段が、第１の動作期間において、パケットを送信するチャネルである送信用チャネルを用いてパケットを送信する第１のステップと、
第１の検出手段が、第１のステップにおいてパケットが送信される毎に、第１の動作期間において、パケットが送信されたときの通信結果を検出するとともにパケットの送信後の無線通信の空き期間を検出する第２のステップと、
第２の検出手段が、送信用チャネルを受ける毎に、第１の動作期間において、他の端末装置による無線通信の有無を観測する期間である観測期間における送信用チャネルの状態を検出する第３のステップと、
学習器が、第１の動作期間において検出された通信結果、空き期間、および観測期間における前記送信用チャネルの状態と、パケットの送信に用いるチャネルの候補である候補チャネルとを受け付け、通信結果および空き期間に基づいて、第１の動作期間においてパケットが送信用チャネルで送信されたときに得られる報酬である即時報酬を算出する第１の処理と、観測期間における送信用チャネルの状態に対応する１つのパケット長を選択した回数によって１つの送信用チャネルにおける即時報酬の累積値を平均した報酬であり、かつ、第１の動作期間の後の動作期間である第２の動作期間における報酬である平均報酬を第１の処理において算出された即時報酬を用いて算出する第２の処理と、候補チャネルと観測期間における送信用チャネルの状態とパケットのパケット長と平均報酬とを対応付けた対応表を作成または更新し、その作成または更新した対応表に基づいて最大の平均報酬が得られるときのチャネルを所定の確率で送信用チャネルとして選択するとともに観測期間における送信用チャネルの状態に応じて最大の平均報酬が得られるときのパケット長を所定の確率で選択し、その選択した送信用チャネルおよびパケット長を出力する第３の処理とを観測期間における送信用チャネルの状態、通信結果および空き期間を受け付ける毎に実行する第４のステップとをコンピュータに実行させ、
通信手段は、第１のステップにおいて、更に、第３の処理において選択された送信用チャネルおよびパケット長を学習器から受ける毎に、第２の動作期間において、その受けた送信用チャネルが空いているとき、学習器から受けたパケット長を有するパケットを送信する、コンピュータに実行させるためのプログラムである。 (Composition 8)
Also, according to the embodiment of the present invention, the program
a first step in which the communication means transmits packets using a transmission channel, which is a channel for transmitting packets, during a first operation period;
The first detection means detects, in the first operation period, the communication result at the time the packet is transmitted each time the packet is transmitted in the first step, and the idle period of wireless communication after the packet is transmitted. a second step of detecting
A third detection means for detecting the state of the transmission channel in an observation period, which is a period for observing the presence or absence of wireless communication by other terminal devices, in the first operation period each time the second detection means receives the transmission channel. a step of
A learning device receives the communication result detected in the first operation period, the idle period, the state of the transmission channel in the observation period, and candidate channels that are candidates for channels used for packet transmission, and receives the communication result and A first process of calculating an immediate reward, which is a reward obtained when a packet is transmitted in the transmission channel during the first operation period, based on the vacant period, and corresponding to the state of the transmission channel during the observation period. It is a reward obtained by averaging the cumulative value of immediate rewards in one transmission channel according to the number of times one packet length is selected, and is a reward in a second operation period that is an operation period after the first operation period. A second process for calculating the average reward using the immediate reward calculated in the first process, and a correspondence table that associates the candidate channel, the state of the transmission channel in the observation period, the packet length of the packet, and the average reward. is created or updated, and based on the created or updated correspondence table, the channel that yields the maximum average reward is selected as the transmission channel with a predetermined probability, and the maximum and a third process of selecting the packet length when the average reward of is obtained with a predetermined probability, and outputting the selected transmission channel and packet length, and the state of the transmission channel, the communication result, and the idle period during the observation period. cause the computer to execute a fourth step that is executed each time the
In the first step, the communication means further receives the transmission channel and the packet length selected in the third process from the learner, during the second operation period, when the received transmission channel becomes available. A program for causing a computer to transmit a packet having a packet length received from a learner when a packet is received from the learner.

（構成９）
構成８において、学習器は、第４のステップの第１の処理において、通信結果がパケットの送信の失敗であるとき、即時報酬を零と算出し、通信結果がパケットの送信の成功であるとき、空き期間に所定の期間を加算した加算結果の逆数を即時報酬として算出する。 (Composition 9)
In configuration 8, in the first process of the fourth step, the learning device calculates the immediate reward as zero when the communication result is a packet transmission failure, and when the communication result is a packet transmission success , the reciprocal of the result of adding a predetermined period to the vacant period is calculated as an immediate reward.

（構成１０）
構成８または構成９において、学習器は、第４のステップの第２の処理において、第１の動作期間における即時報酬と第１の動作期間における平均報酬と観測期間における送信用チャネルの状態に対応する１つのパケット長を選択した回数とに基づいて第２の動作期間における平均報酬を算出して平均報酬を更新する。 (Configuration 10)
In Configuration 8 or Configuration 9, in the second processing of the fourth step, the learning device corresponds to the immediate reward in the first operation period, the average reward in the first operation period, and the state of the transmission channel in the observation period. calculating an average reward in the second operation period based on the number of times one packet length is selected and updating the average reward.

（構成１１）
構成１０において、学習器は、第４のステップの第２の処理において、第１の動作期間における即時報酬をＲ_ｔとし、第１の動作期間における平均報酬をＶ_ｔとし、第２の動作期間における平均報酬をＶ_ｔ＋１とし、観測期間における送信用チャネルの状態に対応する１つのパケット長を選択した回数をｎ（ｎは、１以上の整数である。）としたとき、以下の式（１）によって平均報酬Ｖ_ｔ＋１を算出することによって平均報酬を更新する。 (Composition 11)
In configuration 10, in the second process of the fourth step, the learner sets the immediate reward in the first action period to R _t , the average reward in the first action period to V _t , and the second action period , and the number of times one packet length is selected corresponding to the state of the transmission channel during the observation period is n (n is an _integer equal to or greater than 1.), the following equation (1 ) to update the average reward by calculating the average reward V _t+1 .

Ｖ_ｔ＋１＝Ｖ_ｔ＋（Ｒ_ｔ－Ｖ_ｔ）／ｎ・・・（１）
（構成１２）
構成１０または構成１１において、学習器は、第４のステップの第３の処理において、確率（１－ε）（εは、１～０の範囲の実数である。）で第２の動作期間における平均報酬が最大であるチャネルを候補チャネルから送信用チャネルとして選択し、確率εで任意のチャネルを候補チャネルから送信用チャネルとして選択する。 V _t+1 =V _t +(R _t −V _t )/n (1)
(Composition 12)
In configuration 10 or configuration 11, in the third process of the fourth step, the learner performs A channel with the maximum average reward is selected as a transmission channel from the candidate channels, and an arbitrary channel is selected from the candidate channels as a transmission channel with probability ε.

（構成１３）
構成１０から構成１２のいずれかにおいて、学習器は、第４のステップの第３の処理において、観測期間における送信用チャネルの状態に対して第２の動作期間における平均報酬が最大であるパケット長を選択する。 (Composition 13)
In any one of configuration 10 to configuration 12, in the third processing of the fourth step, the learner determines the packet length with the maximum average reward in the second operation period with respect to the state of the transmission channel in the observation period. to select.

（構成１４）
構成８から構成１３のいずれかにおいて、制御手段が、パケットの送信が成功した確率である送信成功率がしきい値以下であるとき、候補チャネルの帯域と異なる帯域のチャネルを新たな候補チャネルとして選択し、その選択した新たな候補チャネルを用いるように学習器を制御する第５のステップを更にコンピュータに実行させ、
学習器は、新たな候補チャネルを用いて第１の処理、第２の処理および第３の処理を観測期間における送信用チャネルの状態、通信結果および空き期間を受け付ける毎に実行する。 (Composition 14)
In any one of configuration 8 to configuration 13, when the transmission success rate, which is the probability of successful packet transmission, is equal to or less than a threshold, the control means selects a channel in a band different from the band of the candidate channel as the new candidate channel. further causing the computer to perform a fifth step of selecting and controlling the learner to use the selected new candidate channel;
The learning device executes the first process, the second process, and the third process using a new candidate channel each time it receives the state of the transmission channel, the communication result, and the idle period during the observation period.

（構成１５）
更に、この発明の実施の形態によれば、記録媒体は、構成８から構成１４のいずれかに記載されたプログラムを記録したコンピュータ読み取り可能な記録媒体である。 (Composition 15)
Further, according to the embodiment of the present invention, the recording medium is a computer-readable recording medium recording the program described in any one of Structures 8 to 14.

異なる無線通信システムを用いて無線通信を行う端末装置と共存して無線通信を行うことができる。 Wireless communication can be performed while coexisting with a terminal device that performs wireless communication using a different wireless communication system.

この発明の実施の形態における通信システムの概略図である。1 is a schematic diagram of a communication system in an embodiment of the invention; FIG. 図１に示す端末装置の概略図である。2 is a schematic diagram of a terminal device shown in FIG. 1; FIG. 受信電力スペクトルの概念図である。FIG. 4 is a conceptual diagram of a received power spectrum; 観測期間および空き期間を説明するための図である。FIG. 4 is a diagram for explaining observation periods and vacant periods; 学習器における対応表の概略図である。FIG. 4 is a schematic diagram of a correspondence table in a learning device; 図２に示す学習器の動作を説明するための図である。3 is a diagram for explaining the operation of the learning device shown in FIG. 2; FIG. 図２に示す端末装置の動作を説明するためのタイミングチャートである。FIG. 3 is a timing chart for explaining the operation of the terminal device shown in FIG. 2; FIG. 図２に示す端末装置の各動作期間における動作を説明するための図である。3 is a diagram for explaining the operation of the terminal device shown in FIG. 2 during each operation period; FIG. 対応表ＴＢＬ１の変遷を示す第１の概略図である。FIG. 4 is a first schematic diagram showing transition of a correspondence table TBL1; 対応表ＴＢＬ１の変遷を示す第２の概略図である。FIG. 11 is a second schematic diagram showing the transition of the correspondence table TBL1; 対応表ＴＢＬ１の変遷を示す第３の概略図である。FIG. 11 is a third schematic diagram showing changes in the correspondence table TBL1; 対応表ＴＢＬ１の変遷を示す第４の概略図である。FIG. 11 is a fourth schematic diagram showing changes in the correspondence table TBL1; 対応表ＴＢＬ１の変遷を示す第５の概略図である。FIG. 11 is a fifth schematic diagram showing changes in the correspondence table TBL1; 図２に示す端末装置の動作を説明するためのフローチャートである。3 is a flowchart for explaining the operation of the terminal device shown in FIG. 2; 図２に示す学習器の動作を説明するための第１のフローチャートである。FIG. 3 is a first flow chart for explaining the operation of the learning device shown in FIG. 2; FIG. 図２に示す学習器の動作を説明するための第２のフローチャートである。3 is a second flow chart for explaining the operation of the learning device shown in FIG. 2; パケット長ｍの異なる決定方法を説明するための図である。FIG. 4 is a diagram for explaining different methods of determining packet length m;

本発明の実施の形態について図面を参照しながら詳細に説明する。なお、図中同一または相当部分には同一符号を付してその説明は繰返さない。 An embodiment of the present invention will be described in detail with reference to the drawings. The same or corresponding parts in the drawings are denoted by the same reference numerals, and the description thereof will not be repeated.

図１は、この発明の実施の形態における通信システムの概略図である。図１を参照して、通信システム１００は、基地局ＢＳ１と、端末装置ＴＭ１とを備える。基地局ＢＳ１および端末装置ＴＭ１は、無線通信空間に配置される。 FIG. 1 is a schematic diagram of a communication system according to an embodiment of the invention. Referring to FIG. 1, communication system 100 includes base station BS1 and terminal device TM1. A base station BS1 and a terminal device TM1 are arranged in a wireless communication space.

基地局ＢＳ１は、通信範囲ＲＥＧ１を有する。端末装置ＴＭ１は、通信範囲ＲＥＧ１内に配置される。 Base station BS1 has a coverage area REG1. The terminal device TM1 is located within the communication range REG1.

基地局ＢＳ１は、無線通信システムＲＦ１を用いてパケットを端末装置ＴＭ１へ送信するとともに端末装置ＴＭ１からパケットを受信する。 The base station BS1 transmits packets to the terminal device TM1 and receives packets from the terminal device TM1 using the radio communication system RF1.

基地局ＢＳ２は、通信範囲ＲＥＧ２を有する。そして、基地局ＢＳ２は、通信範囲ＲＥＧ２が基地局ＢＳ１の通信範囲ＲＥＧ１と一部が重複するように配置される。基地局ＢＳ２は、無線通信システムＲＦ１と異なる無線通信システムＲＦ２を用いてパケットを端末装置ＴＭ２へ送信するとともに端末装置ＴＭ２からパケットを受信する。 Base station BS2 has a coverage area REG2. The base station BS2 is arranged such that the communication range REG2 partially overlaps the communication range REG1 of the base station BS1. The base station BS2 transmits packets to the terminal device TM2 and receives packets from the terminal device TM2 using a radio communication system RF2 different from the radio communication system RF1.

端末装置ＴＭ１は、後述する方法によって、パケットを送信するための送信用チャネルを選択し、その選択した送信用チャネルで端末装置ＴＭ２による無線通信と共存するようにパケットを基地局ＢＳ１へ送信する。その後、端末装置ＴＭ１は、パケットを受信したことを示すＡＣＫ（Acknowledgement）パケットを送信用チャネルで基地局ＢＳ１から受信したとき、パケットの送信が成功したことを検知し、ＡＣＫパケットを基地局ＢＳ１から受信しないとき、パケットの送信が失敗したことを検知する。 The terminal device TM1 selects a transmission channel for transmitting the packet by a method described later, and transmits the packet to the base station BS1 so as to coexist with the radio communication by the terminal device TM2 on the selected transmission channel. After that, when the terminal device TM1 receives an ACK (acknowledgement) packet indicating that the packet has been received from the base station BS1 through the transmission channel, the terminal device TM1 detects that the packet has been successfully transmitted, and transmits an ACK packet from the base station BS1. When not received, it detects that the transmission of the packet has failed.

なお、図１においては、基地局ＢＳ１の通信範囲ＲＥＧ１には、１個の端末装置ＴＭ１が図示されているが、実際には、基地局ＢＳ１の通信範囲ＲＥＧ１には、複数の端末装置ＴＭ１が存在する。 In FIG. 1, one terminal device TM1 is shown within the communication range REG1 of the base station BS1, but in reality there are a plurality of terminal devices TM1 within the communication range REG1 of the base station BS1. exist.

以下においては、端末装置ＴＭ１を「端末装置１０」と表記する。 In the following, the terminal device TM1 is referred to as "terminal device 10".

図２は、図１に示す端末装置の概略図である。図２を参照して、端末装置１０は、アンテナ１と、受信手段２と、制御手段３と、学習器４と、送信手段５と、アプリケーション６とを備える。 FIG. 2 is a schematic diagram of the terminal device shown in FIG. Referring to FIG. 2 , terminal device 10 includes antenna 1 , receiving means 2 , control means 3 , learning device 4 , transmitting means 5 and application 6 .

受信手段２は、キャリアセンスを行うための信号Ｓ＿ｃａｒｒｉｅｒ＿Ｌと選択チャネルＣＨ＿Ｓｅｌｅｃｔとを制御手段３から受けると、他の端末装置による無線通信の有無を観測する期間である観測期間Ｌにおいて、選択チャネルＣＨ＿Ｓｅｌｅｃｔでアンテナ１を介してキャリアセンスを行い、受信電力の時間依存性を示す受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを取得し、その取得した受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを制御手段３へ出力する。 When the receiving means 2 receives the signal S_carrier_L for performing carrier sense and the selection channel CH_Select from the control means 3, the reception means 2 receives the selection channel CH_Select during the observation period L, which is a period for observing the presence or absence of radio communication by other terminal devices. Carrier sensing is performed via the antenna 1 to obtain a received power spectrum PW_carrier_L that indicates the time dependence of received power, and the obtained received power spectrum PW_carrier_L is output to the control means 3 .

また、受信手段２は、アンテナ１を介して基地局ＢＳ１から選択チャネルＣＨ＿ＳｅｌｅｃｔでＡＣＫパケットを受信すると、その受信したＡＣＫパケットを制御手段３へ出力する。その後、受信手段２は、選択チャネルＣＨ＿Ｓｅｌｅｃｔでアンテナ１を介してキャリアセンスを行い、選択チャネルＣＨ＿Ｓｅｌｅｃｔにおける受信電力スペクトルＰＷ＿ｃｈｎを取得する。そして、受信手段２は、受信電力スペクトルＰＷ＿ｃｈｎを制御手段３へ出力する。 Further, when receiving an ACK packet on the selected channel CH_Select from the base station BS1 via the antenna 1, the receiving means 2 outputs the received ACK packet to the control means 3. FIG. After that, the receiving means 2 performs carrier sensing via the antenna 1 on the selected channel CH_Select and acquires the received power spectrum PW_chn on the selected channel CH_Select. The receiving means 2 then outputs the received power spectrum PW_chn to the control means 3 .

制御手段３は、無線通信に用いるチャネルの候補である候補チャネルＣＨ＿ｃｄｔ＿１，チャネルＣＨ＿ｃｄｔ＿２を予め保持する。候補チャネルＣＨ＿ｃｄｔ＿１は、例えば、２．４ＧＨｚ帯の１ｃｈ、６ｃｈ、１１ｃｈからなり、候補チャネルＣＨ＿ｃｄｔ＿２は、例えば、５ＧＨｚ帯の１２８ｃｈ、１３２ｃｈおよび１３６ｃｈからなる。そして、制御手段３は、候補チャネルＣＨ＿ｃｄｔ＿１、または候補チャネルＣＨ＿ｃｄｔ＿２を学習器４へ出力する。 The control unit 3 holds in advance candidate channels CH_cdt_1 and CH_cdt_2, which are candidates for channels used for wireless communication. Candidate channel CH_cdt_1 consists of 1ch, 6ch, and 11ch of 2.4 GHz band, for example, and candidate channel CH_cdt_2 consists of 128ch, 132ch, and 136ch of 5 GHz band, for example. Then, control means 3 outputs candidate channel CH_cdt_1 or candidate channel CH_cdt_2 to learning device 4 .

また、制御手段３は、学習器４から選択チャネルＣＨ＿Ｓｅｌｅｃｔを受けると、信号Ｓ＿ｃａｒｒｉｅｒ＿Ｌを生成し、選択チャネルＣＨ＿Ｓｅｌｅｃｔおよび信号Ｓ＿ｃａｒｒｉｅｒ＿Ｌを受信手段２へ出力する。その後、制御手段３は、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを受信手段２から受け、その受けた受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌに基づいて、後述する方法によって、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓを検出する。そして、制御手段３は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓを学習器４へ出力する。 Also, upon receiving the selected channel CH_Select from the learning device 4 , the control means 3 generates a signal S_carrier_L and outputs the selected channel CH_Select and the signal S_carrier_L to the receiving means 2 . After that, the control means 3 receives the received power spectrum PW_carrier_L from the receiving means 2, and detects the state S of the selected channel CH_Select in the observation period L by a method described later based on the received received power spectrum PW_carrier_L. Then, the control means 3 outputs the state S of the selected channel CH_Select during the observation period L to the learning device 4 .

更に、制御手段３は、アプリケーション６から送信データＤ＿ＴＲを受け、パケット長ｍを学習器４から受けると、送信用パケットＰＫＴのパケット長Ｌ＿ＰＫＴがパケット長ｍになるときのデータ量ＡＯＤを有する送信データＤ＿ｍを送信データＤ＿ＴＲから検出し、その検出した送信データＤ＿ｍを含む送信用パケットＰＫＴを生成する。そして、制御手段３は、選択チャネルＣＨ＿Ｓｅｌｅｃｔにおけるキャリアセンスの結果を受信手段２から受けると、選択チャネルＣＨ＿Ｓｅｌｅｃｔにおけるキャリアセンスの結果に基づいて選択チャネルＣＨ＿Ｓｅｌｅｃｔが空いているか否かを判定する。そして、制御手段３は、選択チャネルＣＨ＿Ｓｅｌｅｃｔが空いていると判定したとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔおよび送信用パケットＰＫＴを送信手段５へ出力する。なお、制御手段３は、選択チャネルＣＨ＿Ｓｅｌｅｃｔにおけるキャリアセンスの結果に基づいて選択チャネルＣＨ＿Ｓｅｌｅｃｔが空いていないと判定したとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔが空くのを待って選択チャネルＣＨ＿Ｓｅｌｅｃｔおよび送信用パケットＰＫＴを送信手段５へ出力する。 Further, when the control means 3 receives the transmission data D_TR from the application 6 and the packet length m from the learning device 4, the transmission data having the data amount AOD when the packet length L_PKT of the transmission packet PKT becomes the packet length m D_m is detected from transmission data D_TR, and a transmission packet PKT containing the detected transmission data D_m is generated. Then, when the control means 3 receives the carrier sense result for the selected channel CH_Select from the receiving means 2, it determines whether or not the selected channel CH_Select is available based on the carrier sense result for the selected channel CH_Select. When the control means 3 determines that the selection channel CH_Select is available, the control means 3 outputs the selection channel CH_Select and the transmission packet PKT to the transmission means 5 . When the control means 3 determines that the selection channel CH_Select is not available based on the carrier sense result of the selection channel CH_Select, the control means 3 waits for the selection channel CH_Select to become available and transmits the selection channel CH_Select and the transmission packet PKT to the transmission means. Output to 5.

更に、制御手段３は、選択チャネルＣＨ＿Ｓｅｌｅｃｔおよび送信用パケットＰＫＴを送信手段５へ出力した後、一定期間において、ＡＣＫパケットを受信手段２から受けたとき、送信用パケットＰＫＴの送信が成功したことを検知する。そして、制御手段３は、送信用パケットＰＫＴの送信が成功したことを示す信号Ｓ＿ｓｕｃｃｅｓｓを生成し、その生成した信号Ｓ＿ｓｕｃｃｅｓｓを学習器４へ出力する。一方、制御手段３は、送信用パケットＰＫＴを送信手段５へ出力した後、一定期間において、ＡＣＫパケットを受信手段２から受けなかったとき、送信用パケットＰＫＴの送信が失敗したことを検知する。そして、制御手段３は、送信用パケットＰＫＴの送信が失敗したことを示す信号Ｓ＿ｆａｉｌｕｒｅを生成し、その生成した信号Ｓ＿ｆａｉｌｕｒｅを学習器４へ出力する。つまり、制御手段３は、送信用パケットＰＫＴを送信手段５へ出力した後、送信用パケットＰＫＴの送信が成功したか失敗したかを判定する。 Furthermore, after outputting the selected channel CH_Select and the transmission packet PKT to the transmission means 5, the control means 3, when receiving an ACK packet from the reception means 2 for a certain period of time, indicates that the transmission packet PKT has been successfully transmitted. detect. The control means 3 then generates a signal S_success indicating that the transmission packet PKT has been successfully transmitted, and outputs the generated signal S_success to the learning device 4 . On the other hand, when the control means 3 does not receive an ACK packet from the receiving means 2 for a certain period of time after outputting the transmission packet PKT to the transmission means 5, it detects that the transmission of the transmission packet PKT has failed. Then, the control means 3 generates a signal S_failure indicating that transmission of the transmission packet PKT has failed, and outputs the generated signal S_failure to the learning device 4 . That is, after outputting the transmission packet PKT to the transmission means 5, the control means 3 determines whether the transmission of the transmission packet PKT has succeeded or failed.

更に、制御手段３は、送信用パケットＰＫＴの送信が成功したか失敗したかを判定した後に、受信電力スペクトルＰＷ＿ｃｈｎを受信手段２から受けると、その受けた受信電力スペクトルＰＷ＿ｃｈｎに基づいて、後述する方法によって、空き期間Ｎを検出する。そして、制御手段３は、空き期間Ｎを学習器４へ出力する。 Furthermore, when the control means 3 receives the reception power spectrum PW_chn from the reception means 2 after determining whether the transmission of the transmission packet PKT has succeeded or failed, the control means 3, based on the received reception power spectrum PW_chn, will be described later. The method detects an idle period N. Then, the control means 3 outputs the vacant period N to the learning device 4 .

更に、制御手段３は、一定期間において、送信手段５へ出力した送信用パケットＰＫＴの個数Ｎ_ＰＫＴと、受信手段２から受けたＡＣＫパケットの個数Ｎ_ＡＣＫとをカウントし、個数Ｎ_ＡＣＫを個数Ｎ_ＰＫＴで除算してパケットの送信成功率Ｒ_{ＳＵＣＣＥＳＳ}を算出する。そして、制御手段３は、送信成功率Ｒ_{ＳＵＣＣＥＳＳ}がしきい値Ｒ＿ｔｈ以下であるとき、候補チャネルＣＨ＿ｃｄｔを変更する。一方、制御手段３は、送信成功率Ｒ_{ＳＵＣＣＥＳＳ}がしきい値Ｒ＿ｔｈよりも大きいとき、候補チャネルＣＨ＿ｃｄｔを変更しない。なお、しきい値Ｒ＿ｔｈは、例えば、５０％に設定される。 Further, the control means 3 counts the number N _PKT of transmission packets PKT output to the transmission means 5 and the number N _ACK of ACK packets received from the reception means 2 in a certain period of time, and counts the number N _ACK as the number N Divide by the _PKT to calculate the packet transmission success rate _{R_SUCCESS} . Then, the control means 3 changes the candidate channel CH_cdt when the transmission success rate R _SUCCESS is equal to or less than the threshold value R_th. On the other hand, the control means 3 does not change the candidate channel CH_cdt when the transmission success rate R _SUCCESS is greater than the threshold value R_th. Note that the threshold value R_th is set to 50%, for example.

学習器４は、候補チャネルＣＨ＿ｃｄｔ、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ、信号Ｓ＿ｓｕｃｃｅｓｓまたは信号Ｓ＿ｆａｉｌｕｒｅ、および空き期間Ｎを制御手段３から受ける。そして、学習器４は、候補チャネルＣＨ＿ｃｄｔ、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ、信号Ｓ＿ｓｕｃｃｅｓｓまたは信号Ｓ＿ｆａｉｌｕｒｅ、および空き期間Ｎに基づいて、多腕バンディットアルゴリズムによって学習を行い、候補チャネルＣＨ＿ｃｄｔから選択チャネルＣＨ＿Ｓｅｌｅｃｔを選択し、送信用パケットＰＫＴのパケット長ｍを選択する。そして、学習器４は、選択チャネルＣＨ＿Ｓｅｌｅｃｔおよびパケット長ｍを制御手段３へ出力する。 The learning device 4 receives the candidate channel CH_cdt, the state S of the selected channel CH_Select in the observation period L, the signal S_success or the signal S_failure, and the idle period N from the control means 3 . Then, the learning device 4 performs learning by the multi-armed bandit algorithm based on the candidate channel CH_cdt, the state S of the selected channel CH_Select in the observation period L, the signal S_success or the signal S_failure, and the idle period N, and selects from the candidate channel CH_cdt. Select the channel CH_Select and select the packet length m of the transmission packet PKT. The learning device 4 then outputs the selected channel CH_Select and the packet length m to the control means 3 .

送信手段５は、選択チャネルＣＨ＿Ｓｅｌｅｃｔおよび送信用パケットＰＫＴを制御手段３から受けると、送信用パケットＰＫＴを選択チャネルＣＨ＿Ｓｅｌｅｃｔでアンテナ１を介して送信する。 Upon receiving the selection channel CH_Select and the transmission packet PKT from the control means 3, the transmission means 5 transmits the transmission packet PKT via the antenna 1 on the selection channel CH_Select.

アプリケーション６は、送信データを生成し、その生成した送信データを制御手段３へ出力する。 The application 6 generates transmission data and outputs the generated transmission data to the control means 3 .

図３は、受信電力スペクトルの概念図である。図３において、縦軸は、受信電力を表し、横軸は、時間を表す。 FIG. 3 is a conceptual diagram of a received power spectrum. In FIG. 3, the vertical axis represents received power and the horizontal axis represents time.

図３を参照して、受信電力スペクトルＳＰ＿ＲＳＳＩは、受信電力が時間の経過とともに変化する。制御手段３は、全ての無線通信システムが通信を行っていない状態である信号の無い状態における受信電力値をしきい値ＲＳＳＩ＿ｔｈとして予め保持する。 Referring to FIG. 3, received power spectrum SP_RSSI changes with the passage of time. The control means 3 preliminarily holds, as a threshold value RSSI_th, a received power value in a signalless state in which no wireless communication system is communicating.

そして、制御手段３は、選択チャネルＣＨ＿Ｓｅｌｅｃｔで受信された受信電力スペクトルＳＰ＿ＲＳＳＩの振幅値を２乗して受信電力値に変換し、その変換した受信電力値がしきい値ＲＳＳＩ＿ｔｈよりも大きいとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態がビジー状態であると判定し、受信電力値がしきい値ＲＳＳＩ＿ｔｈ以下であるとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態がアイドル状態であると判定する。 Then, the control means 3 squares the amplitude value of the reception power spectrum SP_RSSI received on the selected channel CH_Select to convert it into a reception power value, and when the converted reception power value is larger than the threshold value RSSI_th, selects It is determined that the state of the channel CH_Select is busy, and when the received power value is equal to or less than the threshold RSSI_th, it is determined that the selected channel CH_Select is idle.

［学習器における学習］
学習器４における学習について説明する。図４は、観測期間および空き期間を説明するための図である。図４を参照して、この発明の実施の形態においては、チャネルの状態が変化する最小時間であるスロットＳＬを設定する。スロットＳＬは、例えば、１０μｓの長さを有する。 [Learning in the learner]
Learning in the learning device 4 will be described. FIG. 4 is a diagram for explaining observation periods and vacant periods. Referring to FIG. 4, in the embodiment of the present invention, a slot SL, which is the minimum time during which the channel state changes, is set. Slot SL has a length of 10 μs, for example.

受信手段２は、観測期間Ｌ（スロットＳＬ１，ＳＬ２）において、選択チャネルＣＨ＿Ｓｅｌｅｃｔでキャリアセンスを行って受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを検出し、その検出した受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを制御手段３へ出力する。 Receiving means 2 detects received power spectrum PW_carrier_L by performing carrier sense on selected channel CH_Select during observation period L (slots SL1 and SL2), and outputs detected received power spectrum PW_carrier_L to control means 3 .

制御手段３は、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを受信手段２から受けると、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌの振幅値を２乗して受信電力値に変換する。そして、制御手段３は、受信電力値をしきい値ＲＳＳＩ＿ｔｈと比較し、受信電力値がしきい値ＲＳＳＩ＿ｔｈよりも大きいとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態がビジー状態であると判定し、受信電力値がしきい値ＲＳＳＩ＿ｔｈ以下であるとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態がアイドル状態であると判定することをスロットＳＬ１，ＳＬ２について実行する。 Upon receiving the received power spectrum PW_carrier_L from the receiving means 2, the control means 3 squares the amplitude value of the received power spectrum PW_carrier_L to convert it into a received power value. Then, the control means 3 compares the received power value with the threshold RSSI_th, and when the received power value is greater than the threshold RSSI_th, determines that the selected channel CH_Select is in the busy state, and the received power value When it is equal to or less than the threshold RSSI_th, determining that the state of the selected channel CH_Select is idle is performed for slots SL1 and SL2.

この発明の実施の形態においては、ビジー状態を“１”で表し、アイドル状態を“０”で表す。 In the embodiment of the present invention, the busy state is represented by "1" and the idle state by "0".

観測期間Ｌは、２つのスロットＳＬ１，ＳＬ２からなるので、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓを“００”、“０１”、“１０”および“１１”の２ビットで表す。 Since the observation period L consists of two slots SL1 and SL2, the state S of the selected channel CH_Select during the observation period L is represented by two bits "00", "01", "10" and "11".

送信手段５は、観測期間Ｌの後のスロットＳＬ３，ＳＬ４でパケットを送信すると、制御手段３は、スロットＳＬ５において、ＡＣＫパケットを受信したか否かを判定することによってパケットの送信が成功したか失敗したかを判定する。 When the transmission means 5 transmits the packets at slots SL3 and SL4 after the observation period L, the control means 3 determines whether or not the ACK packet has been received at slot SL5 to determine whether the packet has been successfully transmitted. Determine if it failed.

その後、受信手段２は、スロットＳＬ６～ＳＬ８において、選択チャネルＣＨ＿Ｓｅｌｅｃｔでキャリアセンスを行って受信電力スペクトルＰＷ＿ｃｈｎを検出し、その検出した受信電力スペクトルＰＷ＿ｃｈｎを制御手段３へ出力する。 After that, the receiving means 2 detects the received power spectrum PW_chn by performing carrier sense on the selected channel CH_Select in the slots SL6 to SL8, and outputs the detected received power spectrum PW_chn to the control means 3. FIG.

制御手段３は、受信電力スペクトルＰＷ＿ｃｈｎを受信手段２から受けると、受信電力スペクトルＰＷ＿ｃｈｎの振幅値を２乗して受信電力値に変換する。そして、制御手段３は、受信電力値をしきい値ＲＳＳＩ＿ｔｈと比較し、受信電力値がしきい値ＲＳＳＩ＿ｔｈよりも大きいとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態がビジー状態であると判定し、受信電力値がしきい値ＲＳＳＩ＿ｔｈ以下であるとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態がアイドル状態であると判定することをスロットＳＬ６～ＳＬ８について実行する。 Upon receiving the received power spectrum PW_chn from the receiving means 2, the control means 3 squares the amplitude value of the received power spectrum PW_chn to convert it into a received power value. Then, the control means 3 compares the received power value with the threshold RSSI_th, and when the received power value is greater than the threshold RSSI_th, determines that the selected channel CH_Select is in the busy state, and the received power value Determining that the state of the selected channel CH_Select is idle when it is equal to or less than the threshold RSSI_th is performed for slots SL6-SL8.

そして、制御手段３は、スロットＳＬ６がビジー状態であるとき、空き期間Ｎが“０”であることを検出し、スロットＳＬ６がアイドル状態であり、かつ、スロットＳＬ７がビジー状態であるとき、空き期間Ｎが“１”であることを検出し、スロットＳＬ６，ＳＬ７がアイドル状態であり、かつ、スロットＳＬ８がビジー状態であるとき、空き期間Ｎが“２”であることを検出し、スロットＳＬ６～ＳＬ８がアイドル状態であるとき、空き期間Ｎが“３”であることを検出する。即ち、制御手段３は、スロットＳＬ６がアイドル状態であることを起点として、アイドル状態のスロットＳＬ６からアイドル状態が連続するスロットＳＬまでの期間を空き期間Ｎとして検出する。つまり、空き期間Ｎは、パケット送信後において、無線通信が行われていない状態が連続する期間である。 Then, the control means 3 detects that the idle period N is "0" when the slot SL6 is in a busy state, and when the slot SL6 is in an idle state and the slot SL7 is in a busy state, the idle period N is detected. When the period N is detected to be "1" and the slots SL6 and SL7 are idle and the slot SL8 is busy, the idle period N is detected to be "2" and the slot SL6 is detected. ˜SL8 is in an idle state, it detects that the vacant period N is "3". That is, the control means 3 detects the period from the slot SL6 in the idle state to the slot SL in which the idle state continues as the idle period N, starting from the fact that the slot SL6 is in the idle state. In other words, the idle period N is a period during which wireless communication is not performed after packet transmission.

制御手段３は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓおよび空き期間Ｎを検出すると、その検出した観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓおよび空き期間Ｎを学習器４へ出力する。また、制御手段３は、受信手段２からＡＣＫパケットを受けたとき、信号Ｓ＿ｓｕｃｃｅｓｓを生成して学習器４へ出力し、受信手段２からＡＣＫパケットを受けなかったとき、信号Ｓ＿ｆａｉｌｕｒｅを生成して学習器４へ出力する。 When the control means 3 detects the state S of the selected channel CH_Select and the idle period N in the observation period L, it outputs the detected state S of the selected channel CH_Select and the idle period N in the observation period L to the learning device 4 . Further, when the control means 3 receives an ACK packet from the receiving means 2, it generates a signal S_success and outputs it to the learning device 4. When it does not receive an ACK packet from the receiving means 2, it generates a signal S_failure for learning. Output to device 4.

図５は、学習器４における対応表の概略図である。図５を参照して、対応表ＴＢＬ１は、チャネル番号と、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓと、パケット長ｍと、平均報酬Ｖとを含む。チャネル番号、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ、パケット長ｍおよび平均報酬Ｖは、相互に対応付けられる。 FIG. 5 is a schematic diagram of a correspondence table in the learning device 4. FIG. Referring to FIG. 5, correspondence table TBL1 includes channel number, state S of selected channel CH_Select in observation period L, packet length m, and average reward V. FIG. The channel number, the state S of the selected channel CH_Select in the observation period L, the packet length m and the average reward V are associated with each other.

チャネル番号は、１，・・・，ａ，・・・，Ａ（Ａは、１つの候補チャネルＣＨ＿ｃｄｔに含まれるチャネルの総数であり、ａは、１～Ａの整数である。）からなる。観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓは、“００”、“０１”、“１０”および“１１”からなる。観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態“００”、“０１”、“１０”および“１１”は、チャネル１，・・・，ａ，・・・，Ａの各々に対応付けられる。 The channel numbers consist of 1, ..., a, ..., A (A is the total number of channels included in one candidate channel CH_cdt, and a is an integer from 1 to A). The state S of the selected channel CH_Select in the observation period L consists of "00", "01", "10" and "11". The states “00”, “01”, “10” and “11” of the selected channel CH_Select in the observation period L are associated with channels 1, . . . , a, .

パケット長ｍは、１，２，・・・，Ｍからなる。Ｍは、パケット長ｍの総数であり、２以上の整数である。パケット長ｍ＝１、パケット長ｍ＝２、・・・、およびパケット長ｍ＝Ｍは、それぞれ、異なるパケット長を表し、例えば、パケット長ｍ＝１が最も短いパケット長を表し、パケット長ｍ＝Ｍが最も長いパケット長を表す。そして、パケット長ｍ＝Ｍは、例えば、無線通信システムにおけるＤＩＦＳ（Distributed Inter Frame Space）の長さに設定され、パケット長ｍ＝１は、基準のパケット長に設定され、ｍが“１”増える毎に、パケット長ｍは、例えば、１０μｓづつ長くなる。基準のパケット長は、例えば、１０μｓに設定される。 The packet length m consists of 1, 2, . M is the total number of packet lengths m and is an integer of 2 or more. Packet length m=1, packet length m=2, . =M represents the longest packet length. Packet length m=M is set to, for example, the length of DIFS (Distributed Inter Frame Space) in a wireless communication system, packet length m=1 is set to the reference packet length, and m is increased by "1". , the packet length m increases by, for example, 10 μs. The standard packet length is set to 10 μs, for example.

パケット長ｍ＝１～Ｍは、１つのチャネル番号における観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態“００”、“０１”、“１０”および“１１”の各々に対応付けられる。図５においては、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態“０１”、“１０”および“１１”に対応するパケット長ｍの欄が空白であるが、実際には、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態“０１”、“１０”および“１１”に対応するパケット長ｍの欄には、パケット長ｍ＝１～Ｍが格納されている。 Packet lengths m=1 to M are associated with states “00”, “01”, “10” and “11” of selected channel CH_Select in observation period L in one channel number. In FIG. 5, the column of the packet length m corresponding to the states “01”, “10” and “11” of the selected channel CH_Select during the observation period L is blank. The column of packet length m corresponding to the states "01", "10" and "11" of , stores packet lengths m=1 to M. FIG.

平均報酬Ｖは、各チャネル番号において、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態“００”、“０１”、“１０”および“１１”の各々に対応付けられるＭ個のパケット長ｍ＝１～Ｍに対応付けられる。そして、平均報酬Ｖは、次式によって算出される。 The average reward V is M packet lengths m=1 to M associated with each of the states “00”, “01”, “10” and “11” of the selected channel CH_Select in the observation period L for each channel number. can be mapped to Then, the average reward V is calculated by the following equation.

式（１）において、Ｖ_ｔ＋１は、動作期間Ｔ＋１における平均報酬であり、Ｖ_ｔは、動作期間Ｔにおいて得られる平均報酬であり、Ｒ_ｔは、動作期間Ｔにおいて得られる即時報酬であり、ｎは、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの１つの状態Ｓに対応する１つのパケット長ｍを選択した回数である。 In equation (1), V _t+1 is the average reward in action period T+1, V _t is the average reward obtained in action period T, R _t is the immediate reward obtained in action period T, and n is the number of times one packet length m corresponding to one state S of the selection channel CH_Select in the observation period L is selected.

式（１）は、動作期間Ｔにおける平均報酬Ｖ_ｔ、動作期間Ｔにおける即時報酬Ｒ_ｔおよび観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓに対応する１つのパケット長ｍを選択した回数ｎによって動作期間Ｔ＋１において得られる平均報酬Ｖ_ｔ＋１を算出することを表す。そして、平均報酬Ｖ_ｔ＋１は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓに対応するＭ個のパケット長ｍの各々について算出される。 Equation (1) expresses the average reward V _t in the operating period T, the immediate reward R _t in the operating period T, and the number n of times n that one packet length m corresponding to the state S of the selected channel CH_Select in the observation period L is selected. Represents calculating the average reward V _t+1 obtained at T+1. The average reward V _t+1 is then calculated for each of the M packet lengths m corresponding to the state S of the selected channel CH_Select in the observation period L.

式（１）における即時報酬Ｒ_ｔは、次式によって表される。 The immediate reward _Rt in formula (1) is represented by the following formula.

式（２）においては、パケットの送信が成功したとき（Ｓｕｃｃｅｓｓ）、即時報酬Ｒ_ｔは、空き期間Ｎに“１”を加算した加算結果Ｎ＋１の逆数にパケット長ｍを乗算した乗算結果からなり、パケットの送信が失敗したとき（Ｆａｉｌｕｒｅ）、即時報酬Ｒ_ｔは、零（＝０）である。 In equation (2), when the packet is successfully transmitted (Success), the immediate reward _Rt consists of the multiplication result obtained by multiplying the reciprocal of the addition result N+1 obtained by adding "1" to the idle period N and the packet length m. , when the packet transmission fails (Failure), the immediate reward R _t is zero (=0).

式（２Ａ）において、Ｎ＋１の逆数を算出するのは、空き期間Ｎが零（＝０）である場合にも、即時報酬Ｒ_ｔを算出できるようにするためである。 In Equation (2A), the reciprocal of N+1 is calculated so that the immediate reward _Rt can be calculated even when the vacant period N is zero (=0).

式（２）によれば、パケットの送信が成功したとき（Ｓｕｃｃｅｓｓ）、即時報酬Ｒ_ｔは、空き期間Ｎが短い方が大きくなり、空き期間Ｎが長い方が小さくなり、パケット長が長い方が大きくなり、パケット長ｍが短い方が小さくなる。 According to equation (2), when the packet is successfully transmitted (Success), the immediate reward R _t becomes larger when the vacant period N is short, becomes smaller when the vacant period N is long, and increases when the packet length is long. becomes larger, and becomes smaller as the packet length m becomes shorter.

式（１）においては、平均報酬Ｖ_ｔの初期値は、零（＝０）に設定される。その結果、動作期間Ｔにおいて、パケットの送信に失敗したとき、即時報酬Ｒ_ｔが零（＝０）であるので（式（２Ｂ）参照）、平均報酬Ｖ_ｔ＋１は、零（＝０）である。一方、動作期間Ｔにおいて、パケットの送信に成功したとき、即時報酬Ｒ_ｔがｍ／（Ｎ＋１）であるので（式（２Ａ）参照）、平均報酬Ｖ_ｔ＋１は、ｍ／｛ｎ・（Ｎ＋１）｝である。 In equation (1), the initial value of the average reward Vt is set to zero ( ₌ 0). As a result, when the transmission of a packet fails in operation period T, the average reward V _t+1 is zero (=0) because the immediate reward R _t is zero (=0) (see equation (2B)). . On the other hand, when a packet is successfully transmitted in operation period T, since the immediate reward R _t is m/(N+1) (see equation (2A)), the average reward V _t+1 is m/{n·(N+1) }.

従って、学習器４における学習が開始された後、パケットの送信が失敗することが継続すると、平均報酬Ｖ_ｔ＋１は、大きくならない。 Therefore, if the packet transmission failures continue after the learning in the learner 4 is started, the average reward V _t+1 will not increase.

式（１）および式（２）によれば、即時報酬Ｒ_ｔが平均報酬Ｖ_ｔよりも大きいとき、平均報酬Ｖ_ｔ＋１は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ（“００”，“０１”，“１０”，“１１”のいずれか１つ）に対応する１つのパケット長ｍを選択した回数ｎが増加するに従って大きくなる。即ち、平均報酬Ｖ_ｔ＋１は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの１つの状態Ｓに対応する同じパケット長ｍが選択され続けることによって増加する。従って、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ（“００”，“０１”，“１０”，“１１”のいずれか１つ）に対応するＭ個のパケット長１～Ｍ（図５参照）のうちのいずれか１つに対応する平均報酬Ｖが最初に零（＝０）よりも大きくなると、その後、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓが同じであれば、同じパケット長ｍが選択され続ける可能性がある。 According to equations (1) and (2), when the immediate reward R _t is greater than the average reward V _t _{+1, the average reward V t+1} corresponds to the state S(“00”, “01 ”, “10”, or “11”) increases as the number of times n of selecting one packet length m increases. That is, the average reward V _t+1 increases by keeping the same packet length m corresponding to one state S of the selected channel CH_Select in the observation period L being selected. Therefore, M packet lengths 1 to M (see FIG. 5) corresponding to the state S (one of "00", "01", "10", and "11") of the selected channel CH_Select in the observation period L is first greater than zero (=0), then the same packet length m is selected if the state S of the selected channel CH_Select in the observation period L is the same. may continue to be

一方、即時報酬Ｒ_ｔが平均報酬Ｖ_ｔよりも小さいとき、平均報酬Ｖ_ｔ＋１は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ（“００”，“０１”，“１０”，“１１”のいずれか１つ）に対応する１つのパケット長ｍを選択した回数ｎが増加するに従って小さくなる。これは、空き期間Ｎが長くなった場合に生じうる。従って、より大きい平均報酬Ｖ_ｔ＋１を獲得する観点からは、学習器４による学習を継続することによって観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ（“００”，“０１”，“１０”，“１１”のいずれか１つ）に応じてパケットの送信に成功する確率がより高く、かつ、より長いパケット長ｍを見出し、その見出したパケット長ｍを観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ（“００”，“０１”，“１０”，“１１”のいずれか１つ）に応じて選択することが好ましい。これによって、端末装置１０は、他の端末装置による無線通信との衝突を回避して（即ち、他の端末装置と共存して）、無線通信を行うことができる。 On the other hand, when the immediate reward R _t is smaller than the average reward V _t , the average reward V _t+1 is the state S (any of “00”, “01”, “10”, “11” of the selected channel CH_Select in the observation period L or one) decreases as the number of times n that one packet length m is selected increases. This can occur if the idle period N becomes longer. Therefore, from the viewpoint of obtaining a larger average reward V _t+1 , the state S (“00”, “01”, “10”, “11 ”), find a longer packet length m with a higher probability of successful packet transmission, and use the found packet length m as the state S (“ 00", "01", "10", and "11"). As a result, the terminal device 10 can perform wireless communication while avoiding collision with wireless communication by other terminal devices (that is, coexisting with other terminal devices).

図６は、図２に示す学習器４の動作を説明するための図である。図６を参照して、学習器４は、対応表ＴＢＬ１を保持する。そして、学習器４は、候補チャネルＣＨ＿ｃｄｔを制御手段３から受けると、動作期間Ｔにおいて、ε－ｇｒｅｅｄｙ法によって候補チャネルＣＨ＿ｃｄｔからチャネルを選択する。より具体的には、学習器４は、ある一定の小さい数ε（例えば、０．３）を決定しておき、０～１の範囲の実数からなる乱数ｐを発生させ、その発生させた乱数ｐがε以下であるとき、候補チャネルＣＨ＿ｃｄｔからランダムにチャネルを選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿Ｔとして選択し、発生した乱数ｐがε以下でないとき、動作期間Ｔにおいて最大の平均報酬Ｖ_ｔが得られるチャネルを選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔとして候補チャネルＣＨ＿ｃｄｔから選択する。 FIG. 6 is a diagram for explaining the operation of the learning device 4 shown in FIG. Referring to FIG. 6, learning device 4 holds correspondence table TBL1. Upon receiving the candidate channel CH_cdt from the control means 3, the learning device 4 selects a channel from the candidate channel CH_cdt by the ε-greedy method during the operation period T. More specifically, the learning device 4 determines a certain small number ε (for example, 0.3), generates a random number p consisting of real numbers in the range of 0 to 1, and generates the random number When p is less than or equal to ε, a channel is randomly selected as the selection channel CH_Select_T from the candidate channels CH_cdt, and when the generated random number p is not less than or equal to ε, the channel that gives the maximum average reward V _t in the operation period T is selected. Select from the candidate channels CH_cdt as CH_Select_t.

そして、学習器４は、候補チャネルＣＨ＿ｃｄｔから選択した選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔを制御手段３へ出力する。 Then, the learning device 4 outputs the selection channel CH_Select_t selected from the candidate channels CH_cdt to the control means 3 .

その後、学習器４は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔの状態Ｓ_ｔを制御手段３から受けると、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔの状態Ｓ_ｔに対して、ε－ｇｒｅｅｄｙ法によってパケット長ｍを選択する。より具体的には、学習器４は、０～１の範囲の実数からなる乱数ｐを発生させ、その発生させた乱数ｐがε以下であるとき、パケット長１～Ｍからランダムにパケット長ｍ_ｔを選択し、発生した乱数ｐがε以下でないとき、動作期間Ｔにおいて最大の平均報酬Ｖ_ｔが得られるときのパケット長ｍ_ｔをパケット長１～Ｍから選択する。 After that, when the learning device 4 receives the state St of the selected channel _{CH_Select_t} in the observation period L from the control means 3, the packet length m is calculated for the state St of the selected channel _{CH_Select_t} in the observation period L by the ε-greedy method. select. More specifically, the learning device 4 generates a random number p consisting of real numbers in the range of 0 to 1, and when the generated random number p is equal to or less than ε, the packet length m is randomly selected from the packet lengths 1 to M. _t is selected, and when the generated random number p is not equal to or less than ε, the packet length m _t when the maximum average reward V _t is obtained in the operation period T is selected from the packet lengths 1 to M.

なお、学習器４は、発生した乱数ｐがε以下でないときに最大の平均報酬Ｖ_ｔが存在しないとき、パケット長１～Ｍからランダムにパケット長ｍ_ｔを選択する。 Note that the learning device 4 randomly selects the packet length m _t from the packet lengths 1 to M when the generated random number p is not equal to or less than ε and the maximum average reward V _t does not exist.

学習器４は、パケット長ｍ_ｔを選択すると、その選択したパケット長ｍ_ｔを制御手段３へ出力する。 After selecting the packet length _mt , the learning device 4 outputs the selected packet length _mt to the control means 3 .

その後、学習器４は、パケットが送信されたときの通信結果（パケットの送信の成功または失敗）および空き期間Ｎを制御手段３から受けると、通信結果（パケットの送信の成功または失敗）および空き期間Ｎに基づいて、動作期間Ｔにおける即時報酬Ｒ_ｔを算出する。より具体的には、学習器４は、パケットの送信の成功と空き期間Ｎとを受けたとき、式（２Ａ）によって即時報酬Ｒ_ｔを算出する。一方、学習器４は、パケットの送信の失敗と空き期間Ｎとを受けたとき、式（２Ｂ）によって即時報酬Ｒ_ｔを算出する。そして、学習器４は、即時報酬Ｒ_ｔを算出すると、その算出した即時報酬Ｒ_ｔを記憶する。 After that, learning device 4 receives the communication result (success or failure of packet transmission) when the packet is transmitted and the idle period N from control means 3. Based on the period N, calculate the immediate reward R _t in the action period T. More specifically, when learning device 4 receives successful packet transmission and idle period N, learning device 4 calculates immediate reward R _t by equation (2A). On the other hand, when the learning device 4 receives the packet transmission failure and the idle period N, it calculates the immediate reward _Rt by Equation (2B). After calculating the immediate reward _Rt , the learning device 4 stores the calculated immediate reward _Rt .

学習器４は、即時報酬Ｒ_ｔを算出した後に、即時報酬Ｒ_ｔを算出したときの選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔと同じチャネルを選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿Ｔ＋１として選択したときの平均報酬Ｖ_ｔ＋１を、即時報酬Ｒ_ｔおよび平均報酬Ｖ_ｔを用いて式（１）によって算出する。 After calculating the immediate reward R _t , the learning device 4 selects the same channel as the selected channel CH_Select_t when calculating the immediate reward R _t as the selected channel _{CH_Select_T} +1, and calculates the average reward V _t+ 1 as the selected channel CH_Select_T+1. It is calculated by the formula (1) using the reward _Vt .

そうすると、学習器４は、対応表ＴＢＬ１において、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔの状態Ｓ_ｔに対応付けられたパケット長ｍ（即時報酬Ｒ_ｔを算出したときのパケット長ｍ）に対応付けて平均報酬Ｖ_ｔ＋１を格納する。 Then, in correspondence table _TBL1 , learning device 4 _averages Store the reward V _t+1 .

そして、学習器４は、動作期間Ｔ＋１において、上述した方法によって選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿Ｔ＋１を選択し、その選択した選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿Ｔ＋１を制御手段３へ出力する。 Then, the learning device 4 selects the selected channel CH_Select_T+1 by the method described above and outputs the selected selected channel CH_Select_T+1 to the control means 3 in the operation period T+1.

その後、学習器４は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿Ｔ＋１の状態Ｓ_ｔ＋１を制御手段３から受けると、上述した方法によって、選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿Ｔ＋１の状態Ｓ_ｔ＋１に対してパケット長ｍ_ｔ＋１を選択する。そして、学習器４は、選択したパケット長ｍ_ｔ＋１を制御手段３へ出力する。 After that, when the learning device 4 receives the state St+1 of the selected channel CH_Select_T ₊ 1 in the observation period L from the control means 3, it selects the packet length mt+ ₁ for the state St+1 of the selected channel _{CH_Select_T} +1 by the method described above. The learning device 4 then outputs the selected packet length mt ₊₁ to the control means 3 .

その後、学習器４は、上述した動作を繰り返し実行する。 After that, the learning device 4 repeatedly executes the operation described above.

［端末装置における学習器以外の動作］
制御手段３は、候補チャネルＣＨ＿ｃｄｔを学習器４へ出力し、その後、選択チャネルＣＨ＿Ｓｅｌｅｃｔを学習器４から受ける。 [Operations other than the learning device in the terminal device]
The control means 3 outputs the candidate channel CH_cdt to the learner 4 and then receives the selected channel CH_Select from the learner 4 .

そして、制御手段３は、信号Ｓ＿ｃａｒｒｉｅｒ＿Ｌを生成し、選択チャネルＣＨ＿Ｓｅｌｅｃｔおよび信号Ｓ＿ｃａｒｒｉｅｒ＿Ｌを受信手段２へ出力する。 The control means 3 then generates the signal S_carrier_L and outputs the selected channel CH_Select and the signal S_carrier_L to the receiving means 2 .

その後、制御手段３は、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを受信手段２から受け、その受けた受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌに基づいて、上述した方法によって観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ（＝“００”，“０１”，“１０”，“１１”のいずれか）を検出する。そして、制御手段３は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ（＝“００”，“０１”，“１０”，“１１”のいずれか）を学習器４へ出力する。 After that, the control means 3 receives the received power spectrum PW_carrier_L from the receiving means 2, and based on the received received power spectrum PW_carrier_L, selects the state S (="00", " 01”, “10”, or “11”). Then, the control means 3 outputs the state S (=“00”, “01”, “10”, “11”) of the selected channel CH_Select during the observation period L to the learning device 4 .

引き続いて、制御手段３は、アプリケーション６から送信データを受け、学習器４からパケット長ｍを受けると、上述した方法によって、パケット長ｍを有する送信用パケットＰＫＴを生成する。 Subsequently, when the control means 3 receives the transmission data from the application 6 and the packet length m from the learning device 4, it generates a transmission packet PKT having the packet length m by the method described above.

そして、制御手段３は、キャリアセンスの結果を受信手段２から受け、その受けたキャリアセンスの結果に基づいて、選択チャネルＣＨ＿Ｓｅｌｅｃｔが空いていると判定したとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔおよび送信用パケットＰＫＴを送信手段５へ出力する。 Then, when the control means 3 receives the carrier sense result from the receiving means 2 and determines that the selection channel CH_Select is available based on the received carrier sense result, the control means 3 transmits the selection channel CH_Select and the transmission packet PKT. Output to the transmission means 5 .

送信手段５は、選択チャネルＣＨ＿Ｓｅｌｅｃｔおよび送信用パケットＰＫＴを制御手段３から受ける。そして、送信手段５は、選択チャネルＣＨ＿Ｓｅｌｅｃｔを用いて送信用パケットＰＫＴを送信する。この場合、送信手段５は、固定の伝送レートで送信用パケットＰＫＴを送信する。 The transmission means 5 receives the selected channel CH_Select and the transmission packet PKT from the control means 3 . Then, the transmission means 5 transmits the transmission packet PKT using the selection channel CH_Select. In this case, the transmitting means 5 transmits the transmission packet PKT at a fixed transmission rate.

その後、受信手段２は、一定期間において、選択チャネルＣＨ＿ＳｅｌｅｃｔでＡＣＫパケットを受信すると、その受信したＡＣＫパケットを制御手段３へ出力する。そして、受信手段２は、選択チャネルＣＨ＿Ｓｅｌｅｃｔでキャリアセンスを行い、受信電力スペクトルＰＷ＿ｃｈｎを検出し、その検出した受信電力スペクトルＰＷ＿ｃｈｎを制御手段３へ出力する。 After that, when receiving an ACK packet on the selection channel CH_Select for a certain period of time, the receiving means 2 outputs the received ACK packet to the control means 3 . Then, the receiving means 2 performs carrier sense on the selected channel CH_Select, detects the received power spectrum PW_chn, and outputs the detected received power spectrum PW_chn to the control means 3 .

制御手段３は、ＡＣＫパケットを受信手段２から受けると、パケットの送信が成功したことを検知し、ＡＣＫパケットを受信手段２から受けなかったとき、パケットの送信が失敗したことを検知する。そして、制御手段３は、パケットの送信が成功したことを検知したとき、信号Ｓ＿ｓｕｃｃｅｓｓを生成して学習器４へ出力し、パケットの送信が失敗したことを検知したとき、信号Ｓ＿ｆａｉｌｕｒｅを生成して学習器４へ出力する。 When the control means 3 receives an ACK packet from the receiving means 2, it detects that the packet has been successfully transmitted. When the control means 3 detects that the packet transmission has succeeded, it generates a signal S_success and outputs it to the learning device 4. When it detects that the packet transmission has failed, it generates a signal S_failure. Output to learning device 4 .

そして、制御手段３は、受信電力スペクトルＰＷ＿ｃｈｎを受信手段２から受けると、受信電力スペクトルＰＷ＿ｃｈｎに基づいて、上述した方法によって、空き期間Ｎを検出し、その検出した空き期間Ｎを学習器４へ出力する。 Then, when receiving the received power spectrum PW_chn from the receiving means 2, the control means 3 detects the idle period N based on the received power spectrum PW_chn by the above-described method, and transfers the detected idle period N to the learning device 4. Output.

また、制御手段３は、パケットの送信数Ｎ_ＰＫＴと、受信手段２から受けたＡＣＫパケットの個数Ｎ_ＡＣＫとをカウントし、個数Ｎ_ＡＣＫを送信数Ｎ_ＰＫＴで除算して送信成功率Ｒ_{ＳＵＣＣＥＳＳ}を算出する。 In addition, the control means 3 counts the number of transmitted packets N _PKT and the number N _ACK of ACK packets received from the receiving means 2, divides the number N _ACK by the number of transmitted N _PKT , and obtains the transmission success rate R _SUCCESS . calculate.

そして、制御手段３は、送信成功率Ｒ_{ＳＵＣＣＥＳＳ}がしきい値Ｒ＿ｔｈ以下であるか否かを判定する。 Then, the control means 3 determines whether or not the transmission success rate _{R_SUCCESS} is equal to or less than the threshold value R_th.

制御手段３は、送信成功率Ｒ_{ＳＵＣＣＥＳＳ}がしきい値Ｒ＿ｔｈ以下であると判定したとき、帯域の異なるチャネルからなる候補チャネル（既に学習器４へ出力した候補チャネルと個なる候補チャネル）を学習器４へ出力する。 When the control means 3 determines that the transmission success rate _{R_SUCCESS} is equal to or less than the threshold value R_th, the control means 3 selects candidate channels composed of channels of different bands (candidate channels different from the candidate channels already output to the learning device 4) to the learning device. Output to 4.

一方、制御手段３は、送信成功率Ｒ_{ＳＵＣＣＥＳＳ}がしきい値Ｒ＿ｔｈよりも大きいと判定したとき、既に学習器４へ出力した候補チャネル（候補チャネルＣＨ＿ｃｄｔ＿１または候補チャネルＣＨ＿ｃｄｔ＿２）を維持する。つまり、制御手段３は、候補チャネルＣＨ＿ｃｄｔを変更しない。 On the other hand, when the control means 3 determines that the transmission success rate R _SUCCESS is greater than the threshold value R_th, it maintains the candidate channel (candidate channel CH_cdt_1 or candidate channel CH_cdt_2) already output to the learning device 4 . That is, the control means 3 does not change the candidate channel CH_cdt.

以後、上述した動作が繰り返し実行される。 Thereafter, the operations described above are repeatedly performed.

図７は、図２に示す端末装置１０の動作を説明するためのタイミングチャートである。図７においては、チャネル１、チャネル２およびチャネル３からなる候補チャネルＣＨ＿ｃｄｔのうち、チャネル２が選択チャネルＣＨ＿Ｓｅｌｅｃｔとして選択された場合について端末装置１０の動作タイミングを説明する。また、矢印ＡＲ１のタイミングから矢印ＡＲ６のタイミングまでの期間を動作期間Ｔとする。 FIG. 7 is a timing chart for explaining the operation of the terminal device 10 shown in FIG. In FIG. 7, the operation timing of the terminal device 10 will be described when channel 2 is selected as the selected channel CH_Select among the candidate channels CH_cdt consisting of channel 1, channel 2 and channel 3. FIG. A period from the timing of arrow AR1 to the timing of arrow AR6 is assumed to be an operation period T. FIG.

図７を参照して、制御手段３は、矢印ＡＲ１のタイミングよりも前のタイミングにおいて、候補チャネル（チャネル１～３）を学習器４へ出力する。そして、制御手段３は、矢印ＡＲ１のタイミングにおいて、送信データＤ＿ｍを含む送信用パケットＰＫＴを生成する。また、学習器４は、矢印ＡＲ１のタイミングにおいて、候補チャネル（チャネル１～３）からチャネル２を選択チャネルＣＨ＿Ｓｅｌｅｃｔとして選択し、その選択した選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）を制御手段３へ出力する。 Referring to FIG. 7, control means 3 outputs candidate channels (channels 1 to 3) to learning device 4 at a timing prior to the timing of arrow AR1. Then, the control means 3 generates a transmission packet PKT including the transmission data D_m at the timing of the arrow AR1. Also, at the timing of arrow AR1, learning device 4 selects channel 2 from candidate channels (channels 1 to 3) as selected channel CH_Select, and outputs the selected selected channel CH_Select (=channel 2) to control means 3. .

そして、制御手段３は、学習器４から選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）を受けると、信号Ｓ＿ｃａｒｒｉｅｒ＿Ｌを生成し、選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）および信号Ｓ＿ｃａｒｒｉｅｒ＿Ｌを受信手段２へ出力する。 Upon receiving the selected channel CH_Select (=channel 2) from the learning device 4, the control means 3 generates a signal S_carrier_L and outputs the selected channel CH_Select (=channel 2) and the signal S_carrier_L to the receiving means 2.

受信手段２は、選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）および信号Ｓ＿ｃａｒｒｉｅｒ＿Ｌを制御手段３から受けると、矢印ＡＲ１のタイミングから矢印ＡＲ２のタイミングまでの観測期間Ｌにおいて、選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）においてキャリアセンスを行って受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを検出し、その検出した受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを制御手段３へ出力する。 When the receiving means 2 receives the selected channel CH_Select (=channel 2) and the signal S_carrier_L from the control means 3, during the observation period L from the timing of the arrow AR1 to the timing of the arrow AR2, the carrier in the selected channel CH_Select (=channel 2) Sensing is performed to detect the received power spectrum PW_carrier_L, and the detected received power spectrum PW_carrier_L is output to the control means 3 .

制御手段３は、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを受信手段２から受けると、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌの振幅値を２乗して受信電力値に変換する。そして、制御手段３は、受信電力値をしきい値ＲＳＳＩ＿ｔｈと比較し、受信電力値がしきい値ＲＳＳＩ＿ｔｈよりも大きいとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）の状態がビジー状態であると判定し、受信電力値がしきい値ＲＳＳＩ＿ｔｈ以下であるとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）の状態がアイドル状態であると判定する。 Upon receiving the received power spectrum PW_carrier_L from the receiving means 2, the control means 3 squares the amplitude value of the received power spectrum PW_carrier_L to convert it into a received power value. Then, the control means 3 compares the received power value with the threshold RSSI_th, and determines that the selected channel CH_Select (=channel 2) is busy when the received power value is greater than the threshold RSSI_th. , the state of the selected channel CH_Select (=channel 2) is determined to be idle when the received power value is equal to or less than the threshold RSSI_th.

そして、制御手段３は、矢印ＡＲ２のタイミングにおいて、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）の状態Ｓが“００”，“０１”，“１０”，“１１”のいずれかからなることを検出する。 Then, the control means 3 determines that the state S of the selected channel CH_Select (=channel 2) in the observation period L is one of "00", "01", "10" and "11" at the timing of the arrow AR2. to detect

図７においては、観測期間Ｌの１番目のスロットＳＬにおいてパケットが送信されており、観測期間Ｌの２番目のスロットＳＬが空いているので、制御手段３は、矢印ＡＲ２のタイミングにおいて、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌに基づいて、上述した方法によって、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）の状態Ｓが“１０”であることを検出する。そして、制御手段３は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）の状態Ｓ（＝“１０”）を学習器４へ出力する。 In FIG. 7, a packet is transmitted in the first slot SL of the observation period L, and the second slot SL of the observation period L is vacant. Based on the spectrum PW_carrier_L, it is detected that the state S of the selected channel CH_Select (=channel 2) is "10" in the observation period L by the method described above. Then, the control means 3 outputs the state S (=“10”) of the selected channel CH_Select (=channel 2) during the observation period L to the learning device 4 .

学習器４は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）の状態Ｓ（＝“１０”）を制御手段３から受けると、矢印ＡＲ２のタイミングにおいて、上述した方法によって、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）の状態Ｓ（＝“１０”）に応じたパケット長ｍを選択し、その選択したパケット長ｍを制御手段３へ出力する。 When the learning device 4 receives the state S (=“10”) of the selected channel CH_Select (=channel 2) in the observation period L from the control means 3, at the timing of the arrow AR2, the selection in the observation period L is performed by the method described above. A packet length m corresponding to the state S (="10") of the channel CH_Select (=channel 2) is selected, and the selected packet length m is output to the control means 3.

また、制御手段３は、矢印ＡＲ２のタイミングにおいて、アプリケーション６から送信データＤ＿ＴＲを受け、学習器４からパケット長ｍを受けると、その受けた送信データＤ＿ＴＲおよびパケット長ｍに基づいて、上述した方法によって、送信データＤ＿ｍを含む送信用パケットＰＫＴを生成し、キャリアセンスを行うように受信手段２を制御する。そして、制御手段３は、キャリアセンスの結果を受信手段２から受け、その受けたキャリアセンスの結果に基づいて、選択チャネルＣＨ＿Ｓｅｌｅｃｔが空いていると判定したとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）および送信用パケットＰＫＴを送信手段５へ出力する。 Further, when the control means 3 receives the transmission data D_TR from the application 6 and the packet length m from the learning device 4 at the timing of the arrow AR2, the above-described method is performed based on the received transmission data D_TR and the packet length m. generates a transmission packet PKT containing transmission data D_m, and controls the receiving means 2 to perform carrier sensing. Then, when the control means 3 receives the carrier sense result from the receiving means 2 and determines that the selected channel CH_Select is available based on the received carrier sense result, the selected channel CH_Select (=channel 2) and It outputs the transmission packet PKT to the transmission means 5 .

なお、制御手段３は、選択チャネルＣＨ＿Ｓｅｌｅｃｔが空いていないと判定したとき、選択チャネルＣＨ＿Ｓｅｌｅｃｔが空くのを待って、選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）および送信用パケットＰＫＴを送信手段５へ出力する。 When the control means 3 determines that the selection channel CH_Select is not available, the control means 3 waits for the selection channel CH_Select to become available and outputs the selection channel CH_Select (=channel 2) and the transmission packet PKT to the transmission means 5 .

送信手段５は、矢印ＡＲ２のタイミングにおいて、選択チャネルＣＨ＿Ｓｅｌｅｃｔ（＝チャネル２）および送信用パケットＰＫＴを制御手段３から受け、選択チャネルＣＨ＿Ｓｅｌｅｃｔを用いて送信用パケットＰＫＴを送信する。 The transmission means 5 receives the selection channel CH_Select (=channel 2) and the transmission packet PKT from the control means 3 at the timing of the arrow AR2, and transmits the transmission packet PKT using the selection channel CH_Select.

そして、矢印ＡＲ３のタイミングでパケットの送信が完了する。その後、受信手段２は、矢印ＡＲ４のタイミングでＡＣＫパケットを受信すると、その受信したＡＣＫパケットを制御手段３へ出力する。 Then, the transmission of the packet is completed at the timing of arrow AR3. After that, when receiving an ACK packet at the timing of arrow AR4, receiving means 2 outputs the received ACK packet to control means 3 .

制御手段３は、ＡＣＫパケットを受信手段２から受けると、パケットの送信が成功したことを検知する。一方、制御手段３は、ＡＣＫパケットを受信手段２から受けなかったとき、パケットの送信が失敗したことを検知する。従って、制御手段３は、矢印ＡＲ４のタイミングにおいて、パケットの送信の成功または失敗を学習器４へ出力する。 Upon receiving the ACK packet from the receiving means 2, the control means 3 detects successful transmission of the packet. On the other hand, when the control means 3 does not receive an ACK packet from the receiving means 2, it detects that the packet transmission has failed. Therefore, the control means 3 outputs success or failure of packet transmission to the learning device 4 at the timing of the arrow AR4.

その後、受信手段２は、矢印ＡＲ５のタイミングから矢印ＡＲ６のタイミングまでの期間において選択チャネルＣＨ＿Ｓｅｌｅｃｔでキャリアセンスを行って受信電力スペクトルＰＷ＿ｃｈｎを検出し、その検出した受信電力スペクトルＰＷ＿ｃｈｎを制御手段３へ出力する。 After that, the receiving means 2 detects the reception power spectrum PW_chn by performing carrier sense in the selection channel CH_Select during the period from the timing of the arrow AR5 to the timing of the arrow AR6, and outputs the detected reception power spectrum PW_chn to the control means 3. do.

制御手段３は、受信電力スペクトルＰＷ＿ｃｈｎを受信手段２から受けると、矢印ＡＲ６のタイミングにおいて、受信電力スペクトルＰＷ＿ｃｈｎに基づいて、上述した方法によって、空き期間Ｎを検出し、その検出した空き期間Ｎを学習器４へ出力する。 When the control means 3 receives the received power spectrum PW_chn from the receiving means 2, at the timing of the arrow AR6, based on the received power spectrum PW_chn, the control means 3 detects the idle period N by the method described above, and converts the detected idle period N to Output to learning device 4 .

学習器４は、矢印ＡＲ６のタイミングにおいて、空き期間Ｎを制御手段３から受けると、その受けた空き期間Ｎと、矢印ＡＲ４のタイミングで制御手段３から受けたパケットの送信の成功または失敗とに基づいて、動作期間Ｔにおける即時報酬Ｒ_ｔを算出し、その算出した即時報酬Ｒ_ｔを記憶する。この即時報酬Ｒ_ｔは、動作期間Ｔの後の動作期間（矢印ＡＲ１のタイミングから矢印ＡＲ６のタイミングまでの期間からなる動作期間）において、動作期間Ｔにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔと同じチャネルが選択されたときに得られる平均報酬Ｖ_ｔ＋１を式（１）によって算出するために用いられる。 When learning device 4 receives idle period N from control means 3 at the timing of arrow AR6, learner 4 receives the idle period N and the success or failure of packet transmission received from control means 3 at the timing of arrow AR4. Based on this, the immediate reward Rt in the action period _T is calculated, and the calculated immediate reward _Rt is stored. This immediate reward Rt is given when the same channel as the selected channel _{CH_Select} in the action period T is selected in the action period after the action period T (the action period consisting of the period from the timing of the arrow AR1 to the timing of the arrow AR6). is used to calculate the average reward V _t+1 obtained at , by equation (1).

学習器４は、即時報酬Ｒ_ｔを記憶すると、選択チャネルＣＨ＿Ｓｅｌｅｃｔと同じチャネルが選択されたときに算出された動作期間Ｔにおける即時報酬Ｒ_ｔを用いて、動作期間Ｔよりも後の動作期間（矢印ＡＲ１のタイミングから矢印ＡＲ６のタイミングまでの期間からなる動作期間）における平均報酬Ｖ_ｔ＋１を式（１）によって算出する。そして、学習器４は、対応表ＴＢＬ１において、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓに対応付けられたパケット長ｍ（即時報酬Ｒ_ｔを算出したときのパケット長ｍ）に対応付けて平均報酬Ｖ_ｔ＋１を格納する。 When the immediate reward _Rt is stored, the learning device 4 uses the immediate reward _Rt in the action period T calculated when the same channel as the selected channel CH_Select is selected in the action period after the action period T ( The average reward V _t+1 in the action period from the timing of arrow AR1 to the timing of arrow AR6 is calculated by equation (1). Then, the learning device 4 associates the average reward with the packet length m (the packet length m when the immediate reward _Rt is calculated) associated with the state S of the selected channel CH_Select in the observation period L in the correspondence table TBL1. Store V _t+1 .

その後、制御手段３は、上述した送信成功率Ｒ_{ＳＵＣＣＥＳＳ}を算出する。そして、制御手段３は、その算出した送信成功率Ｒ_{ＳＵＣＣＥＳＳ}がしきい値Ｒ＿ｔｈ以下であるとき、別の候補チャネルＣＨ＿ｃｄｔを学習器４へ出力して別の候補チャネルＣＨ＿ｃｄｔを用いるように学習器４を制御する。一方、制御手段３は、送信成功率Ｒ_{ＳＵＣＣＥＳＳ}がしきい値Ｒ＿ｔｈよりも大きいとき、矢印ＡＲ１のタイミングよりも前のタイミングにおいて、学習器４へ既に出力した候補チャネルＣＨ＿ｃｄｔを維持するので、学習器４へ候補チャネルＣＨ＿ｃｄｔを出力しない。 After that, the control means 3 calculates the transmission success rate _{R_SUCCESS} described above. Then, when the calculated transmission success rate _{R_SUCCESS} is equal to or less than the threshold value R_th, the control means 3 outputs another candidate channel CH_cdt to the learning device 4 so that the learning device 4 uses another candidate channel CH_cdt. to control. On the other hand, when the transmission success rate R _SUCCESS is greater than the threshold value R_th, the control means 3 maintains the candidate channel CH_cdt already output to the learning device 4 at the timing before the timing of the arrow AR1. 4 does not output the candidate channel CH_cdt.

学習器４は、制御手段３から既に受けた候補チャネルＣＨ＿ｃｄｔと異なる候補チャネルＣＨ＿ｃｄｔを制御手段３から受けると、その受けた候補チャネルＣＨ＿ｃｄｔに基づいて上述した方法によって選択チャネルＣＨ＿Ｓｅｌｅｃｔを選択する。 When the learning device 4 receives from the control means 3 a candidate channel CH_cdt different from the candidate channel CH_cdt already received from the control means 3, the learning device 4 selects the selection channel CH_Select based on the received candidate channel CH_cdt by the method described above.

以後、端末装置１０は、動作期間Ｔ毎に上述した動作を繰り返し実行する。 After that, the terminal device 10 repeatedly performs the above-described operation every operation period T. FIG.

図８は、図２に示す端末装置１０の各動作期間における動作を説明するための図である。 FIG. 8 is a diagram for explaining the operation of the terminal device 10 shown in FIG. 2 during each operation period.

図８を参照して、Ｔ（Ｔは、正の整数である。）番目の動作期間、（Ｔ＋１）番目の動作期間および（Ｔ＋２）番目の動作期間の各々は、図７に示す矢印ＡＲ１のタイミングから矢印ＡＲ６のタイミングまでの期間からなる。 Referring to FIG. 8, each of the T (T is a positive integer)-th operation period, the (T+1)-th operation period and the (T+2)-th operation period corresponds to the arrow AR1 shown in FIG. It consists of a period from the timing to the timing of the arrow AR6.

制御手段３は、Ｔ番目の動作期間の矢印ＡＲ１のタイミングにおいて学習器４から選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔを受ける。そして、受信手段２および制御手段３は、次の（Ｉ）～（ＩＩＩ）を実行する。
（Ｉ）Ｔ番目の動作期間の矢印ＡＲ２のタイミングにおいて、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔの状態Ｓ_ｔを検出する。
（ＩＩ）Ｔ番目の動作期間の矢印ＡＲ４のタイミングにおいて、パケットを送信したときの通信結果ＣＭ＿ｒｓｔ＿ｔ（パケットの送信の成功または失敗）を取得する。
（ＩＩＩ）Ｔ番目の動作期間の矢印ＡＲ５のタイミングから矢印ＡＲ６のタイミングまでの期間において、パケットの送信完了後の空き期間Ｎ_ｔを検出する。 The control means 3 receives the selection channel CH_Select_t from the learning device 4 at the timing of the arrow AR1 in the Tth operation period. Then, the receiving means 2 and the control means 3 execute the following (I) to (III).
(I) Detect the state St of the selected channel _{CH_Select_t} in the observation period L at the timing of the arrow AR2 in the T-th operation period.
(II) Obtain the communication result CM_rst_t (success or failure of packet transmission) when the packet is transmitted at the timing of the arrow AR4 in the T-th operation period.
(III) In the period from the timing of the arrow AR5 to the timing of the arrow AR6 in the _T -th operation period, an idle period Nt after the completion of packet transmission is detected.

そうすると、制御手段３は、（Ｉ）～（ＩＩＩ）における観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔの状態Ｓ、パケットを送信したときの通信結果ＣＭ＿ｒｓｔ＿ｔ（パケットの送信の成功または失敗）、および空き期間Ｎを学習器４へ出力する。 Then, the control means 3 determines the state S of the selected channel CH_Select_t in the observation period L in (I) to (III), the communication result CM_rst_t when the packet is transmitted (success or failure of packet transmission), and the idle period N Output to learning device 4 .

学習器４は、Ｔ番目の動作期間において、次の（Ａ）～（Ｄ）を実行する。
（Ａ）Ｔ番目の動作期間において、選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔを選択する。
（Ｂ）Ｔ番目の動作期間において、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔの状態Ｓ_ｔに応じてパケット長ｍ_ｔを選択する。
（Ｃ）Ｔ番目の動作期間における通信結果ＣＭ＿ｒｓｔ＿ｔ、空き期間Ｎ_ｔおよびパケット長ｍ_ｔに基づいて式（２）によって即時報酬Ｒ_ｔを算出する。
（Ｄ）即時報酬Ｒ_ｔを用いて（Ｔ＋１）番目の動作期間における平均報酬Ｖ_ｔ＋１を式（１）によって算出する。 The learning device 4 performs the following (A) to (D) in the Tth operation period.
(A) Select the selection channel CH_Select_t in the Tth operation period.
(B) Select the packet length _mt according to the state St of the selected channel _{CH_Select_t} in the observation period L in the Tth operation period.
(C) An immediate reward _Rt is calculated by Equation (2) based on the communication result _{CM_rst_t} , idle period Nt, and packet length mt in the _T -th operation period.
(D) Using the immediate reward R _t , the average reward V _t+1 in the (T+1)-th action period is calculated by Equation (1).

次に、（Ｔ＋１）番目の動作期間において、受信手段２および制御手段３は、上記の（Ｉ）～（ＩＩＩ）を実行する。この場合、受信手段２および制御手段３は、（Ｉ）において、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔ＋１の状態Ｓ_ｔ＋１を検出し、（ＩＩ）において、パケットを送信したときの通信結果ＣＭ＿ｒｓｔ＿ｔ＋１を取得し、（ＩＩＩ）において、パケットの送信完了後の空き期間Ｎ_ｔ＋１を検出する。 Next, in the (T+1)th operation period, the receiving means 2 and the control means 3 execute the above (I) to (III). In this case, the receiving means 2 and the control means 3 detect the state St+1 of the selected channel _{CH_Select_t} +1 in the observation period L in (I), acquire the communication result CM_rst_t+1 when the packet is transmitted in (II), In (III), an idle period _Nt+1 after completion of packet transmission is detected.

一方、学習器４は、（Ｔ＋１）番目の動作期間において、上記の（Ａ）～（Ｄ）を実行する（図８の（Ｅ）参照）。この場合、学習器４は、（Ａ）において、選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔ＋１を選択し、（Ｂ）において、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔ＋１の状態Ｓ_ｔ＋１に応じてパケット長ｍ_ｔ＋１を選択し、（Ｃ）において、通信結果ＣＭ＿ｒｓｔ＿ｔ＋１、空き期間Ｎ_ｔ＋１およびパケット長ｍ_ｔ＋１に基づいて式（２）によって即時報酬Ｒ_ｔ＋１を算出し、（Ｄ）において、即時報酬Ｒ_ｔ＋１を用いて（Ｔ＋２）番目の動作期間における平均報酬Ｖ_ｔ＋２を式（１）によって算出する。 On the other hand, the learning device 4 executes the above (A) to (D) in the (T+1)th operation period (see (E) in FIG. 8). In this case, the learning device 4 selects the selected channel CH_Select_t+1 in (A), selects the packet length m _t+1 _according to the state St+1 of the selected channel CH_Select_t+1 in the observation period L in (B), and (C). In (D), the immediate reward R _t+1 is calculated by Equation (2) based on the communication result CM_rst_t+1, the idle period N _t+1 and the packet length m _t+1 , and in (D) the (T+2)th operation period using the immediate reward R _t+1 The average reward V _t+2 in is calculated by equation (1).

更に、（Ｔ＋２）番目の動作期間において、受信手段２および制御手段３は、上記の（Ｉ）～（ＩＩＩ）を実行する。この場合、受信手段２および制御手段３は、（Ｉ）において、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔ＋２の状態Ｓ_ｔ＋２を検出し、（ＩＩ）において、パケットを送信したときの通信結果ＣＭ＿ｒｓｔ＿ｔ＋２を取得し、（ＩＩＩ）において、パケットの送信完了後の空き期間Ｎ_ｔ＋２を検出する。 Furthermore, in the (T+2)th operation period, the receiving means 2 and the control means 3 execute the above (I) to (III). In this case, the receiving means 2 and the control means 3 detect the state St+2 of the selected channel _{CH_Select_t} +2 in the observation period L in (I), acquire the communication result CM_rst_t+2 when the packet is transmitted in (II), In (III), an idle period _Nt+2 after completion of packet transmission is detected.

一方、学習器４は、（Ｔ＋２）番目の動作期間において、上記の（Ａ）～（Ｄ）を実行する（図８の（Ｆ）参照）。この場合、学習器４は、（Ａ）において、選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔ＋２を選択し、（Ｂ）において、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿ｔ＋２の状態Ｓ_ｔ＋２に応じてパケット長ｍ_ｔ＋２を選択し、（Ｃ）において、通信結果ＣＭ＿ｒｓｔ＿ｔ＋２、空き期間Ｎ_ｔ＋２およびパケット長ｍ_ｔ＋２に基づいて式（２）によって即時報酬Ｒ_ｔ＋２を算出し、（Ｄ）において、即時報酬Ｒ_ｔ＋２を用いて（Ｔ＋３）番目の動作期間における平均報酬Ｖ_ｔ＋３を式（１）によって算出する。 On the other hand, the learning device 4 executes the above (A) to (D) in the (T+2)th operation period (see (F) in FIG. 8). In this case, the learning device 4 selects the selected channel CH_Select_t+2 in (A), selects the packet length m _t+2 _according to the state St+2 of the selected channel CH_Select_t+2 in the observation period L in (B), and (C). In (D), the immediate reward R _t+2 is calculated by formula (2) based on the communication result CM_rst_t+2, the idle period N _t+2 , and the packet length m _t+2 , and in (D), the (T+3)th operation period using the immediate reward R _t+2 The average reward V _t+3 in is calculated by equation (1).

なお、学習器４は、即時報酬Ｒ_ｔおよび平均報酬Ｖ_ｔ＋１を算出し、選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿Ｔ＋１を選択することを（Ｔ＋１）番目の動作期間の矢印ＡＲ１のタイミングまでに行い、即時報酬Ｒ_ｔおよび平均報酬Ｖ_ｔ＋１を算出し、パケット長ｍ_ｔ＋１を選択することを（Ｔ＋１）番目の動作期間の矢印ＡＲ２のタイミングまでであれば、Ｔ番目の動作期間において行ってもよく、（Ｔ＋１）番目の動作期間において行ってもよい。 Note that the learning device 4 calculates the immediate reward R _t and the average reward V _t+1 , selects the selected channel CH_Select_T+1 by the timing of the arrow AR1 in the (T+1)th operation period, and calculates the immediate reward R _t and the average reward V t+1. Calculating the reward V _t+ _{1 and selecting the packet length m t+1} may be performed in the T-th operation period until the timing of the arrow AR2 in the (T+1)-th operation period, and the (T+1)-th operation You can do it during the period.

また、学習器４は、即時報酬Ｒ_ｔ＋１および平均報酬Ｖ_ｔ＋２を算出し、選択チャネルＣＨ＿Ｓｅｌｅｃｔ＿Ｔ＋２を選択することを（Ｔ＋２）番目の動作期間の矢印ＡＲ１のタイミングまでに行い、即時報酬Ｒ_ｔ＋１および平均報酬Ｖ_ｔ＋２を算出し、パケット長ｍ_ｔ＋２を選択することを（Ｔ＋２）番目の動作期間の矢印ＡＲ２のタイミングまでであれば、（Ｔ＋１）番目の動作期間において行ってもよく、（Ｔ＋２）番目の動作期間において行ってもよい。 In addition, the learning device 4 calculates the immediate reward R _t+1 and the average reward V _t+2 , selects the selection channel CH_Select_T+2 by the timing of the arrow AR1 in the (T+2)th operation period, and calculates the immediate reward R _t +1 and the average reward V t+2. Calculating the reward V _t+ _{2 and selecting the packet length m t+2} may be performed in the (T+1)th operation period until the timing of the arrow AR2 in the (T+2)th operation period. may be performed during the operation period of

そして、受信手段２、制御手段３および学習器４は、上述した動作を繰り返し実行する。 Then, the receiving means 2, the control means 3 and the learning device 4 repeatedly execute the operations described above.

Ｔ番目の動作期間は、「第１の動作期間」を構成し、（Ｔ＋１）番目の動作期間は、「第２の動作期間」を構成する。 The T-th operating period constitutes a "first operating period", and the (T+1)-th operating period constitutes a "second operating period".

そして、Ｔ番目の動作期間および（Ｔ＋１）番目の動作期間において、上述した動作が終了した後、（Ｔ＋１）番目の動作期間および（Ｔ＋２）番目の動作期間において、上述した動作が繰り返し実行される。この場合、（Ｔ＋１）番目の動作期間は、「第１の動作期間」を構成し、（Ｔ＋２）番目の動作期間は、「第２の動作期間」を構成する。以後、同様にして、２つの動作期間において、上述した動作が繰り返し実行される。この場合、２つの動作期間は、２つの動作期間において同じ選択チャネルＣＨ＿Ｓｅｌｅｃｔが連続して選択されるとき、隣接しており、２つの動作期間において同じ選択チャネルＣＨ＿Ｓｅｌｅｃｔが連続して選択されないとき、離れている。 After the above-described operations are completed in the T-th operation period and the (T+1)-th operation period, the above-described operations are repeatedly performed in the (T+1)-th operation period and the (T+2)-th operation period. . In this case, the (T+1)th operation period constitutes the "first operation period", and the (T+2)th operation period constitutes the "second operation period". Thereafter, similarly, the above-described operations are repeatedly performed in two operation periods. In this case, the two operating periods are adjacent when the same selected channel CH_Select is continuously selected in the two operating periods, and separated when the same selected channel CH_Select is not continuously selected in the two operating periods. ing.

この発明の実施の形態においては、即時報酬Ｒ_ｔは、１つの動作期間においてパケットが選択チャネルＣＨ＿Ｓｅｌｅｃｔで送信されたときに得られる報酬であり、平均報酬Ｖ_ｔ＋１は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓに対応する１つのパケット長ｍを選択した回数ｎ（累積値）によって１つの選択チャネルＣＨ＿Ｓｅｌｅｃｔにおける即時報酬Ｒ_ｔの累積値を平均した報酬であり、かつ、１つの動作期間の後の動作期間において得られる報酬である。 In our embodiment, the immediate reward R _t is the reward obtained when a packet is transmitted on the selected channel CH_Select in one operating period, and the average reward V _t+1 is the selected channel CH_Select in the observation period L is the reward obtained by averaging the cumulative value of immediate reward R _t in one selection channel CH_Select by the number n (cumulative value) of selecting one packet length m corresponding to state S, and after one operation period It is a reward obtained during the action period.

図９から図１３は、それぞれ、対応表ＴＢＬ１の変遷を示す第１の概略図から第５の概略図である。 9 to 13 are first to fifth schematic diagrams, respectively, showing changes in the correspondence table TBL1.

図９から図１３は、候補チャネルが１ｃｈ，６ｃｈ，１１ｃｈからなり、パケット長ｍが１０μｓ、２０μｓおよび３０μｓである場合について対応表ＴＢＬ１の変遷を示す。 FIGS. 9 to 13 show changes in the correspondence table TBL1 for cases where candidate channels are 1ch, 6ch, and 11ch, and packet lengths m are 10 μs, 20 μs, and 30 μs.

図９を参照して、対応表ＴＢＬ１（Ａ）は、観測期間Ｌにおけるチャネルの状態“００”，“０１”，“１０”，“１１”が１ｃｈ，６ｃｈ，１１ｃｈの各々に対応付けられ、パケット長１０μｓ，２０μｓ，３０μｓが観測期間Ｌにおけるチャネルの状態“００”，“０１”，“１０”，“１１”の各々に対応付けられ、平均報酬Ｖがそれぞれのパケット長１０μｓ，２０μｓ，３０μｓに対応付けられた構成からなる。そして、対応表ＴＢＬ１（Ａ）は、初期状態の対応表であるため、平均報酬Ｖは、全て、初期値（＝０）からなる。そして、学習器４は、候補チャネル（＝１ｃｈ，６ｃｈ，１１ｃｈ）を制御手段３から受ける。 Referring to FIG. 9, correspondence table TBL1(A) associates channel states “00”, “01”, “10”, and “11” in observation period L with 1ch, 6ch, and 11ch, respectively. Packet lengths of 10 μs, 20 μs and 30 μs are associated with channel states “00”, “01”, “10” and “11” in the observation period L, respectively, and the average reward V is associated with each packet length of 10 μs, 20 μs and 30 μs. consists of a configuration associated with Since the correspondence table TBL1(A) is the correspondence table in the initial state, all the average rewards V are initial values (=0). Then, the learning device 4 receives the candidate channels (=1ch, 6ch, 11ch) from the control means 3 .

次に、図１０を参照して、学習器４は、１番目の動作期間において、乱数ｐを発生させ、その発生させた乱数ｐがε以下であるので、候補チャネル（＝１ｃｈ，６ｃｈ，１１ｃｈ）からランダムにチャネル６ｃｈを選択し（図１０の対応表ＴＢＬ１（Ｂ）参照）、チャネル６ｃｈを選択チャネルＣＨ＿Ｓｅｌｅｃｔとして制御手段３へ出力する。 Next, referring to FIG. 10, learning device 4 generates a random number p in the first operation period, and since the generated random number p is equal to or less than ε, candidate channels (=1ch, 6ch, 11ch) ) at random (see correspondence table TBL1(B) in FIG. 10), and outputs channel 6ch to control means 3 as selected channel CH_Select.

その後、学習器４は、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”を制御手段３から受ける。そして、学習器４は、ε－ｇｒｅｅｄｙ法によって、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に応じたパケット長ｍを選択する。ε－ｇｒｅｅｄｙ法によってパケット長ｍを選択する場合、発生した乱数ｐがε以下であるとき、ランダムにパケット長ｍを選択し、発生した乱数ｐがε以下でないとき、最大の平均報酬Ｖが得られるときのパケット長ｍを選択することになる。 After that, the learning device 4 receives the state “01” of the channel 6ch during the observation period L from the control means 3 . Then, the learning device 4 selects the packet length m according to the state "01" of the channel 6ch in the observation period L by the ε-greedy method. When the packet length m is selected by the ε-greedy method, when the generated random number p is ε or less, the packet length m is randomly selected, and when the generated random number p is not ε or less, the maximum average reward V is obtained. will choose the packet length m when it is available.

この時点で、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に対応する平均報酬Ｖは、全て、零（０）であり（図９の対応表ＴＢＬ１（Ａ）参照）、最大の平均報酬Ｖが存在しないので、発生した乱数ｐがε以下でないとき、ランダムにパケット長ｍを選択することになる。一方、発生した乱数ｐがε以下であるとき、ε－ｇｒｅｅｄｙ法によれば、ランダムにパケット長ｍを選択することになる。 At this point, all the average rewards V corresponding to the state “01” of channel 6ch in observation period L are zero (0) (see correspondence table TBL1(A) in FIG. 9), and the maximum average reward V is Since it does not exist, when the generated random number p is not equal to or less than ε, the packet length m is randomly selected. On the other hand, when the generated random number p is less than or equal to ε, the packet length m is randomly selected according to the ε-greedy method.

従って、学習器４は、パケット長ｍ＝１０μｓをランダムに選択し（図１０の対応表ＴＢＬ１（Ｂ）参照）、パケット長ｍ＝１０μｓを制御手段３へ出力する。 Therefore, the learning device 4 randomly selects the packet length m=10 μs (see correspondence table TBL1(B) in FIG. 10) and outputs the packet length m=10 μs to the control means 3 .

引き続いて、学習器４は、パケットの送信が成功したことを示す信号Ｓ＿ｓｕｃｃｅｓｓを制御手段３から受け、その後、空き期間Ｎ（＝２）を制御手段３から受ける。 Subsequently, the learning device 4 receives from the control means 3 a signal S_success indicating that the packet has been successfully transmitted, and then receives an idle period N (=2) from the control means 3 .

そして、学習器４は、信号Ｓ＿ｓｕｃｃｅｓｓおよび空き期間Ｎ（＝２）に基づいて式（２Ａ）によって１番目の動作期間における即時報酬Ｒ_１（＝１０／３）を算出する。この時点において、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に対応する平均報酬Ｖ_１は、全て、零（＝０）であるので（図９の対応表ＴＢＬ１（Ａ）参照）、学習器４は、即時報酬Ｒ_１（＝１０／３）と平均報酬Ｖ_１（＝０）と、ｎ＝１とを式（１）に代入して、平均報酬Ｖ_２＝０＋（１０／３－０）／１＝１０／３を算出し、その算出した平均報酬Ｖ_２（＝１０／３）をパケット長ｍ＝１０μｓに対応付けて対応表ＴＢＬ１（Ｂ）に格納する。 Then, the learning device 4 calculates the immediate reward R ₁ (=10/3) in the first action period by Equation (2A) based on the signal S_success and the idle period N (=2). At this point, all the _average rewards V1 corresponding to the state “01” of channel 6ch in observation period L are zero (=0) (see correspondence table TBL1(A) in FIG. 9), so learning device 4 Substitute the immediate reward R ₁ (=10/3), the average reward V ₁ (=0), and n=1 into the equation (1) to obtain the average reward V ₂ =0+(10/3−0) /1=10/3 is calculated, and the calculated average reward V ₂ (=10/3) is stored in the correspondence table TBL1(B) in association with the packet length m=10 μs.

図１１を参照して、学習器４は、２番目の動作期間において、乱数ｐを発生させ、その発生させた乱数ｐがε以下であるので、候補チャネル（＝１ｃｈ，６ｃｈ，１１ｃｈ）からランダムにチャネル６ｃｈを選択し、チャネル６ｃｈを選択チャネルＣＨ＿Ｓｅｌｅｃｔとして制御手段３へ出力する。 Referring to FIG. 11, learning device 4 generates random number p in the second operation period. channel 6ch is selected, and channel 6ch is output to the control means 3 as the selected channel CH_Select.

その後、学習器４は、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”を制御手段３から受ける。この時点で、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に対応する平均報酬Ｖの欄には、１０μｓのパケット長ｍに対応付けて平均報酬Ｖ_２（＝１０／３）が格納されている（図１０の対応表ＴＢＬ１（Ｂ）参照）。学習器４は、乱数ｐを発生させ、その発生させた乱数ｐがε以下でないので、最大の平均報酬Ｖ_２が得られるときのパケット長ｍ＝１０μｓを選択し、その選択したパケット長ｍ＝１０μｓを制御手段３へ出力する。 After that, the learning device 4 receives the state “01” of the channel 6ch during the observation period L from the control means 3 . At this point, the average reward V ₂ (=10/3) is stored in the column of the average reward V corresponding to the state “01” of the channel 6ch in the observation period L in association with the packet length m of 10 μs. (See correspondence table TBL1(B) in FIG. 10). The learning device 4 generates a random number p, and since the generated random number p is not equal to or less than ε, selects the packet length m=10 μs when the maximum _average reward V2 is obtained, and the selected packet length m= 10 μs is output to the control means 3 .

引き続いて、学習器４は、パケットの送信が成功したことを示す信号Ｓ＿ｓｕｃｃｅｓｓを制御手段３から受け、その後、空き期間Ｎ（＝１）を制御手段３から受ける。 Subsequently, the learning device 4 receives from the control means 3 a signal S_success indicating that the packet has been successfully transmitted, and then receives an idle period N (=1) from the control means 3 .

そうすると、学習器４は、信号Ｓ＿ｓｕｃｃｅｓｓおよび空き期間Ｎ（＝１）に基づいて式（２Ａ）によって即時報酬Ｒ_２（＝１０／２）を算出し、その算出した即時報酬Ｒ_２（＝１０／２）を記憶する。 Then, the learning device 4 calculates the immediate reward R ₂ (=10/2) by Equation (2A) based on the signal S_success and the idle period N (=1), and the calculated immediate reward R ₂ (=10/ 2) is stored.

その後、学習器４は、即時報酬Ｒ_２（＝１０／２）と平均報酬Ｖ_２（＝１０／３）とｎ＝２とを式（１）に代入して、平均報酬Ｖ_３＝１０／３＋（１０／２－１０／３）／２＝２５／６を算出し、その算出した平均報酬Ｖ_３（＝２５／６）をパケット長ｍ＝１０μｓに対応付けて対応表ＴＢＬ１（Ｂ）に格納する。 After that, the learning device 4 substitutes the immediate reward R ₂ (=10/2), the average reward V ₂ (=10/3), and n=2 into Equation (1) to obtain the average reward V ₃ =10/ 3+(10/2−10/3)/2=25/6 is calculated, and the calculated average reward V ₃ (=25/6) is associated with the packet length m=10 μs, and is shown in the correspondence table TBL1 (B). Store.

図１２を参照して、学習器４は、３番目の動作期間において、乱数ｐを発生させ、その発生させた乱数ｐがε以下であるので、チャネル１ｃｈをランダムに選択し、その選択したチャネル１ｃｈを選択チャネルＣＨ＿Ｓｅｌｅｃｔとして制御手段３へ出力する。 Referring to FIG. 12, learning device 4 generates a random number p in the third operation period, and since the generated random number p is equal to or less than ε, it randomly selects channel 1ch, and selects channel 1ch. 1ch is output to the control means 3 as the selected channel CH_Select.

その後、学習器４は、観測期間Ｌにおけるチャネル１ｃｈの状態“００”を制御手段３から受ける。この時点で、観測期間Ｌにおけるチャネル１ｃｈの状態“００”に対応する平均報酬Ｖ_３は、全て、零（０）である（図１１の対応表ＴＢＬ１（Ｃ）参照）。学習器４は、乱数ｐを発生させ、その発生させた乱数ｐがε以下であるので、パケット長ｍ＝２０μｓをランダムに選択し、パケット長ｍ＝２０μｓを制御手段３へ出力する。 After that, the learning device 4 receives the state “00” of the channel 1ch in the observation period L from the control means 3 . At this point, all the average rewards V3 corresponding to the state "00" of channel 1ch in observation period L _are zero (0) (see correspondence table TBL1(C) in FIG. 11). The learning device 4 generates a random number p, and since the generated random number p is equal to or less than ε, it randomly selects a packet length m=20 μs and outputs the packet length m=20 μs to the control means 3 .

その後、学習器４は、パケットの送信が失敗したことを示す信号Ｓ＿ｆａｉｌｕｒｅを制御手段３から受ける。 After that, the learning device 4 receives from the control means 3 a signal S_failure indicating that the transmission of the packet has failed.

そして、学習器４は、信号Ｓ＿ｆａｉｌｕｒｅに基づいて式（２Ｂ）によって即時報酬Ｒ_３（＝０）を算出し、その算出した即時報酬Ｒ_３（＝０）を記憶する。そうすると、学習器４は、即時報酬Ｒ_３（＝０）および平均報酬Ｖ_３（＝０）に基づいて、平均報酬Ｖ_４（＝０）を算出し、その算出した平均報酬Ｖ_４（＝０）を対応表ＴＢＬ１（Ｄ）のチャネル１ｃｈの状態“００”に対応する２０μｓのパケット長ｍに対応付けて平均報酬Ｖの欄に格納する。 Then, the learning device 4 calculates the immediate reward R ₃ (=0) by Equation (2B) based on the signal S_failure, and stores the calculated immediate reward R ₃ (=0). Then, the learning device 4 calculates the average reward V ₄ (=0) based on the immediate reward R ₃ (=0) and the average reward V ₃ (=0), and the calculated average reward V ₄ (=0 ) is associated with the packet length m of 20 μs corresponding to the state “00” of channel 1ch in the correspondence table TBL1(D) and stored in the average reward V column.

図１３を参照して、学習器４は、４番目の動作期間において、乱数ｐを発生させ、その発生させた乱数ｐがεよりも大きいので、最大の平均報酬Ｖ_３が得られるときのチャネル６ｃｈを選択する。対応表ＴＢＬ１（Ｄ）においては、平均報酬Ｖ_３＝２５／６であり、平均報酬Ｖ_４＝０であるので、平均報酬Ｖ_３が最大である。 Referring to FIG. 13, learning device 4 generates a random number p in the fourth operation period, and since the generated random number p is greater than ε, the channel when the maximum average reward V ₃ is obtained is Select 6ch. In the correspondence table TBL1(D), the average reward V ₃ =25/6 and the average reward V ₄ =0, so the average reward V ₃ is the maximum.

学習器４は、チャネル６ｃｈを選択すると、その選択したチャネル６ｃｈを選択チャネルＣＨ＿Ｓｅｌｅｃｔとして制御手段３へ出力する。 When the learning device 4 selects the channel 6ch, it outputs the selected channel 6ch to the control means 3 as the selected channel CH_Select.

その後、学習器４は、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”を制御手段３から受ける。この時点で、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に対応する平均報酬Ｖは、Ｖ_３（＝２５／６）が最大である（図１２の対応表ＴＢＬ１（Ｄ）参照）。学習器４は、乱数ｐを発生させ、その発生させた乱数ｐがε以下でないので、最大の平均報酬Ｖ_３（＝２５／６）が得られるときのパケット長ｍ＝１０μｓを選択し、パケット長ｍ＝１０μｓを制御手段３へ出力する。 After that, the learning device 4 receives the state “01” of the channel 6ch during the observation period L from the control means 3 . At this point, V ₃ (=25/6) is the maximum average reward V corresponding to state "01" of channel 6ch in observation period L (see correspondence table TBL1(D) in FIG. 12). The learning device 4 generates a random number p, and since the generated random number p is not equal to or less than ε, the learning device 4 selects a packet length m=10 μs when the maximum average reward V ₃ (=25/6) is obtained, and a packet Output the length m=10 μs to the control means 3 .

引き続いて、学習器４は、パケットの送信が成功したことを示す信号Ｓ＿ｓｕｃｃｅｓｓと、空き期間Ｎ（＝３）とを制御手段３から受ける。 Subsequently, the learning device 4 receives from the control means 3 a signal S_success indicating successful transmission of the packet and an idle period N (=3).

そうすると、学習器４は、信号Ｓ＿ｓｕｃｃｅｓｓおよび空き期間Ｎ（＝３）に基づいて式（２Ａ）によって即時報酬Ｒ_４（＝１０／４）を算出し、その算出した即時報酬Ｒ_４（＝１０／４）を記憶する。 Then, the learning device 4 calculates the immediate reward R ₄ (=10/4) by Equation (2A) based on the signal S_success and the idle period N (=3), and the calculated immediate reward R ₄ (=10/ 4) is stored.

そして、学習器４は、即時報酬Ｒ_４（＝１０／４）と、平均報酬Ｖ_３（＝２５／６）と、ｎ＝３とを式（１）に代入して平均報酬Ｖ_５＝２５／６＋（１０／４－２５／６）／３＝６５／１０を算出する。ここで、平均報酬Ｖ_３（＝２５／６）を式（１）に代入して平均報酬Ｖ_５を算出するのは、観測期間Ｌにおけるチャネル６ｃｈの状態Ｓ（＝“０１”）に対応するパケット長ｍ＝１０μｓに対して算出された平均報酬Ｖ_ｔが図１１の対応表ＴＢＬ１（Ｃ）に格納されたＶ_３（＝２５／６）であるからである。従って、観測期間Ｌにおけるチャネル６ｃｈの状態Ｓ（＝“０１”）に対応するパケット長ｍ＝１０μｓに対して平均報酬Ｖ_ｔ＋１を動作期間Ｔ（＝４）において式（１）によって算出するとき、動作期間Ｔ（＝４）よりも前の動作期間Ｔ（＝２）において算出された平均報酬Ｖ_３（＝２５／６）を平均報酬Ｖ_ｔとして用いる。 Then, the learning device 4 substitutes the immediate reward R ₄ (=10/4), the average reward V ₃ (=25/6), and n=3 into the equation (1) to obtain the average reward V ₅ =25 /6+(10/4-25/6)/3=65/10. Here, calculating the _average reward V5 by substituting the average reward V3 ₍ =25/6) into the equation (1) corresponds to the state S (="01") of the channel 6ch in the observation period L. This is because the average reward V _t calculated for the packet length m=10 μs is V ₃ (=25/6) stored in the correspondence table TBL1(C) of FIG. Therefore, when calculating the average reward V _t+1 for the packet length m=10 μs corresponding to the state S (="01") of the channel 6ch in the observation period L in the operation period T (=4) by the formula (1), The average reward V ₃ (=25/6) calculated in the action period T (=2) before the action period T (=4) is used as the average reward V _t .

図９から図１３において説明したように、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に対応付けられた平均報酬の３個の欄のうち、１つの欄に平均報酬Ｖ_３（＝２５／６）が格納されているので（図１１の対応表ＴＢＬ１（Ｃ）参照）、４番目の動作期間において、乱数ｐがε以下でないとき、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に対してチャネル長ｍを選択するとき、最大の平均報酬Ｖ_３（＝２５／６）に対するチャネル長（＝１０μｓ）を選択できる（図１１の対応表ＴＢＬ１（Ｃ）および図１３の対応表ＴＢＬ１（Ｅ）参照）。 As described with reference to FIGS. 9 to 13, among the three columns of the average reward associated with the state "01" of the channel 6ch in the observation period L, one column contains the average reward V ₃ (=25/6). ) is stored (see the correspondence table TBL1(C) in FIG. 11), during the fourth operation period, when the random number p is not equal to or less than ε, the channel When choosing the length m, we can choose the channel length (=10 μs) for the maximum average reward V ₃ (=25/6) (see correspondence table TBL1(C) in FIG. 11 and correspondence table TBL1(E) in FIG. 13) ).

また、図１０に示す１番目の動作期間において、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に対してパケット長ｍ（＝１０μｓ）を選択し、パケットの送信に成功して即時報酬Ｒ_１（＝１０／３）が得られている。また、図１１に示す２番目の動作期間において、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に対してパケット長ｍ（＝１０μｓ）を選択し、即時報酬Ｒ_２（＝１０／２）が得られるとともに平均報酬Ｖ_３（＝２５／６）が得られている。更に、図１３に示す４番目の動作期間において、観測期間Ｌにおけるチャネル６ｃｈの状態“０１”に対してパケット長ｍ（＝１０μｓ）を選択し、平均報酬Ｖ_５（＝６５／１０）が得られている（図１３の対応表ＴＢＬ１（Ｅ）参照）。 Also, in the first operation period shown in FIG. 10, the packet length m (=10 μs) is selected for the state “01” of channel 6ch in the observation period L, the packet is successfully transmitted, and the immediate reward R ₁ ( = 10/3) is obtained. Further, in the second operation period shown in FIG. 11, packet length m (=10 μs) is selected for state “01” of channel 6ch in observation period L, and immediate reward R ₂ (=10/2) is obtained. and an average reward V ₃ (=25/6) is obtained. Furthermore, in the fourth operation period shown in FIG. 13, packet length m (=10 μs) is selected for state “01” of channel 6ch in observation period L, and average reward V ₅ (=65/10) is obtained. (See correspondence table TBL1(E) in FIG. 13).

その結果、観測期間Ｌにおけるチャネル６ｃｈの状態Ｓが“０１”であるとき、観測期間Ｌの経過後にパケットを送信するとパケットの送信に成功する確率が高いので、観測期間Ｌにおけるチャネル６ｃｈの状態Ｓが“０１”であれば、観測期間Ｌの経過後のスロットが空いている確率が高いと推定できる。これは、観測期間Ｌにおける各チャネルの状態“００”，“０１”，“１０”，“１１”にも当てはまることである。 As a result, when the state S of the channel 6ch during the observation period L is "01", the probability of successful packet transmission is high if the packet is transmitted after the observation period L has elapsed. is "01", it can be estimated that there is a high probability that a slot is available after the observation period L has elapsed. This also applies to the states "00", "01", "10", and "11" of each channel in the observation period L.

従って、学習器４が図９から図１３において説明した学習を繰り返し実行することによって、学習器４は、観測期間Ｌにおける各チャネルの状態Ｓ（＝“００”，“０１”，“１０”，“１１”のいずれか）に応じたパケット長ｍを選択することが可能である。 Therefore, the learner 4 repeatedly performs the learning described with reference to FIGS. It is possible to select the packet length m according to (either "11").

更に、学習器４が図９から図１３において説明した学習を繰り返し実行することによって、例えば、観測期間Ｌにおけるチャネル１１ｃｈの状態“００”に対しては、パケット長ｍ＝３０μｓを選択すると、パケットの送信に失敗する確率が高くなり、パケット長ｍ＝１０μｓを選択すると、パケットの送信に成功する確率が高くなることを学習することもあり得る。 Furthermore, the learner 4 repeatedly performs the learning described in FIGS. 9 to 13. For example, when the packet length m=30 μs is selected for the state “00” of the channel 11ch in the observation period L, the packet , and may learn that choosing a packet length of m=10 μs increases the probability of successful packet transmission.

その結果、学習器４は、観測期間Ｌにおけるチャネル１１ｃｈの状態“００”に対しては、乱数ｐがε以下でないとき、パケット長ｍ＝１０μｓを選択することになる。そして、パケット長ｍ＝１０μｓを選択してパケットを送信することは、短い空き期間を利用してパケットを送信することになり、各端末装置が自己のパケット送信だけを利己的に促進するのではなく、他の端末装置による無線通信が空いている期間を利用して（つまり、他の端末装置による無線通信に配慮した上で）パケットを送信できることになる。従って、各端末装置は、他の端末装置による無線通信の有無に関する情報を他の端末装置から取得しなくても、他の端末装置と共存しながら無線通信を行うことができる。 As a result, the learning device 4 selects the packet length m=10 μs for the state “00” of the channel 11ch in the observation period L when the random number p is not equal to or less than ε. Selecting a packet length of m=10 μs and transmitting a packet means transmitting a packet using a short idle period, and each terminal device may selfishly promote only its own packet transmission. Instead, packets can be transmitted using periods when wireless communication by other terminal devices is idle (in other words, taking wireless communication by other terminal devices into consideration). Therefore, each terminal device can perform wireless communication while coexisting with other terminal devices without acquiring information about the presence or absence of wireless communication by other terminal devices from other terminal devices.

更に、学習器４は、ε－ｇｒｅｅｄｙ法によってパケット長ｍを選択するので、乱数ｐがε以下であるとき、ランダムにパケット長ｍを選択することになり、最初に、零（＝０）よりも大きい平均報酬Ｖが得られたパケット長ｍを継続して選択することを抑制して、より大きい平均報酬Ｖが得られるパケット長ｍを探索することができる。 Furthermore, since the learning device 4 selects the packet length m by the ε-greedy method, when the random number p is ε or less, the packet length m is randomly selected. It is possible to search for a packet length m with which a larger average reward V is obtained by suppressing continuous selection of the packet length m with which a larger average reward V is obtained.

なお、選択チャネルＣＨ＿Ｓｅｌｅｃｔをε－ｇｒｅｅｄｙ法によって選択する場合、乱数ｐがε以下でないとき、学習器４は、各チャネルの状態“００”，“０１”，“１０”，“１１”の全てに対応付けられた平均報酬Ｖの欄（図９から図１３に示す対応表ＴＢＬ１（Ａ）～ＴＢＬ１（Ｅ）においては、１２個の平均報酬Ｖの欄）を参照して、最大の平均報酬Ｖが得られるチャネルを選択チャネルＣＨ＿Ｓｅｌｅｃｔとして選択する。 When the selected channel CH_Select is selected by the ε-greedy method, when the random number p is not equal to or smaller than ε, the learning device 4 selects all of the states “00”, “01”, “10”, and “11” of each channel. By referring to the associated average reward V column (12 average reward V columns in the correspondence tables TBL1(A) to TBL1(E) shown in FIGS. 9 to 13), the maximum average reward V is obtained as the selection channel CH_Select.

また、パケット長ｍをε－ｇｒｅｅｄｙ法によって選択する場合、乱数ｐがε以下でないとき、学習器４は、観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓ（“００”，“０１”，“１０”，“１１”のいずれか１つ）に対応付けられた平均報酬Ｖの欄（図９から図１３に示す対応表ＴＢＬ１（Ａ）～ＴＢＬ１（Ｅ）においては、３個の平均報酬Ｖの欄）を参照して、最大の平均報酬Ｖが得られるパケット長をパケット長ｍとして選択する。 When the packet length m is selected by the ε-greedy method, the learning device 4 selects the state S (“00”, “01”, “10” , “11”) (in correspondence tables TBL1(A) to TBL1(E) shown in FIGS. 9 to 13, three average reward V columns ), and select the packet length m that gives the maximum average reward V as the packet length m.

図１４は、図２に示す端末装置１０の動作を説明するためのフローチャートである。図１４を参照して、端末装置１０の動作が開始されると、制御手段３は、送信データＤ＿ＴＲがあるか否かを判定する（ステップＳ１）。この場合、制御手段３は、アプリケーション６から送信データＤ＿ＴＲを受けたとき、送信データがあると判定し、アプリケーション６から送信データＤ＿ＴＲを受けなかったとき、送信データがないと判定する。 FIG. 14 is a flow chart for explaining the operation of the terminal device 10 shown in FIG. Referring to FIG. 14, when the operation of terminal device 10 is started, control means 3 determines whether there is transmission data D_TR (step S1). In this case, when the control means 3 receives the transmission data D_TR from the application 6, it determines that there is transmission data, and when it does not receive the transmission data D_TR from the application 6, it determines that there is no transmission data.

ステップＳ１において、送信データＤ＿ＴＲがあると判定されると、制御手段３は、パケット長ｍを要求する信号Ｓ＿ｒｅｑ＿ｍを生成して学習器４へ出力する。 In step S1, when it is determined that there is transmission data D_TR, the control means 3 generates a signal S_req_m requesting the packet length m and outputs it to the learning device 4. FIG.

学習器４は、端末装置１０における学習器４以外の部分の動作と並行して、観測期間Ｌにおける選択チャネルの状態Ｓ、パケットを送信したときの通信結果およびパケットの送信完了後の空き期間Ｎに基づいて学習を実行し、所定の確率で最大の平均報酬Ｖ_ｔ＋１が得られるときのチャネルを選択チャネルとして選択するとともに、観測期間Ｌにおける選択チャネルの状態Ｓに応じて、所定の確率で最大の平均報酬Ｖ_ｔ＋１が得られるときのパケット長をパケット長ｍとして選択する（ステップＳ２）。ここで、平均報酬Ｖ_ｔ＋１は、動作期間Ｔ＋１における平均報酬である。 In parallel with the operations of the terminal device 10 other than the learner 4, the learner 4 acquires the state S of the selected channel during the observation period L, the communication result when the packet is transmitted, and the idle period N after the completion of the packet transmission. and selects the channel for which the maximum average reward V _t+1 is obtained with a predetermined probability as the selected channel, and according to the state S of the selected channel during the observation period L, is obtained as the packet length _m (step S2). where the average reward V _t+1 is the average reward in the action period T+1.

学習器４は、信号Ｓ＿ｒｅｑ＿ｍを制御手段３から受けると、選択したパケット長ｍを制御手段３へ出力する。 Upon receiving the signal S_req_m from the control means 3 , the learning device 4 outputs the selected packet length m to the control means 3 .

制御手段３は、学習器４からパケット長ｍを受けると、送信用パケットＰＫＴのパケット長Ｌ＿ＰＫＴがパケット長ｍになるときのデータ量ＡＯＤを有する送信データＤ＿ｍを送信データＤ＿ＴＲから検出し、その検出した送信データＤ＿ｍを含む送信用パケットＰＫＴを生成する（ステップＳ３）。そして、制御手段３は、選択チャネルを要求する信号Ｓ＿ｒｅｑ＿ＣＨを生成して学習器４へ出力する。 When receiving the packet length m from the learning device 4, the control means 3 detects the transmission data D_m having the data amount AOD when the packet length L_PKT of the transmission packet PKT becomes the packet length m from the transmission data D_TR, and detects the transmission data D_m. A transmission packet PKT containing the transmitted data D_m is generated (step S3). Then, the control means 3 generates a signal S_req_CH requesting the selected channel and outputs it to the learning device 4 .

そして、学習器４は、制御手段３から信号Ｓ＿ｒｅｑ＿ＣＨを受けると、選択した選択チャネルを制御手段３へ出力し、観測期間Ｌにおける選択チャネルの状態Ｓを制御手段３から受けると、パケット長ｍを制御手段３へ出力する。 Upon receiving a signal S_req_CH from the control means 3, the learning device 4 outputs the selected selected channel to the control means 3. Upon receiving the state S of the selected channel during the observation period L from the control means 3, the learning device 4 sets the packet length m to Output to the control means 3 .

制御手段３は、ステップＳ３の後、学習器４から選択チャネルを受けると、受信手段２から受けた受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌに基づいて、上述した方法によって、観測期間Ｌにおける選択チャネルの状態Ｓを検出し（ステップＳ４）、その検出した観測期間Ｌにおける選択チャネルの状態Ｓを学習器４へ出力する。 Upon receiving the selected channel from the learner 4 after step S3, the control means 3 detects the state S of the selected channel during the observation period L by the method described above based on the received power spectrum PW_carrier_L received from the receiving means 2. (step S4), and the detected state S of the selected channel in the observation period L is output to the learning device 4. FIG.

制御手段３は、ステップＳ４の後、受信手段２から受けたキャリアセンスの結果（選択チャネルにおけるキャリアセンスの結果）に基づいて選択チャネルが空いているか否かを判定する（ステップＳ５）。 After step S4, the control means 3 determines whether or not the selected channel is available based on the result of carrier sense received from the receiving means 2 (the result of carrier sense in the selected channel) (step S5).

ステップＳ５において、選択チャネルが空いていると判定されると、制御手段３は、選択チャネルＣＨ＿Ｓｅｌｅｃｔおよびパケット（送信用パケットＰＫＴ）を送信手段５へ出力し、送信手段５は、選択チャネルＣＨ＿Ｓｅｌｅｃｔを用いて、制御手段３から受けたパケット（送信用パケットＰＫＴ）をパケット長ｍで送信する（ステップＳ６）。 In step S5, when it is determined that the selection channel is available, the control means 3 outputs the selection channel CH_Select and the packet (packet for transmission PKT) to the transmission means 5, and the transmission means 5 uses the selection channel CH_Select. Then, the packet (packet for transmission PKT) received from the control means 3 is transmitted with the packet length m (step S6).

その後、制御手段３は、ＡＣＫパケットの有無に基づいて、パケットを送信したときの通信結果を検出し（ステップＳ７）、その検出した通信結果を学習器４へ出力する。そして、制御手段３は、受信手段２から受けた受信電力スペクトルＰＷ＿ｃｈｎに基づいて、上述した方法によって、パケットの送信完了後の空き期間Ｎを検出し（ステップＳ８）、その検出した空き期間Ｎを学習器４へ出力する。 After that, the control means 3 detects the communication result when the packet is transmitted based on the presence or absence of the ACK packet (step S7), and outputs the detected communication result to the learning device 4. FIG. Then, based on the received power spectrum PW_chn received from the receiving means 2, the control means 3 detects the idle period N after the completion of packet transmission (step S8), and detects the detected idle period N by the method described above. Output to learning device 4 .

そうすると、制御手段３は、パケットの送信成功率がしきい値以下であるか否かを判定する（ステップＳ９）。 Then, the control means 3 determines whether or not the packet transmission success rate is equal to or less than the threshold value (step S9).

ステップＳ９において、パケットの送信成功率がしきい値以下であると判定されたとき、制御手段３は、候補チャネルを他の候補チャネルに変更し（ステップＳ１０）、その変更した他の候補チャネルを学習器４へ出力する。 When it is determined in step S9 that the packet transmission success rate is equal to or less than the threshold, the control means 3 changes the candidate channel to another candidate channel (step S10), and changes the changed candidate channel to Output to learning device 4 .

そして、ステップＳ９において、パケットの送信成功率がしきい値以下でないと判定されたとき、またはステップＳ１０の後、一連の動作は、ステップＳ１へ移行する。 Then, when it is determined in step S9 that the packet transmission success rate is not equal to or lower than the threshold value, or after step S10, the series of operations proceeds to step S1.

図１４に示すフローチャートにおいては、端末装置が駆動されている限り、ステップＳ１からステップＳ１０が繰り返し実行される。 In the flowchart shown in FIG. 14, steps S1 to S10 are repeatedly executed as long as the terminal device is driven.

また、図１４に示すフローチャートにおいては、ステップＳ１０からステップＳ１へ移行した場合、パケットの送信に用いる選択チャネルは、他の候補チャネルから選択される（ステップＳ２参照）。 Also, in the flowchart shown in FIG. 14, when the process moves from step S10 to step S1, the selection channel used for packet transmission is selected from other candidate channels (see step S2).

図１５および図１６は、それぞれ、図２に示す学習器４の動作を説明するための第１および第２のフローチャートである。図１５を参照して、学習器４の動作が開始されると、学習器４は、候補チャネルを制御手段３から受ける（ステップＳ２１）。 15 and 16 are first and second flow charts, respectively, for explaining the operation of the learning device 4 shown in FIG. Referring to FIG. 15, when the operation of learning device 4 is started, learning device 4 receives candidate channels from control means 3 (step S21).

そして、学習器４は、対応表ＴＢＬ１における平均報酬の全てを零（＝０）に設定することによって平均報酬を初期化する（ステップＳ２２）。 Then, the learning device 4 initializes the average reward by setting all the average rewards in the correspondence table TBL1 to zero (=0) (step S22).

その後、学習器４は、０～１の乱数ｐを発生させる（ステップＳ２３）。そして、学習器４は、乱数ｐがε以下であるか否かを判定する（ステップＳ２４）。 After that, the learning device 4 generates a random number p between 0 and 1 (step S23). Then, the learning device 4 determines whether or not the random number p is equal to or less than ε (step S24).

ステップＳ２４において、乱数ｐがε以下でないと判定されたとき、学習器４は、最大の平均報酬Ｖ_ｔ＋１が対応表ＴＢＬ１に存在するか否かを判定する（ステップＳ２５）。 When it is determined in step S24 that the random number p is not equal to or less than ε, the learning device 4 determines whether or not the maximum average reward V _t+1 exists in the correspondence table TBL1 (step S25).

ステップＳ２５において、最大の平均報酬Ｖ_ｔ＋１が対応表ＴＢＬ１に存在すると判定されたとき、学習器４は、最大の平均報酬Ｖ_ｔ＋１が得られるときのチャネルを候補チャネルから選択する（ステップＳ２６）。なお、最大の平均報酬Ｖ_ｔ＋１が複数存在するとき、学習器４は、複数の最大の平均報酬Ｖ_ｔ＋１のうちの任意の１つの最大の平均報酬Ｖ_ｔ＋１が得られるときのチャネルを候補チャネルから選択する。 When it is determined in step S25 that the maximum average reward V _t+1 exists in the correspondence table TBL1, the learning device 4 selects the channel from which the maximum average reward V _t+1 is obtained from the candidate channels (step S26). Note that when there are a plurality of maximum average rewards V _t+1 _, the learning device 4 selects a channel from which any one maximum average reward V t+1 of the plurality of maximum average rewards V _t+1 is obtained from the candidate channels. select.

一方、ステップＳ２４において、乱数ｐがε以下であると判定されたとき、またはステップＳ２５において、最大の平均報酬Ｖ_ｔ＋１が対応表ＴＢＬ１に存在しないと判定されたとき、学習器４は、候補チャネルからランダムにチャネルを選択する（ステップＳ２７）。 On the other hand, when it is determined in step S24 that the random number p is equal to or less than ε, or when it is determined in step S25 that the maximum average reward V _t+1 does not exist in the correspondence table TBL1, the learning device 4 selects the candidate channel A channel is randomly selected from (step S27).

そして、ステップＳ２６またはステップＳ２７の後、学習器４は、選択したチャネルを選択チャネルとして制御手段３へ出力する（ステップＳ２８）。 After step S26 or step S27, learning device 4 outputs the selected channel to control means 3 as a selected channel (step S28).

その後、学習器４は、観測期間Ｌにおける選択チャネルの状態Ｓを制御手段３から受ける（ステップＳ２９）。 After that, the learning device 4 receives the state S of the selected channel during the observation period L from the control means 3 (step S29).

そして、学習器４は、０～１の乱数ｐを発生させ（ステップＳ３０）、その発生させた乱数ｐがε以下であるか否かを判定する（ステップＳ３１）。 Then, the learning device 4 generates a random number p between 0 and 1 (step S30), and determines whether or not the generated random number p is equal to or less than ε (step S31).

ステップＳ３１において、乱数ｐがε以下でないと判定されたとき、学習器４は、最大の平均報酬Ｖ_ｔ＋１が対応表ＴＢＬ１に存在するか否かを判定する（ステップＳ３２）。 When it is determined in step S31 that the random number p is not equal to or less than ε, the learning device 4 determines whether or not the maximum average reward V _t+1 exists in the correspondence table TBL1 (step S32).

ステップＳ３２において、最大の平均報酬Ｖ_ｔ＋１が対応表ＴＢＬ１に存在すると判定されたとき、学習器４は、観測期間Ｌにおける選択チャネルの状態Ｓに対して、最大の平均報酬Ｖ_ｔ＋１が得られるときのパケット長ｍを選択する（ステップＳ３３）。なお、最大の平均報酬Ｖ_ｔ＋１が複数存在するとき、学習器４は、複数の最大の平均報酬Ｖ_ｔ＋１のうちの任意の１つの最大の平均報酬Ｖ_ｔ＋１が得られるときのパケット長ｍを選択する。 In step S32, when it is determined that the maximum average reward V _t ₊₁ exists in the correspondence table TBL1, the learning device 4 performs is selected (step S33). Note that when there are multiple maximum average rewards V _t+1 , the learning device 4 selects the packet length m when any one maximum average reward V _t+1 is obtained from the multiple maximum average rewards V _t+1 . do.

一方、ステップＳ３１において、乱数ｐがε以下であると判定されたとき、またはステップＳ３２において、最大の平均報酬Ｖ_ｔ＋１が対応表ＴＢＬ１に存在しないと判定されたとき、学習器４は、ランダムにパケット長ｍを選択する（ステップＳ３４）。 On the other hand, when it is determined in step S31 that the random number p is equal to or less than ε, or when it is determined in step S32 that the maximum average reward V _t+1 does not exist in the correspondence table TBL1, the learner 4 randomly selects A packet length m is selected (step S34).

そして、ステップＳ３３またはステップＳ３４の後、学習器４は、選択したパケット長ｍを制御手段３へ出力する（ステップＳ３５）。その後、一連の動作は、図１６のステップＳ３６へ移行する。 After step S33 or step S34, the learning device 4 outputs the selected packet length m to the control means 3 (step S35). After that, the series of operations proceeds to step S36 in FIG.

図１６を参照して、図１５のステップＳ３５の後、学習器４は、パケットの送信結果を制御手段３から受ける（ステップＳ３６）。引き続いて、学習器４は、パケットの送信完了後の空き期間Ｎを制御手段３から受ける（ステップＳ３７）。 Referring to FIG. 16, after step S35 in FIG. 15, learning device 4 receives a packet transmission result from control means 3 (step S36). Subsequently, the learning device 4 receives the vacant period N after completion of packet transmission from the control means 3 (step S37).

そして、学習器４は、パケットの送信結果、空き期間Ｎおよびパケット長ｍを用いて式（２）によって即時報酬Ｒ_ｔを算出し（ステップＳ３８）、その算出した即時報酬Ｒ_ｔを記憶する。即時報酬Ｒ_ｔは、動作期間Ｔにおける即時報酬である。 Then, learning device 4 calculates an immediate reward _Rt by Equation (2) using the packet transmission result, idle period N, and packet length m (step S38), and stores the calculated immediate reward _Rt . The immediate reward _Rt is the immediate reward in the action period T.

その後、学習器４は、即時報酬Ｒ_ｔを用いて式（１）によって平均報酬Ｖ_ｔ＋１を算出し（ステップＳ３９）、観測期間Ｌにおける選択チャネルの状態Ｓに対して平均報酬Ｖ_ｔ＋１を対応表ＴＢＬ１に格納する（ステップＳ４０）。 After that, the learning device 4 calculates the average reward V _t+1 according to the equation (1) using the immediate reward R _t (step S39), and assigns the average reward V _t+1 to the state S of the selected channel in the observation period L as the correspondence table Store in TBL1 (step S40).

そして、一連の動作は、図１５のステップＳ４１へ移行し、学習器４は、別の候補チャネルを制御手段３から受けたか否かを判定する（ステップＳ４１）。 Then, the series of operations proceeds to step S41 in FIG. 15, and learning device 4 determines whether or not another candidate channel has been received from control means 3 (step S41).

ステップＳ４１において、別の候補チャネルを制御手段３から受けなかったと判定されたとき、一連の動作は、ステップＳ２３へ移行する。 When it is determined in step S41 that another candidate channel has not been received from the control means 3, the series of operations proceeds to step S23.

一方、ステップＳ４１において、別の候補チャネルを制御手段３から受けたと判定されたとき、一連の動作は、ステップＳ２２へ移行する。 On the other hand, when it is determined in step S41 that another candidate channel has been received from the control means 3, the series of operations proceeds to step S22.

学習器４は、図１４に示すフローチャートにおいて、端末装置１０における学習器４以外の部分の動作と並行して、図１５および図１６に示すフローチャートのステップＳ２１～ステップＳ４１を繰り返し実行する。 In the flowchart shown in FIG. 14, the learning device 4 repeatedly executes steps S21 to S41 of the flowcharts shown in FIGS. 15 and 16 in parallel with the operations of the terminal device 10 other than the learning device 4. FIG.

そして、学習器４は、図１４に示すステップＳ１の後に信号Ｓ＿ｒｅｑ＿ｍを制御手段３から受けると、図１４に示すステップＳ２において、パケット長ｍを制御手段３へ出力し（ステップＳ３５参照）、図１４に示すステップＳ３の後に信号Ｓ＿ｒｅｑ＿ＣＨを制御手段３から受けると、図１４に示すステップＳ２において、選択チャネルを制御手段３へ出力する（ステップＳ２８参照）。 When learning device 4 receives signal S_req_m from control means 3 after step S1 shown in FIG. 14, learning device 4 outputs packet length m to control means 3 (see step S35) in step S2 shown in FIG. When the signal S_req_CH is received from the control means 3 after step S3 shown in FIG. 14, the selected channel is output to the control means 3 in step S2 shown in FIG. 14 (see step S28).

図１５および図１６に示すフローチャートによれば、学習器４は、１－εの確率で最大の平均報酬Ｖ_ｔ＋１が得られるときのチャネルを候補チャネルから選択し（ステップＳ２６参照）、εの確率でランダムにチャネルを候補チャネルから選択する（ステップＳ２７参照）。そして、最大の平均報酬Ｖ_ｔ＋１が得られるときのチャネルを選択するかランダムにチャネルを選択するかは、発生させる乱数ｐによって決定される（ステップＳ２３，Ｓ２４参照）。 According to the flow charts shown in FIGS. 15 and 16, the learning device 4 selects from the candidate channels the channel when the maximum average reward V _t+1 is obtained with a probability of 1−ε (see step S26), and the probability of ε randomly select a channel from the candidate channels (see step S27). Then, it is determined by the generated random number p whether to select the channel when the maximum average reward Vt ₊₁ is obtained or to select the channel at random (see steps S23 and S24).

従って、１－εの確率で最大の平均報酬Ｖ_ｔ＋１が得られるときのチャネルを選択してパケットを送信でき、εの確率でランダムにチャネルを選択してパケットを送信できるので、１つのチャネルを継続して用いてパケットを送信する場合に比べて他の端末装置による無線通信との衝突を回避してパケットの送信に成功する確率を高くできる。その結果、端末装置１０は、他の端末装置と共存して無線通信を行うことができる。 Therefore, a packet can be transmitted by selecting a channel when the maximum average reward V _t+1 is obtained with a probability of 1-ε, and a channel can be randomly selected and transmitted with a probability of ε. It is possible to avoid collisions with wireless communication by other terminal devices and increase the probability of successful packet transmission, compared to the case where packets are transmitted by continuous use. As a result, the terminal device 10 can coexist with other terminal devices and perform wireless communication.

また、図１５および図１６に示すフローチャートによれば、学習器４は、対応表ＴＢＬ１において、観測期間Ｌにおける選択チャネルの状態Ｓに対応する平均報酬の複数の欄の少なくとも１つの欄に平均報酬Ｖ_ｔ＋１が格納されていれば、１－εの確率で最大の平均報酬Ｖ_ｔ＋１が得られるときのパケット長ｍを選択する（ステップＳ３５の“ＹＥＳ”，ステップＳ３６参照）。 Further, according to the flowcharts shown in FIGS. 15 and 16, the learning device 4 adds the average reward If V _t+1 is stored, select the packet length m when the maximum average reward V _t+1 is obtained with a probability of 1−ε (“YES” in step S35, see step S36).

最大の平均報酬Ｖ_ｔ＋１が得られるので、選択されたパケット長ｍでパケットの送信に成功していることになり、学習を継続することによって、観測期間Ｌにおける選択チャネルの状態“００”，“０１”，“１０”，“１１”のそれぞれに対してパケットの送信に成功するパケット長ｍが決定されることになる。従って、観測期間Ｌにおける選択チャネルの状態“００”，“０１”，“１０”，“１１”に応じてパケット長ｍを変えることによってパケットの送信に成功する確率を高くできる。 Since the maximum average reward V _t+1 is obtained, it means that the packet has been successfully transmitted with the selected packet length m. 01", "10", and "11", the packet length m for successful packet transmission is determined. Therefore, by changing the packet length m according to the states "00", "01", "10" and "11" of the selected channel during the observation period L, the probability of successful packet transmission can be increased.

そして、観測期間Ｌにおける選択チャネルの状態Ｓに対して、最大の平均報酬Ｖ_ｔ＋１が得られるときのパケット長ｍを選択することは、観測期間Ｌにおける選択チャネルの状態Ｓに適合したパケット長ｍを選択することに相当する。 Then, for the state S of the selected channel in the observation period L, selecting the packet length m when the maximum average reward V _t+1 is obtained means that the packet length m is equivalent to selecting

この場合、例えば、観測期間Ｌにおける選択チャネルの状態Ｓ（＝“００”）に対して、第１の長さのパケット長ｍ_１が選択され、観測期間Ｌにおける選択チャネルの状態Ｓ（＝“０１”）に対して、第１の長さよりも長い第２のパケット長ｍ_２が選択され、観測期間Ｌにおける選択チャネルの状態Ｓ（＝“１０”）に対して、第１の長さよりも短い第３のパケット長ｍ_３が選択され、観測期間Ｌにおける選択チャネルの状態Ｓ（＝“１１”）に対して、第２の長さよりも長い第４のパケット長ｍ_４が選択される（ｍ_３＜ｍ_１＜ｍ_２＜ｍ_４）。 In this case, for example, the packet length m1 of the _first length is selected for the state S (="00") of the selected channel in the observation period L, and the state S (="00") of the selected channel in the observation period L 01"), a second packet length _m2 longer than the first length is selected, and for the selected channel state S (="10") in the observation period L, A short _third packet length m3 is selected, and for the selected channel state S (="11") in the observation period L, a _fourth packet length m4 longer than the second length is selected ( m ₃ < m ₁ < m ₂ < m ₄ ).

そして、最大の平均報酬Ｖ_ｔ＋１が得られているので、パケット長ｍ_１～ｍ_４のいずれか１つのパケット長でパケットを送信した場合、パケットの送信に成功し、かつ、パケット長ｍに比例し、空き期間Ｎに反比例する即時報酬Ｒが得られ、その結果、平均報酬Ｖ_ｔ＋１が大きくなる。 Then, since the maximum average reward V _t+1 is obtained, when the packet is transmitted with any one of the packet lengths m ₁ to m ₄ , the packet is successfully transmitted, and the packet is proportional to the packet length m , resulting in an immediate reward R that is inversely proportional to the idle period N, resulting in a larger average reward V _t+1 .

この場合、パケットの送信完了後の空き期間Ｎが短くなれば、空き期間Ｎを観測する期間において他の端末装置による無線通信が行われていることを意味するので、端末装置１０は、他の端末装置と共存して無線通信を行うことができる。 In this case, if the vacant period N after the completion of packet transmission becomes shorter, it means that wireless communication is being performed by another terminal device during the period during which the vacant period N is observed. Wireless communication can be performed while coexisting with the terminal device.

また、図１５および図１６に示すフローチャートに従って学習を繰り返すことによって、観測期間Ｌにおける選択チャネルの状態“００”，“０１”，“１０”，“１１”と、観測期間Ｌが経過した後のアイドル状態の長さとの間に、一定の傾向が存在することを見出すことができる。 Further, by repeating learning according to the flowcharts shown in FIGS. It can be seen that there is a certain trend between the length of idle state.

上記においては、観測期間Ｌは、２個のスロットＳＬであるとして、観測期間Ｌにおけるチャネルの状態Ｓを“００”，“０１”，“１０”，“１１”によって表したが、この発明の実施の形態においては、これに限らず、観測期間Ｌは、３個のスロットＳＬであるとして、観測期間Ｌにおけるチャネルの状態Ｓを“０００”，“００１”，“０１０”，“０１１”，“１００”，“１０１”，“１１０”，“１１１”によって表してもよく、観測期間Ｌは、４個のスロットＳＬ以上であるとして、観測期間Ｌにおけるチャネルの状態Ｓを４ビット以上で表してもよい。 In the above description, the observation period L is two slots SL, and the channel state S in the observation period L is represented by "00", "01", "10", and "11". In the embodiment, not limited to this, the observation period L is assumed to be three slots SL, and the channel state S in the observation period L is represented by "000", "001", "010", "011", It may be represented by "100", "101", "110", "111", and assuming that the observation period L is four slots SL or more, the channel state S in the observation period L is represented by 4 bits or more. may

そして、観測期間Ｌが長い方が、観測期間Ｌにおけるチャネルの状態Ｓと空いているスロットＳＬとの相関関係が得られ易くなり、観測期間Ｌにおけるチャネルの状態Ｓに対して最適なパケット長ｍを選択し易くできる。 The longer the observation period L, the easier it is to obtain the correlation between the channel state S in the observation period L and the vacant slots SL. can be easily selected.

また、上記においては、スロットＳＬ単位でパケットを送信すると説明したが、この発明の実施の形態においては、これに限らず、スロットＳＬ単位でパケットを送信しなくてもよい。 Also, in the above description, packets are transmitted in units of slot SL, but in the embodiment of the present invention, it is not limited to this, and packets may not be transmitted in units of slot SL.

図１７は、パケット長ｍの異なる決定方法を説明するための図である。図１７を参照して、制御手段３は、アプリケーション６から送信データを受ける。そして、例えば、観測期間Ｌの長さを２００μｓとし、上述した方法によって、１０μｓ毎にビジー状態であるかアイドル状態であるかを判定して観測期間Ｌにおける各チャネルの状態Ｓを２０ビットで表現する。また、選択可能なパケット長ｍとして、例えば、１０μｓ、２０μｓ、３０μｓ、・・・、１００μｓを設定しておく。そして、候補チャネルと、観測期間Ｌにおける各チャネルの状態Ｓと、選択可能なパケット長１～Ｍとを相互に対応付けて対応表ＴＢＬ１と同じ構成の対応表を作成する。 FIG. 17 is a diagram for explaining different methods of determining the packet length m. Referring to FIG. 17, control means 3 receives transmission data from application 6 . Then, for example, the length of the observation period L is 200 μs, and the state S of each channel in the observation period L is represented by 20 bits by determining whether it is in a busy state or an idle state every 10 μs by the method described above. do. Also, as the selectable packet length m, for example, 10 μs, 20 μs, 30 μs, . . . , 100 μs are set. Then, the candidate channels, the state S of each channel in the observation period L, and the selectable packet lengths 1 to M are associated with each other to create a correspondence table having the same structure as the correspondence table TBL1.

観測期間Ｌにおける選択チャネルの状態Ｓに対して、１０μｓのパケット長を選択する場合、制御手段３は、送信データから１０μｓの長さを有する送信データＤ１を検出してパケットを生成する。また、制御手段３は、次のタイミングで２０μｓのパケット長を有するパケットを送信する場合、送信データＤ１に続く部分から２０μｓの長さを有する送信データＤ２を検出してパケットを生成する。更に、制御手段３は、次のタイミングで３０μｓのパケット長を有するパケットを送信する場合、送信データＤ２に続く部分から３０μｓの長さを有する送信データＤ３を検出してパケットを生成する。制御手段３は、以下、同様にして、選択されたパケット長ｍに適合する長さ有する送信データを検出してパケットを生成する。 When selecting a packet length of 10 μs for the state S of the selected channel during the observation period L, the control means 3 detects transmission data D1 having a length of 10 μs from the transmission data and generates a packet. Further, when transmitting a packet having a packet length of 20 μs at the next timing, the control means 3 detects transmission data D2 having a length of 20 μs from a portion following transmission data D1 and generates a packet. Furthermore, when transmitting a packet having a packet length of 30 μs at the next timing, the control means 3 detects transmission data D3 having a length of 30 μs from the portion following transmission data D2 and generates a packet. Thereafter, the control means 3 similarly detects transmission data having a length matching the selected packet length m and generates a packet.

スロットＳＬ単位を用いない場合、即時報酬Ｒ_ｔを算出するときの空き期間Ｎは、１０μｓ単位でビジー状態であるかアイドル状態であるかを判定して検出される。そして、空き期間Ｎが零（＝０）である場合にも即時報酬Ｒ_ｔを算出できるようにするために、所定の時間長（例えば、１０μｓの時間長）を加算し、その加算結果の逆数にパケット長ｍを乗算した乗算結果を即時報酬として算出する。 When the slot SL unit is not used, the idle period N when calculating the immediate reward _Rt is detected by determining whether it is in a busy state or an idle state in units of 10 μs. Then, in order to be able to calculate the immediate reward _Rt even when the idle period N is zero (=0), a predetermined time length (for example, a time length of 10 μs) is added, and the reciprocal of the addition result is is multiplied by the packet length m to calculate the immediate reward.

なお、端末装置１０の動作は、ソフトウェアによって実現されてもよい。この場合、端末装置１０は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）およびＲＡＭ（Random Access Memory）を備える。そして、ＲＯＭは、図１４に示すフローチャート（図１５および図１６に示すフローチャートを含む）の各ステップからなるプログラムＰｒｏｇ＿Ａを記憶する。 Note that the operation of the terminal device 10 may be realized by software. In this case, the terminal device 10 includes a CPU (Central Processing Unit), a ROM (Read Only Memory) and a RAM (Random Access Memory). The ROM stores a program Prog_A consisting of steps of the flowchart shown in FIG. 14 (including the flowcharts shown in FIGS. 15 and 16).

ＣＰＵは、ＲＯＭからプログラムＰｒｏｇ＿Ａを読み出し、その読み出したプログラムＰｒｏｇ＿Ａを実行し、観測期間Lにおける選択チャネルの状態Ｓに適合するパケット長ｍを選択してパケットを送信する。ＲＡＭは、算出された即時報酬Ｒ等を一時的に記憶する。 The CPU reads the program Prog_A from the ROM, executes the read program Prog_A, selects a packet length m suitable for the state S of the selected channel during the observation period L, and transmits the packet. The RAM temporarily stores the calculated immediate reward R and the like.

また、プログラムＰｒｏｇ＿Ａは、ＣＤ，ＤＶＤ等の記録媒体に記録されて流通してもよい。プログラムＰｒｏｇ＿Ａを記録した記録媒体がコンピュータに装着されると、コンピュータは、記録媒体からプログラムＰｒｏｇ＿Ａを読み出して実行し、観測期間Lにおける選択チャネルの状態Ｓに適合するパケット長ｍを選択してパケットを送信する。 Also, the program Prog_A may be recorded on a recording medium such as a CD or DVD and distributed. When the recording medium recording the program Prog_A is loaded into the computer, the computer reads the program Prog_A from the recording medium, executes it, selects the packet length m suitable for the state S of the selected channel during the observation period L, and transmits the packet. Send.

従って、プログラムＰｒｏｇ＿Ａを記録した記録媒体は、コンピュータ読み取り可能な記録媒体である。 Therefore, the recording medium recording the program Prog_A is a computer-readable recording medium.

上述した実施の形態によれば、この発明の実施の形態による端末装置は、
第１の動作期間において、パケットを送信するチャネルである送信用チャネルを用いてパケットを送信する通信手段と、
通信手段によってパケットが送信される毎に、第１の動作期間において、パケットが送信されたときの通信結果を検出するとともにパケットの送信後の無線通信の空き期間を検出する第１の検出手段と、
送信用チャネルを受ける毎に、第１の動作期間において、他の端末装置による無線通信の有無を観測する期間である観測期間における送信用チャネルの状態を検出する第２の検出手段と、
第１の動作期間において検出された通信結果、空き期間、および観測期間における送信用チャネルの状態と、パケットの送信に用いるチャネルの候補である候補チャネルとを受け付け、通信結果および空き期間に基づいて、第１の動作期間においてパケットが送信用チャネルで送信されたときに得られる報酬である即時報酬を算出する第１の処理と、観測期間における送信用チャネルの状態に対応する１つのパケット長を選択した回数によって１つの送信用チャネルにおける即時報酬の累積値を平均した報酬であり、かつ、第１の動作期間の後の動作期間である第２の動作期間における報酬である平均報酬を第１の処理において算出された即時報酬を用いて算出する第２の処理と、候補チャネルと観測期間における送信用チャネルの状態とパケットのパケット長と平均報酬とを対応付けた対応表を作成または更新し、その作成または更新した対応表に基づいて最大の平均報酬が得られるときのチャネルを所定の確率で送信用チャネルとして選択するとともに観測期間における送信用チャネルの状態に応じて最大の平均報酬が得られるときのパケット長を所定の確率で選択し、その選択した送信用チャネルおよびパケット長を出力する第３の処理とを観測期間における送信用チャネルの状態、通信結果および空き期間を受け付ける毎に実行する学習器とを備え、
通信手段は、更に、第３の処理において選択された送信用チャネルおよびパケット長を学習器から受ける毎に、第２の動作期間において、その受けた送信用チャネルが空いているとき、学習器から受けたパケット長を有するパケットを送信するものであればよい。 According to the above-described embodiments, the terminal device according to the embodiments of the present invention is
communication means for transmitting packets using a transmission channel, which is a channel for transmitting packets, during the first operation period;
a first detection means for detecting, in a first operation period, a communication result when the packet is transmitted and an idle period of wireless communication after the packet is transmitted, each time the packet is transmitted by the communication means; ,
a second detection means for detecting the state of the transmission channel in an observation period, which is a period for observing the presence or absence of wireless communication by other terminal devices, each time the transmission channel is received;
Accepting the communication result detected in the first operation period, the idle period, the state of the transmission channel in the observation period, and the candidate channel that is a candidate for the channel used for packet transmission, and based on the communication result and the idle period , a first process of calculating an immediate reward, which is a reward obtained when a packet is transmitted in the transmission channel during the first operation period, and one packet length corresponding to the state of the transmission channel during the observation period. The average reward that is the reward obtained by averaging the cumulative value of the immediate reward in one transmission channel by the selected number of times and the reward in the second operation period that is the operation period after the first operation period is the first Creates or updates a correspondence table that associates the second process calculated using the immediate reward calculated in the process of 1, the candidate channel, the state of the transmission channel in the observation period, the packet length of the packet, and the average reward. , based on the created or updated correspondence table, selects a channel for transmission with a predetermined probability as a channel for obtaining the maximum average reward, and obtains the maximum average reward according to the state of the transmission channel during the observation period. A third process of selecting the packet length when the packet is available with a predetermined probability and outputting the selected transmission channel and packet length is executed each time the state of the transmission channel, the communication result, and the idle period in the observation period are received. and a learner that
Further, each time the communication means receives from the learning device the transmission channel and the packet length selected in the third processing, during the second operation period, when the received transmission channel is free, from the learning device Any device that transmits a packet having the received packet length may be used.

端末装置は、このような構成を備えていれば、観測期間における送信用チャネルの状態に適合したパケット長を選択し、観測期間が経過した後に、観測期間における送信用チャネルの状態に適合したパケット長でパケットを送信できるとともに、他の端末装置による無線通信も可能になり、他の端末装置と共存して無線通信を行うことができるからである。 If the terminal device has such a configuration, it selects a packet length suitable for the state of the transmission channel during the observation period, and after the observation period has passed, transmits a packet suitable for the state of the transmission channel during the observation period. This is because it is possible to transmit packets in a long packet, and wireless communication by other terminal devices is also possible, so that wireless communication can be performed while coexisting with other terminal devices.

また、この発明の実施の形態によるプログラムは、
通信手段が、第１の動作期間において、パケットを送信するチャネルである送信用チャネルを用いてパケットを送信する第１のステップと、
第１の検出手段が、第１のステップにおいてパケットが送信される毎に、第１の動作期間において、パケットが送信されたときの通信結果を検出するとともにパケットの送信後の無線通信の空き期間を検出する第２のステップと、
第２の検出手段が、送信用チャネルを受ける毎に、第１の動作期間において、他の端末装置による無線通信の有無を観測する期間である観測期間における送信用チャネルの状態を検出する第３のステップと、
学習器が、第１の動作期間において検出された通信結果、空き期間、および観測期間における前記送信用チャネルの状態と、パケットの送信に用いるチャネルの候補である候補チャネルとを受け付け、通信結果および空き期間に基づいて、第１の動作期間においてパケットが送信用チャネルで送信されたときに得られる報酬である即時報酬を算出する第１の処理と、観測期間における送信用チャネルの状態に対応する１つのパケット長を選択した回数によって１つの送信用チャネルにおける即時報酬の累積値を平均した報酬であり、かつ、第１の動作期間の後の動作期間である第２の動作期間における報酬である平均報酬を第１の処理において算出された即時報酬を用いて算出する第２の処理と、候補チャネルと観測期間における送信用チャネルの状態とパケットのパケット長と平均報酬とを対応付けた対応表を作成または更新し、その作成または更新した対応表に基づいて最大の平均報酬が得られるときのチャネルを所定の確率で送信用チャネルとして選択するとともに観測期間における送信用チャネルの状態に応じて最大の平均報酬が得られるときのパケット長を所定の確率で選択し、その選択した送信用チャネルおよびパケット長を出力する第３の処理とを観測期間における送信用チャネルの状態、通信結果および空き期間を受け付ける毎に実行する第４のステップとをコンピュータに実行させ、
通信手段は、第１のステップにおいて、更に、第３の処理において選択された送信用チャネルおよびパケット長を学習器から受ける毎に、第２の動作期間において、その受けた送信用チャネルが空いているとき、学習器から受けたパケット長を有するパケットを送信するものであればよい。 Also, the program according to the embodiment of the present invention is
a first step in which the communication means transmits packets using a transmission channel, which is a channel for transmitting packets, during a first operation period;
The first detection means detects, in the first operation period, the communication result at the time the packet is transmitted each time the packet is transmitted in the first step, and the idle period of wireless communication after the packet is transmitted. a second step of detecting
A third detection means for detecting the state of the transmission channel in an observation period, which is a period for observing the presence or absence of wireless communication by other terminal devices, in the first operation period each time the second detection means receives the transmission channel. a step of
A learning device receives the communication result detected in the first operation period, the idle period, the state of the transmission channel in the observation period, and candidate channels that are candidates for channels used for packet transmission, and receives the communication result and A first process of calculating an immediate reward, which is a reward obtained when a packet is transmitted in the transmission channel during the first operation period, based on the vacant period, and corresponding to the state of the transmission channel during the observation period. It is a reward obtained by averaging the cumulative value of immediate rewards in one transmission channel according to the number of times one packet length is selected, and is a reward in a second operation period that is an operation period after the first operation period. A second process for calculating the average reward using the immediate reward calculated in the first process, and a correspondence table that associates the candidate channel, the state of the transmission channel in the observation period, the packet length of the packet, and the average reward. is created or updated, and based on the created or updated correspondence table, the channel that yields the maximum average reward is selected as the transmission channel with a predetermined probability, and the maximum and a third process of selecting the packet length when the average reward of is obtained with a predetermined probability, and outputting the selected transmission channel and packet length, and the state of the transmission channel, the communication result, and the idle period during the observation period. cause the computer to execute a fourth step that is executed each time the
In the first step, the communication means further receives the transmission channel and the packet length selected in the third process from the learner, during the second operation period, when the received transmission channel becomes available. It is sufficient if the packet having the packet length received from the learner is transmitted when the learning device is on.

プログラムが第１のステップから第４のステップをコンピュータに実行させると、観測期間における送信用チャネルの状態に適合したパケット長が選択され、観測期間が経過した後に、観測期間における送信用チャネルの状態に適合したパケット長でパケットを送信できるとともに、他の端末装置による無線通信も可能になり、他の端末装置と共存して無線通信を行うことができるからである。 When the program causes the computer to execute the first to fourth steps, a packet length suitable for the state of the transmission channel during the observation period is selected, and after the observation period has passed, the state of the transmission channel during the observation period is selected. This is because a packet can be transmitted with a packet length suitable for , wireless communication by another terminal device is also possible, and wireless communication can be performed while coexisting with the other terminal device.

この発明の実施の形態においては、候補チャネルから選択された選択チャネルＣＨ＿Ｓｅｌｅｃｔは、「送信用チャネル」を構成する。 In this embodiment of the invention, the selected channel CH_Select selected from the candidate channels constitutes a "channel for transmission".

また、この発明の実施の形態においては、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを検出するとともにＡＣＫパケットを受信する受信手段２と、受信電力スペクトルＰＷ＿ｃｈｎに基づいて空き期間Ｎを検出する制御手段３とは、「第１の検出手段」を構成する。 Further, in the embodiment of the present invention, the receiving means 2 that detects the received power spectrum PW_carrier_L and receives an ACK packet, and the control means 3 that detects the idle period N based on the received power spectrum PW_chn 1 detection means”.

更に、この発明の実施の形態においては、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌを検出する受信手段２と、受信電力スペクトルＰＷ＿ｃａｒｒｉｅｒ＿Ｌに基づいて観測期間Ｌにおける選択チャネルＣＨ＿Ｓｅｌｅｃｔの状態Ｓを検出する制御手段３とは、「第２の検出手段」を構成する。 Furthermore, in the embodiment of the present invention, the receiving means 2 that detects the received power spectrum PW_carrier_L and the control means 3 that detects the state S of the selected channel CH_Select in the observation period L based on the received power spectrum PW_carrier_L are: constitute a second detecting means.

更に、この発明の実施の形態においては、パケットを送信手段５へ出力する制御手段３と、パケットを送信する送信手段５とは、「通信手段」を構成する。 Furthermore, in the embodiment of the present invention, the control means 3 for outputting packets to the transmission means 5 and the transmission means 5 for transmitting packets constitute "communication means".

更に、この発明の実施の形態においては、スロットＳＬ単位でパケットが送信される場合において、即時報酬Ｒ_ｔを算出するときに空き期間Ｎに加算される“１”は、１つのスロットＳＬを意味するので、式（２Ａ）の“Ｎ＋１”は、実質的に、Ｎ個のスロットＳＬの時間長に１つのスロットＳＬの時間長を加算することを意味する。また、スロットＳＬ単位でパケットが送信されない場合において、即時報酬Ｒ_ｔを算出するとき、所定の時間長（例えば、１０μｓの時間長）が空き期間Ｎ（１０μｓのアイドル状態の総和からなる）に加算される。その結果、スロットＳＬ単位でパケットが送信される場合、およびスロットＳＬ単位でパケットが送信されない場合の両方において、即時報酬Ｒ_ｔを算出するとき、所定の時間長が空き期間Ｎに加算されることになる。従って、スロットＳＬ単位でパケットが送信される場合において、即時報酬Ｒ_ｔを算出するときに空き期間Ｎに加算される“１”、およびスロットＳＬ単位でパケットが送信されない場合において、即時報酬Ｒ_ｔを算出するときに空き期間Ｎに加算される所定の時間長（例えば、１０μｓの時間長）は、空き期間Ｎに加算される「所定の期間」を構成する。 Furthermore, in the embodiment of the present invention, when packets are transmitted in slot SL units, "1" added to the idle period N when calculating the immediate reward _Rt means one slot SL. Therefore, "N+1" in equation (2A) substantially means adding the time length of one slot SL to the time length of N slots SL. Also, when packets are not transmitted in slot SL units, when calculating the immediate reward _Rt , a predetermined length of time (for example, a length of time of 10 μs) is added to the idle period N (consisting of the total idle state of 10 μs). be done. As a result, when calculating the immediate reward _Rt , a predetermined length of time is added to the vacant period N both when packets are transmitted in slot SL units and when packets are not transmitted in slot SL units. become. Therefore, when packets are transmitted in slot SL units, "1" is added to the idle period N when calculating the immediate reward _Rt , and when packets are not transmitted in slot SL units, the immediate reward _Rt A predetermined length of time (for example, a length of 10 μs) added to the vacant period N when calculating constitutes a “predetermined period” added to the vacant period N.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は、上記した実施の形態の説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 It should be considered that the embodiments disclosed this time are illustrative in all respects and not restrictive. The scope of the present invention is indicated by the scope of the claims rather than the description of the above-described embodiments, and is intended to include all modifications within the scope and meaning equivalent to the scope of the claims.

この発明は、端末装置、コンピュータに実行させるためのプログラムおよびプログラムを記録したコンピュータ読み取り可能な記録媒体に適用される。 The present invention is applied to a terminal device, a program to be executed by a computer, and a computer-readable recording medium recording the program.

１アンテナ、２受信手段、３制御手段、４学習器、５送信手段、６アプリケーション、１０端末装置、１００通信システム。 REFERENCE SIGNS LIST 1 antenna, 2 receiving means, 3 control means, 4 learning device, 5 transmitting means, 6 application, 10 terminal device, 100 communication system.

Claims

communication means for transmitting the packet using a transmission channel, which is a channel for transmitting the packet, during a first operation period;
each time the packet is transmitted by the communication means, in the first operation period, a communication result when the packet is transmitted is detected, and an idle period of wireless communication after transmission of the packet is detected; 1 detection means;
a second detection means for detecting the state of the transmission channel in an observation period, which is a period for observing the presence or absence of wireless communication by other terminal devices, each time the transmission channel is received, in the first operation period; ,
receiving the communication result detected in the first operation period, the state of the transmission channel in the idle period and the observation period, and a candidate channel that is a candidate for the channel to be used in the packet transmission, and performing the communication; a first process of calculating an immediate reward, which is a reward obtained when the packet is transmitted over the transmission channel in the first operation period, based on the result and the idle period; A reward obtained by averaging the cumulative value of the immediate reward in one transmission channel according to the number of times one packet length corresponding to the state of the transmission channel is selected, and an operation period after the first operation period. using the immediate reward calculated in the first process, and the state of the candidate channel and the transmission channel in the observation period and the packet length of the packet and the average reward are created or updated, and based on the created or updated correspondence table, the channel when the maximum average reward is obtained is selected with a predetermined probability. Selecting the transmission channel and selecting, with the predetermined probability, the packet length when the maximum average reward is obtained according to the state of the transmission channel during the observation period, and selecting the selected transmission channel and packet a third process for outputting a length, and a learning device that executes each time the state of the transmission channel in the observation period, the communication result, and the idle period are received,
Each time the communication means receives the transmission channel and packet length selected in the third process from the learning device, during the second operation period, when the received transmission channel is available , a terminal device that transmits a packet having a packet length received from the learner.

In the first processing, the learning device calculates the immediate reward as zero when the communication result is a failure of transmission of the packet, and when the communication result is a success of transmission of the packet, the 2. The terminal device according to claim 1, wherein a reciprocal of an addition result obtained by adding a predetermined period to an idle period is calculated as the immediate reward.

In the second processing, the learning device calculates one packet length corresponding to the immediate reward in the first operation period, the average reward in the first operation period, and the state of the transmission channel in the observation period. 3. The terminal device according to claim 1, wherein an average reward in said second operation period is calculated based on the number of times selected and said average reward is updated.

In the second processing, the learning device sets the immediate reward in the first action period to _Rt , the average reward in the first action period to _Vt , and the average reward in the second action period to Let V _t+1 and let n be the number of times one packet length is selected corresponding to the state of the transmission channel during the observation period (n is an integer equal to or greater than 1). 4. A terminal according to claim 3, wherein said average reward is updated by calculating an average reward Vt ₊₁ .
V _t+1 =V _t +(R _t −V _t )/n (1)

In the third processing, the learning device selects the channel having the maximum average reward in the second operation period with probability (1−ε) (ε is a real number in the range of 1 to 0). 5. The terminal apparatus according to claim 3, wherein the terminal apparatus selects from candidate channels as said transmission channel, and selects an arbitrary channel from said candidate channels as said transmission channel with probability ?.

3 to 4, wherein in the third processing, the learning device selects a packet length that maximizes the average reward in the second operation period with respect to the state of the transmission channel in the observation period. 6. The terminal device according to any one of 5.

when the transmission success rate, which is the probability of successful transmission of the packet, is equal to or less than a threshold, selecting a channel in a band different from the band of the candidate channel as a new candidate channel, and using the selected new candidate channel. further comprising a control means for controlling the learner such that
The learning device performs the first process, the second process, and the third process using the new candidate channel, and obtains the state of the transmission channel, the communication result, and the idle period during the observation period. 7. The terminal device according to any one of claims 1 to 6, which is executed each time it is received.

a first step in which the communication means transmits the packet using a transmission channel, which is a channel for transmitting the packet, during a first operation period;
A first detection means detects, in the first operation period, a communication result when the packet is transmitted each time the packet is transmitted in the first step, and after transmission of the packet. a second step of detecting an idle period of wireless communication;
A second detection means detects the state of the transmission channel during an observation period, which is a period for observing the presence or absence of wireless communication by other terminal devices, in the first operation period, each time the transmission channel is received. a third step of
A learning device learns the communication result detected in the first operation period, the state of the transmission channel in the idle period, and the observation period, and a candidate channel that is a candidate for the channel used to transmit the packet. a first process of calculating an immediate reward, which is a reward obtained when the packet is transmitted through the transmission channel in the first operation period, based on the acceptance, the communication result, and the idle period; A reward obtained by averaging the cumulative value of the immediate rewards in one transmission channel by the number of times one packet length corresponding to the state of the transmission channel is selected in the observation period, and in the first operation period a second process of calculating an average reward, which is a reward in a second operation period, which is a later operation period, using the immediate reward calculated in the first process; creating or updating a correspondence table that associates the state of the credit channel, the packet length of the packet, and the average reward, and determining the channel when the maximum average reward is obtained based on the created or updated correspondence table and selecting the packet length when the maximum average reward is obtained according to the state of the transmission channel in the observation period with the predetermined probability, and selecting the selected transmission channel with the probability of causing a computer to execute a third process of outputting a trusted channel and a packet length, and a fourth step of executing each time the state of the transmission channel, the communication result, and the idle period during the observation period are received;
In the first step, the communication means further receives the transmission channel and the packet length selected in the third process from the learner, during the second operation period, the received transmission. A program for executing a computer that transmits a packet having a packet length received from said learner when a trusted channel is free.

In the first processing of the fourth step, the learning device calculates the immediate reward as zero when the communication result is a failure to transmit the packet, and the communication result is failure to transmit the packet. 9. The program to be executed by a computer according to claim 8, wherein, when successful, a reciprocal of an addition result obtained by adding a predetermined period to said idle period is calculated as said immediate reward.

In the second processing of the fourth step, the learning device corresponds to the immediate reward in the first operation period, the average reward in the first operation period, and the state of the transmission channel in the observation period. 10. The program to be executed by a computer according to claim 8 or 9, wherein the average reward in the second operation period is calculated based on the number of times one packet length is selected and the average reward is updated. .

In the second processing of the fourth step, the learner sets the immediate reward in the first action period to R _t , the average reward in the first action period to V _t , and the second Let V _t+1 be the average reward during the operation period, and let n be the number of times one packet length is selected corresponding to the state of the transmission channel during the observation period (n is an integer of 1 or more). 11. The computer-implemented program according to claim 10, wherein said average reward is updated by calculating the average reward Vt ₊₁ according to formula (1) of .
V _t+1 =V _t +(R _t −V _t )/n (1)

In the third processing of the fourth step, the learning device has a probability of (1−ε) (ε is a real number in the range of 1 to 0) that the average reward in the second action period is 12. The computer according to claim 10, wherein the maximum channel is selected as the transmission channel from the candidate channels, and an arbitrary channel is selected from the candidate channels as the transmission channel with probability ε. program for.

wherein, in the third processing of the fourth step, the learning device selects a packet length that maximizes the average reward in the second operation period for the state of the transmission channel in the observation period; A program to be executed by the computer according to any one of claims 10 to 12.

The control means selects a channel having a band different from that of the candidate channel as a new candidate channel when the transmission success rate, which is the probability of successful transmission of the packet, is equal to or less than a threshold value, and further causing the computer to perform a fifth step of controlling the learner to use a candidate channel;
The learning device performs the first process, the second process, and the third process using the new candidate channel, and obtains the state of the transmission channel, the communication result, and the idle period during the observation period. 14. The program to be executed by the computer according to any one of claims 8 to 13, which is executed each time it is received.

A computer-readable recording medium recording the program according to any one of claims 8 to 14.