JP7393765B2

JP7393765B2 - Wireless communication device, beam direction control device, beam direction control method and program

Info

Publication number: JP7393765B2
Application number: JP2020104391A
Authority: JP
Inventors: 俊翔黄; 裕史白戸; 直樹北; 高至山本; 優介香田
Original assignee: Kyoto University; Nippon Telegraph and Telephone Corp
Current assignee: Kyoto University; Nippon Telegraph and Telephone Corp
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2023-12-07
Anticipated expiration: 2040-06-17
Also published as: JP2021197674A

Description

本発明は、無線通信装置、ビーム方向制御装置、ビーム方向制御方法及びプログラムに関する。 The present invention relates to a wireless communication device, a beam direction control device, a beam direction control method, and a program.

近年、高速通信が可能なミリ波無線通信技術が注目されている。ミリ波帯を採用した無線通信システムとして、無線ＬＡＮ（IEEE 802.11ad、802.11ay）／無線ＰＡＮ（IEEE 802.15.3e）に加え、第五世代の移動体通信システムなどで国際標準規格が制定されている。 In recent years, millimeter wave wireless communication technology that enables high-speed communication has attracted attention. In addition to wireless LAN (IEEE 802.11ad, 802.11ay)/wireless PAN (IEEE 802.15.3e), international standards have been established for wireless communication systems that use the millimeter wave band, such as fifth-generation mobile communication systems. There is.

周波数が６ＧＨｚ以下のマイクロ波帯に比べて、周波数が数十ＧＨｚ以上のミリ波帯では、伝搬減衰が大きいという特徴を有する。こうした伝搬減衰を補償するため、上記の標準化された無線通信システムでは、アレーアンテナを用いて指向性ビームを形成することにより無線通信品質を改善する技術が盛り込まれている。特定の方向に形成されたアレーアンテナのビーム（以下、ビーム方向と記載）により、ミリ波帯の無線通信システムは屋外での高速の無線伝送にも適用可能となった。 A millimeter wave band with a frequency of several tens of GHz or more has a characteristic that propagation attenuation is large compared to a microwave band with a frequency of 6 GHz or less. In order to compensate for such propagation attenuation, the standardized wireless communication system described above incorporates a technique for improving wireless communication quality by forming a directional beam using an array antenna. The beam of the array antenna formed in a specific direction (hereinafter referred to as beam direction) has made millimeter wave band wireless communication systems applicable to high-speed wireless transmission outdoors.

屋外にミリ波無線通信システムを適用することにより、マクロセル（Macro-Cell）とピコセル（Pico-Cell）とを接続した無線通信システムがある（例えば、非特許文献１参照）。ピコセルは、主に屋外でエンドユーザに近い場所（電柱や架線等）に設置されることが一般的である。こうした屋外環境では、風などの外因により、設置された送信機の変位が頻繁に発生する。送信機の変位が発生すると、送受信で対向させるべきビーム方向の不対向が生じるため伝搬減衰を補償できず、無線通信品質の低下が頻発する。 There is a wireless communication system in which a macro cell (Macro-Cell) and a pico-cell (Pico-Cell) are connected by applying a millimeter wave wireless communication system outdoors (for example, see Non-Patent Document 1). Pico cells are generally installed outdoors near end users (on utility poles, overhead wires, etc.). In such outdoor environments, installed transmitters are frequently displaced due to external factors such as wind. When the transmitter is displaced, the beam directions that should be opposed during transmission and reception become unopposed, making it impossible to compensate for propagation attenuation, resulting in frequent deterioration of wireless communication quality.

図８を用いて、風などの外因によりビーム方向の不対向が発生する一例を説明する。無線通信装置９１は、電柱９２と電柱９３との間の架線９４上に取り付けられている。一方、無線通信装置９１の通信先の無線通信装置９５は、建物９６に設置されている。無線通信装置９１のビームＢ９１は、無線通信装置９５のビームＢ９５と対向するように設定される。しかし、図８では、風の力により無線通信装置９１が押されることによりビームＢ９１の方向が対向する無線通信装置９５のビームＢ９５から外れてしまい、無線通信品質が低下するという事象が発生している。このように、小型化されたアンテナを用いる無線通信装置は、風などの外力によりビーム方向が不規則な運動で揺れ、対向する装置とのビーム不対向が生じる。 An example in which beam directions become unopposed due to external factors such as wind will be described with reference to FIG. 8. The wireless communication device 91 is attached on an overhead wire 94 between a utility pole 92 and a utility pole 93. On the other hand, a wireless communication device 95 to which the wireless communication device 91 communicates is installed in a building 96 . Beam B91 of wireless communication device 91 is set to face beam B95 of wireless communication device 95. However, in FIG. 8, when the wireless communication device 91 is pushed by the force of the wind, the direction of the beam B91 deviates from the beam B95 of the opposing wireless communication device 95, resulting in a decrease in wireless communication quality. There is. In this way, in a wireless communication device using a miniaturized antenna, the beam direction swings in irregular motion due to external forces such as wind, resulting in beam misalignment with the opposing device.

通常のミリ波無線通信システムでは、無線通信品質を改善するため、通信する両方の無線通信装置でビーム方向を合わせる必要がある。ビーム方向を合わせる方法として、両方の無線通信装置それぞれが変更し得るビーム方向をすべて探索し、最良の無線通信品質が得られるビーム方向（例えば受信電力が最大となる方向）を特定した上で、その特定したビーム方向を通信相手に通知する方法が考えられる。しかし、風により頻繁に振動するような環境では、無線通信品質を維持するためには、双方の無線通信装置が最良の無線通信品質が得られるように頻繁にビーム方向を探索しなければならない。上述したようなビーム方向の探索が頻繁に行われた場合、利用可能な無線通信リソース（タイムスロットなど）の減少が大きな問題となる。 In a typical millimeter wave wireless communication system, in order to improve wireless communication quality, it is necessary to align the beam directions of both communicating wireless communication devices. As a method for aligning the beam directions, search all the beam directions that can be changed by both wireless communication devices, identify the beam direction that provides the best wireless communication quality (for example, the direction where the received power is maximum), and then A possible method is to notify the communication partner of the identified beam direction. However, in an environment where there is frequent vibration due to wind, in order to maintain wireless communication quality, both wireless communication devices must frequently search for beam directions to obtain the best wireless communication quality. When beam direction searches as described above are performed frequently, a reduction in available wireless communication resources (time slots, etc.) becomes a major problem.

上述のように、ミリ波無線通信システムでは、無線通信品質を改善するために通信する双方のビーム方向を対向させる必要があることから、頻繁にビーム方向に変動が生じる環境においては、頻繁にビーム方向を探索する必要性が増加する。しかし、無線通信装置がこの探索を行っている期間では、無線通信に利用可能なリソースが減少するという問題がある。 As mentioned above, in a millimeter wave wireless communication system, in order to improve the quality of wireless communication, it is necessary to make the beam directions of both sides facing each other. Increased need to search for direction. However, there is a problem in that during the period when the wireless communication device is performing this search, resources available for wireless communication decrease.

一方で、人体によるシャドーイングという環境の外因により無線通信品質への影響が生じうるミリ波無線通信システムを、学習に基づいて制御する技術がある（非特許文献２参照）。図９を用いて、この技術において想定される無線通信システムの構成概要と、その無線通信システムの無線通信品質に影響を与える外因について説明する。図９に示すように、屋内にミリ波のアクセスポイント（以下、ＡＰと記載）９８が２台設置されている。これら２台のＡＰ９８を、ＡＰ９８－１、ＡＰ９８－２と記載する。端末局（以下、ＳＴＡと記載）９９は、ビーム方向の変更により、ＡＰ９８－１またはＡＰ９８－２と無線通信が可能である。歩行者９７は、ＡＰ９８－１とＳＴＡ９９の間、またはＡＰ９８－２とＳＴＡ９９の間のランダムな経路Ｗを移動する。歩行者９７は、移動の方向や速度の変更も可能である。歩行者９７がＡＰ９８－１とＳＴＡとの間を横切る際のある瞬間に、ＡＰ９８－１またはＳＴＡ９９のいずれかのビーム、もしくはそれら両方のビームが歩行者９７（人体）により遮蔽された場合、シャドーイングにより無線通信品質が大きく劣化する。こうした人体遮蔽の事象は、無線通信品質に影響を与える外因となる。 On the other hand, there is a technique for controlling, based on learning, a millimeter wave wireless communication system in which wireless communication quality may be affected by an external environmental factor such as shadowing by a human body (see Non-Patent Document 2). An overview of the configuration of a wireless communication system assumed in this technology and external factors that affect the wireless communication quality of the wireless communication system will be described using FIG. 9. As shown in FIG. 9, two millimeter wave access points (hereinafter referred to as AP) 98 are installed indoors. These two AP98s are referred to as AP98-1 and AP98-2. A terminal station (hereinafter referred to as STA) 99 is capable of wireless communication with AP 98-1 or AP 98-2 by changing the beam direction. The pedestrian 97 moves along a random route W between the AP 98-1 and the STA 99 or between the AP 98-2 and the STA 99. The pedestrian 97 can also change the direction and speed of movement. If either the beam of AP98-1 or STA99, or both beams are blocked by the pedestrian 97 (human body) at a certain moment when the pedestrian 97 crosses between AP98-1 and STA, a shadow occurs. wireless communication quality deteriorates significantly. Such a human body shielding event becomes an external cause that affects wireless communication quality.

非特許文献２では、上述した人体遮蔽といった外因の影響を克服するため、カメラの映像により歩行者の位置を把握し、その位置に基づいてＳＴＡにとって最適な通信が可能なＡＰを選択する制御方法が提案されている。すなわち、通信中のＡＰとＳＴＡの指向性ビームが人体に遮蔽される事象を外因として、カメラ映像で歩行者の位置情報を取得する。そして、取得した位置情報に基づいて遮蔽の発生を予測した上で、人体に遮蔽されないＡＰにＳＴＡをハンドオーバさせるよう制御する。図９に示す無線通信システムの場合、この動作例は次のようになる。すなわち、ＳＴＡ９９は、ＡＰ９８－１と通信中に、映像に基づいてＡＰ９８－１とＳＴＡ９９との間を横断する歩行者９７により遮蔽が生じると予測すると、予めビーム方向をＡＰ９８－２に向けるよう変更（ハンドオーバ制御）する。これにより、長期的に観測した無線通信品質（スループットの累積値など）を最大化して、人体遮蔽による無線通信品質の低下を回避できる。 In Non-Patent Document 2, in order to overcome the influence of external factors such as the above-mentioned human body shielding, a control method is disclosed in which the position of a pedestrian is grasped from a camera image, and based on the position, an AP with which communication is possible that is optimal for the STA is selected. is proposed. That is, position information of a pedestrian is acquired from a camera image using an event in which the directional beams of the communicating AP and STA are blocked by a human body as an external cause. Then, after predicting the occurrence of shielding based on the acquired position information, control is performed so that the STA is handed over to an AP that is not shielded by the human body. In the case of the wireless communication system shown in FIG. 9, an example of this operation is as follows. That is, if STA 99 predicts that a pedestrian 97 crossing between AP 98-1 and STA 99 will block the area based on the image while communicating with AP 98-1, it changes the beam direction in advance to point toward AP 98-2. (handover control). This makes it possible to maximize wireless communication quality observed over a long period of time (cumulative value of throughput, etc.) and avoid deterioration in wireless communication quality due to human body shielding.

S. Hur，et al.，“Millimeter Wave Beamforming for Wireless Backhaul and Access in Small Cell Networks”，Fig.1. Multi-tiered cell using wireless backhaul，IEEE Transactions on Communications，Vol. 61，No. 10，Oct. 2013S. Hur, et al., “Millimeter Wave Beamforming for Wireless Backhaul and Access in Small Cell Networks”, Fig. 1. Multi-tiered cell using wireless backhaul, IEEE Transactions on Communications, Vol. 61, No. 10, Oct. 2013 香田他，“遮蔽者の位置情報を活用したミリ波通信ハンドオーバ制御への強化学習応用”，一般社団法人電子情報通信学会，信学技報SR2017-131，2018年，p.95-102Koda et al., “Application of reinforcement learning to millimeter wave communication handover control using position information of the occluding person”, Institute of Electronics, Information and Communication Engineers, IEICE Technical Report SR2017-131, 2018, p.95-102

上述した非特許文献２の制御方法は、無線通信品質に影響を与える環境の外因として、屋内環境における人体遮蔽の事象のみを考慮している。これは、一人の歩行者の平面移動といった比較的単純な外因に対する制御方法である。このように、非特許文献２の技術では、比較的単純な外因について学習を行うため、膨大な処理リソースを用いることなく、環境状態／制御方法の対応関係をうまく学習できる。 The control method of Non-Patent Document 2 described above only considers the phenomenon of human body shielding in an indoor environment as an external environmental factor that affects wireless communication quality. This is a control method for a relatively simple external cause such as the plane movement of a single pedestrian. In this way, in the technique of Non-Patent Document 2, since learning is performed for relatively simple external causes, the correspondence between environmental states and control methods can be successfully learned without using enormous processing resources.

しかしながら、図８に示したような屋外環境では、無線通信品質へ影響を与える外因が多数存在する。例えば、風の条件に関しては、瞬時風速や風向、大気密度、空気の抗力係数などの条件について考慮する必要がある。また、無線通信装置の設置条件に関しては、架線長、架線の材質、地上からの高さなどの条件について考慮する必要がある。このような、多数の外因についてすべて網羅的に学習することは非常に困難である。非特許文献２に示された学習に基づく制御方法では、外因が頻繁に変動するときに、新たな外因に対応しきれず、改めて学習する必要が生じる。言い換えれば、未学習の外因があれば、ビーム方向が不対向になりえる。つまり、この学習に基づく制御方法自体は、多数の外因が存在する環境においてはビーム方向制御の失敗回数が増えると考えられる。 However, in an outdoor environment as shown in FIG. 8, there are many external factors that affect wireless communication quality. For example, regarding wind conditions, it is necessary to consider conditions such as instantaneous wind speed, wind direction, atmospheric density, and air drag coefficient. Furthermore, regarding the installation conditions of the wireless communication device, it is necessary to consider conditions such as the length of the overhead wire, the material of the overhead wire, and the height from the ground. It is extremely difficult to comprehensively learn all about such a large number of external causes. In the control method based on learning shown in Non-Patent Document 2, when the external cause frequently changes, it is not possible to cope with the new external cause, and it becomes necessary to perform learning again. In other words, if there is an unlearned external cause, the beam directions can become unopposed. In other words, the control method itself based on this learning is considered to increase the number of failures in beam direction control in an environment where many external factors exist.

上記事情に鑑み、本発明は、複雑な外因の変動環境においても、ビーム方向制御の失敗を低減できる無線通信装置、ビーム方向制御装置、ビーム方向制御方法及びプログラムを提供することを目的としている。 In view of the above circumstances, an object of the present invention is to provide a wireless communication device, a beam direction control device, a beam direction control method, and a program that can reduce failures in beam direction control even in an environment where complex external factors change.

本発明の一態様は、ビーム方向を制御可能な無線通信装置であって、ビームを形成して無線通信を行う無線通信部と、自装置の設置環境に関する情報である環境状態情報を取得するセンサと、前記無線通信部による無線通信の品質を示す無線通信品質情報を取得する無線通信品質監視部と、前記無線通信部に対してビーム方向の制御指示を出力するビーム方向制御部と、を備え、前記ビーム方向制御部は、前記環境状態情報とビーム方向が制御された前後の前記無線通信品質情報とを用いて、前記環境状態情報に応じて無線通信の品質を向上させるビーム方向の制御の方法を示すビーム制御方策を学習し、学習結果に基づいて前記環境状態情報に応じた前記ビーム制御方策を決定し、決定した前記ビーム制御方策に従ったビーム方向の制御指示を前記無線通信部に出力する第１の学習部と、環境状態情報を生成するための演算を示す情報生成方策に従って生成された環境状態情報と、生成された前記環境状態情報に応じて前記第１の学習部が出力した前記制御指示に基づいてビーム方向が制御された前後の前記無線通信品質情報とを用いて、無線通信の品質を低下させる環境状態情報を生成する情報生成方策を学習し、学習された前記情報生成方策に基づいて環境状態情報を生成する第２の学習部と、前記センサにより取得した前記環境状態情報と前記第２の学習部により生成された前記環境状態情報とのいずれを前記第１の学習部に入力するかを切り替える切替部と、を備える、無線通信装置である。 One aspect of the present invention is a wireless communication device that can control a beam direction, and includes a wireless communication unit that forms a beam and performs wireless communication, and a sensor that acquires environmental status information that is information about the installation environment of the device. a wireless communication quality monitoring unit that acquires wireless communication quality information indicating the quality of wireless communication by the wireless communication unit; and a beam direction control unit that outputs a beam direction control instruction to the wireless communication unit. , the beam direction control unit controls the beam direction to improve the quality of wireless communication according to the environmental state information, using the environmental state information and the wireless communication quality information before and after the beam direction is controlled. learning a beam control policy indicating a method, determining the beam control policy according to the environmental state information based on the learning result, and instructing the radio communication unit to control the beam direction according to the determined beam control policy; a first learning unit that outputs, environmental status information generated according to an information generation policy indicating a calculation for generating environmental status information, and the first learning unit outputs according to the generated environmental status information. learning an information generation policy for generating environmental state information that degrades the quality of wireless communication using the wireless communication quality information before and after the beam direction was controlled based on the control instruction, and learning the learned information. a second learning unit that generates environmental state information based on a generation policy; and a second learning unit that generates environmental state information based on a generation policy; A wireless communication device includes a switching unit that switches input to a learning unit.

本発明の一態様は、ビーム方向を制御可能な無線通信装置の設置環境に関する情報である環境状態情報と、前記無線通信装置のビーム方向が制御された前後それぞれにおける無線通信の品質を示す無線通信品質情報とを用いて、前記環境状態情報に応じて無線通信の品質を向上させるビーム方向の制御の方法を示すビーム制御方策を学習し、学習結果に基づいて前記環境状態情報に応じた前記ビーム制御方策を決定し、決定した前記ビーム制御方策に従ったビーム方向の制御指示を前記無線通信装置に出力する第１の学習部と、環境状態情報を生成するための演算を示す情報生成方策に従って生成された環境状態情報と、生成された前記環境状態情報に応じて前記第１の学習部が出力した前記制御指示に基づいてビーム方向が制御された前後それぞれの前記無線通信品質情報とを用いて、無線通信の品質を低下させる環境状態情報を生成する情報生成方策を学習し、学習された前記情報生成方策に基づいて環境状態情報を生成する第２の学習部と、前記無線通信装置のセンサにより取得した前記環境状態情報と前記第２の学習部により生成された前記環境状態情報とのいずれを前記第１の学習部に入力するかを切り替える切替部と、を備えるビーム方向制御装置である。 One aspect of the present invention provides environmental state information that is information regarding the installation environment of a wireless communication device that can control a beam direction, and wireless communication that indicates the quality of wireless communication before and after the beam direction of the wireless communication device is controlled. The quality information is used to learn a beam control policy that indicates a beam direction control method that improves the quality of wireless communication according to the environmental state information, and based on the learning result, the beam direction is adjusted according to the environmental state information. a first learning unit that determines a control strategy and outputs a beam direction control instruction to the wireless communication device according to the determined beam control strategy; and an information generation strategy that indicates a calculation for generating environmental state information. Using the generated environmental state information and the wireless communication quality information before and after the beam direction is controlled based on the control instruction outputted by the first learning unit according to the generated environmental state information. a second learning unit that learns an information generation strategy for generating environmental status information that degrades the quality of wireless communication, and generates environmental status information based on the learned information generation strategy; A beam direction control device comprising: a switching unit that switches which of the environmental state information acquired by a sensor and the environmental state information generated by the second learning unit is input to the first learning unit. be.

本発明の一態様は、ビーム方向を制御可能な無線通信装置が実行するビーム方向制御方法であって、無線通信部が、ビームを形成して無線通信を行う通信ステップと、センサが前記無線通信装置の設置環境に関する情報である環境状態情報を取得する環境状態情報取得ステップと、無線通信品質監視部が、前記無線通信部による無線通信の品質を示す無線通信品質情報を取得する無線通信品質情報取得ステップと、ビーム方向制御部が、前記無線通信部に対してビーム方向の制御指示を出力するビーム方向制御ステップとを有し、前記ビーム方向制御ステップは、前記環境状態情報とビーム方向が制御された前後の前記無線通信品質情報とを用いて、前記環境状態情報に応じて無線通信の品質を向上させるビーム方向の制御の方法を示すビーム制御方策を学習し、学習結果に基づいて前記環境状態情報に応じた前記ビーム制御方策を決定し、決定した前記ビーム制御方策に従ったビーム方向の制御指示を前記無線通信部に出力する第１の学習ステップと、環境状態情報を生成するための演算を示す情報生成方策に従って生成された環境状態情報と、生成された前記環境状態情報に応じて前記第１の学習ステップにおいて出力された前記制御指示に基づいてビーム方向が制御された前後の前記無線通信品質情報とを用いて、無線通信品質を低下させる環境状態情報を生成する情報生成方策を学習し、学習された前記情報生成方策に基づいて環境状態情報を生成する第２の学習ステップと、前記環境状態情報取得ステップにおいて取得された前記環境状態情報と前記第２の学習ステップにおいて生成された前記環境状態情報とのいずれを前記第１の学習ステップにおいて用いるかを切り替える切替ステップと、を有するビーム方向制御方法である。 One aspect of the present invention is a beam direction control method executed by a wireless communication device capable of controlling a beam direction, which includes a communication step in which a wireless communication unit forms a beam and performs wireless communication, and a sensor performs the wireless communication. an environmental status information acquisition step of acquiring environmental status information which is information regarding the installation environment of the device; and wireless communication quality information where the wireless communication quality monitoring unit acquires wireless communication quality information indicating the quality of wireless communication by the wireless communication unit. and a beam direction control step in which the beam direction control unit outputs a beam direction control instruction to the wireless communication unit, and the beam direction control step includes the step of controlling the environmental state information and the beam direction. A beam control strategy indicating a beam direction control method that improves the quality of wireless communication according to the environmental state information is learned using the wireless communication quality information before and after the environment state information, and based on the learning result, the wireless communication quality information a first learning step of determining the beam control policy according to state information and outputting a beam direction control instruction according to the determined beam control policy to the wireless communication unit; The environmental state information generated according to the information generation policy indicating the calculation, and the beam direction before and after the beam direction is controlled based on the control instruction outputted in the first learning step according to the generated environmental state information. a second learning step of learning an information generation policy for generating environmental status information that degrades wireless communication quality using the wireless communication quality information, and generating environmental status information based on the learned information generation policy; , a switching step of switching which of the environmental state information acquired in the environmental state information acquisition step and the environmental state information generated in the second learning step is to be used in the first learning step; This is a beam direction control method.

本発明の一態様は、コンピュータを、上述のビーム方向制御装置として機能させるためのプログラムである。 One aspect of the present invention is a program for causing a computer to function as the beam direction control device described above.

本発明により、複雑な外因の変動環境においても、ビーム方向制御の失敗を低減させることが可能となる。 According to the present invention, it is possible to reduce failures in beam direction control even in an environment with complex fluctuations due to external factors.

本発明の実施形態による無線通信装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a wireless communication device according to an embodiment of the present invention. 同実施形態によるビーム方向制御部の構成を示すブロック図である。FIG. 3 is a block diagram showing the configuration of a beam direction control section according to the same embodiment. 同実施形態によるビーム制御方策テーブルの例を示す図である。It is a figure which shows the example of the beam control policy table by the same embodiment. 同実施形態による情報生成方策テーブルの例を示す図である。It is a figure which shows the example of the information generation policy table by the same embodiment. 同実施形態によるビーム方向制御部の第１の学習モードにおける処理の例を示すフロー図である。FIG. 6 is a flow diagram illustrating an example of processing in a first learning mode of the beam direction control unit according to the embodiment. 同実施形態によるビーム方向制御部の第２の学習モードにおける処理の例を示すフロー図である。FIG. 7 is a flow diagram showing an example of processing in a second learning mode of the beam direction control unit according to the embodiment. 同実施形態による無線通信装置のハードウェア構成を示すブロック図である。FIG. 2 is a block diagram showing the hardware configuration of the wireless communication device according to the same embodiment. ビーム方向の不対向が発生する例を説明するための図である。FIG. 6 is a diagram for explaining an example in which non-opposing beam directions occur. 無線通信品質に影響を与える外因を説明するための図である。FIG. 2 is a diagram for explaining external factors that affect wireless communication quality.

以下、図面を参照しながら本発明の実施形態を詳細に説明する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

＜無線通信装置の構成と機能＞
図１は本発明の一実施形態による無線通信装置１の構成を示すブロック図である。無線通信装置１は、指向性ビームを用いて対向の通信相手である他の無線通信装置と電波を送受信する。通信相手の無線通信装置は、無線通信装置１から一定の方向に存在する。無線通信装置１は、無線通信部１１と、無線通信品質監視部１２と、環境センサ１３と、ビーム方向制御部１５とを備える。 <Configuration and functions of wireless communication device>
FIG. 1 is a block diagram showing the configuration of a wireless communication device 1 according to an embodiment of the present invention. The wireless communication device 1 uses a directional beam to transmit and receive radio waves to and from another wireless communication device with which it communicates. The communication partner wireless communication device exists in a fixed direction from the wireless communication device 1. The wireless communication device 1 includes a wireless communication section 11 , a wireless communication quality monitoring section 12 , an environment sensor 13 , and a beam direction control section 15 .

無線通信部１１は、指向性ビームのビーム方向を変更可能なアレーアンテナ、所定の無線周波数の無線を送受信するための高周波回路、信号処理回路などのデバイスから構成される。無線通信部１１により、無線通信装置１は他の無線通信装置と無線通信する。無線通信部１１は、ビーム方向制御部１５からの制御指示により指示されたビーム方向に電波の指向性を形成するよう、アレーアンテナのウェイトを適切に調整する。ウェイトの調整により、アナログ方式で各アンテナ素子に入出力する無線信号の位相を調整すること、もしくはデジタル方式で各アンテナ素子に入出力する無線信号の振幅と位相を調整することが可能である。あるいは、ウェイトの調整により、上記のアナログ方式と上記のデジタル方式を組合せて、多段で各アンテナ素子に入出力する無線信号の振幅または位相を調整することが可能である。 The wireless communication unit 11 is comprised of devices such as an array antenna that can change the beam direction of a directional beam, a high frequency circuit for transmitting and receiving radio waves at a predetermined radio frequency, and a signal processing circuit. The wireless communication unit 11 allows the wireless communication device 1 to wirelessly communicate with other wireless communication devices. The wireless communication unit 11 appropriately adjusts the weight of the array antenna so that the radio wave directivity is formed in the beam direction instructed by the control instruction from the beam direction control unit 15. By adjusting the weights, it is possible to adjust the phase of a radio signal input/output to each antenna element using an analog method, or to adjust the amplitude and phase of a radio signal input/output to each antenna element using a digital method. Alternatively, by adjusting the weights, it is possible to combine the analog method described above and the digital method described above to adjust the amplitude or phase of the wireless signal input/output to each antenna element in multiple stages.

無線通信品質監視部１２は、自装置が無線通信に使用している指向性ビームのビーム方向と、そのビーム方向を用いた通信期間における無線通信品質に関する情報とを無線通信部１１から取得する。無線通信品質に関する情報は、例えば、受信電力、受信電力対雑音比などである。以下の記載では説明の便宜上、無線通信品質を代表する例として受信電力を取り上げるが、他の指標を利用してもよい。無線通信品質監視部１２は、取得したビーム方向を示すビーム方向情報と、取得した無線通信品質に関する情報を示す無線通信品質情報とをビーム方向制御部１５に出力する。 The wireless communication quality monitoring unit 12 acquires from the wireless communication unit 11 the beam direction of the directional beam that the own device uses for wireless communication and information regarding the wireless communication quality during the communication period using the beam direction. Information regarding wireless communication quality is, for example, received power, received power-to-noise ratio, and the like. In the following description, for convenience of explanation, received power will be taken up as an example representative of wireless communication quality, but other indicators may be used. The wireless communication quality monitoring unit 12 outputs beam direction information indicating the acquired beam direction and wireless communication quality information indicating the acquired information regarding the wireless communication quality to the beam direction control unit 15.

環境センサ１３は、一つまたは複数のセンシング可能なデバイスから構成される。環境センサ１３は、無線通信装置１の周辺の環境に関する情報である環境状態情報の検出又は取得を行う。環境状態情報は、例えば、風速、風向、無線通信装置１の運動回転速度や加速度、無線通信装置１の設置場所の高さなどである。環境センサ１３は、検出又は取得した環境状態情報をビーム方向制御部１５に出力する。 The environmental sensor 13 is composed of one or more sensing devices. The environmental sensor 13 detects or acquires environmental state information, which is information regarding the environment around the wireless communication device 1 . The environmental state information includes, for example, wind speed, wind direction, rotational speed and acceleration of the wireless communication device 1, height of the installation location of the wireless communication device 1, and the like. The environmental sensor 13 outputs the detected or acquired environmental state information to the beam direction control unit 15.

ビーム方向制御部１５は、無線通信装置１が無線通信に使用する指向性ビームのビーム方向を制御する。ビーム方向制御部１５は、自装置が無線通信に使用しているビーム方向を示すビーム方向情報と、そのビーム方向を使用している通信期間において得られた無線通信品質情報とを無線通信品質監視部１２から取得する。さらに、ビーム方向制御部１５は、上記の通信期間における環境状態情報を環境センサ１３から取得する。ビーム方向制御部１５は、取得したこれらの情報に基づいて、次の通信期間において、通信相手とビーム方向を対向させるための制御指示を無線通信部１１に出力する。通信相手とビーム方向を対向させるとは、最大受信電力が得られるように指向性ビームのビーム方向を調整することである。 The beam direction control unit 15 controls the beam direction of a directional beam that the wireless communication device 1 uses for wireless communication. The beam direction control unit 15 monitors the quality of wireless communication by using beam direction information indicating the beam direction that the own device uses for wireless communication and wireless communication quality information obtained during the communication period in which the beam direction is used. Obtained from section 12. Furthermore, the beam direction control unit 15 acquires environmental state information during the above communication period from the environmental sensor 13. Based on the acquired information, the beam direction control unit 15 outputs a control instruction to the wireless communication unit 11 to cause the beam direction to face the communication partner in the next communication period. Setting the beam direction to face the communication partner means adjusting the beam direction of the directional beam so that the maximum received power can be obtained.

＜ビーム方向制御部の構成と機能＞
本実施形態におけるビーム方向制御部１５が、ある通信期間における環境状態情報と無線通信に使用したビーム方向とを入力条件としてビーム制御方策を学習し、学習したビーム制御方策に基づいて次の通信期間で使用するビーム方向を制御する構成と機能を説明する。なお、ビーム制御方策とは、通信品質を向上させるためにビーム方向をどのように制御するかを表す。 <Configuration and function of beam direction control unit>
The beam direction control unit 15 in this embodiment learns a beam control strategy using the environmental state information in a certain communication period and the beam direction used for wireless communication as input conditions, and performs the next communication period based on the learned beam control strategy. This section explains the configuration and functions for controlling the beam direction used. Note that the beam control policy refers to how the beam direction is controlled to improve communication quality.

図２は、ビーム方向制御部１５の詳細な構成を示すブロック図である。ビーム方向制御部１５は、モード設定部１５１と、環境状態情報取得部１５２と、第１の学習部１５３と、第２の学習部１５４とを備える。 FIG. 2 is a block diagram showing the detailed configuration of the beam direction control section 15. As shown in FIG. The beam direction control section 15 includes a mode setting section 151 , an environmental state information acquisition section 152 , a first learning section 153 , and a second learning section 154 .

ビーム方向制御部１５は、第１の学習モードと第２の学習モードとの二つのモードで動作する。モード設定部１５１は、ビーム方向制御部１５がそれら二つのモードのいずれで動作するかを設定する。第１の学習モードでは、第１の学習部１５３は、実際に環境センサ１３により得られた環境状態情報を用いてビーム制御方策を学習する。第１の学習モードにおいて、モード設定部１５１は、第２の学習部１５４を動作させないように制御する。第２の学習モードにおいて、第２の学習部１５４は、実際には得られていない周辺の環境を疑似した環境状態情報を生成して第１の学習部１５３に入力する。これにより、多様な環境状態情報に応じたビーム制御方策を学習する契機を第１の学習部１５３に与える。以下では、第２の学習部１５４が生成した環境状態情報を疑似環境状態情報と記載する。 The beam direction control unit 15 operates in two modes: a first learning mode and a second learning mode. The mode setting section 151 sets in which of these two modes the beam direction control section 15 operates. In the first learning mode, the first learning unit 153 learns a beam control strategy using environmental state information actually obtained by the environmental sensor 13. In the first learning mode, the mode setting section 151 controls the second learning section 154 so as not to operate. In the second learning mode, the second learning section 154 generates environmental state information that simulates the surrounding environment that is not actually obtained, and inputs it to the first learning section 153. This provides the first learning unit 153 with an opportunity to learn beam control strategies according to various environmental state information. In the following, the environmental state information generated by the second learning unit 154 will be referred to as pseudo environmental state information.

モード設定部１５１は、例えば、無線通信装置１に取り付けられたディップスイッチなどの物理的なスイッチである。あるいは、モード設定部１５１は、無線通信装置１に実装されたソフトウェアにより実現されてもよい。この場合、第１の学習部１５３は、外部の制御用パーソナルコンピュータ（ＰＣ）からの指示、又は、ネットワーク経由の遠隔制御を受けてモードを変更してもよい。また、あるいは、モード設定部１５１は、事前に設定されたスケジューラに従って、所定の時間に第１の学習モードから第２の学習モードに切り替え、また別の時間に第２の学習モードから第１の学習モードに切り替えるように、動作モードを変更しても構わない。 The mode setting unit 151 is, for example, a physical switch such as a dip switch attached to the wireless communication device 1. Alternatively, the mode setting unit 151 may be realized by software installed in the wireless communication device 1. In this case, the first learning unit 153 may change the mode in response to an instruction from an external control personal computer (PC) or remote control via a network. Alternatively, the mode setting unit 151 switches from the first learning mode to the second learning mode at a predetermined time, and switches from the second learning mode to the first learning mode at another time, according to a scheduler set in advance. The operating mode may be changed, such as switching to learning mode.

環境状態情報取得部１５２は、環境センサ１３から環境状態情報を入力する。環境状態情報取得部１５２は、入力した環境状態情報を、第１の学習モードでは第１の学習部１５３に出力し、第２の学習モードでは第２の学習部１５４に出力する。モード設定部１５１及び環境状態情報取得部１５２により、環境センサ１３が取得した環境状態情報と第２の学習部１５４が生成した疑似環境状態情報とのいずれを第１の学習部１５３に入力するかを切り替える切替部としての機能を実現する。 The environmental state information acquisition unit 152 inputs environmental state information from the environmental sensor 13 . The environmental state information acquisition unit 152 outputs the input environmental state information to the first learning unit 153 in the first learning mode, and outputs it to the second learning unit 154 in the second learning mode. Which of the environmental status information acquired by the environmental sensor 13 and the pseudo environmental status information generated by the second learning unit 154 is input to the first learning unit 153 by the mode setting unit 151 and the environmental status information acquisition unit 152? Realizes the function as a switching section that switches between

第１の学習部１５３は、環境状態情報と、ビーム方向が制御された前後の無線通信品質情報とを用いて、環境状態情報に応じて無線通信の品質を向上させるビーム制御方策を学習し、学習結果に基づいてビーム方向の制御指示を無線通信部１１に出力する。第１の学習部１５３は、ビーム制御方策記憶部１５３１及び第１の累積報酬記憶部１５３２を備える。ビーム制御方策記憶部１５３１は、ビーム制御方策テーブルを記憶する。ビーム制御方策テーブルは、環境状態情報に対応したビーム制御方策を示す。本実施形態では、ビーム制御方策は、現在のビーム方向からの補正量により表される。第１の累積報酬記憶部１５３２は、第１の累積報酬を記憶する。第１の累積報酬は、第１の報酬を加算した値である。第１の報酬は、ビーム制御方策によって無線通信品質が改善した程度に応じて付与される値である。本実施形態では、改善の程度が大きいほど大きな値の第１の報酬が付与される。第１の報酬は、段階的な値でもよい。 The first learning unit 153 uses the environmental state information and the wireless communication quality information before and after the beam direction is controlled to learn a beam control policy for improving the quality of wireless communication according to the environmental state information, A beam direction control instruction is output to the wireless communication unit 11 based on the learning result. The first learning section 153 includes a beam control policy storage section 1531 and a first cumulative reward storage section 1532. The beam control policy storage unit 1531 stores a beam control policy table. The beam control strategy table shows beam control strategies corresponding to the environmental state information. In this embodiment, the beam control strategy is expressed by the amount of correction from the current beam direction. The first cumulative reward storage unit 1532 stores the first cumulative reward. The first cumulative reward is the sum of the first rewards. The first reward is a value given according to the degree to which wireless communication quality has been improved by the beam control policy. In this embodiment, the larger the degree of improvement, the larger the first reward is given. The first reward may be a tiered value.

第１の学習部１５３は、第１の学習モードにおいて、環境状態情報取得部１５２から環境状態情報を入力し、無線通信品質監視部１２からビーム方向情報及び無線通信品質情報を入力する。第１の学習部１５３は、環境状態情報及びビーム方向に応じたビーム制御方策をビーム制御方策記憶部１５３１に記憶されているビーム制御方策テーブルから読み出す。第１の学習部１５３は、読み出したビーム制御方策に従ってビーム方向を制御するよう指示する制御指示を無線通信部１１に出力する。 In the first learning mode, the first learning unit 153 inputs environmental state information from the environmental state information acquisition unit 152 and receives beam direction information and wireless communication quality information from the wireless communication quality monitoring unit 12. The first learning unit 153 reads out the beam control policy according to the environmental state information and the beam direction from the beam control policy table stored in the beam control policy storage unit 1531. The first learning unit 153 outputs a control instruction to the wireless communication unit 11 to control the beam direction according to the read beam control policy.

第１の学習部１５３は、ビーム制御方策に基づく制御指示に従って変更されたビーム方向により無線通信が行われている間の無線通信品質情報を無線通信品質監視部１２から入力する。第１の学習部１５３は、このビーム方向が変更された前後の無線通信品質情報が示す通信品質の変化に応じて、ビーム制御方策に第１の報酬を付与する。第１の学習部１５３は、付与した第１の報酬を第１の累積報酬記憶部１５３２に出力する。第１の累積報酬記憶部１５３２は、記憶している第１の累積報酬の値を、入力した第１の報酬を加算した値に更新する。第１の学習部１５３は、第１の報酬が低いビーム制御方策を変更する。これにより、第１の学習部１５３は、一定期間における第１の累積報酬が最大化するように、ビーム制御方策を変更する。 The first learning unit 153 inputs wireless communication quality information from the wireless communication quality monitoring unit 12 while wireless communication is being performed with a beam direction changed according to a control instruction based on a beam control policy. The first learning unit 153 gives a first reward to the beam control policy according to the change in communication quality indicated by the wireless communication quality information before and after the beam direction is changed. The first learning unit 153 outputs the provided first reward to the first cumulative reward storage unit 1532. The first cumulative reward storage unit 1532 updates the stored value of the first cumulative reward to a value obtained by adding the input first reward. The first learning unit 153 changes the beam control policy with a low first reward. Thereby, the first learning unit 153 changes the beam control policy so that the first cumulative reward over a certain period of time is maximized.

第１の学習部１５３は、第２の学習モードにおいて、環境状態情報取得部１５２から環境状態情報を入力する代わりに、第２の学習部１５４から疑似環境状態情報を入力する。第１の学習部１５３は、この疑似環境状態情報を環境状態情報取得部１５２から入力した環境状態情報の代わりに用いて、上記の第１の学習モードと同様の動作を行う。 In the second learning mode, the first learning section 153 inputs pseudo environmental state information from the second learning section 154 instead of inputting the environmental state information from the environmental state information acquisition section 152. The first learning section 153 uses this pseudo environmental state information in place of the environmental state information input from the environmental state information acquisition section 152, and performs the same operation as in the first learning mode described above.

第２の学習部１５４は、第１の学習モードでは動作せず、第２の学習モードにおいて動作する。第２の学習部１５４は、情報生成方策に従って生成された疑似環境状態情報と、疑似環境状態情報に応じて第１の学習部１５３が決定したビーム制御方策に基づいてビーム方向が制御された前後の無線通信品質情報とを用いて、無線通信品質を低下させる疑似環境状態情報を生成する情報生成方策を学習する。第２の学習部１５４は、学習結果の情報生成方策に基づいて生成した疑似環境状態情報を第１の学習部１５３に出力する。 The second learning section 154 does not operate in the first learning mode, but operates in the second learning mode. The second learning unit 154 determines whether the beam direction is before or after being controlled based on the pseudo environment state information generated according to the information generation policy and the beam control policy determined by the first learning unit 153 according to the pseudo environment state information. The wireless communication quality information is used to learn an information generation policy for generating pseudo environmental state information that degrades wireless communication quality. The second learning unit 154 outputs pseudo environmental state information generated based on the information generation strategy of the learning result to the first learning unit 153.

第２の学習部１５４は、情報生成方策記憶部１５４１及び第２の累積報酬記憶部１５４２を備える。情報生成方策記憶部１５４１は、情報生成方策テーブルを記憶する。情報生成方策テーブルは、環境状態情報と、その環境状態情報に基づいて生成された疑似環境状態情報と、疑似環境状態情報の情報生成方策との対応を示す図である。情報生成方策は、例えば、環境状態情報に対して行う演算により表される。本実施形態では、環境状態情報とビーム方向情報との組み合わせごとに情報生成方策が設定されるものとする。第２の累積報酬記憶部１５４２は、第２の累積報酬を記憶する。第２の累積報酬は、第２の報酬を加算した値である。第２の報酬は、疑似環境状態情報を用いて第１の学習部１５３が決定したビーム制御方策に基づく制御指示によって無線通信品質がどの程度低下したかに応じて付与される値である。本実施形態では、低下の程度が大きいほど大きな値の第２の報酬が付与される。第２の報酬は、段階的な値でもよい。 The second learning unit 154 includes an information generation policy storage unit 1541 and a second cumulative reward storage unit 1542. The information generation policy storage unit 1541 stores an information generation policy table. The information generation policy table is a diagram showing the correspondence between environmental state information, pseudo environmental state information generated based on the environmental state information, and information generation policy of the pseudo environmental state information. The information generation policy is expressed, for example, by a calculation performed on the environmental state information. In this embodiment, it is assumed that an information generation policy is set for each combination of environmental state information and beam direction information. The second cumulative reward storage unit 1542 stores the second cumulative reward. The second cumulative reward is the sum of the second rewards. The second reward is a value given depending on how much the wireless communication quality has deteriorated due to the control instruction based on the beam control policy determined by the first learning unit 153 using the pseudo environment state information. In this embodiment, the larger the degree of decline, the larger the second reward is given. The second reward may be a tiered value.

第２の学習部１５４は、第２の学習モードにおいて、環境センサ１３が出力した環境状態情報を環境状態情報取得部１５２から入力し、無線通信品質監視部１２からビーム方向情報と無線通信品質情報を入力する。第２の学習部１５４は、環境状態情報が示す環境情報に関する情報とビーム方向情報が示すビーム方向との組み合わせに応じた情報生成方策を、情報生成方策記憶部１５４１に記憶される情報生成方策テーブルから読み出す。第２の学習部１５４は、環境状態情報が示す環境状態に関する情報に、読み出した情報生成方策が示す演算を行って、疑似環境状態情報を生成する。第２の学習部１５４は、環境状態情報取得部１５２から入力した環境状態情報と、その環境状態情報に基づいて生成された疑似環境状態情報とを対応付けて情報生成方策テーブルに書き込む。 In the second learning mode, the second learning unit 154 inputs the environmental state information output by the environmental sensor 13 from the environmental state information acquisition unit 152, and receives beam direction information and wireless communication quality information from the wireless communication quality monitoring unit 12. Enter. The second learning unit 154 generates an information generation policy according to the combination of the information regarding the environmental information indicated by the environmental state information and the beam direction indicated by the beam direction information in an information generation policy table stored in the information generation policy storage unit 1541. Read from. The second learning unit 154 generates pseudo environmental state information by performing the calculation indicated by the read information generation policy on the information regarding the environmental state indicated by the environmental state information. The second learning unit 154 associates the environmental state information input from the environmental state information acquisition unit 152 with the pseudo environmental state information generated based on the environmental state information and writes them into the information generation policy table.

第２の学習部１５４は、生成した疑似環境状態情報を第１の学習部１５３に出力する。第１の学習部１５３は、第２の学習部１５４から入力した疑似環境情報に基づいて決定したビーム制御方策によるビーム方向の制御指示を無線通信部１１に出力する。第２の学習部１５４は、この制御指示に従って変更されたビーム方向により無線通信が行われている通信期間の無線通信品質に関する情報を示す無線通信品質情報を無線通信品質監視部１２から入力する。第２の学習部１５４は、制御指示の前後の通信期間における通信品質の変化に応じて、情報生成方策に第２の報酬を付与する。第２の学習部１５４は、第２の累積報酬記憶部１５４２に記憶されている第２の累積報酬を、付与した第２の報酬を加算した値に更新する。第２の学習部１５４は、第２の報酬が低い情報生成方策を変更する。これにより、第２の学習部１５４は、一定期間における第２の累積報酬が最大化するように、情報生成方策を学習する。 The second learning unit 154 outputs the generated pseudo environment state information to the first learning unit 153. The first learning unit 153 outputs to the wireless communication unit 11 a beam direction control instruction based on the beam control policy determined based on the pseudo environment information input from the second learning unit 154. The second learning section 154 receives from the radio communication quality monitoring section 12 radio communication quality information indicating information regarding the radio communication quality during the communication period during which radio communication is performed using the beam direction changed according to this control instruction. The second learning unit 154 gives a second reward to the information generation policy according to the change in communication quality in the communication period before and after the control instruction. The second learning unit 154 updates the second cumulative reward stored in the second cumulative reward storage unit 1542 to a value obtained by adding the given second reward. The second learning unit 154 changes the information generation policy with a low second reward. Thereby, the second learning unit 154 learns the information generation policy so that the second cumulative reward in a certain period of time is maximized.

図３は、ビーム制御方策記憶部１５３１に記憶されるビーム制御方策テーブルの例を示す図である。図３に示すビーム制御方策テーブルは、ビーム方向及び環境状態情報の組み合わせごとのビーム制御方策及び前回取得報酬を示す。図３においては、環境状態情報が風速であり、ビーム制御方策が現在のビーム方向に対する角度補正量である場合を例に示している。前回取得報酬は、対応するビーム制御方策により前回ビーム方向を制御したときに得られた第１の報酬を示す。 FIG. 3 is a diagram showing an example of a beam control policy table stored in the beam control policy storage unit 1531. The beam control policy table shown in FIG. 3 shows the beam control policy and previously acquired reward for each combination of beam direction and environmental state information. In FIG. 3, an example is shown in which the environmental state information is wind speed and the beam control policy is an angle correction amount for the current beam direction. The previously obtained reward indicates the first reward obtained when the beam direction was previously controlled by the corresponding beam control policy.

図４は、情報生成方策記憶部１５４１に記憶される情報生成方策テーブルの例を示す図である。図４に示す情報生成方策テーブルは、ビーム方向と、環境状態情報と、その環境状態情報に基づいて生成された疑似環境状態情報と、前回取得報酬と、疑似環境状態情報の生成に使用した情報生成方策とを対応付けた情報である。図４においては、環境状態情報が風速である場合を例に示している。前回報酬は、対応する疑似環境状態情報及びビーム方向に応じて第１の学習部１５３が決定したビーム制御方策により前回ビーム制御を行ったときに得られた第２の報酬を示す。 FIG. 4 is a diagram showing an example of an information generation policy table stored in the information generation policy storage unit 1541. The information generation policy table shown in FIG. 4 includes the beam direction, environmental state information, pseudo environmental state information generated based on the environmental state information, previously obtained reward, and information used to generate the pseudo environmental state information. This information is associated with a generation policy. In FIG. 4, a case where the environmental state information is wind speed is shown as an example. The previous reward indicates the second reward obtained when beam control was performed last time using the beam control policy determined by the first learning unit 153 according to the corresponding pseudo environment state information and beam direction.

続いて、各学習モードにおけるビーム方向制御部１５の動作を説明する。 Next, the operation of the beam direction control section 15 in each learning mode will be explained.

＜第１の学習モード＞
環境状態情報取得部１５２は、環境センサ１３から環境状態情報を取得する。第１の学習モードでは、環境状態情報取得部１５２は、取得した環境状態情報を第１の学習部１５３に出力する。 <First learning mode>
The environmental state information acquisition unit 152 acquires environmental state information from the environmental sensor 13. In the first learning mode, the environmental state information acquisition section 152 outputs the acquired environmental state information to the first learning section 153.

第１の学習部１５３は、環境状態情報取得部１５２から環境状態情報を入力し、さらに、無線通信品質監視部１２からビーム方向情報及び無線通信品質情報を入力する。第１の学習部１５３は、ある通信期間において使用したビーム方向を示すビーム方向情報と、その通信期間内の環境状態情報とに応じて、次の通信期間で使用するビーム方向を、以下のように制御する。 The first learning unit 153 receives environmental state information from the environmental state information acquisition unit 152 and further receives beam direction information and wireless communication quality information from the wireless communication quality monitoring unit 12. The first learning unit 153 determines the beam direction to be used in the next communication period as follows, according to beam direction information indicating the beam direction used in a certain communication period and environmental state information within the communication period. control.

第１の学習部１５３は、入力したビーム方向の条件と、入力した環境状態情報の条件との組み合わせについて、次の通信期間において使用するビーム方向を学習する。この第１の学習部１５３に入力される環境状態情報は、例えば、時刻ｔ_０～ｔ_Ｎ（Ｎは１以上の整数）のそれぞれにおける瞬時風速［１０ｍ／ｓ，８ｍ／ｓ，１２ｍ／ｓ，…］といった時系列データの形式である。もしくは、環境状態情報は、時刻ｔ_０～ｔ_Ｎの瞬時風速、風向、と無線通信装置１の設置高さなど、といった複数の要素から構成されるtuple（タプル）であっても構わない。 The first learning unit 153 learns the beam direction to be used in the next communication period based on the combination of the input beam direction condition and the input environmental state information condition. The environmental state information input to the first learning unit 153 includes, for example, instantaneous wind speeds [10 _m /s, 8 m/s, 12 m _/ s, ...] is the format of time series data. Alternatively, the environmental state information may be a tuple composed of a plurality of elements, such as instantaneous wind speed and wind direction from time t ₀ to t _N , and the installation height of the wireless communication device 1.

ビーム方向を制御する一例として、例えば、第１の学習部１５３は、環境状態情報が示す時刻ｔ_０～ｔ_Ｎの瞬時の風速が［１０ｍ／ｓ，８ｍ／ｓ，１２ｍ／ｓ，…］であるという条件と、入力されたビーム方向情報が示すビーム方向の条件との組み合わせに基づいて、各時刻の環境状態情報に対応する角度補正量が［５度，２度，１１度，…］であるといった制御指示を無線通信部１１に出力する。 As an example of controlling the beam direction, for example, the first learning unit 153 determines that the instantaneous wind speed from time t ₀ to t _N indicated by the environmental state information is [10 m/s, 8 m/s, 12 m/s, ...]. Based on the combination of this condition and the beam direction condition indicated by the input beam direction information, the angle correction amount corresponding to the environmental state information at each time is [5 degrees, 2 degrees, 11 degrees,...]. A control instruction such as "Yes" is output to the wireless communication section 11.

上記のように第１の学習部１５３は、入力した環境状態情報及びビーム方向情報に対応して角度補正量の制御指示を出力する。そのため、第１の学習部１５３は、ビーム制御方策記憶部１５３１に記憶されるビーム制御方策テーブルを参照して、現在のビーム方向と、過去に経験した環境状態情報とに対応した角度補正量のうち、現在と同じ環境状態情報に対応する角度補正量を取得する。第１の学習部１５３は、取得した角度補正量を設定した制御指示を無線通信部１１に出力する。 As described above, the first learning unit 153 outputs an instruction to control the angle correction amount in accordance with the input environmental state information and beam direction information. Therefore, the first learning unit 153 refers to the beam control policy table stored in the beam control policy storage unit 1531 and calculates the angle correction amount corresponding to the current beam direction and the environmental state information experienced in the past. Among them, the angle correction amount corresponding to the same environmental state information as the current one is acquired. The first learning unit 153 outputs a control instruction in which the obtained angle correction amount is set to the wireless communication unit 11.

なお、ビーム制御方策テーブルに、現在のビーム方向と、過去に経験した環境状態情報とに対応する角度補正量が存在しない場合、第１の学習部１５３は、無線通信部１１に設定可能な角度範囲内で任意の角度補正量のビーム制御方策を決定することができる。設定できる角度範囲は、無線通信装置１が保有するアレーアンテナの設計構成に依存し、事前に求められている。第１の学習部１５３は、ビーム制御方策記憶部１５３１に記憶されるビーム制御方策テーブルに、ビーム方向及び環境状態情報と、指示した角度補正量を設定したビーム制御方策とを対応付けて書き込む。第１の学習部１５３は、決定したビーム制御方策に基づく制御指示を、無線通信部１１に出力する。 Note that if there is no angle correction amount corresponding to the current beam direction and the environmental state information experienced in the past in the beam control policy table, the first learning unit 153 determines the angle that can be set in the wireless communication unit 11. A beam control strategy with an arbitrary angle correction amount within the range can be determined. The angle range that can be set depends on the design configuration of the array antenna possessed by the wireless communication device 1, and is determined in advance. The first learning unit 153 writes the beam direction and environmental state information and the beam control policy in which the instructed angle correction amount is set in a beam control policy table stored in the beam control policy storage unit 1531 in association with each other. The first learning unit 153 outputs a control instruction based on the determined beam control policy to the wireless communication unit 11.

なお、上記のような風速を示す環境状態情報は一例にすぎず、第１の学習部１５３は、風速以外の環境状態情報も取得し、複数の要素からなる環境状態情報を構成した上で、ビーム方向の制御指示を生成することも可能である。また、無線通信部１１のビーム方向を変更させるための情報は、任意の形式で表現することができる。例えば、上記のような角度補正量ではなく、無線通信部１１が電波の指向性を必要な方向で形成できるよう、アレーアンテナの各素子に必要なウェイトを出力してもよい。この場合、第１の学習部１５３は、ビーム方向を使用せずに、環境状態情報に対応したウェイトをビーム制御方策として取得してもよい。 Note that the environmental state information indicating wind speed as described above is only an example, and the first learning unit 153 also acquires environmental state information other than wind speed, configures environmental state information consisting of a plurality of elements, and then It is also possible to generate beam direction control instructions. Furthermore, information for changing the beam direction of the wireless communication unit 11 can be expressed in any format. For example, instead of the angle correction amount as described above, necessary weights may be output to each element of the array antenna so that the wireless communication unit 11 can form the directivity of radio waves in the required direction. In this case, the first learning unit 153 may acquire the weight corresponding to the environmental state information as the beam control policy without using the beam direction.

＜第１の報酬の付与方法の一例＞
第１の学習部１５３が出力した制御指示に従って無線通信部１１がビーム方向を変更した結果は、次の通信期間において無線通信品質監視部１２から取得される受信電力などの無線通信品質に反映される。第１の学習部１５３は、上記のビーム方向制御に伴う無線通信品質情報の変化に応じて第１の報酬を決定して、第１の累積報酬記憶部１５３２に記録する。報酬は、事前に決定された制御目的の達成度合いに応じて、任意の方法で付与される数値である。第１の学習部１５３の制御目的は、「ビーム方向制御により受信電力を向上させる」ことである。この制御目的に応じて第１の報酬を付与する方法の一例を説明する。 <An example of how to give the first reward>
The result of the wireless communication unit 11 changing the beam direction according to the control instruction output by the first learning unit 153 is reflected in the wireless communication quality such as received power acquired from the wireless communication quality monitoring unit 12 in the next communication period. Ru. The first learning unit 153 determines a first reward according to the change in wireless communication quality information accompanying the beam direction control described above, and records it in the first cumulative reward storage unit 1532. The reward is a numerical value that is given by any method depending on the degree of achievement of a predetermined control objective. The purpose of control of the first learning unit 153 is to "improve received power by controlling beam direction." An example of a method of providing the first reward according to this control purpose will be explained.

例えば、第１の学習部１５３の制御目的に合わせ、ビーム方向制御前の受信電力とビーム方向制御後の受信電力とを比較して、３ｄＢ以上の増加である場合は報酬を１００とし、０ｄＢ以上３ｄＢ未満の増加である場合は報酬を１とし、０ｄＢ未満の増加である場合は報酬を０とする。ただし、この第１の報酬の付与方法は一例にすぎず、制御目的と合致すれば、例えば機械学習により生成された報酬関数など、他の報酬付与方法を用いても構わない。 For example, in accordance with the control purpose of the first learning unit 153, the received power before beam direction control is compared with the received power after beam direction control, and if the increase is 3 dB or more, the reward is set to 100, and the reward is set to 100, If the increase is less than 3 dB, the reward is 1, and if the increase is less than 0 dB, the reward is 0. However, this first reward giving method is only an example, and other reward giving methods, such as a reward function generated by machine learning, may be used as long as it matches the control purpose.

第１の学習部１５３は、所定の報酬付与方法とビーム制御方策に基づくビーム方向制御の結果とに応じて第１の報酬を決定する。第１の学習部１５３は、第１の累積報酬記憶部１５３２に第１の報酬を出力する。第１の累積報酬記憶部１５３２は、現在の第１の累積報酬情報の値を、入力した第１の報酬を加算した値により更新する。このように、第１の学習部１５３は、制御結果を得る度に第１の報酬を決定し、第１の累積報酬記憶部１５３２は、その累積和を計算して記憶する。 The first learning unit 153 determines a first reward according to a predetermined reward giving method and a result of beam direction control based on a beam control policy. The first learning unit 153 outputs the first reward to the first cumulative reward storage unit 1532. The first cumulative reward storage unit 1532 updates the current value of the first cumulative reward information with the value obtained by adding the input first reward. In this way, the first learning unit 153 determines the first reward each time it obtains a control result, and the first cumulative reward storage unit 1532 calculates and stores the cumulative sum.

＜ビーム制御方策テーブルの更新＞
第１の学習部１５３は、新規の環境状態情報が入力された場合に、前述した通り、任意の角度補正量を出力することができる。しかし、その任意の角度補正量では必ずしも受信電力を最大化できないため、複数回の試行錯誤によって受信電力を最大化できる角度補正量を学習する必要がある。そのため、第１の学習部１５３は、環境状態情報に対応した前回取得報酬に、その環境状態情報に対応したビーム制御方策によって前回ビーム方向制御を行った後に付与された第１の報酬を書き込むことで、ビーム制御方策記憶部１５３１に記憶されるビーム制御方策テーブルを更新する。第１の学習部１５３は、前回取得報酬に設定されている第１の報酬が最大値でない場合に、その前回取得報酬に対応したビーム制御方策を変更する。例えば、第１の学習部１５３は、図２示すビーム制御方策テーブルにおいて、最大値ではない前回取得報酬に対応した角度補正量を変更する。そして、第１の学習部１５３は、変更後の角度補正量によるビーム方向制御後に付与した第１の報酬を、ビーム制御方策テーブルに書き込む。このようにして、第１の学習部１５３は、ビーム制御方策テーブルの内容を更新し、ビーム制御方策記憶部１５３１に保持する。 <Update of beam control policy table>
As described above, the first learning unit 153 can output an arbitrary angle correction amount when new environmental state information is input. However, since the received power cannot necessarily be maximized with the arbitrary angle correction amount, it is necessary to learn the angle correction amount that can maximize the received power through trial and error multiple times. Therefore, the first learning unit 153 writes the first reward given after the previous beam direction control was performed using the beam control policy corresponding to the environmental state information into the previously acquired reward corresponding to the environmental state information. Then, the beam control policy table stored in the beam control policy storage unit 1531 is updated. When the first reward set as the previously acquired reward is not the maximum value, the first learning unit 153 changes the beam control policy corresponding to the previously acquired reward. For example, the first learning unit 153 changes the angle correction amount corresponding to the previously acquired reward that is not the maximum value in the beam control policy table shown in FIG. Then, the first learning unit 153 writes the first reward given after beam direction control using the changed angle correction amount into the beam control strategy table. In this way, the first learning unit 153 updates the contents of the beam control policy table and stores it in the beam control policy storage unit 1531.

＜第２の学習モード＞
第２の学習モードでは、第１の学習部１５３及び第２の学習部１５４が動作する。第１の学習モードにおいて、第１の学習部１５３は、環境状態情報取得部１５２から入力される環境状態情報に基づいて、ビーム制御方策の学習が可能である。この環境状態情報は、環境センサ１３によって実際に観測された、無線通信装置１の置かれた環境状態に関する情報を示す。しかし、実際に観測されず、無線通信装置１が経験していない環境状態については、ビーム制御方策を学習できない。特に、複雑な外因のある環境においては、有限な時間内ですべての環境状態を実際の観測で経験しきれない可能性が高い。経験していない環境状態について第１の学習部１５３が決定するビーム制御の方法の多くは、最適化されていないことが想定され、ビーム制御の失敗により通信品質が低下する恐れがある。 <Second learning mode>
In the second learning mode, the first learning section 153 and the second learning section 154 operate. In the first learning mode, the first learning section 153 can learn a beam control policy based on the environmental state information input from the environmental state information acquisition section 152. This environmental state information indicates information regarding the environmental state in which the wireless communication device 1 is placed, which is actually observed by the environmental sensor 13. However, beam control strategies cannot be learned for environmental conditions that are not actually observed and that the wireless communication device 1 has not experienced. In particular, in environments with complex external causes, there is a high possibility that all environmental conditions cannot be experienced through actual observation within a finite amount of time. It is assumed that many of the beam control methods determined by the first learning unit 153 for unexperienced environmental conditions are not optimized, and communication quality may deteriorate due to beam control failure.

そこで、第２の学習モードでは、環境状態情報取得部１５２が、実際の観測で経験していない疑似的な環境状態に関する情報である疑似環境状態情報を生成し、その生成した疑似環境状態情報を第１の学習部１５３へ入力する。第１の学習部１５３は、疑似環境状態情報を、過去に経験していない環境状態情報として認識し、そのような環境状態情報に対応できるよう、新しいビーム制御方策を学習し始める。つまり、環境状態情報取得部１５２は、第１の学習部１５３の学習を促進させる機能を有している。 Therefore, in the second learning mode, the environmental state information acquisition unit 152 generates pseudo environmental state information that is information about a pseudo environmental state that has not been experienced in actual observation, and uses the generated pseudo environmental state information. Input to the first learning section 153. The first learning unit 153 recognizes the pseudo environmental state information as environmental state information that has not been experienced in the past, and starts learning a new beam control strategy to cope with such environmental state information. That is, the environmental state information acquisition unit 152 has a function of promoting learning by the first learning unit 153.

無線通信装置１には、数多くの環境状態が存在する。第１の学習モードでは、その存在する環境状態のうち、過去に経験した環境状態でしか学習ができない。第２の学習モードでは、第１の学習部１５３に、これまでに経験した環境状態とは異なる未経験の環境状態を疑似的に経験させる。よって、第１の学習部１５３は、未経験の環境状態でも学習が可能となる。 There are many environmental conditions in the wireless communication device 1. In the first learning mode, learning can only be performed using previously experienced environmental states among existing environmental states. In the second learning mode, the first learning unit 153 is caused to experience in a simulated manner an unexperienced environmental state that is different from the previously experienced environmental states. Therefore, the first learning unit 153 can learn even in an inexperienced environmental state.

第２の学習部１５４の機能を実現するには、第１の学習部１５３の学習を促進させると共に、自機能部においても、複雑な外因環境を模擬できるよう、疑似環境状態情報の情報生成方策について学習する必要がある。通常、複雑な外因環境は、第１の学習部１５３によるビーム方向制御の効果（例えば、受信電力の最大化）を劣化させるため、第１の学習部１５３の学習目的とは逆の学習目的を有していると考えられる。そこで、第２の学習モードでは、複雑な外因環境の影響を模擬する第２の学習部１５４は、第１の学習部１５３の学習目的とは逆に、受信電力を低下させる学習目的を持つ。 In order to realize the function of the second learning unit 154, in addition to promoting the learning of the first learning unit 153, an information generation policy for pseudo environment state information is required so that the self-function unit can also simulate a complex external environment. need to learn about. Usually, a complex external environment degrades the effect of beam direction control by the first learning section 153 (for example, maximizing received power), so the learning objective opposite to that of the first learning section 153 is set. It is thought that they have. Therefore, in the second learning mode, the second learning section 154 that simulates the influence of a complex external environment has a learning purpose of reducing received power, which is opposite to the learning purpose of the first learning section 153.

第２の学習モードにおいて、第１の学習部１５３は、環境状態情報取得部１５２から入力した環境状態情報に代えて、第２の学習部１５４が生成した疑似環境状態情報を用いて、第１の学習モードと同様に、次の通信期間において受信電力を増大させるよう学習し、ビーム方向の制御指示を無線通信部１１に出力する。なお、簡潔に説明するため、特段記載のない場合、第１の学習部１５３の動作は上記の第１の学習モードと同じであり、以下ではその詳細を省略する。 In the second learning mode, the first learning section 153 uses the pseudo environmental state information generated by the second learning section 154 instead of the environmental state information input from the environmental state information acquisition section 152. Similarly to the learning mode, the learning mode learns to increase the received power in the next communication period, and outputs a beam direction control instruction to the wireless communication unit 11. For the sake of brevity, unless otherwise specified, the operation of the first learning section 153 is the same as in the first learning mode described above, and the details thereof will be omitted below.

第２の学習部１５４は、前述のように複雑な外因のある環境状態を模擬するため、第１の学習部１５３とは逆の目的を有しており、次の通信期間における受信電力を減少させるよう疑似環境状態情報の生成方策を学習する。 The second learning section 154 has the opposite purpose to the first learning section 153 in order to simulate an environmental state with a complex external cause as described above, and reduces the received power in the next communication period. learn strategies for generating pseudo-environmental state information.

環境状態情報取得部１５２は、環境センサ１３から環境状態情報を取得する。第２の学習モードでは、環境状態情報取得部１５２は、取得した環境状態情報を第２の学習部１５４に入力する。また、無線通信品質監視部１２が取得したビーム方向情報も、第２の学習部１５４に入力される。 The environmental state information acquisition unit 152 acquires environmental state information from the environmental sensor 13. In the second learning mode, the environmental state information acquisition unit 152 inputs the acquired environmental state information to the second learning unit 154. Furthermore, the beam direction information acquired by the wireless communication quality monitoring section 12 is also input to the second learning section 154.

第２の学習部１５４は、環境状態情報取得部１５２から入力された環境状態情報の内容に演算を行って、疑似環境状態情報を生成する。例えば、第２の学習部１５４に入力される環境状態情報は、時刻ｔ_０～ｔ_Ｎ（Ｎは１以上の整数）のそれぞれにおいて観測した瞬時風速［１０ｍ／ｓ，８ｍ／ｓ，１２ｍ／ｓ，…］といった時系列データの形式である。もしくは、環境状態情報は、時刻ｔ_０～ｔ_Ｎの瞬時風速、風向、と無線通信装置の設置高さなど、といった複数の要素から構成されるtuple（タプル）であっても構わない。 The second learning section 154 performs calculations on the contents of the environmental state information input from the environmental state information acquisition section 152 to generate pseudo environmental state information. For example, the environmental state information input to the second learning unit 154 is the instantaneous wind speed [ ₁₀ _m /s, 8 m/s, 12 m/s ,...] is the format of time series data. Alternatively, the environmental state information may be a tuple composed of a plurality of elements, such as instantaneous wind speed and wind direction from time t ₀ to t _N , and the installation height of the wireless communication device.

第２の学習部１５４は、入力されたビーム方向の条件と、入力された環境状態情報の条件との組み合わせについて、新たな環境状態に関する情報を示す疑似環境状態情報をどのように生成するかの情報生成方策を学習する。疑似環境状態情報を生成する情報生成方策の一例として、第２の学習部１５４は、例えば、環境状態情報取得部１５２から取得した時刻ｔ_０～ｔ_Ｎの瞬時の風速が［１０ｍ／ｓ，８ｍ／ｓ，１２ｍ／ｓ，…］であるという条件と、入力されたビーム方向情報が示すビーム方向との条件とに基づいて、風速を２倍にするという演算を得る。第２の学習部１５４は、取得した情報生成方策を用いて、各時刻における疑似環境状態情報［２０ｍ／ｓ，１６ｍ／ｓ，２４ｍｓ，…］を生成し、第１の学習部１５３に出力する。 The second learning unit 154 learns how to generate pseudo environmental state information indicating information regarding a new environmental state for the combination of the input beam direction condition and the input environmental state information condition. Learn information generation strategies. As an example of an information generation policy for generating pseudo environmental state information, the second learning unit 154 calculates, for example, that the instantaneous wind speed from time t ₀ to t _N acquired from the environmental state information acquisition unit 152 is [10 m/s, 8 m/s]. /s, 12m/s, . . .] and the beam direction indicated by the input beam direction information, an operation to double the wind speed is obtained. The second learning unit 154 uses the acquired information generation policy to generate pseudo environmental state information [20 m/s, 16 m/s, 24 ms, ...] at each time, and outputs it to the first learning unit 153. .

第２の学習部１５４は、ビーム方向の条件及び環境状態情報の条件と、生成した疑似環境状態情報と、疑似環境状態情報の生成に用いた情報生成方策とを対応付けて、図４に示す情報生成方策テーブルに書き込む。なお、情報生成方策記憶部１５４１が記憶する情報生成方策テーブルに、現在のビーム方向と、過去に経験した環境状態情報に対応する疑似環境状態情報が存在しない場合、第２の学習部１５４は任意の情報生成方策により（例えば環境状態情報が示す値にランダムな正数を乗算する等）疑似環境状態情報を生成し、第１の学習部１５３に出力することが可能である。 The second learning unit 154 associates the beam direction conditions and environmental state information conditions, the generated pseudo environmental state information, and the information generation strategy used to generate the pseudo environmental state information, as shown in FIG. Write to the information generation policy table. Note that if the information generation policy table stored in the information generation policy storage unit 1541 does not include pseudo environmental state information corresponding to the current beam direction and the environmental state information experienced in the past, the second learning unit 154 may perform an arbitrary It is possible to generate pseudo environmental state information using the information generation policy (for example, multiplying the value indicated by the environmental state information by a random positive number) and output it to the first learning unit 153.

なお、上記の環境状態情報、及び第２の学習部１５４による疑似環境状態情報の生成方策は一例にすぎず、他の環境状態情報を取得することや、他の任意の情報生成方策により環境状態情報から疑似環境状態情報を生成しても構わない。 Note that the above-mentioned environmental state information and the strategy for generating pseudo environmental state information by the second learning unit 154 are only examples, and the environmental state may be changed by acquiring other environmental state information or by using any other information generation strategy. Pseudo environmental state information may be generated from the information.

第２の学習部１５４は、生成した疑似環境状態情報を第１の学習部１５３に出力する。第１の学習部１５３は、第２の学習モードにおいて、環境状態情報取得部１５２から入力した環境状態情報に代えて、第２の学習部１５４から入力した疑似環境状態情報を用いる点を除き、第１の学習モードと同様の動作を行う。 The second learning unit 154 outputs the generated pseudo environment state information to the first learning unit 153. The first learning section 153 uses the pseudo environmental state information input from the second learning section 154 instead of the environmental state information input from the environmental state information acquisition section 152 in the second learning mode. The same operation as in the first learning mode is performed.

なお、第１の学習部１５３は、ビーム方向と、第２の学習部１５４から入力した疑似環境状態情報との組み合わせが未経験である場合、上述のように、無線通信部１１に設定可能な角度範囲内で任意の角度補正量のビーム制御方策を決定する。第１の学習部１５３は、ビーム制御方策テーブルに、ビーム方向及び疑似環境状態情報と、決定したビーム制御方策とを対応付けて書き込む。その後の第１の学習モードにおいて、第１の学習部１５３は、過去に入力した疑似環境状態情報と同じ環境状態情報と、その疑似環境状態情報を入力したときと同じビーム方向とを入力した場合、ビーム制御方策テーブルからそれらに対応するビーム制御方策を読み出し、即効的にビーム方向を制御することができる。すなわち、第１の学習部１５３は、そのビーム制御方策を初期値として、未経験な環境状態に対するビーム制御方策を予め学習することができ、将来的にその環境状態が発生した場合に、学習済のビーム制御方策を用いて変動環境への対応が可能となり、未学習に起因するビーム方向の制御失敗を回避できる。 Note that, when the combination of the beam direction and the pseudo environment state information input from the second learning unit 154 is not experienced, the first learning unit 153 determines the angle that can be set in the wireless communication unit 11 as described above. Determine a beam control strategy for an arbitrary angle correction amount within the range. The first learning unit 153 writes the beam direction and pseudo environment state information and the determined beam control policy in association with each other in the beam control policy table. In the subsequent first learning mode, when the first learning unit 153 inputs the same environmental state information as the pseudo environmental state information input in the past and the same beam direction as when the pseudo environmental state information was input, , the corresponding beam control strategies can be read out from the beam control strategy table and the beam direction can be controlled immediately. In other words, the first learning unit 153 can use the beam control policy as an initial value to learn a beam control policy for an unexperienced environmental state in advance, and when that environmental state occurs in the future, it can use the learned beam control policy as an initial value. Using beam control strategies, it is possible to respond to changing environments, and failures in beam direction control due to unlearning can be avoided.

また、第１の学習部１５３は、現在のビーム方向と、第２の学習部１５４から入力した疑似環境状態情報との組み合わせが既にビーム制御方策テーブルに設定されている場合、第１の学習モードと同様に、それらに対応したビーム制御方策に基づいてビーム方向の制御指示を無線通信部１１に出力する。しかし、第１の学習部１５３は、第２の学習モードにおいては、そのビーム制御方策に第１の報酬を付与せず、ビーム制御方策の変更は行わない。これにより、正しく学習されたビーム制御方策が変更されないようにする。あるいは、第１の学習部１５３は、ビーム制御方策を学習済みのビーム方向及び環境状態情報の組み合わせの情報を第２の学習部１５４に通知してもよい。第２の学習部１５４は、現在のビーム方向と、生成した疑似環境状態情報との組み合わせが学習済みであると判定した場合、情報生成方策を変更して、異なる疑似環境状態情報を生成する。 Further, if the combination of the current beam direction and the pseudo environment state information input from the second learning unit 154 is already set in the beam control policy table, the first learning unit 153 selects the first learning mode. Similarly, a beam direction control instruction is output to the wireless communication unit 11 based on the corresponding beam control policy. However, in the second learning mode, the first learning unit 153 does not give the first reward to the beam control strategy and does not change the beam control strategy. This prevents correctly learned beam control strategies from being changed. Alternatively, the first learning unit 153 may notify the second learning unit 154 of information on the combination of the beam direction and environmental state information for which the beam control policy has been learned. When the second learning unit 154 determines that the combination of the current beam direction and the generated pseudo environment state information has been learned, the second learning unit 154 changes the information generation policy and generates different pseudo environment state information.

＜第２の報酬の付与方法の一例＞
第２の学習部１５４は、生成した疑似環境状態情報によりビーム方向制御が行われた結果を、次の通信期間において無線通信品質監視部１２から取得する無線通信品質情報により観測できる。第２の学習部１５４は、上記のように生成した疑似環境状態情報に基づいたビーム方向制御を行った前後の無線通信品質情報の変化に応じて第２の報酬を決定し、第２の累積報酬記憶部１５４２に記録する。疑似環境状態情報の生成後に情報生成方策に付与される第２の報酬は、第１の学習部とは逆の目的で設定される必要がある。つまり、第１の学習部１５３の制御目的は受信電力を向上させることであるが、第２の学習部１５４の目的は、受信電力を低下させることである。そして、第２の学習部１５４で付与する第２の報酬は、この目的に合わせて決定する必要がある。そこで、例えば、第２の学習部１５４が生成した疑似環境状態情報に基づくビーム方向制御後の受信電力がビーム方向制御前の受信電力と比較して、３ｄＢ以上の減少である場合は第２の報酬を１００とし、０ｄＢ以上３ｄＢ未満の減少である場合は第２の報酬を１とし、減少していない場合は第２の報酬を０とする。なお、上記の第２の報酬の付与方法は一例にすぎず、例えば機械学習により生成された報酬関数など、他の方法で報酬を決定しても構わない。 <Example of second reward granting method>
The second learning unit 154 can observe the result of beam direction control performed using the generated pseudo environment state information using the wireless communication quality information acquired from the wireless communication quality monitoring unit 12 in the next communication period. The second learning unit 154 determines a second reward according to changes in wireless communication quality information before and after performing beam direction control based on the pseudo environment state information generated as described above, and calculates a second cumulative reward. It is recorded in the remuneration storage unit 1542. The second reward given to the information generation policy after the generation of the pseudo environmental state information needs to be set for the opposite purpose to the first learning part. In other words, the purpose of control of the first learning section 153 is to improve the received power, while the purpose of the second learning section 154 is to decrease the received power. The second reward given by the second learning unit 154 needs to be determined in accordance with this purpose. Therefore, for example, if the received power after beam direction control based on the pseudo environment state information generated by the second learning section 154 is reduced by 3 dB or more compared to the received power before beam direction control, the second learning section 154 The reward is 100, and if the decrease is 0 dB or more and less than 3 dB, the second reward is 1, and if there is no decrease, the second reward is 0. Note that the above-mentioned method of awarding the second reward is only an example, and the reward may be determined using other methods, such as a reward function generated by machine learning, for example.

第２の学習部１５４は、疑似環境状態情報に基づくビーム方向制御結果に応じて決定した第２の報酬を第２の累積報酬記憶部１５４２に出力する。第２の累積報酬記憶部１５４２は、現在の第２の累積報酬情報の値に、入力した第２の報酬を加算し、加算後の値により第２の累積報酬情報を更新する。このように、第２の学習部１５４は、疑似環境状態情報を生成する度に、その疑似環境状態情報に基づくビーム方向制御の制御結果を得て第２の報酬を決定し、第２の累積報酬記憶部１５４２は、その累積和を計算して記憶する。 The second learning unit 154 outputs the second reward determined according to the beam direction control result based on the pseudo environment state information to the second cumulative reward storage unit 1542. The second cumulative reward storage unit 1542 adds the input second reward to the current value of the second cumulative reward information, and updates the second cumulative reward information with the added value. In this way, each time the second learning unit 154 generates pseudo environment state information, the second learning unit 154 obtains the control result of beam direction control based on the pseudo environment state information, determines the second reward, and calculates the second cumulative The reward storage unit 1542 calculates and stores the cumulative sum.

＜情報生成方策記憶部１５４１の記憶内容の更新＞
上述したように、第２の学習部１５４は、新規の環境状態情報が入力された場合に、任意の情報生成方策を用いて生成した疑似環境状態情報を出力することができる。しかし、その任意の情報生成方策により生成した疑似環境状態情報は、必ずしも目的達成に最適とは限らない。そのため、複数回の試行錯誤によって目的達成のための情報生成方策を学習する必要がある。そこで、第２の学習部１５４は、生成した疑似環境状態情報に基づき行われた前回のビーム方向制御の後に受け取った第２の報酬を、図４に示す情報生成方策記憶部１５４１に記憶される情報生成方策テーブルに書き込む。第２の学習部１５４は、ある疑似環境状態情報に対して、前回の制御後に受け取った第２の報酬が最大値ではない場合に、その疑似環境状態情報に対応した情報生成方策、または制御範囲を変更する。第２の学習部１５４は、変更された情報制御方策、又は制御範囲を用いて生成された疑似環境状態情報に基づくビーム方向制御後に付与された第２の報酬により、情報生成方策記憶部１５４１に記憶される情報生成方策テーブルを更新する。 <Updating the storage contents of the information generation policy storage unit 1541>
As described above, when new environmental state information is input, the second learning unit 154 can output pseudo environmental state information generated using an arbitrary information generation strategy. However, the pseudo environmental state information generated by the arbitrary information generation strategy is not necessarily optimal for achieving the purpose. Therefore, it is necessary to learn information generation strategies to achieve the goal through multiple trials and errors. Therefore, the second learning unit 154 stores the second reward received after the previous beam direction control performed based on the generated pseudo environment state information in the information generation policy storage unit 1541 shown in FIG. Write to the information generation policy table. When the second reward received after the previous control is not the maximum value for certain pseudo environment state information, the second learning unit 154 determines an information generation policy or control range corresponding to the pseudo environment state information. change. The second learning unit 154 uses the changed information control policy or the second reward given after the beam direction control is based on the pseudo environment state information generated using the control range to update the information generation policy storage unit 1541. Update the stored information generation policy table.

図４に示すように環境状態情報が瞬時風速である場合の一例を説明する。例えば、第２の学習部１５４は、初回の疑似環境状態情報の状態情報生成方策では、ある時刻における瞬時風速に２を乗算し、疑似環境状態情報として出力とする。そして、この生成した疑似環境状態情報により、第２の学習部１５４が最大の報酬が取得できなった場合に、２回目の疑似環境状態情報の生成の際には、瞬時風速を４と乗算する等、繰り返しの回数に応じ乗算する数値を大きくしていく等の方法が考えられる。ただし、上述した状態情報生成方策に限らず、第２の学習部１５４の目的達成のために、任意の計算方法やアルゴリズムを利用してもよい。 An example in which the environmental state information is instantaneous wind speed as shown in FIG. 4 will be described. For example, in the first state information generation policy for pseudo environment state information, the second learning unit 154 multiplies the instantaneous wind speed at a certain time by 2 and outputs the result as pseudo environment state information. Then, when the second learning unit 154 is unable to obtain the maximum reward based on the generated pseudo environment state information, when generating the pseudo environment state information for the second time, the instantaneous wind speed is multiplied by 4. A possible method is to increase the value to be multiplied according to the number of repetitions. However, the present invention is not limited to the state information generation policy described above, and any calculation method or algorithm may be used to achieve the purpose of the second learning unit 154.

＜処理フロー＞
図５は、第１の学習モードにおけるビーム方向制御部１５の動作例を示すフロー図である。モード設定部１５１は、第１の学習モードを開始する（ステップＳ１０５）。第１の学習部１５３は、無線通信品質監視部１２からビーム方向情報及び無線通信品質情報を入力し、環境センサ１３から環境状態情報を入力する（ステップＳ１１０）。第１の学習部１５３は、入力した環境状態情報及びビーム方向情報に対応したビーム方向制御方策をビーム制御方策テーブルから読み出す（ステップＳ１１５）。第１の学習部１５３は、入力した環境状態情報及びビーム方向情報に対応したビーム方向制御方策がない場合、任意のビーム方向制御方策を決定する。第１の学習部１５３は、ビーム制御方策テーブルに環境状態情報及びビーム方向情報と、決定したビーム方向制御方策とを対応付けて書き込む。第１の学習部１５３は、ビーム方向制御方策に基づくビーム方向の制御指示を無線通信部１１に出力する（ステップＳ１２０）。無線通信部１１は、制御指示に設定されているビーム方向制御方策に従ってビーム方向を変更し、変更後のビーム方向により次の通信期間の無線通信を行う。 <Processing flow>
FIG. 5 is a flow diagram showing an example of the operation of the beam direction control section 15 in the first learning mode. The mode setting unit 151 starts the first learning mode (step S105). The first learning unit 153 inputs beam direction information and wireless communication quality information from the wireless communication quality monitoring unit 12, and inputs environmental state information from the environmental sensor 13 (step S110). The first learning unit 153 reads out the beam direction control policy corresponding to the input environmental state information and beam direction information from the beam control policy table (step S115). The first learning unit 153 determines an arbitrary beam direction control policy when there is no beam direction control policy corresponding to the input environmental state information and beam direction information. The first learning unit 153 writes the environmental state information and beam direction information and the determined beam direction control policy in a beam control policy table in association with each other. The first learning unit 153 outputs a beam direction control instruction based on the beam direction control policy to the wireless communication unit 11 (step S120). The wireless communication unit 11 changes the beam direction according to the beam direction control policy set in the control instruction, and performs wireless communication in the next communication period using the changed beam direction.

第１の学習部１５３は、無線通信品質監視部１２から制御指示後のビーム方向情報及び無線通信品質情報を入力し、環境状態情報取得部１５２から制御指示後の環境状態情報を入力する（ステップＳ１２５）。第１の学習部１５３は、直前のステップＳ１２０において出力した制御指示に従って変更されたビーム方向を用いた通信期間の無線通信品質と、その通信期間の直前のビーム方向変更前の通信期間の無線通信品質とを比較する。第１の学習部１５３は、比較結果に応じて第１の報酬を決定する（ステップＳ１３０）。第１の学習部１５３は、決定した第１の報酬を、直前のステップＳ１２０において制御指示を出力したときに用いたビーム制御方策に対応付けてビーム制御方策テーブルに書き込む（ステップＳ１３５）。さらに、第１の学習部１５３の取得報酬は、決定した第１の報酬を第１の累積報酬記憶部１５３２に出力する。第１の累積報酬記憶部１５３２は、記憶している第１の累積報酬の値を、入力した第１の報酬を加算した値に更新する（ステップＳ１４０）。 The first learning unit 153 inputs the beam direction information and wireless communication quality information after the control instruction from the wireless communication quality monitoring unit 12, and inputs the environmental state information after the control instruction from the environmental state information acquisition unit 152 (step S125). The first learning unit 153 determines the wireless communication quality in the communication period using the beam direction changed according to the control instruction output in the immediately preceding step S120, and the wireless communication quality in the communication period before the beam direction change immediately before that communication period. Compare with quality. The first learning unit 153 determines the first reward according to the comparison result (step S130). The first learning unit 153 writes the determined first reward into the beam control strategy table in association with the beam control strategy used when outputting the control instruction in the immediately previous step S120 (step S135). Furthermore, the acquired reward of the first learning unit 153 outputs the determined first reward to the first cumulative reward storage unit 1532. The first cumulative reward storage unit 1532 updates the stored value of the first cumulative reward to a value obtained by adding the input first reward (step S140).

第１の学習部１５３は、決定した第１の報酬が、最大値であるか否かを判定する（ステップＳ１４５）。第１の学習部１５３は、決定した第１の報酬が最大値であると判定した場合（ステップＳ１４５：ＹＥＳ）、ステップＳ１５５の処理に進む。第１の学習部１５３は、決定した第１の報酬が最大値ではないと判定した場合（ステップＳ１４５：ＮＯ）、ビーム制御方策テーブルに設定されている前回のビーム制御方策を変更する（ステップＳ１５０）。 The first learning unit 153 determines whether the determined first reward is the maximum value (step S145). When the first learning unit 153 determines that the determined first reward is the maximum value (step S145: YES), the process proceeds to step S155. If the first learning unit 153 determines that the determined first reward is not the maximum value (step S145: NO), it changes the previous beam control policy set in the beam control policy table (step S150). ).

第１の学習部１５３は、第１の学習モードが終了ではないと判定した場合（ステップＳ１５５：ＮＯ）、ステップＳ１１５からの処理を繰り返す。そして、第１の学習部１５３は、第１の学習モードが終了と判定した場合（ステップＳ１５５：ＹＥＳ）、図５の処理を終了する。 If the first learning unit 153 determines that the first learning mode is not finished (step S155: NO), it repeats the process from step S115. Then, when the first learning section 153 determines that the first learning mode has ended (step S155: YES), the first learning section 153 ends the process of FIG.

なお、ステップＳ１４５～ステップＳ１５０の処理を、ステップＳ１０５～ステップＳ１４０及びステップＳ１５５の処理と独立のタイミングで行ってもよい。この場合、第１の学習部１５３は、ビーム制御方策テーブルから前回報酬が最大値ではないビーム制御方策を検出し、検出したビーム制御方策についてステップＳ１５０の処理を行う。 Note that the processing from step S145 to step S150 may be performed at a timing independent of the processing from step S105 to step S140 and step S155. In this case, the first learning unit 153 detects a beam control policy whose previous reward is not the maximum value from the beam control policy table, and performs the process of step S150 on the detected beam control policy.

図６は、第２の学習モードにおけるビーム方向制御部１５の動作例を示すフロー図である。モード設定部１５１は、第２の学習モードを開始する（ステップＳ２０５）。第１の学習部１５３及び第２の学習部１５４は、無線通信品質監視部１２からビーム方向情報及び無線通信品質情報を入力し、第２の学習部１５４は、環境センサ１３が出力した環境状態情報を環境状態情報取得部１５２から入力する（ステップＳ２１０）。 FIG. 6 is a flow diagram showing an example of the operation of the beam direction control section 15 in the second learning mode. The mode setting unit 151 starts the second learning mode (step S205). The first learning unit 153 and the second learning unit 154 receive beam direction information and wireless communication quality information from the wireless communication quality monitoring unit 12, and the second learning unit 154 receives the environmental state output from the environmental sensor 13. Information is input from the environmental state information acquisition unit 152 (step S210).

第２の学習部１５４は、環境状態情報及びビーム方向情報に対応した情報生成方策を、情報生成方策記憶部１５４１に記憶されている情報生成方策テーブルから読み出す（ステップＳ２１５）。第２の学習部１５４は、読み出した情報生成方策に従って、環境状態情報から疑似環境状態情報を生成する（ステップＳ２２０）。なお、環境状態情報及びビーム方向情報に対応した情報生成方策が情報生成方策テーブルに設定されていない場合、第２の学習部１５４は、任意の情報生成方策を決定する。第２の学習部１５４は、環境状態情報及びビーム方向情報と、生成した疑似環境状態情報と、決定した情報生成方策とを対応付けて情報生成方策テーブルに書き込む。第２の学習部１５４は、生成した疑似環境状態情報を第１の学習部１５３に出力する（ステップＳ２２５）。第１の学習部１５３は、環境状態情報に代えて疑似環境状態情報を用いて、図５に示すステップＳ１１５～ステップＳ１２０の処理を行う（ステップＳ２３０）。無線通信部１１は、ステップＳ２３０において第１の学習部１５３が出力した制御指示に設定されているビーム方向制御方策に従ってビーム方向を変更し、変更後のビーム方向により次の通信期間の無線通信を行う。 The second learning unit 154 reads out the information generation policy corresponding to the environmental state information and the beam direction information from the information generation policy table stored in the information generation policy storage unit 1541 (step S215). The second learning unit 154 generates pseudo environmental state information from the environmental state information according to the read information generation policy (step S220). Note that if the information generation policy corresponding to the environmental state information and beam direction information is not set in the information generation policy table, the second learning unit 154 determines an arbitrary information generation policy. The second learning unit 154 associates the environmental state information and beam direction information, the generated pseudo environmental state information, and the determined information generation policy and writes them into the information generation policy table. The second learning unit 154 outputs the generated pseudo environment state information to the first learning unit 153 (step S225). The first learning unit 153 uses the pseudo environmental state information instead of the environmental state information to perform the processes of steps S115 to S120 shown in FIG. 5 (step S230). The wireless communication unit 11 changes the beam direction according to the beam direction control policy set in the control instruction output by the first learning unit 153 in step S230, and performs wireless communication in the next communication period using the changed beam direction. conduct.

第１の学習部１５３及び第２の学習部１５４は、無線通信品質監視部１２から制御指示後のビーム方向情報及び無線通信品質情報を入力し、第２の学習部１５４は、環境状態情報取得部１５２から制御指示後の環境状態情報を入力する（ステップＳ２３５）。第１の学習部１５３は、図５に示すステップＳ１２５～ステップＳ１５５の処理を行う。ただし、ステップＳ１２５において、第１の学習部１５３に環境状態情報は入力されない。また、ビーム方向と疑似環境状態情報とに対応したビーム制御方策が学習済みの場合、第１の学習部１５３は、ステップＳ１３０～ステップＳ１５５の処理を行わない。 The first learning unit 153 and the second learning unit 154 input beam direction information and wireless communication quality information after a control instruction from the wireless communication quality monitoring unit 12, and the second learning unit 154 acquires environmental state information. Environmental state information after the control instruction is input from the unit 152 (step S235). The first learning unit 153 performs the processes from step S125 to step S155 shown in FIG. However, in step S125, no environmental state information is input to the first learning unit 153. Further, if the beam control policy corresponding to the beam direction and the pseudo environment state information has been learned, the first learning unit 153 does not perform the processing of steps S130 to S155.

第２の学習部１５４は、直前のステップＳ２３０において第１の学習部１５３が出力した制御指示に従って変更されたビーム方向を用いた通信期間の無線通信品質と、その通信期間の直前のビーム方向変更前の通信期間の無線通信品質とを比較する。第２の学習部１５４は、比較結果に応じて第２の報酬を決定する（ステップＳ２４０）。第２の学習部１５４は、決定した第２の報酬を、直前のステップＳ２２０において疑似環境状態情報を生成したときの情報生成方策に対応付けて情報生成方策テーブルに書き込む（ステップＳ２４５）。さらに、第２の学習部１５４は、決定した第２の報酬を第２の累積報酬記憶部１５４２に出力する。第２の累積報酬記憶部１５４２は、記憶している第２の累積報酬を、入力した第２の報酬を加算した値に更新する（ステップＳ２５０）。 The second learning unit 154 calculates the wireless communication quality in the communication period using the beam direction changed according to the control instruction output by the first learning unit 153 in step S230 immediately before, and the beam direction change immediately before the communication period. Compare the wireless communication quality with the previous communication period. The second learning unit 154 determines the second reward according to the comparison result (step S240). The second learning unit 154 writes the determined second reward into the information generation strategy table in association with the information generation strategy used when generating the pseudo environmental state information in the immediately preceding step S220 (step S245). Furthermore, the second learning unit 154 outputs the determined second reward to the second cumulative reward storage unit 1542. The second cumulative reward storage unit 1542 updates the stored second cumulative reward to a value obtained by adding the input second reward (step S250).

第２の学習部１５４は、決定した第２の報酬が、最大値であるか否かを判定する（ステップＳ２５５）。第２の学習部１５４は、決定した第２の報酬が最大値であると判定した場合（ステップＳ２５５：ＹＥＳ）、ステップＳ２６５の処理に進む。第２の学習部１５４は、決定した第２の報酬が最大値ではないと判定した場合（ステップＳ２５５：ＮＯ）、情報生成方策テーブルに設定されている前回の状態情報生成方策を変更する（ステップＳ２６０）。 The second learning unit 154 determines whether the determined second reward is the maximum value (step S255). When the second learning unit 154 determines that the determined second reward is the maximum value (step S255: YES), the process proceeds to step S265. When the second learning unit 154 determines that the determined second reward is not the maximum value (step S255: NO), the second learning unit 154 changes the previous state information generation policy set in the information generation policy table (step S260).

第２の学習部１５４は、第２の学習モードが終了ではないと判定した場合（ステップＳ２６５：ＮＯ）、ステップＳ２１５からの処理を繰り返す。そして、第２の学習部１５４は、第２の学習モードが終了と判定した場合（ステップＳ２６５：ＹＥＳ）、図６の処理を終了する。 If the second learning section 154 determines that the second learning mode has not ended (step S265: NO), it repeats the processing from step S215. Then, when the second learning section 154 determines that the second learning mode has ended (step S265: YES), the second learning section 154 ends the process of FIG. 6 .

なお、ステップＳ２５５～ステップＳ２６０の処理を、ステップＳ２０５～ステップＳ２５０及びステップＳ２６５の処理と独立のタイミングで行ってもよい。この場合、第２の学習部１５４は、情報生成方策テーブルから前回報酬が最大値ではない情報生成方策を検出し、検出した情報生成方策を変更する。 Note that the processing from step S255 to step S260 may be performed at a timing independent of the processing from step S205 to step S250 and step S265. In this case, the second learning unit 154 detects an information generation strategy whose previous reward is not the maximum value from the information generation strategy table, and changes the detected information generation strategy.

本実施形態によれば、無線通信品質へ影響を与える外因が複雑化する条件下において、適切な学習装置により、複雑な外因（無線通信装置が置かれた周辺の環境状態）の変動に対応して、ビーム方向制御の方法を実現することができる。従って、複雑な外因の変動が発生する環境に無線通信装置がおかれた場合でも、学習に基づくビーム方向制御の失敗回数を低減することができる。 According to the present embodiment, under conditions where external factors that affect wireless communication quality become more complex, an appropriate learning device is used to respond to changes in complex external factors (environmental conditions around the wireless communication device). Thus, a method of beam direction control can be realized. Therefore, even when a wireless communication device is placed in an environment where complex fluctuations due to external factors occur, the number of failures in beam direction control based on learning can be reduced.

なお、無線通信装置は、ビーム方向制御部１５を有するビーム方向制御装置を、内部又は外部に備えてもよい。 Note that the wireless communication device may include a beam direction control device having the beam direction control section 15 inside or outside.

上述した実施形態における無線通信装置１のビーム方向制御部１５の機能をコンピュータで実現するようにしてもよい。その場合、ビーム方向制御部１５この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 The functions of the beam direction control unit 15 of the wireless communication device 1 in the embodiment described above may be realized by a computer. In this case, the beam direction control unit 15 can realize this function by recording a program on a computer-readable recording medium, and having the computer system read and execute the program recorded on the recording medium. Good too. Note that the "computer system" herein includes hardware such as an OS and peripheral devices. Furthermore, the term "computer-readable recording medium" refers to portable media such as flexible disks, magneto-optical disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems. Furthermore, a "computer-readable recording medium" refers to a storage medium that dynamically stores a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include a device that retains a program for a certain period of time, such as a volatile memory inside a computer system that is a server or client in that case. Further, the above-mentioned program may be one for realizing a part of the above-mentioned functions, or may be one that can realize the above-mentioned functions in combination with a program already recorded in the computer system.

無線通信装置１のハードウェア構成例を説明する。図７は、無線通信装置１のハードウェア構成例を示す装置構成図である。無線通信装置１は、プロセッサ７１、記憶部７２、通信インタフェース７３、ユーザインタフェース７４及びセンサ７５を備える。 An example of the hardware configuration of the wireless communication device 1 will be described. FIG. 7 is a device configuration diagram showing an example of the hardware configuration of the wireless communication device 1. As shown in FIG. The wireless communication device 1 includes a processor 71, a storage section 72, a communication interface 73, a user interface 74, and a sensor 75.

プロセッサ７１は、演算や制御を行う中央演算装置である。プロセッサ７１は、例えば、ＣＰＵである。プロセッサ７１は、記憶部７２からプログラムを読み出して実行する。記憶部７２は、さらに、プロセッサ７１が各種プログラムを実行する際のワークエリアなどを有する。通信インタフェース７３は、他装置と通信可能に接続するものである。ユーザインタフェース７４は、ディップスイッチ、ボタンなどの入力装置や、ランプ、ディスプレイなどの表示装置である。ユーザインタフェース７４により、人為的な操作が入力される。センサ７５は、環境状態情報の検出又は取得を行う。 The processor 71 is a central processing unit that performs calculations and control. Processor 71 is, for example, a CPU. The processor 71 reads a program from the storage unit 72 and executes it. The storage unit 72 further includes a work area when the processor 71 executes various programs. The communication interface 73 is communicably connected to other devices. The user interface 74 is an input device such as a dip switch or a button, or a display device such as a lamp or a display. A human operation is input through the user interface 74 . The sensor 75 detects or obtains environmental state information.

無線通信品質監視部１２及びビーム方向制御部１５の機能は、プロセッサ７１が記憶部７２からプログラムを読み出して実行することより実現される。なお、これらの機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されてもよい。無線通信部１１は、通信インタフェース７３により実現される。また、通信インタフェース７３は、ネットワークを介したＰＣ等との通信を実現してもよい。環境センサ１３は、１以上のセンサ７５により実現される。なお、無線通信部１１や環境センサ１３の一部の機能は、プロセッサ７１が記憶部７２からプログラムを読み出して実行することより実現されてもよい。 The functions of the wireless communication quality monitoring section 12 and the beam direction control section 15 are realized by the processor 71 reading a program from the storage section 72 and executing it. Note that all or part of these functions may be realized using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). The wireless communication unit 11 is realized by the communication interface 73. Further, the communication interface 73 may realize communication with a PC or the like via a network. The environmental sensor 13 is realized by one or more sensors 75. Note that some functions of the wireless communication unit 11 and the environmental sensor 13 may be realized by the processor 71 reading out and executing a program from the storage unit 72.

以上説明した実施形態によれば、ビーム方向を制御可能無線通信装置は、無線通信部と、センサと、無線通信品質監視部と、ビーム方向制御部とを備える。無線通信部は、ビームを形成して無線通信を行う。センサは、自装置の設置環境に関する情報である環境状態情報を取得する。例えば、センサは、実施形態の環境センサである。無線通信品質監視部は、無線通信部による無線通信の品質を示す無線通信品質情報を取得する。ビーム方向制御部は、無線通信部に対してビーム方向の制御指示を出力する。ビーム方向制御部は、第１の学習部と、第２の学習部と、切替部とを備える。第１の学習部は、環境状態情報とビーム方向が制御された前後の無線通信品質情報とを用いて、環境状態情報に応じて無線通信の品質を向上させるビーム方向の制御の方法を示すビーム制御方策を学習する。第１の学習部は、学習結果に基づいて環境状態情報に応じたビーム制御方策を決定し、決定したビーム制御方策に従ったビーム方向の制御指示を無線通信部に出力する。第２の学習部は、環境状態情報を生成するための演算を示す情報生成方策に従って生成された環境状態情報と、生成された環境状態情報に応じて第１の学習部が出力した制御指示に基づいてビーム方向が制御された前後の無線通信品質情報とを用いて、無線通信の品質を低下させる環境状態情報を生成する情報生成方策を学習する。第２の学習部は、学習された情報生成方策に基づいて環境状態情報を生成する。切替部は、センサにより取得した環境状態情報と第２の学習部により生成された環境状態情報とのいずれを第１の学習部に入力するかを切り替える。 According to the embodiment described above, the beam direction controllable wireless communication device includes a wireless communication section, a sensor, a wireless communication quality monitoring section, and a beam direction control section. The wireless communication unit performs wireless communication by forming a beam. The sensor acquires environmental status information that is information regarding the installation environment of its own device. For example, the sensor is an environmental sensor in an embodiment. The wireless communication quality monitoring unit acquires wireless communication quality information indicating the quality of wireless communication by the wireless communication unit. The beam direction control unit outputs a beam direction control instruction to the wireless communication unit. The beam direction control section includes a first learning section, a second learning section, and a switching section. The first learning section uses environmental state information and wireless communication quality information before and after the beam direction is controlled, and describes a beam direction control method that improves the quality of wireless communication according to environmental state information. Learn control strategies. The first learning unit determines a beam control strategy according to the environmental state information based on the learning result, and outputs a beam direction control instruction according to the determined beam control strategy to the wireless communication unit. The second learning unit uses the environmental status information generated according to the information generation policy indicating the calculation for generating the environmental status information and the control instruction outputted by the first learning unit according to the generated environmental status information. Based on this information, information generation strategies for generating environmental state information that degrades the quality of wireless communication are learned using wireless communication quality information before and after the beam direction was controlled. The second learning unit generates environmental state information based on the learned information generation policy. The switching unit switches which of the environmental state information acquired by the sensor and the environmental state information generated by the second learning unit is input to the first learning unit.

なお、第１の学習部は、入力された環境状態情報と、無線通信部が形成しているビーム方向と、当該ビーム方向が変更された前後の無線通信品質情報を比較して得られた無線通信の品質の変化とを用いて、環境状態情報及びビーム方向に応じて無線通信の品質を向上させるビーム制御方策を学習してもよい。第１の学習部は、学習結果に基づいて環境状態情報と無線通信部が形成しているビーム方向とに応じたビーム制御方策を決定し、決定したビーム制御方策に従ったビーム方向の制御指示を無線通信部に出力する。 Note that the first learning unit compares the input environmental state information, the beam direction formed by the wireless communication unit, and the wireless communication quality information before and after the beam direction is changed. Changes in communication quality may be used to learn beam control strategies that improve the quality of wireless communication according to environmental state information and beam direction. The first learning unit determines a beam control policy according to the environmental state information and the beam direction formed by the wireless communication unit based on the learning result, and instructs to control the beam direction according to the determined beam control policy. is output to the wireless communication section.

また、第１の学習部は、ビーム制御方策に従って出力した制御指示によりビーム方向が変更された前後の無線通信の品質の変化に応じて当該ビーム制御方策に第１の報酬を付与し、第１の報酬に基づいて選択したビーム制御方策を変更してもよい。 Further, the first learning unit provides a first reward to the beam control policy according to a change in the quality of wireless communication before and after the beam direction is changed according to the control instruction output according to the beam control policy; The selected beam control strategy may be changed based on the reward.

また、第２の学習部は、センサが取得した環境状態情報に当該環境状態情報と無線通信部が形成しているビーム方向とに対応した情報生成方策が示す演算を行って生成された環境状態情報と、生成された環境状態情報に応じて第１の学習部が出力した制御指示に基づいてビーム方向が制御された前後の無線通信品質情報を比較して得られた無線通信の品質の変化とを用いて、環境状態情報及びビーム方向に応じて無線通信の品質を低下させる情報生成方策を学習してもよい。第２の学習部は、学習結果に基づいて環境状態情報と無線通信部が形成しているビーム方向とに応じた情報生成方策を決定し、環境状態情報に決定した情報生成方策が示す演算を行って第１の学習部に入力する環境状態情報を生成する。 Further, the second learning unit is configured to perform calculations on the environmental status information acquired by the sensor according to an information generation policy corresponding to the environmental status information and the beam direction formed by the wireless communication unit to generate an environmental state. Changes in the quality of wireless communication obtained by comparing the information and wireless communication quality information before and after the beam direction is controlled based on the control instruction output by the first learning unit according to the generated environmental state information may be used to learn an information generation policy for reducing the quality of wireless communication depending on the environmental state information and the beam direction. The second learning unit determines an information generation policy according to the environmental state information and the beam direction formed by the wireless communication unit based on the learning result, and applies the calculation indicated by the determined information generation policy to the environmental state information. and generates environmental state information to be input to the first learning section.

なお、第２の学習部は、情報生成方策が示す演算を行って生成した環境状態情報に応じて第１の学習部が出力した制御指示によりビーム方向が変更された前後の無線通信の品質の変化に応じて情報生成方策に第２の報酬を付与し、第２の報酬に基づいて選択した情報生成方策を変更してもよい。 Note that the second learning unit determines the quality of wireless communication before and after the beam direction is changed based on the control instruction output by the first learning unit in accordance with the environmental state information generated by performing calculations indicated by the information generation policy. A second reward may be given to the information generation policy according to the change, and the selected information generation policy may be changed based on the second reward.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described above in detail with reference to the drawings, the specific configuration is not limited to these embodiments, and includes designs within the scope of the gist of the present invention.

１…無線通信装置、１１…無線通信部、１２…無線通信品質監視部、１３…環境センサ、１５…ビーム方向制御部、７１…プロセッサ、７２…記憶部、７３…通信インタフェース、７４…ユーザインタフェース、７５…センサ、９１…無線通信装置、９２…電柱、９３…電柱、９４…架線、９５…無線通信装置、９６…建物、９７…歩行者、９８－１、９８－２…アクセスポイント、９９…端末局、…経路、１５１…モード設定部、１５２…環境状態情報取得部、１５３…第１の学習部、１５４…第２の学習部、１５３１…ビーム制御方策記憶部、１５３２…第１の累積報酬記憶部、１５４１…情報生成方策記憶部、１５４２…第２の累積報酬記憶部 DESCRIPTION OF SYMBOLS 1... Wireless communication device, 11... Wireless communication unit, 12... Wireless communication quality monitoring unit, 13... Environmental sensor, 15... Beam direction control unit, 71... Processor, 72... Storage unit, 73... Communication interface, 74... User interface , 75... sensor, 91... wireless communication device, 92... utility pole, 93... utility pole, 94... overhead wire, 95... wireless communication device, 96... building, 97... pedestrian, 98-1, 98-2... access point, 99 ...Terminal station, ...Route, 151...Mode setting section, 152...Environmental state information acquisition section, 153...First learning section, 154...Second learning section, 1531...Beam control policy storage section, 1532...First Cumulative reward storage unit, 1541... Information generation policy storage unit, 1542... Second cumulative reward storage unit

Claims

A wireless communication device capable of controlling beam direction,
a wireless communication unit that performs wireless communication by forming a beam;
a sensor that acquires environmental status information that is information about the installation environment of the own device;
a wireless communication quality monitoring unit that acquires wireless communication quality information indicating the quality of wireless communication by the wireless communication unit;
a beam direction control unit that outputs a beam direction control instruction to the wireless communication unit;
Equipped with
The beam direction control section includes:
Using the environmental state information and the wireless communication quality information before and after the beam direction is controlled, a beam control strategy is learned that indicates a beam direction control method that improves the quality of wireless communication according to the environmental state information. , a first learning unit that determines the beam control strategy according to the environmental state information based on the learning result and outputs a beam direction control instruction according to the determined beam control strategy to the wireless communication unit;
The beam direction is determined based on the environmental state information generated according to the information generation policy indicating the calculation for generating the environmental state information and the control instruction outputted by the first learning unit in accordance with the generated environmental state information. The wireless communication quality information before and after the control is used to learn an information generation policy for generating environmental state information that degrades the quality of wireless communication, and the environmental state information is generated based on the learned information generation policy. a second learning section that generates;
a switching unit that switches which of the environmental state information acquired by the sensor and the environmental state information generated by the second learning unit is input to the first learning unit;
Equipped with
Wireless communication device.

The first learning unit compares the input environmental state information, a beam direction formed by the wireless communication unit, and the wireless communication quality information before and after the beam direction is controlled . learning a beam control policy for improving the quality of wireless communication according to the input environmental state information and the beam direction formed by the wireless communication unit , and learning results. Based on the input environmental state information and the beam direction formed by the wireless communication unit, the beam control policy is determined based on the input environmental state information and the beam direction formed by the wireless communication unit, and a beam direction control instruction is issued in accordance with the determined beam control policy. output to the wireless communication unit;
The wireless communication device according to claim 1.

The first learning unit provides a first reward to the beam control policy according to a change in the quality of wireless communication before and after the beam direction is changed according to the control instruction output according to the beam control policy, and changing the selected beam control strategy based on a first reward;
The wireless communication device according to claim 2.

The second learning unit performs a calculation on the environmental status information acquired by the sensor and the information generation policy corresponding to the environmental status information and the beam direction formed by the wireless communication unit to generate the information. and the wireless communication quality information before and after the beam direction is controlled based on the control instruction outputted by the first learning unit according to the generated environmental status information. learning an information generation policy for reducing the quality of wireless communication according to the environmental state information acquired by the sensor and the beam direction formed by the wireless communication unit, using the change in the quality of wireless communication obtained by Based on the learning result , determine the information generation policy according to the environmental state information acquired by the sensor and the beam direction formed by the wireless communication unit, and decide on the environmental state information acquired by the sensor. generating environmental state information to be input to the first learning unit by performing calculations indicated by the information generation policy;
The wireless communication device according to claim 2 or claim 3.

The second learning unit is configured to perform wireless communication before and after the beam direction is changed according to the control instruction outputted by the first learning unit in accordance with the environmental state information generated by performing calculations indicated by the information generation policy. assigning a second reward to the information generation policy according to a change in the quality of the information generation policy, and changing the selected information generation policy based on the second reward;
The wireless communication device according to claim 4.

Using environmental state information, which is information about the installation environment of a wireless communication device whose beam direction can be controlled, and wireless communication quality information, which indicates the quality of wireless communication before and after the beam direction of the wireless communication device is controlled, Learning a beam control policy indicating a method of controlling a beam direction to improve the quality of wireless communication according to the environmental state information, and determining the beam control policy according to the environmental state information based on the learning result. a first learning unit that outputs a beam direction control instruction according to the beam control policy to the wireless communication device;
The beam direction is determined based on the environmental state information generated according to the information generation policy indicating the calculation for generating the environmental state information and the control instruction outputted by the first learning unit in accordance with the generated environmental state information. The wireless communication quality information before and after the control is used to learn an information generation policy for generating environmental state information that degrades the quality of wireless communication, and based on the learned information generation policy, environmental state information is generated. a second learning section that generates
a switching unit that switches which of the environmental state information acquired by a sensor of the wireless communication device and the environmental state information generated by the second learning unit is input to the first learning unit;
A beam direction control device comprising:

A beam direction control method executed by a wireless communication device capable of controlling a beam direction, the method comprising:
a communication step in which the wireless communication unit performs wireless communication by forming a beam;
an environmental status information acquisition step in which the sensor acquires environmental status information that is information regarding the installation environment of the wireless communication device;
a wireless communication quality information acquisition step in which the wireless communication quality monitoring unit acquires wireless communication quality information indicating the quality of wireless communication by the wireless communication unit;
a beam direction control step in which the beam direction control section outputs a beam direction control instruction to the wireless communication section;
The beam direction control step includes:
Using the environmental state information and the wireless communication quality information before and after the beam direction is controlled, learn a beam control strategy that indicates a beam direction control method that improves the quality of wireless communication according to the environmental state information. a first learning step of determining the beam control strategy according to the environmental state information based on the learning result, and outputting a beam direction control instruction according to the determined beam control strategy to the wireless communication unit; ,
A beam is generated based on the environmental state information generated according to the information generation policy indicating the calculation for generating the environmental state information and the control instruction outputted in the first learning step according to the generated environmental state information. The wireless communication quality information before and after the direction is controlled is used to learn an information generation policy for generating environmental state information that degrades wireless communication quality, and the environmental state information is generated based on the learned information generation policy. a second learning step to generate;
a switching step of switching which of the environmental state information acquired in the environmental state information acquisition step and the environmental state information generated in the second learning step is used in the first learning step;
A beam direction control method comprising:

computer,
A program for functioning as the beam direction control device according to claim 6.