JP2010263442A

JP2010263442A - Optical path setting method, and optical information communication system

Info

Publication number: JP2010263442A
Application number: JP2009113071A
Authority: JP
Inventors: Takuji Tachibana; 拓至橘; Itsumi Koyanagi; 衣津美小柳
Original assignee: Nara Institute of Science and Technology NUC
Current assignee: Nara Institute of Science and Technology NUC
Priority date: 2009-05-07
Filing date: 2009-05-07
Publication date: 2010-11-18

Abstract

<P>PROBLEM TO BE SOLVED: To provide an optical path setting method for a plurality of on-demands that attains differentiation of service for optical path setting and effective utilization of wavelength (optical path) and can be used in actual environment, and to provide an optical information communication system. <P>SOLUTION: An optical path setting table or link information sets a reward to obtain when an optical path is set, for each service class, as well as a cost when the setting of the optical path fails. They determine the acceptance or refusal of the optical path setting request of each class so that a reward function calculated from the reward and cost may be a maximum, and find out an optimal action for next link status from the present link status in transmission node by using a reinforcement learning algorithm based on the reward function. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、光パスを用いてデータ伝送を行うＷＤＭ（Wavelength Division Multiplexing；波長分割多重）ネットワークの光パス設定方法の技術に関するものである。 The present invention relates to a technique of an optical path setting method for a WDM (Wavelength Division Multiplexing) network that performs data transmission using an optical path.

ＷＤＭ技術を用いた光ネットワークでは、送受信ノード間に設定された波長（光パス）を用いてデータ伝送する。近年、各ユーザがインターネット上で伝送するデータ量は劇的に増加しており、将来的には各ユーザが１ギガビット／秒程度のデータを伝送することが想定され、オンデマンドで光パスを設定して大容量データを伝送する利用方法の検討が進められている。 In an optical network using the WDM technology, data is transmitted using a wavelength (optical path) set between transmitting and receiving nodes. In recent years, the amount of data transmitted by each user over the Internet has increased dramatically. In the future, it is assumed that each user will transmit data of about 1 gigabit / second, and an optical path is set on demand. As a result, studies on utilization methods for transmitting a large amount of data are in progress.

オンデマンドでの光パス設定においては、利用料金に応じた複数の光パス設定サービスをユーザに提供するべく、各サービスクラスが設定できる最大光パス数を事前に決定する方法が研究されている（例えば、特許文献１、特許文献２を参照。）。
最大光パス数を事前に決定する方法では、高優先クラスの最大光パス設定数を低優先クラスの最大光パス設定数よりも多くするやり方が行われている。しかし、このやり方では、空き波長が多数存在する場合でも、低優先クラスは決められた光パス設定数以上の光パスを設定することができないため、高優先クラスの光パス設定要求頻度が低優先クラスの光パス設定要求頻度よりも少ない場合には波長利用率が低下し、波長資源が有効利用できないといった問題点がある。 In on-demand optical path setting, research has been conducted on a method for determining in advance the maximum number of optical paths that can be set for each service class in order to provide users with multiple optical path setting services according to usage charges ( For example, see Patent Document 1 and Patent Document 2.)
In the method of determining the maximum number of optical paths in advance, a method is employed in which the maximum number of optical paths set for the high priority class is made larger than the maximum number of optical paths set for the low priority class. However, in this method, even when there are many free wavelengths, the low priority class cannot set more optical paths than the set number of optical paths, so the optical path setting request frequency of the high priority class is low priority. When the frequency is less than the optical path setting request frequency of the class, there is a problem that the wavelength utilization rate is lowered and wavelength resources cannot be effectively used.

かかる問題を解決するため、マルコフ決定過程（ＭＤＰ：Markov Decision Process）を用いた方法が提案されている。このマルコフ決定過程を用いた方法は、光パスの設定状況に応じて、どのクラスの光パス設定要求を受け入れるべきかを事前に決定するものである。すなわち、マルコフ決定過程によって各ノードがとるべき最適行動を導出し、各ノードは導出された最適行動に従って光パス設定の受け入れ可否を決定する。ここで、最適行動は、マルコフ決定過程において報酬関数に依存するものである。しかし、マルコフ決定過程を用いた方法では、送受信ノード間に設定された波長（光パス）数が大きくなると、報酬関数が収束しないことから、実環境での利用、すなわち、ネットワーク規模が大きくなった場合の利用に困難であるといった問題がある。また、ネットワーク全体のトポロジ情報や各リンク内の波長数などの情報が必要であり、分散環境では使用することができない． In order to solve this problem, a method using a Markov Decision Process (MDP) has been proposed. This method using the Markov decision process determines in advance which class of optical path setting request should be accepted in accordance with the setting condition of the optical path. That is, the optimum behavior to be taken by each node is derived by the Markov decision process, and each node decides whether or not to accept the optical path setting according to the derived optimum behavior. Here, the optimum behavior depends on the reward function in the Markov decision process. However, in the method using the Markov decision process, when the number of wavelengths (optical paths) set between the transmitting and receiving nodes increases, the reward function does not converge, so the use in the real environment, that is, the network scale increases. There is a problem that it is difficult to use in some cases. In addition, information such as the topology information of the entire network and the number of wavelengths in each link is necessary and cannot be used in a distributed environment.

特開２００６−２７９７６７号公報JP 2006-279767 A 特開２００８−１７２５６号公報JP 2008-17256 A

上述したように、ユーザがオンデマンドで光パスを設定して大容量データを伝送する利用方法において、従来技術では、サービスの差別化を重要視しすぎたり、柔軟性にかけるといった点から波長資源（光パスのチャネル資源）が有効利用できないといった問題点や、ネットワーク規模が大きくなった場合に使用する計算法が利用できないといった問題点があった。
また、従来の方法では、ネットワークトポロジーや使用可能な波長数、トラヒック情報などの収集可能な情報を活用して、光パスを設定することができなかった。 As described above, in a usage method in which a user sets an optical path on demand to transmit a large amount of data, the conventional technology uses wavelength resources from the point of placing too much importance on service differentiation and flexibility. There is a problem that (optical channel channel resources) cannot be effectively used, and a calculation method used when the network scale becomes large cannot be used.
Further, in the conventional method, it is not possible to set an optical path by utilizing information that can be collected such as network topology, the number of usable wavelengths, and traffic information.

上記状況に鑑みて、本発明は、光パス設定のサービス差別化と波長（光パス）有効利用が図れ、実環境で使用できる、複数のオンデマンドでの光パス設定方法と光情報通信システムを提供することを目的とする。 In view of the above situation, the present invention provides a plurality of on-demand optical path setting methods and an optical information communication system that can be used in a real environment and can differentiate services of optical path setting and effectively use wavelengths (optical paths). The purpose is to provide.

また、本発明は、ネットワークトポロジーやトラヒック情報など収集可能な情報を活用し得る光パス設定方法を提供することを目的とする。また、ネットワークトポロジーやトラヒック情報などが得られない分散環境においても、同様の効果を提供できることを目的とする．
さらに、本発明は、ネットワーク上の各ノードの波長変換機能に制限がある場合にも適用できる。 It is another object of the present invention to provide an optical path setting method that can utilize collectable information such as network topology and traffic information. Moreover, it aims at providing the same effect even in a distributed environment where network topology and traffic information cannot be obtained.
Furthermore, the present invention can also be applied to cases where the wavelength conversion function of each node on the network is limited.

上記目的を達成すべく、本発明の第１の観点の光パス設定方法は、波長分割多重ネットワークにおける送信ノードから受信ノードへの複数のサービスクラスを有する光パス設定方法であって、各ノードには光パスが設定される場合に得られる報酬をサービスクラス毎に設定し、かつ、光パス設定が失敗したときのコストを設定し、報酬およびコストから導出した報酬関数を使用してＱ関数を更新し、現在の状態に対する各行動のＱ値が最大となるように各クラスの光パス設定要求の受け入れ可否を決定して、送信ノードにおける現在のリンク状態から次のリンク状態へのＱ関数を報酬関数に基づき更新する構成とされる。 In order to achieve the above object, an optical path setting method according to a first aspect of the present invention is an optical path setting method having a plurality of service classes from a transmission node to a reception node in a wavelength division multiplexing network. Sets the reward for each service class when the optical path is set, sets the cost when the optical path setting fails, and uses the reward function derived from the reward and cost to calculate the Q function Update, determine whether to accept the optical path setting request of each class so that the Q value of each action for the current state is maximized, and calculate the Q function from the current link state to the next link state in the transmitting node It is set as the structure updated based on a reward function.

かかる構成によれば、ＷＤＭ技術を用いた光ネットワークにおいて、サービス品質の異なる複数のオンデマンド光パス設定サービスをユーザに提供でき、かつ、限られた波長資源を有効利用できる。
本発明では、ＷＤＭ技術を用いた光ネットワークにおいて、強化学習を利用して光パスを設定する点が特徴である。具体的には、各ノードには光パスが設定される場合に得られる報酬をサービスクラス毎に設定し、また光パス設定が失敗したときのコストも設定し、これらの報酬から導出した報酬関数が最大になるように、各クラスの光パス設定要求の受け入れ可否を決定する。また、この最適な光パス設定の行動（最適行動）は、ネットワークの情報が得られる集中管理環境であれば、擬似ネットワーク環境下でモンテカルロシミュレーション等を事前に行うことによって獲得することができる。ネットワークの情報が全く得られず各ノードは隣接するリンクの波長数と提供すべきサービスクラス数しかわからない分散環境であれば、実際に運用してサービスを提供しながら徐々に最適行動を学習し獲得する。獲得した最適行動により、各ノードはグローバルな光パス設定情報を使用することなく、複数の品質の光パス設定サービスを提供しながら波長の有効利用が可能となるのである。
また、強化学習を用いることにより、マルコフ決定過程（ＭＤＰ）のように、送受信ノード間に設定された波長（光パス）数が大きくなると、報酬関数が収束しないという問題がなく、使用可能な波長数が例えば１０以上といった実環境での利用が可能となる。 According to such a configuration, in an optical network using the WDM technology, a plurality of on-demand optical path setting services having different service qualities can be provided to the user, and limited wavelength resources can be effectively used.
The present invention is characterized in that an optical path is set using reinforcement learning in an optical network using WDM technology. Specifically, the reward obtained when an optical path is set for each node is set for each service class, the cost when the optical path setting fails is also set, and the reward function derived from these rewards To determine whether or not to accept the optical path setting request of each class. In addition, this optimum optical path setting action (optimum action) can be obtained by performing a Monte Carlo simulation or the like in advance in a pseudo network environment in a centralized management environment where network information can be obtained. In a distributed environment where no network information is available and each node knows only the number of wavelengths of adjacent links and the number of service classes to be provided, it gradually learns and acquires optimal behavior while actually operating and providing services. To do. The acquired optimum behavior enables each node to effectively use wavelengths while providing a plurality of optical path setting services without using global optical path setting information.
Further, by using reinforcement learning, there is no problem that the reward function does not converge when the number of wavelengths (optical paths) set between the transmitting and receiving nodes increases as in the Markov decision process (MDP), and the usable wavelength. The number can be used in an actual environment such as 10 or more.

ここで、上記のリンク状態は、具体的には、光パス設定要求到着イベントを単位とする離散時間を考えた場合に、時刻（ｔ）におけるクラス数Ｍのサービスクラスｉの光パス設定数Ｎ_ｉ(ｔ)と、前回時刻（ｔ−１）の光パス設定要求に対する受け入れ結果Ｉ(ｔ−１)を用いて、下記数１で定義される。 Here, the above-mentioned link state specifically refers to the number N of optical paths set for the service class i of the number M of classes at time (t) when considering a discrete time with an optical path setting request arrival event as a unit. _{Using i} (t) and the acceptance result I (t−1) for the optical path setting request at the previous time (t−1), the following equation 1 is used.

（数１）
ｓ_t＝（Ｎ_１（ｔ），Ｎ_２（ｔ），・・・，Ｎ_Ｍ（ｔ），Ｉ（ｔ−１）） (Equation 1)
_{_{_{s t = (N 1 (t}}} ), N 2 (t), ···, N M (t), I (t-1))

また、上記の報酬関数は、具体的には、現在のリンク状態ｓ_tから行動ａ_tを取って、次のリンク状態ｓ(ｔ＋１)＝（Ｎ_１(ｔ＋１)，Ｎ_２(ｔ＋１)，・・・，Ｎ_Ｍ(ｔ＋１)，Ｉ(ｔ)）へ遷移した時に、下記数２で定義される。なお、パラメータ変数Ｒ_ｉは０以上１以下の値を取り、ｉ＜ｊの時Ｒ_j＜Ｒ_iとなり、パラメータＣは０より大きい値を取り、パラメータ変数βは０以上１以下の値をとる。 In addition, the above-mentioned reward function, specifically, from the current link state s _t taking the action a _t, the next link state s (t + 1) = ( N 1 (t + 1), N 2 (t + 1), · .., N _M (t + 1), I (t)) is defined by the following formula 2 when transiting. The parameter variable R _i takes a value between 0 and 1, and when i <j, R _j <R _i , the parameter C takes a value greater than 0, and the parameter variable β takes a value between 0 and 1 inclusive. .

（数２）
ｒ_ｔ＋１＝β（Ｒ_１Ｎ_１（ｔ）＋Ｒ_２Ｎ_２（ｔ）＋・・・＋Ｒ_ＭＮ_Ｍ（ｔ））−（１−β）ＣＩ(ｔ) （但し、０＜Ｒ_Ｍ＜Ｒ_Ｍ−１＜・・・＜Ｒ_ｉ＜・・・＜Ｒ_２＜Ｒ_１＜＝１，０＜＝β＜＝１，０＜Ｃ） (Equation 2)
r _{t + 1} = β (R ₁ N ₁ (t) + R ₂ N ₂ (t) +... + R _M N _M (t)) − (1-β) CI (t) (where 0 <R _M <R _M-1 <... <R _i <... <R ₂ <R ₁ <= 1, 0 <= β <= 1, 0 <C)

そして、報酬関数を基づいて、状態ｓ_tに対する行動ａ_tの行動価値関数（Ｑ関数）は下記数３で定義され、状態ｓ_tに対して最も大きなＱ値を持つ行動ａ_tを最適行動として光パス設定の受け入れ可否が決定される。 Then, based on reward function, action value function action a _t for the state s _t (Q function) is defined by the following Expression 3, the action a _t with the highest Q value for state s _t as optimal action Whether to accept the optical path setting is determined.

（数３）
Ｑ（ｓ_t，ａ_t）← Ｑ（ｓ_t，ａ_t）＋α[ｒ_ｔ＋１＋γｍａｘＱ(ｓ_t+1，ａ)−Ｑ(ｓ_t，ａ_t)] (但し、０＜α＜＝１，０＜＝γ＜１) (Equation 3)
Q (s _t , a _t ) ← Q (s _t , a _t ) + α [r _{t + 1} + γmaxQ (s _{t + 1} , a) −Q (s _t , a _t )] (where 0 <α <= 1, 0 <= γ <1)

リンク状態および報酬関数を上記の定義とすることにより、サービスクラスに応じて光パスの設定のしやすさを変化でき、かつ、空き波長はできるだけ有効利用することが可能になる。
ここで、パラメータ変数Ｒ_ｉは、各ノードには光パスが設定される場合に得られる報酬をサービスクラス毎に設定したものであり、またパラメータ変数Ｃは、光パス設定が失敗したときの設定コストである。パラメータ変数βを用いることで、光パス設定のサービス差別化と波長（光パス）有効利用が図れることになる。 By defining the link state and the reward function as described above, the ease of setting the optical path can be changed according to the service class, and the free wavelength can be used as effectively as possible.
Here, the parameter variable R _i is a reward set for each service class when an optical path is set in each node, and the parameter variable C is set when the optical path setting fails. Cost. By using the parameter variable β, optical path setting service differentiation and wavelength (optical path) effective use can be achieved.

さらに、パラメータαは学習率であり、α＝０のときは学習ができないので除外され、α＝１のときは最新の情報だけを考慮する。パラメータγは割引率であり、過去の情報がどの程度影響するかを示している。パラメータγ＝０のときは現在の報酬だけを考慮し、γ＝１のときは過去の情報を長期間考慮しすぎるため除外される。 Further, the parameter α is a learning rate. When α = 0, learning is not possible because it cannot be learned. When α = 1, only the latest information is considered. The parameter γ is a discount rate and indicates how much past information affects. When the parameter γ = 0, only the current reward is considered, and when γ = 1, the past information is excessively considered for a long time and is excluded.

また、上記の強化学習アルゴリズムは、ネットワークトポロジーやノード数，使用可能な波長数，トラヒック情報などが把握可能な集中管理環境の下では、
１）少なくともネットワークトポロジー，ノード数，使用可能な波長数，各クラスの平均光パス設定要求間隔，各クラスの平均光パス使用時間が与えられた擬似ネットワーク環境を構築するステップと、
２）擬似ネットワーク環境上で、疑似的に各クラスの光パス設定要求イベントと光パス使用終了イベントを発生させるステップと、
３）各ノードが状態ｓ_ｔに応じた行動ａ_ｔをとりながら、Ｑ値を更新していくステップと、
４）その都度変更された最適行動ペア（ｓ_ｔ，ａ_ｔ）に従って光パス設定要求の受け入れ可否を決定していくステップと、
を備えた構成とされる。 In addition, the above reinforcement learning algorithm is used in a centralized management environment where the network topology, number of nodes, number of usable wavelengths, traffic information, etc. can be grasped.
1) constructing a pseudo network environment given at least the network topology, the number of nodes, the number of usable wavelengths, the average optical path setting request interval of each class, and the average optical path usage time of each class;
2) A step of generating an optical path setting request event and an optical path use end event of each class in a pseudo manner in a pseudo network environment;
3) while taking the action a _t in which each node is in accordance with the state s _t, and the step to continue to update the Q value,
4) determining whether or not to accept the optical path setting request according to the optimal action pair (s _t , a _t ) changed each time;
It is set as the structure provided with.

疑似ネットワーク環境下で、光パス設定要求イベント間隔と各クラスの平均光パス使用時間を基に、実際に光パスの設定要求イベントと光パス使用終了イベントを発生させる。この場合に、各ノードのリンク状態の初期値ｓ_０を(0,0,0,0,…,0)とし、各リンク状態ｓ_０に対する行動ａ_０をランダムに与えておく。そのようにすることにより、各ノードは、到着イベントが発生すると、リンク状態ｓ_０＝(0,0,0,…,0)に対する行動ａ_０に従って発生した光パス設定要求に対する行動ａ_０を実際に実行する。十分な学習が行われると各リンク状態ｓ_ｔに対する最適な行動ａ_ｔが導出され、サービスの差別化と波長の有効利用が実現されることになる。 In a pseudo network environment, an optical path setting request event and an optical path use end event are actually generated based on the optical path setting request event interval and the average optical path usage time of each class. In this case, the initial value s ₀ of the link state of each node is set to (0, 0, 0, 0,..., 0), and the action a ₀ for each link state s ₀ is given at random. By doing so, each node, the arrival event occurs, the link state _{s 0 = (0,0,0, ...,} 0) action a ₀ with respect to the optical path setting request generated according action a ₀ for the actual To run. Sufficient optimum action a _t learning and is performed for each link state s _t is derived, so that the effective use of differentiation and the wavelength of the service is achieved.

次に、本発明の第２の観点の光パス設定方法は、波長分割多重ネットワークにおける送信ノードから受信ノードへの複数のサービスクラスを有する光パス設定を行う光パス設定方法であって、
１）各ノードに対して、光パスが設定される場合に得られる報酬をサービスクラス毎に設定し、また、光パス設定が失敗したときのコストを設定するステップと、
２）自ノードに隣接するリンクの波長数とサービスクラス数を設定するステップと、
３）光パス設定要求イベント又は光パス使用終了イベントが発生した場合に、リンク状態に応じた行動をとりながら、行動価値関数（Ｑ関数）のＱ値を更新していくステップと、
４）光パス設定要求イベントが発生した場合に、リンク状態と行動の最適行動ペアに従って、光パス設定要求イベントの受け入れ可否を決定していくステップと、を備えたことを特徴とする。 Next, an optical path setting method according to a second aspect of the present invention is an optical path setting method for setting an optical path having a plurality of service classes from a transmission node to a reception node in a wavelength division multiplexing network,
1) A step of setting a reward obtained when an optical path is set for each node for each service class, and setting a cost when the optical path setting fails;
2) setting the number of wavelengths and the number of service classes of links adjacent to the own node;
3) a step of updating a Q value of an action value function (Q function) while taking an action according to a link state when an optical path setting request event or an optical path use end event occurs;
4) The step of determining whether or not to accept the optical path setting request event according to the optimum behavior pair of the link state and the action when an optical path setting request event occurs is provided.

また、本発明の光パス設定プログラムは、
波長分割多重ネットワークにおける送信ノードから受信ノードへの複数のサービスクラスを有する光パス設定を行うプログラムであって、
各ノードのコンピュータに、
１）光パスが設定される場合に得られる報酬をサービスクラス毎に設定し、また、光パス設定が失敗したときのコストを設定するステップと、
２）自ノードに隣接するリンクの波長数とサービスクラス数を設定するステップと、
３）光パス設定要求イベント又は光パス使用終了イベントが発生した場合に、リンク状態に応じた行動をとりながら、行動価値関数（Ｑ関数）のＱ値を更新していくステップと、
４）光パス設定要求イベントが発生した場合に、リンク状態と行動の最適行動ペアに従って、光パス設定要求イベントの受け入れ可否を決定していくステップと、
を実行させるものである。 The optical path setting program of the present invention is
A program for setting an optical path having a plurality of service classes from a transmission node to a reception node in a wavelength division multiplexing network,
On each node's computer,
1) setting a reward obtained when an optical path is set for each service class, and setting a cost when the optical path setting fails;
2) setting the number of wavelengths and the number of service classes of links adjacent to the own node;
3) a step of updating a Q value of an action value function (Q function) while taking an action according to a link state when an optical path setting request event or an optical path use end event occurs;
4) When an optical path setting request event occurs, determining whether or not to accept the optical path setting request event according to the optimal behavior pair of the link state and the action;
Is to execute.

上記の第２の観点の光パス設定方法または上記の光パス設定プログラムによれば、疑似シミュレーションを全く行わずとも、最終的に最適な行動ａ_ｔを見つけることができる。
すなわち、疑似シミュレーションを全く行わずとも、上記の本発明の光パス設定プログラムを備えた光スイッチを波長分割多重ネットワークに置き、隣接するリンクの波長数と差別化すべきクラス数だけが分かっている状態（すなわちリンク状態ｓ_ｔと行動ａ_ｔの取りうる範囲だけがわかっている状態）で、実際に光パス設定要求イベントと光パス使用終了イベントが発生したら、状態ｓ_ｔに応じた行動ａ_ｔを行ってＱ値を更新し、それに応じて状態ｓ_ｔに対してとるべき行動ａ_ｔも変更されていくことで、最終的に最適な行動ａ_ｔが見つかるということになるのである。
なお、リンク状態ｓ_ｔやＱ値については、上述の本発明の第１の観点の光パス設定方法の説明と同様であるので、説明は省略する。 According to the second aspect of the optical path setting method or the optical path setting program, without performing a pseudo simulation at all, it is possible to find the final optimal action a _t.
In other words, without performing pseudo simulation at all, the optical switch having the above-described optical path setting program of the present invention is placed in a wavelength division multiplexing network, and only the number of wavelengths of adjacent links and the number of classes to be differentiated are known. (i.e. a state in which only a range that can be taken in the link state s _t and action a _t is known), the actual When the optical path setting request event and a light path using end event occurs, the action a _t in accordance with the state s _t performing update the Q value, that will also change action a _t to be taken with respect to the state s _t accordingly, it become that ultimately optimal action a _t is found.
Since the link state s _t and Q values, are as described for optical path setting method of the first aspect of the present invention described above, description thereof will be omitted.

本発明の光スイッチは、上記本発明のプログラムを搭載し、波長分割多重ネットワークにおける送信ノードから受信ノードへの複数のサービスクラスを有する光パス設定を行うものである。
本発明の光パス設定プログラムおよび本プログラムを搭載した光スイッチによれば、他のノードの情報が全く必要なく、分散環境で使用できるというメリットを有する。 The optical switch of the present invention is equipped with the above-described program of the present invention, and performs optical path setting having a plurality of service classes from a transmission node to a reception node in a wavelength division multiplexing network.
According to the optical path setting program of the present invention and the optical switch equipped with the present program, there is an advantage that no information of other nodes is required and it can be used in a distributed environment.

なお、本発明の光パス設定プログラムや光スイッチに対して、ネットワークトポロジーなどが既知の場合、疑似シミュレーションを用いて、ある程度学習させておいた値を初期値データとして使用させることでもかまわない。かかる場合、本発明の光パス設定プログラムや光スイッチに対して、ネットワークに導入して使用しながら追加の学習をさせることになる。 If the network topology or the like is known for the optical path setting program or optical switch of the present invention, a value that has been learned to some extent may be used as initial value data by using a pseudo simulation. In such a case, the optical path setting program and the optical switch of the present invention are additionally learned while being introduced into the network and used.

また、本発明の光情報通信システムは、複数のサービスクラスを有する波長分割多重ネットワークにおける各ノードが、光送信手段と、光受信手段と、光電気変換手段と、電気光変換手段と、演算手段と、記憶手段を備え、記憶手段は、上述の光パス設定方法により設定されたリンク状態と最適行動のテーブル情報が記憶されており、演算手段は、サービスクラスｉの光パス設定要求を受信した場合、テーブル情報を用いて、現在のリンク状態に対応した最適行動に従って、光パス設定要求を棄却するか否かを判別し、Ｑ関数を用いた各状態に対する各行動のＱ値を更新する構成とされる。 In the optical information communication system of the present invention, each node in the wavelength division multiplexing network having a plurality of service classes includes an optical transmission unit, an optical reception unit, an optical / electrical conversion unit, an electric / optical conversion unit, and an arithmetic unit. And the storage means stores the table information of the link state and the optimum behavior set by the above-described optical path setting method, and the calculation means receives the optical path setting request of the service class i In this case, the table information is used to determine whether or not to reject the optical path setting request according to the optimum behavior corresponding to the current link state, and the Q value of each behavior for each state using the Q function is updated. It is said.

ここで、上記の演算手段は、ネットワークトポロジー，ノード数，使用可能な波長数、各クラスの平均光パス設定要求間隔，各クラスの平均光パス使用時間に基づいて、テーブル情報をオンラインで生成することを特徴とする。 Here, the computing means generates table information online based on the network topology, the number of nodes, the number of usable wavelengths, the average optical path setting request interval of each class, and the average optical path usage time of each class. It is characterized by that.

また、上記の光情報通信システムは、サービス差別化と波長利用効率化の均衡を図る報酬関数のパラメータを調整できるパラメータ調整手段を更に備えた構成とされる。 In addition, the optical information communication system described above further includes a parameter adjustment unit that can adjust a parameter of a reward function that balances service differentiation and wavelength utilization efficiency.

本発明の光パス設定方法ならびに光情報通信システムによれば、光パス設定のサービス差別化と波長（光パス）有効利用が図れ、実環境で使用できるといった効果を有する。 According to the optical path setting method and the optical information communication system of the present invention, there is an effect that service differentiation of optical path setting and effective use of wavelength (optical path) can be achieved, and it can be used in an actual environment.

ＷＤＭネットワークの説明図Illustration of WDM network ＷＤＭネットワークにおける光パス設定の説明図Illustration of optical path setting in WDM network ＷＤＭネットワークにおける各ノードの概略構成図Schematic configuration diagram of each node in the WDM network 強化学習アルゴリズムの処理フロー図Processing flow diagram of reinforcement learning algorithm 実施例１の光パス設定の説明図（１）Explanatory drawing of optical path setting of Example 1 (1) 実施例１の光パス設定の説明図（２）Explanatory drawing of optical path setting of Example 1 (2) 実施例１の光パス設定の場合における棄却率を示すグラフThe graph which shows the rejection rate in the case of the optical path setting of Example 1. 実施例１の光パス設定の場合における棄却率を示すグラフ（パラメータγを横軸）The graph which shows the rejection rate in the case of the optical path setting of Example 1 (parameter γ is abscissa) 実施例１の光パス設定の場合における棄却率を示すグラフ（パラメータαを横軸）The graph which shows the rejection rate in the case of the optical path setting of Example 1 (parameter α is a horizontal axis) 実施例１の光パス設定の場合における棄却率を示すグラフ（波長数を横軸）The graph which shows the rejection rate in the case of the optical path setting of Example 1 (the number of wavelengths is a horizontal axis)

以下、本発明の実施形態について、図面を参照しながら詳細に説明していく。なお、本発明の範囲は、以下の実施例や図示例に限定されるものではなく、幾多の変更及び変形が可能である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. The scope of the present invention is not limited to the following examples and illustrated examples, and various changes and modifications can be made.

図１に示すように、従来の光ネットワークが１本の光ファイバで１波長を使用してデータ伝送するものであるのに対し、ＷＤＭネットワークは、１本の光ファイバで複数の波長を多重化して数倍のデータ量を伝送でき通信を行う送受信ノード間（送信ノードと受信ノードの間）に複数波長のうち，他の送受信ノード間で使用されていない１以上の波長を割り当てることで、各送受信ノード間で伝送波長を占有して大容量データ伝送を実現するものであり、波長の使用終了後はただちに他のノードが使用できるようにして光ファイバの帯域を有効活用するものである。 As shown in FIG. 1, a conventional optical network transmits data using one wavelength using one optical fiber, whereas a WDM network multiplexes a plurality of wavelengths using one optical fiber. By assigning one or more wavelengths that are not used between other transmitting / receiving nodes among a plurality of wavelengths between transmitting / receiving nodes (between the transmitting node and the receiving node) that can transmit several times the amount of data and perform communication, It occupies the transmission wavelength between the transmitting and receiving nodes and realizes large-capacity data transmission, and makes effective use of the optical fiber bandwidth by allowing other nodes to use it immediately after the use of the wavelength.

ＷＤＭネットワークでは、波長（光パス）を効率的に利用して光パスの確保の公平性を図るために、各ノードが自律的に波長（光パス）を選択し、送受信ノード間で物理的な通信チャネルである光パスを確保することが望ましいとされている。 In a WDM network, each node autonomously selects a wavelength (optical path) and uses a physical path between the transmitting and receiving nodes in order to efficiently use the wavelength (optical path) and to ensure fairness in securing the optical path. It is desirable to secure an optical path that is a communication channel.

図２に示すようなＷＤＭネットワークにおける光パス設定では、送信側コンピュータから受信側コンピュータに大容量データを送信する場合、先ず、送信側コンピュータ（ユーザ）から光パスの設定要求を行い、ＷＤＭネットワーク内の空き波長を使用して光パスを設定してから、データ伝送を行う。もし、空き波長がなければ光パスの設定に失敗することとなり、データ伝送は行わない。データ伝送を行う際は、送信側コンピュータと受信側コンピュータの間で１波長の帯域を占有することになる。 In the optical path setting in the WDM network as shown in FIG. 2, when transmitting a large amount of data from the transmitting computer to the receiving computer, first, an optical path setting request is made from the transmitting computer (user) and the WDM network is set up. Data transmission is performed after setting the optical path using the available wavelengths. If there is no free wavelength, the optical path setting will fail and data transmission will not be performed. When data transmission is performed, one wavelength band is occupied between the transmitting computer and the receiving computer.

ここで、図２に示す各ノードの概略構成について、図３を参照して説明する。ＷＤＭネットワークの光ファイバ１１を介する信号は、分波／多重器（１２，１３）で分波または多重化される。分波された光は、光スイッチ１４で、光信号の伝搬する物理チャネルが切り換えられる。光スイッチ１４は、制御器１５で制御されるものである。この制御器１５は、光または電気チャネル８を介して送られてくるノード制御信号を、トランシーバ１８を用いて受信する。また、制御器１５は、メモリ１６に蓄積された光パス設定テーブルやリンク情報などを参照して、光パス設定を行う。 Here, a schematic configuration of each node illustrated in FIG. 2 will be described with reference to FIG. A signal passing through the optical fiber 11 of the WDM network is demultiplexed or multiplexed by a demultiplexer / multiplexer (12, 13). The demultiplexed light is switched by the optical switch 14 in the physical channel through which the optical signal propagates. The optical switch 14 is controlled by the controller 15. The controller 15 receives a node control signal sent via the optical or electrical channel 8 using the transceiver 18. Further, the controller 15 performs optical path setting with reference to the optical path setting table and link information stored in the memory 16.

次に、各ノードの制御器１５が参照するメモリ１６に蓄積された光パス設定テーブルやリンク情報について、以下に説明する。
本発明においては、光パス設定テーブルやリンク情報は、光パスが設定される場合に得られる報酬をサービスクラス毎に設定し、かつ、光パス設定が失敗したときのコストを設定し、報酬およびコストから導出した報酬関数が最大になるように各クラスの光パス設定要求の受け入れ可否を決定して、送信ノードにおける現在のリンク状態から次のリンク状態への最適行動を、報酬関数に基づき強化学習アルゴリズムを用いて導出することとしている。 Next, an optical path setting table and link information stored in the memory 16 referred to by the controller 15 of each node will be described below.
In the present invention, the optical path setting table and link information set the reward obtained when the optical path is set for each service class, and sets the cost when the optical path setting fails, and the reward and Decide whether to accept the optical path setting request of each class so that the reward function derived from cost is maximized, and strengthen the optimal action from the current link state to the next link state at the sending node based on the reward function Derived using a learning algorithm.

各ノードにおける時刻ｔのリンク状態ｓ_ｔは、ｓ_ｔ＝（Ｎ_１（ｔ），Ｎ_２（ｔ），・・・，Ｎ_Ｍ（ｔ），Ｉ（ｔ−１））と定義する。
ここで、Ｍはサービスクラス数であり、Ｎ_ｉ（ｔ）は時刻ｔにおけるサービスクラスｉの光パス設定数であり、Ｉ（ｔ−１）は前回の光パス設定要求に対する受け入れ結果（失敗したら１、成功したら０）である。このＩは、光パス設定が失敗したときのコストとなる。すなわち、後述する報酬関数において、Ｉに重み係数を乗算したものを減算して報酬を求めることとしており、光パス設定要求に失敗したら報酬が少なくなるようになっている。
また、現在のリンク状態ｓ_tから行動ａ_ｔを取って、次の時刻ｔ＋１にリンク状態ｓ_ｔ＋１へ遷移した場合、リンク状態ｓ_ｔ＋１は、ｓ_ｔ＋１＝（Ｎ_１（ｔ＋１），Ｎ_２（ｔ＋１），・・・，Ｎ_Ｍ（ｔ＋１），Ｉ（ｔ））と定義する。 Link state _{s t} at time t at each _node, s _t = is defined as _{(N 1 (t), N} 2 (t), ···, N M (t), I (t-1)).
Here, M is the number of service classes, N _i (t) is the number of optical path settings for service class i at time t, and I (t−1) is the acceptance result for the previous optical path setting request (if it fails) 1. If successful, 0). This I is a cost when the optical path setting fails. That is, in a reward function to be described later, a reward is obtained by subtracting I multiplied by a weighting factor, and the reward is reduced if the optical path setting request fails.
In addition, the current link status s _t acted _{a t,} if a transition is made to the next time t + 1 to the link state _{s t + 1,} the link state _{s t + 1} _{_{is, s t + 1 = (N}} 1 (t + 1), N 2 (t + 1 ,..., N _M (t + 1), I (t)).

ここで、行動ａ_ｔは、Ｍのサービスクラスのうち、どのサービスクラスの光パス要求を受け入れ光パスを設定し、どのサービスクラスの光パス要求を棄却し光パスを設定しないかを示すものである。
サービスクラスがＭある場合、取り得る行動ａ_ｔは、（２^Ｍ−１）通り存在する。例えば、サービスクラスが３の場合、取り得る行動ａ_ｔは以下の７通り存在する。 Here, the action a _t, among the class of service M, which service class to set the optical path request acceptance light path, indicates whether not set rejected by optical paths of light path request any service class is there.
If the service class is M, the possible actions _{a t} ^exists as ⁽² M -1). For example, if the service class is 3, the possible actions a _t exists ways following 7.

１）サービスクラス１の光パス設定要求のみ受け入れる。
２）サービスクラス２の光パス設定要求のみ受け入れる。
３）サービスクラス３の光パス設定要求のみ受け入れる。
４）サービスクラス１とサービスクラス２の光パス設定要求のみ受け入れる。
５）サービスクラス２とサービスクラス３の光パス設定要求のみ受け入れる。
６）サービスクラス１とサービスクラス３の光パス設定要求のみ受け入れる。
７）全サービスクラス１，２，３の光パス設定要求を受け入れる。 1) Only service class 1 optical path setting requests are accepted.
2) Only service class 2 optical path setting requests are accepted.
3) Only an optical path setting request for service class 3 is accepted.
4) Accepts only optical class setting requests for service class 1 and service class 2.
5) Accepts only optical path setup requests for service class 2 and service class 3.
6) Accepts only optical class setting requests for service class 1 and service class 3.
7) Accept all service class 1, 2, and 3 optical path setting requests.

そして、報酬関数は、下記数４で定義される。この報酬関数を用いて、状態ｓ_ｔで行動ａ_ｒを取り、状態がｓ_ｔ＋１へ遷移したときの報酬が算出できる。
ここで、パラメータ変数Ｒ_ｉは、各ノードには光パスが設定される場合に得られる報酬をサービスクラス毎に設定したものであり、またパラメータ変数Ｃは、光パス設定が失敗したときの設定コストである。パラメータ変数βを用いることで、光パス設定のサービス差別化と波長（光パス）有効利用が図れることになる。パラメータ変数βの値が小さくなるにつれて、波長有効利用に対する重みが増加し、サービスの差別化が提供されにくくなる一方、波長を有効利用できるようになる。なお、パラメータ変数Ｒ_ｉおよびパラメータ変数βは、１以下の正の値をとり、Ｃは正の値をとる。 The reward function is defined by the following formula 4. Using this reward function, takes action a _r a state s _t, compensation can be calculated when the state changes to s _{t + 1.}
Here, the parameter variable R _i is a reward set for each service class when an optical path is set in each node, and the parameter variable C is set when the optical path setting fails. Cost. By using the parameter variable β, optical path setting service differentiation and wavelength (optical path) effective use can be achieved. As the value of the parameter variable β decreases, the weight for effective wavelength use increases, making it difficult to provide service differentiation, while enabling effective use of wavelengths. The parameter variable R _i and the parameter variable β take a positive value of 1 or less, and C takes a positive value.

（数４）
ｒ_ｔ＋１＝β（Ｒ_１Ｎ_１（ｔ）＋Ｒ_２Ｎ_２（ｔ）＋・・・＋Ｒ_ＭＮ_Ｍ（ｔ））−（１−β）ＣＩ（ｔ） (Equation 4)
r _{t + 1} = β (R ₁ N ₁ (t) + R ₂ N ₂ (t) +... + R _M N _M (t)) − (1-β) CI (t)

そして、強化学習アルゴリズムは、図４のフロー図に示すように、
（ステップ１）少なくともネットワークトポロジー，ノード数，使用可能な波長数，各クラスの平均光パス設定要求間隔，各クラスの平均光パス使用時間が与えられた擬似ネットワーク環境を構築するステップ
（ステップ２）擬似ネットワーク環境上で、擬似的に各クラスの光パス設定要求イベントと光パス使用終了イベントを発生させるステップ
（ステップ３）各ノードがリンク状態ｓ_ｔに応じた行動ａ_ｔをとりながら、Ｑ値を更新していくステップ
（ステップ４）その都度変更されるリンク状態ｓと行動ａの最適行動ペア（ｓ_ｔ，ａ_ｔ）に従って、光パス設定要求の受け入れ可否を決定していくステップ
から成る。 And the reinforcement learning algorithm, as shown in the flow diagram of FIG.
(Step 1) Step of constructing a pseudo network environment given at least the network topology, the number of nodes, the number of usable wavelengths, the average optical path setting request interval of each class, and the average optical path usage time of each class (Step 2) on pseudo network environment, while pseudo-step of generating a light path setting request event and a light path using end event of each class (step 3) each node takes action a _t in accordance with the link state s _t, Q value (Step 4) is a step of determining whether or not to accept the optical path setting request according to the optimum behavior pair (s _t , a _t ) of the link state s and behavior a that is changed each time.

各ノードの制御器またはノードとは別個独立のコンピュータ上で、光ネットワークのネットワークトポロジー，ノード数，使用可能な波長数，各クラスの平均光パス設定要求間隔，各クラスの平均光パス使用時間が与えられた模擬ネットワーク環境を構築して、モンテカルロシミュレーションを実行して、報酬関数の値を算出する。 The network topology of the optical network, the number of nodes, the number of usable wavelengths, the average optical path setting request interval for each class, and the average optical path usage time for each class on a computer independent of the controller or node of each node A given simulated network environment is constructed, a Monte Carlo simulation is executed, and a reward function value is calculated.

最後に、ステップ５として、報酬関数の値が最大となる最適行動ペア（ｓ_ｔ，ａ_ｔ）に基づいて、リンク状態と最適行動の光パス設定テーブル情報を生成し、リンク状態と最適行動の光パス設定テーブル情報として各ノードのメモリ１６に記憶させるのである。
各ノードでは、サービスクラスｉの光パス設定要求を受信した場合、メモリ１６に記憶された光パス設定テーブル情報を用いて、現在のリンク状態に対応した最適行動に従って、光パス設定要求を棄却するか否かを判別することとなる。 Finally, as step 5, based on the optimal action pair (s _t , a _t ) with the maximum reward function value, the optical path setting table information of the link state and the optimal action is generated, and the link state and the optimal action It is stored in the memory 16 of each node as optical path setting table information.
When each node receives an optical path setup request for service class i, it rejects the optical path setup request according to the optimum behavior corresponding to the current link state using the optical path setup table information stored in the memory 16. It will be determined whether or not.

上記の光パス設定方法ならびに光情報通信システムによれば、光パス設定のサービス差別化と波長（光パス）有効利用が図れ、実環境で使用できることになる。以下に、具体的な実施例を通じて説明する。 According to the above optical path setting method and optical information communication system, service differentiation of optical path setting and effective use of wavelength (optical path) can be achieved, and it can be used in an actual environment. Below, it demonstrates through a specific Example.

図５−１および図５−２は、ノード数が３の光ネットワークトポロジーの光パス設定の様子を示している。ここでは、サービスクラス数が３で、波長数が８の場合を想定している。
図５−１において、ノードＢの時刻ｔにおける状態ｓ_ｔは、（２，３，１，０）であり、２つの空き波長が存在している。この状態で、ノードＡからノードＣへサービスクラス２のデータ伝送要求が発生した場合を考える。状態ｓ_ｔ＝（２，３，１，０）に対する最適行動ａ_ｔが“サービスクラス１とサービスクラス２の光パス設定要求のみ受け入れる”といった行動であるならば、サービスクラス２の光パス設定要求は受け入れされて、状態ｓ_ｔは状態ｓ_ｔ＋１＝（２，４，１，０）に遷移することになる（光パス設定は成功）。 FIG. 5A and FIG. 5B illustrate how an optical path is set in an optical network topology with three nodes. Here, it is assumed that the number of service classes is 3 and the number of wavelengths is 8.
In FIG. 5A, the state s _t of the node B at time t is (2, 3, 1, 0), and there are two free wavelengths. Consider a case where a service class 2 data transmission request is generated from node A to node C in this state. If state _s t = (2,3,1,0) optimal action _{a t} for is actions like "Accept only service class 1 and service class 2 of the optical path setting request", the service class 2 optical path setting request Is accepted and the state s _t transitions to the state s _{t + 1} = ( _{2, 4, 1,} 0) (the optical path setting is successful).

次に、図５−２において、状態ｓ_ｔ＋１で、ノードＡからノードＣへサービスクラス２のデータ伝送要求が発生した場合、状態ｓ_ｔ＋１＝（２，４，１，０）に対する最適行動ａ_ｔ＋１が“サービスクラス１の光パス設定要求のみ受け入れる”といった行動であるならば、サービスクラス２の光パス設定要求は受け入れられずに、状態ｓ_ｔ＋１＝（２，４，１，１）に遷移することになる（光パス設定は失敗）。 Next, in Figure 5-2, in the state _{s t + 1,} if the data transmission request for service class 2 to node C is generated from the node A, the state _s t + 1 = optimal action _{a t + 1} for the (2,4,1,0) Is an action such as “accept only service class 1 optical path setup request”, the service class 2 optical path setup request is not accepted, and the state transits to the state s _{t + 1} = ( _{2, 4, 1, 1} ). (Optical path setting failed).

次に、ノードＡからノードＣへの光パス設定を行う場合に、中継ノードＢにおけるノードＡからの光パス設定要求に対する棄却率について、従来の光パス設定方法と本発明の光パス設定方法を比較する。従来の光パス設定方法には、各クラスが設定可能な光パス数を固定した方式を用いている。波長数は８で、サービスクラス数は３としている。
また、Ｑ関数のαは0.1、γは0.95、報酬関数のＣが1.0で、Ｒ_１，Ｒ_２，Ｒ_３は、0.05から1.0まで、0.05刻みで変化させることにした。但し、Ｒ_３＜Ｒ_２＜Ｒ_１で、β＝0.1である。
また、ユーザの要求として、サービスクラスｉの失敗確率は、ζｉ以下としている。ξ１＝0.4，ξ２＝0.9，ξ３＝1.0である。 Next, when performing optical path setting from node A to node C, the conventional optical path setting method and the optical path setting method of the present invention are used for the rejection rate for the optical path setting request from node A in relay node B. Compare. A conventional optical path setting method uses a method in which the number of optical paths that can be set for each class is fixed. The number of wavelengths is 8, and the number of service classes is 3.
Also, α of the Q function is 0.1, γ is 0.95, C of the reward function is 1.0, and R ₁ , R ₂ , and R ₃ are changed from 0.05 to 1.0 in increments of 0.05. However, R ₃ <R ₂ <R ₁ and β = 0.1.
Further, as a user request, the failure probability of service class i is set to ζi or less. ξ1 = 0.4, ξ2 = 0.9, ξ3 = 1.0.

図６は、光パス設定の場合における棄却率を示すグラフを示している。従来の光パス設定方法と比較して、本発明の光パス設定方法では、サービスクラス１もサービスクラス２も棄却率が低減されており、サービスの差別化と波長の有効利用が更に改善されていることが理解できる。 FIG. 6 shows a graph showing the rejection rate in the case of setting an optical path. Compared with the conventional optical path setting method, in the optical path setting method of the present invention, the rejection rate is reduced in both service class 1 and service class 2, and service differentiation and effective use of wavelengths are further improved. I can understand that.

また、図７は、波長数が６，クラス数が３，Ｑ関数のαが0.1、報酬関数のＣが1.0で、Ｒ_１＝0.8，Ｒ_２＝0.4，Ｒ_３＝0.2で、β＝0.1の時に、パラメータγを横軸にとって、本発明の光パス設定方法を用いた場合の各クラスの棄却率を示している。
図７に示されるように、γが0.65より大きくなると学習によりサービスの差別化が実現できている。このことから、本発明の光パス設定方法の性能は、パラメータγの影響を受けることが理解できる。 FIG. 7 also shows that the number of wavelengths is 6, the number of classes is 3, the Q function α is 0.1, the reward function C is 1.0, R ₁ = 0.8, R ₂ = 0.4, R ₃ = 0.2, and β = 0.1. At this time, the rejection rate of each class when the optical path setting method of the present invention is used is shown with the parameter γ on the horizontal axis.
As shown in FIG. 7, when γ is larger than 0.65, service differentiation can be realized by learning. From this, it can be understood that the performance of the optical path setting method of the present invention is affected by the parameter γ.

また、図８は、波長数が６，クラス数が３，Ｑ関数のγが0.95、報酬関数のＣが1.0で、Ｒ_１＝0.8，Ｒ_２＝0.4，Ｒ_３＝0.2で、β＝0.1の時に、パラメータαを横軸にとって、本発明の光パス設定方法を用いた場合の各クラスの棄却率を示している。
図８に示されるように、αが0から0.1に変化すると、サービスの差別化がより有効になっている。またαが1.0に近づくと棄却率の差が小さくなる。従って、本発明の光パス設定方法の性能はパラメータαの影響を受けることが理解できる。 FIG. 8 shows that the number of wavelengths is 6, the number of classes is 3, the γ of the Q function is 0.95, the C of the reward function is 1.0, R ₁ = 0.8, R ₂ = 0.4, R ₃ = 0.2, and β = 0.1. At this time, the rejection rate of each class when the optical path setting method of the present invention is used is shown with the parameter α as the horizontal axis.
As shown in FIG. 8, when α changes from 0 to 0.1, service differentiation becomes more effective. Also, as α approaches 1.0, the difference in rejection rate becomes smaller. Therefore, it can be understood that the performance of the optical path setting method of the present invention is affected by the parameter α.

また、図９は、クラス数が３で、Ｑ関数のγが0.95で、αが0.1、報酬関数のＣが1.0で、Ｒ_１＝0.8，Ｒ_２＝0.4，Ｒ_３＝0.2で、β＝0.1の時に、波長数を横軸にとって、本発明の光パス設定方法を用いた場合の各クラスの棄却率を示している。
図９に示されるように、本発明の光パス設定方法を用いると波長数が１００の時にでも利用可能であることがわかり、先行研究のマルコフ決定過程の方式よりも有効である。 FIG. 9 shows that the number of classes is 3, the γ of the Q function is 0.95, α is 0.1, the reward function C is 1.0, R ₁ = 0.8, R ₂ = 0.4, R ₃ = 0.2, and β = When 0.1, the horizontal axis indicates the rejection rate of each class when the optical path setting method of the present invention is used.
As shown in FIG. 9, it can be seen that the optical path setting method of the present invention can be used even when the number of wavelengths is 100, which is more effective than the Markov decision process method of the previous research.

本発明は、遠隔医療や高精細動画ストリーミング，グリッドコンピューティングへの利用が期待される。 The present invention is expected to be used for telemedicine, high-definition video streaming, and grid computing.

１光ファイバ
２送信側コンピュータ
３受信側コンピュータ
４データ
５光パス
７光ネットワーク
８ノード
１１光ファイバ
１２、１３分波／多重器
１４光スイッチ
１５制御器
１６メモリ
１７トランシーバ
１８光または電気チャネル
２１光パス設定情報テーブル
２２リンク状態情報テーブル DESCRIPTION OF SYMBOLS 1 Optical fiber 2 Transmission side computer 3 Reception side computer 4 Data 5 Optical path 7 Optical network 8 Node 11 Optical fiber 12, 13 Demultiplexer / Multiplexer 14 Optical switch 15 Controller 16 Memory 17 Transceiver 18 Optical or electrical channel 21 Optical path Setting information table 22 Link status information table

Claims

An optical path setting method having a plurality of service classes from a transmission node to a reception node in a wavelength division multiplexing network,
Remuneration obtained when an optical path is set for each node is set for each service class, and the cost when the optical path setting fails is set,
Deciding whether to accept the optical path setting request for each class so that the reward function derived from the reward and cost is maximized,
An optical path setting method, wherein an optimal behavior from the current link state to the next link state in the transmitting node is derived using a reinforcement learning algorithm based on a reward function.

The link state is
Introduce a discrete time t in units of optical path setting request arrival events,
By using the optical path setting number N _i (t) of the service class i of the class number M at the time (t) and the acceptance result I (t−1) for the optical path setting request at the previous time (t−1), Defined by the number 1,
The reward function is defined by the following _{equation 2} when taking the action a _t from the current link state s _t and transitioning to the next link state s _{t + 1} :
Based the reward function, using the Q value of each behavior a _t to the state s _t obtained from the action-value function defined by the following Expression 3 (Q function), the largest Q value for state s _t optical path setting method according to claim 1, receiving availability of optical path setting is being determined action a _t as optimal action with.
(Equation 1)
_{_{_{s t = (N 1 (t}}} ), N 2 (t), ···, N M (t), I (t-1))
(Equation 2)
r _{t + 1} = β (R ₁ N ₁ (t) + R ₂ N ₂ (t) +... + R _M N _M (t)) − (1-β) CI (t) (where 0 <R _M <R _M-1 <... <R _i <... <R ₂ <R ₁ <= 1, 0 <= β <= 1, 0 <C)
(Equation 3)
Q (s _t , a _t ) ← Q (s _t , a _t ) + α [r _{t + 1} + γmaxQ (s _{t + 1} , a) −Q (s _t , a _t )] (where 0 <α <= 1, 0 <= γ <1)

The reinforcement learning algorithm is:
Constructing a pseudo network environment given at least the network topology, the number of nodes, the number of usable wavelengths, the average optical path setting request interval of each class, and the average optical path usage time of each class;
On the pseudo network environment, generating an optical path setting request event and an optical path use end event for each class in a pseudo manner;
While taking the action a _t each node according to the link state s _t, and the step to continue to update the Q value,
A step in accordance with the link state s _t and optimal action pairs of the actions a _t are each time changed (s _{t, a} _t), will determine the acceptance possibility of the optical path setting request,
The optical path setting method according to claim 2, further comprising:

An optical path setting method for setting an optical path having a plurality of service classes from a transmission node to a reception node in a wavelength division multiplexing network,
For each node, setting a reward obtained when an optical path is set for each service class, and setting a cost when the optical path setting fails,
Setting the number of wavelengths and the number of service classes of links adjacent to the own node;
A step of updating a Q value of an action value function (Q function) while taking an action according to a link state when an optical path setting request event or an optical path use end event occurs;
Determining whether to accept the optical path setting request event according to the optimal behavior pair of the link state and the action when the optical path setting request event occurs; and
An optical path setting method comprising:

The link state is
Introduce a discrete time t in units of optical path setting request arrival events,
By using the optical path setting number N _i (t) of the service class i of the class number M at the time (t) and the acceptance result I (t−1) for the optical path setting request at the previous time (t−1), Defined by equation (4)
The reward function is defined by the following _{equation 5} when taking the action a _t from the current link state s _t and transitioning to the next link state s _{t + 1} :
On the basis of the compensation function, with the Q value of each behavior a _t to the state s _t obtained from the Q function defined by the following Expression 6, a behavior a _t with the highest Q value for state s _t 5. The optical path setting method according to claim 4, wherein acceptability of the optical path setting is determined as the optimum behavior.
(Equation 4)
_{_{_{s t = (N 1 (t}}} ), N 2 (t), ···, N M (t), I (t-1))
(Equation 5)
r _{t + 1} = β (R ₁ N ₁ (t) + R ₂ N ₂ (t) +... + R _M N _M (t)) − (1-β) CI (t) (where 0 <R _M <R _M-1 <... <R _i <... <R ₂ <R ₁ <= 1, 0 <= β <= 1, 0 <C)
(Equation 6)
Q (s _t , a _t ) ← Q (s _t , a _t ) + α [r _{t + 1} + γmaxQ (s _{t + 1} , a) −Q (s _t , a _t )] (where 0 <α <= 1, 0 <= γ <1)

A program for setting an optical path having a plurality of service classes from a transmission node to a reception node in a wavelength division multiplexing network,
On each node's computer,
A step of setting a reward obtained when an optical path is set for each service class, and setting a cost when the optical path setting fails;
Setting the number of wavelengths and the number of service classes of links adjacent to the own node;
A step of updating a Q value of an action value function (Q function) while taking an action according to a link state when an optical path setting request event or an optical path use end event occurs;
Determining whether to accept the optical path setting request event according to the optimal behavior pair of the link state and the action when the optical path setting request event occurs; and
Optical path setting program to execute.

The link state is
Introduce a discrete time t in units of optical path setting request arrival events,
By using the optical path setting number N _i (t) of the service class i of the class number M at the time (t) and the acceptance result I (t−1) for the optical path setting request at the previous time (t−1), Defined by Equation 7,
The reward function is defined by the following _{equation 8} when taking the action at from the current link state s _t and transitioning to the next link state s _{t + 1} :
On the basis of the compensation function, with the Q value of each behavior a _t to the state s _t obtained from the Q function defined by the following Expression 9, the action a _t with the highest Q value for state s _t The optical path setting program according to claim 6, wherein acceptability of the optical path setting is determined as the optimum action.
(Equation 7)
_{_{_{s t = (N 1 (t}}} ), N 2 (t), ···, N M (t), I (t-1))
(Equation 8)
r _{t + 1} = β (R ₁ N ₁ (t) + R ₂ N ₂ (t) +... + R _M N _M (t)) − (1-β) CI (t) (where 0 <R _M <R _M-1 <... <R _i <... <R ₂ <R ₁ <= 1, 0 <= β <= 1, 0 <C)
(Equation 9)
Q (s _t , a _t ) ← Q (s _t , a _t ) + α [r _{t + 1} + γmaxQ (s _{t + 1} , a) −Q (s _t , a _t )] (where 0 <α <= 1, 0 <= γ <1)

The step of setting the optimal action pair (s _t , a _t ) obtained by the optical path setting method of the reinforcement learning algorithm according to claim 3 as an initial value is further provided. Optical path setting program.

An optical switch for performing optical path setting having a plurality of service classes from a transmission node to a reception node in a wavelength division multiplexing network,
An optical switch equipped with the program according to claim 6.

Each node in the wavelength division multiplexing network having a plurality of service classes includes an optical transmission means, an optical reception means, an optical / electrical conversion means, an electric / optical conversion means, an arithmetic means, and a storage means,
The storage unit stores table information of the link state and the optimum behavior set by the optical path setting method according to any one of claims 1 to 5,
When receiving the service class i optical path setup request, the computing means determines whether to reject the optical path setup request according to the optimum behavior corresponding to the current link state using the table information. Updating the Q value of each action for each state using the Q function;
An optical information communication system.

The computing means generates the table information online based on the network topology, the number of nodes, the number of usable wavelengths, the average optical path setting request interval of each class, and the average optical path usage time of each class. The optical information communication system according to claim 10.

11. The optical information communication system according to claim 10, further comprising parameter adjusting means capable of adjusting a parameter of the reward function for achieving a balance between service differentiation and wavelength utilization efficiency.