JP4400837B1

JP4400837B1 - Bond portfolio control device, bond portfolio control program, and bond portfolio control method

Info

Publication number: JP4400837B1
Application number: JP2009065320A
Authority: JP
Inventors: 康成前田; 正清鈴木; 淳中垣; 耕史桂; 俊機門井; 吉晴加室
Original assignee: Kitami Institute of Technology NUC
Current assignee: Kitami Institute of Technology NUC
Priority date: 2009-03-17
Filing date: 2009-03-17
Publication date: 2010-01-20
Anticipated expiration: 2029-03-17
Also published as: JP2010218319A

Abstract

【課題】設定した融資限度額の大小や経済環境等の外部要因の状況を将来の信用格付の遷移確率に反映することが可能な、債権の制御において最適な政策を選択するために用いられる債権ポートフォリオ制御装置等を提供する。
【解決手段】債権の初期状態、外部要因の初期状態、及び制御期間長が与えられると、最適政策算出部が行動決定部と連携して、各時点の債権の各状態および外部要因の各状態における各行動のもとでの期待利得を用いて、制御期間の期待総利得を最大にすることが保証された最適政策を出力する。行動決定部では、債権の状態、外部要因の状態、及び時点が与えられると、当該時点の当該状態においてそれ以降の期待総利得を最大にする最適行動と期待総利得の最大値を出力する。
【選択図】図１PROBLEM TO BE SOLVED: A receivable used to select an optimal policy in controlling receivables, which can reflect the state of external factors such as the size of the set credit limit and the economic environment in the transition probability of future credit ratings Providing portfolio control devices.
SOLUTION: Given an initial state of a bond, an initial state of an external factor, and a control period length, an optimal policy calculation unit cooperates with an action determination unit to each state of the bond at each time point and each state of an external factor Using the expected gain under each action in, output an optimal policy that is guaranteed to maximize the expected total gain in the control period. In the behavior determination unit, given the state of the bond, the state of the external factor, and the time point, the behavior determining unit outputs the optimum behavior for maximizing the expected total gain thereafter and the maximum value of the expected total gain in the state at the time point.
[Selection] Figure 1

Description

本発明は、債権の制御に最適な政策を選択するために用いられる債権ポートフォリオ制御装置、債権ポートフォリオ制御プログラム及び債権ポートフォリオ制御方法に関するものである。 The present invention relates to a bond portfolio control device, a bond portfolio control program, and a bond portfolio control method used for selecting an optimal policy for bond control.

近年、金融工学の分野においても、金融資産のリスク管理等の目的で、様々な確率モデルが利用されるようになっている（例えば、特許文献１参照）。 In recent years, also in the field of financial engineering, various probability models have been used for the purpose of risk management of financial assets (for example, see Patent Document 1).

例えば、金融機関が顧客企業への融資を決定する際の判断材料の一つとして、各企業の信用格付に関する情報が用いられているが、企業の信用格付を状態としてとらえ、信用格付の変化についてマルコフ連鎖を用いてモデル化して評価することが行なわれている（例えば、特許文献２参照）。また、各々の信用格付状態で得られる効用とマルコフ連鎖で評価される将来の各々の信用格付状態の確率から算出される期待効用や、効用の分散などを用いた融資の判断方法がある。 For example, information on the credit rating of each company is used as one of the judgment materials when a financial institution decides to lend to a client company. Modeling and evaluation using a Markov chain is performed (for example, see Patent Document 2). In addition, there is a method for judging a loan using the utility obtained in each credit rating state and the expected utility calculated from the probability of each future credit rating state evaluated by Markov chain, and the dispersion of utility.

特開２００２−２３０２８０号公報、ｐ．１８JP 2002-230280, p. 18

小野覚、「金融リスクマネジメント」、東洋経済新報社、２００２年６月、ｐ．１３７−１７５Ono, “Financial Risk Management”, Toyo Keizai Inc., June 2002, p. 137-175

ところで、上述した従来の融資判断方法は、金融機関が企業を対象にして融資の判断を行う際に用いられる方法であり、個人に対する融資限度額を設定する際に用いられているものではない。 By the way, the above-described conventional loan determination method is a method used when a financial institution makes a loan determination for a company, and is not used when setting a loan limit for an individual.

個人への融資限度額を設定する際には、設定した融資限度額の大小によって、例えば、融資限度額を大きく設定したために債務が過大となって信用格付が低下するなど、将来の個人の信用格付状態への遷移確率が異なることが起こり得るものである。しかしながら、従来の企業に対する融資判断方法では、単に信用格付の状態遷移にマルコフ連鎖を仮定しているだけで、融資限度額の大小によって将来の信用格付状態への遷移確率が異なるようなモデル化は検討されていない。 When setting a loan limit for an individual, the credit rating of the individual in the future may be reduced depending on the size of the set loan limit. Different transition probabilities to rating states can occur. However, the conventional loan judgment method for companies simply assumes a Markov chain for the credit rating state transition, and modeling that the transition probability to the future credit rating state differs depending on the size of the loan limit is not possible. Not considered.

また、経済環境等の外部要因の状況によって、例えば、経済環境が悪化した環境下では信用格付が低下しやすくなるなど、個人の信用格付状態の遷移確率が異なることが起こり得るものであるが、従来の企業に対する融資判断方法では、経済環境等の外部要因の状況によって遷移確率が異なるようなモデル化は行われていない。 Also, depending on the external factors such as the economic environment, the probability of transition of the individual credit rating state may be different, for example, the credit rating is likely to decrease in an environment where the economic environment has deteriorated. In conventional loan judgment methods for companies, modeling is not performed in which the transition probability varies depending on the external factors such as the economic environment.

本発明は、従来の企業に対する融資判断方法を、個人への融資限度額の設定に適用しようとする際に生じるこれらの問題点に対し、設定した融資限度額の大小や、経済環境等の外部要因の状況を、将来の信用格付の遷移確率に反映することが可能な、融資債権等の債権の制御において、融資限度額設定等の最適な政策を選択するために用いられる債権ポートフォリオ制御装置、債権ポートフォリオ制御プログラム及び債権ポートフォリオ制御方法を提供することを目的とするものである。 The present invention addresses the problems that arise when trying to apply a conventional loan judgment method for a company to the setting of a loan limit for an individual. A loan portfolio control device used to select an optimal policy such as setting a loan limit in the control of loans such as loan claims that can reflect the status of factors in the transition probability of future credit ratings, It is an object to provide a bond portfolio control program and a bond portfolio control method.

本発明は、債権の制御に最適な政策を選択するために用いられる債権ポートフォリオ制御装置であって、債権の初期状態ｘ _１、外部要因の初期状態ｚ _１、及び制御期間長Ｔの入力を受け付ける初期条件受付手段と、前記初期条件受付手段が入力を受け付けた債権の初期状態ｘ _１と外部要因の初期状態ｚ _１の組合せを起点にして、前記初期条件受付手段が入力を受け付けた制御期間長Ｔまでの各時点ｔ（１≦ｔ≦Ｔ）における債権の状態ｘ _ｔと外部要因の状態ｚ _ｔの組合せの遷移を各々のノードに展開したＤＰグラフを作成するＤＰグラフ作成手段と、一の時点ｔにおいて外部要因の状態ｚ _ｔで行動ｙ _ｔを選択した場合に、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移する遷移確率を、債権の状態ｘ _ｔと債権の状態ｘ _ｔ＋１と外部要因の状態ｚ _ｔと行動ｙ _ｔの組合せ毎に債権状態遷移確率として記憶し、かつ、一の時点ｔにおける外部要因の状態ｚ _ｔが次の時点ｔ＋１において外部要因の状態ｚ _ｔ＋１へと遷移する遷移確率を、外部要因の状態ｚ _ｔと外部要因の状態ｚ _ｔ＋１の組合せ毎に外部要因遷移確率として記憶する遷移確率記憶手段と、一の時点ｔにおいて行動ｙ _ｔを選択し、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移した場合に得られる期待利得を、債権の状態ｘ _ｔと債権の状態ｘ _ｔ＋１と行動ｙ _ｔの組合せ毎に記憶する期待利得記憶手段と、前記ＤＰグラフ作成手段が作成したＤＰグラフの各ノードにおける債権の状態ｘ _ｔと外部要因の状態ｚ _ｔの組合せに対して、前記遷移確率記憶手段から、時点ｔにおいて外部要因の状態ｚ _ｔで行動ｙ _ｔを選択した場合に、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移する債権状態遷移確率と、時点ｔにおける外部要因の状態ｚ _ｔが次の時点ｔ＋１において外部要因の状態ｚ _ｔ＋１へと遷移する外部要因遷移確率を読み出し、かつ、前記前記期待利得記憶手段から、時点ｔにおいて行動ｙ _ｔを選択し、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移した場合に得られる期待利得を読み出し、前記ＤＰグラフの末端（ｔ＝Ｔ）の各ノードから順に、各ノードにおいて選択し得る行動毎の期待総利得を、前記遷移確率記憶手段から読み出した各々の行動に対応する組合せの遷移確率と、前記期待利得記憶手段から読み出した各々の行動に対応する組合せの期待利得から算出して、期待総利得が最大となる行動を最適行動に決定する最適行動決定手段と、前記ＤＰグラフの全てのノードについて前記最適行動決定手段が決定した最適行動を、前記債権を制御するための最適政策として出力する最適政策出力手段と、を備え、前記最適行動決定手段は、ｔ＝Ｔとなる各ノードについては、選択可能な各々の行動ｙ _ｔについて、遷移可能な各々の債権の状態ｘ _ｔ＋１につき、該当する行動と債権の状態の組合せに対して読み出した前記期待利得に、該当する外部要因の状態と行動と債権の状態の組合せに対して読み出した前記債権状態遷移確率を乗算し、その総和である期待総利得を演算して、前記期待総利得が最大となる行動ｙ _ｔを最適行動に決定し、１≦ｔ≦Ｔ−１となる各ノードについては、選択可能な各々の行動ｙ _ｔについて、遷移可能な各々の債権の状態ｘ _ｔ＋１につき、該当する行動と債権の状態の組合せに対して読み出した前記期待利得と、遷移可能な各々の外部要因の状態ｚ _ｔ＋１についての該当するノードについて演算された前記期待総利得の最大値に読み出した前記外部要因遷移確率を乗じた値の総和とを加算した値に対して、該当する外部要因の状態と行動と債権の状態の組合せに対して読み出した前記債権状態遷移確率を乗算し、その総和である期待総利得を演算して、前記期待総利得が最大となる行動ｙ _ｔを最適行動に決定することを、ｔ＝１となる各ノードまで繰り返すことを特徴とする債権ポートフォリオ制御装置である。
The present invention is a bond portfolio control device used for selecting an optimal policy for bond control, and accepts input of bond initial state x ₁ , external factor initial state z ₁ , and control period length T. and initial condition receiving means, and the initial condition accepting means starting from the combination of the initial state z ₁ in the initial state x ₁ and external factors of claims input has been received, the initial condition control period length accepting means accepts an input and DP graph creation means for creating a DP graph developed transition of each of the nodes combination of states x _t and external factors of the state z _t creditors at each time point t (1 ≦ t ≦ T) to T, one when you select an action y _t by external factors state z _t at time t, creditors at time t the state x _t is the transition probability of transition to state x _{t + 1} of the claim at the next time point t + 1, creditor Stored as a receivable state transition probability for each combination of states x _t and state receivables x _{t + 1} and external factors state z _t and action y _t, and, when the state z _t of the external factors is next in one time t t + 1 in a transition probability of transition to the state z _{t + 1} of the external factors, the transition probability storage means for storing as external factors transition probability for each combination of states z _{t + 1} state z _t and external factors external factors, in one point in time t select action y _t, the state x _t creditors at time t is the expected gain obtained when a transition to the state x _{t + 1} of the claim at the next time point t + 1, the state x _{t + 1} of the state x _t and receivables receivables an expected gain storing means for storing for each combination of action y _t, the combination of states z _t of the state x _t and external factors receivable at each node of the DP graph the DP graph producing means, From serial transition probability storage means, when the user selects the action y _t by external factors state z _t at time t, receivables state x _t creditors at time t is changed to the state x _{t + 1} of the claim at the next time point t + 1 Read out the state transition probability and the external factor transition probability that the external factor state z _{t at} the time t transitions to the external factor state z _{t + 1} at the next time t + 1 , and from the expected gain storage means at the time t select action y _t, reads an expected gain state x _t creditors at time t is obtained when a transition to the state x _{t + 1} of the claim at the next time point t + 1, of the DP graph terminus of (t = T) in order from the node, the expected total gain for each action that can be selected in each node, and the transition probabilities of the combination corresponding to each of the actions read from said transition probability storage means Calculated from the expected gain of the combination corresponding to each of the actions read from the expected gain storage unit, and an optimum motion determination unit for determining the optimal action the action expected total gain is maximized, all nodes of the DP graph And an optimum policy output means for outputting the optimum action determined by the optimum action determination means as an optimum policy for controlling the bond, and the optimum action determination means for each node where t = T. For each selectable action y _t , the state and action of the external factor corresponding to the expected gain read for the combination of the applicable action and the state of the claim for each transitionable claim state x _{t + 1} And the bond state transition probability read out for the combination of bond states, and calculating the expected total gain, which is the sum, and the action y that yields the maximum expected gain _t is determined as the optimum action, and for each node satisfying 1 ≦ t ≦ T−1, for each selectable action y _t , for each transitionable bond state x _{t + 1} , the corresponding action and bond The expected gain read for the combination of states and the maximum value of the expected total gain calculated for the corresponding node for each transitionable external factor state z _{t + 1} is multiplied by the read external factor transition probability. The total sum of the values is multiplied by the credit state transition probabilities read for the combination of the relevant external factor state, action, and claim state, and the expected total gain, which is the sum, is calculated. Then , the bond portfolio control apparatus is characterized in that determining the action y _t having the maximum expected total gain as the optimum action is repeated until each node where t = 1 .

本発明においては、最適行動を決定する際に、一の時点において選択された行動に応じて債権の状態が遷移する各々のパターンの遷移確率を用いることによって、融資限度額の設定等の行動を将来の信用格付等の債権の状態の遷移確率に反映し、期待総利得が最大となるように債権を制御するための最適政策を出力することが可能になる。
In the present invention , when determining the optimum behavior, by using the transition probability of each pattern in which the state of the bond transitions according to the behavior selected at one time point, the behavior such as setting a loan limit is performed. It is possible to output an optimum policy for controlling the receivable so that the expected total gain is maximized, reflecting the transition probability of the receivable state such as a credit rating in the future.

本発明においては、最適行動を決定する際に、一の時点から次の時点において外部要因の状態に応じて債権の状態が遷移する各々のパターンの遷移確率を用いることによって、経済環境等の外部要因の状況を将来の信用格付等の債権の状態の遷移確率に反映し、期待総利得が最大となるように債権を制御するための最適政策を出力することが可能になる。
In the present invention , when determining the optimum behavior, by using the transition probability of each pattern in which the state of the bond transitions according to the state of the external factor from one time point to the next time point, the external environment such as the economic environment is used. It is possible to output the optimum policy for controlling the receivable so that the expected total gain is maximized by reflecting the state of the factor in the state transition probability of the receivable such as credit rating in the future.

このように構成すると、融資限度額の設定等の行動と経済環境等の外部要因の状況の双方を将来の信用格付等の債権の状態の遷移確率に反映し、期待総利得が最大となるように債権を制御するための最適政策を出力することが可能になる。 In this way, the expected total gain is maximized by reflecting both the behavior of setting the loan limit and the external factors such as the economic environment in the transition probability of the credit status such as the future credit rating. It is possible to output the optimal policy for controlling the bond.

このように構成すると、経済環境等の外部要因の状況が遷移する確率を反映し、期待総利得が最大となるように債権を制御するための最適政策を出力することが可能になる。 With this configuration, it is possible to output an optimal policy for controlling the bond so as to maximize the expected total gain, reflecting the probability that the external factors such as the economic environment will change.

さらに、本発明は、前記初期条件受付手段は、対象となる債権が新規債権か既存債権かを識別する債権識別情報の入力を受け付け、前記期待利得記憶手段には、新規債権を対象にした、最初の時点１において行動ｙ _１を選択し、時点１における債権の状態ｘ _１が次の時点２において債権の状態ｘ _２へと遷移した場合に得られる期待利得が、債権の状態ｘ _１と債権の状態ｘ _２と行動ｙ _１の組合せ毎に、既存債権についての期待利得とは別に記憶されていて、前記最適行動決定手段は、前記初期条件受付手段が新規債権を示す債権識別情報の入力を受け付けている場合には、ｔ＝１となるノードについては、前記期待利得記憶手段から新規債権についての期待利得を読み出し、前記期待利得を適用して期待総利得を演算し、前記最適行動を決定することを特徴とすることもできる。
Further, in the present invention, the initial condition accepting means accepts input of claim identification information for identifying whether the subject claim is a new claim or an existing claim, and the expected gain storage means targets the new claim , select action y ₁ at the first time point 1, the expected gain states x ₁ creditor at time 1 is obtained when a transition to the state x ₂ creditors at the next time point 2, the state x ₁ and creditors receivables For each combination of the state x ₂ and the action y ₁ , stored separately from the expected gain for the existing bond, and the optimum behavior determination means inputs the claim identification information indicating the new claim by the initial condition receiving means. If you are accepted for the node to be t = 1, the expected from the gain storage unit reads the expected gain for a new loan, by applying the expected gain calculating an expected total gain, determined the optimal action It can also be characterized.

このように構成すると、過去の実績を有しない新規債権について生じる固有の影響も考慮した、期待総利得が最大となるように債権を制御するための最適政策を出力することが可能になる。 With this configuration, it is possible to output an optimal policy for controlling the bond so that the expected total gain is maximized, taking into account the inherent effects of new bonds that do not have a past record.

本発明は、本発明にかかる債権ポートフォリオ制御装置に備えられる債権ポートフォリオ制御プログラムとして特定することもできる。 The present invention can also be specified as a bond portfolio control program provided in the bond portfolio control apparatus according to the present invention.

本発明に対応する債権ポートフォリオ制御プログラムは、債権の制御に最適な政策を選択するために用いられる債権ポートフォリオ制御プログラムであって、一の時点ｔにおいて外部要因の状態ｚ _ｔで行動ｙ _ｔを選択した場合に、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移する遷移確率を、債権の状態ｘ _ｔと債権の状態ｘ _ｔ＋１と外部要因の状態ｚ _ｔと行動ｙ _ｔの組合せ毎に債権状態遷移確率として記憶し、かつ、一の時点ｔにおける外部要因の状態ｚ _ｔが次の時点ｔ＋１において外部要因の状態ｚ _ｔ＋１へと遷移する遷移確率を、外部要因の状態ｚ _ｔと外部要因の状態ｚ _ｔ＋１の組合せ毎に外部要因遷移確率として記憶する遷移確率記憶手段と、一の時点ｔにおいて行動ｙ _ｔを選択し、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移した場合に得られる期待利得を、債権の状態ｘ _ｔと債権の状態ｘ _ｔ＋１と行動ｙ _ｔの組合せ毎に記憶する期待利得記憶手段とを備えた債権ポートフォリオ制御装置に、債権の初期状態ｘ _１、外部要因の初期状態ｚ _１、及び制御期間長Ｔの入力を受け付ける初期条件受付ステップと、前記初期条件受付ステップで入力を受け付けた債権の初期状態ｘ _１と外部要因の初期状態ｚ _１の組合せを起点にして、前記初期条件受付ステップで入力を受け付けた制御期間長Ｔまでの各時点ｔ（１≦ｔ≦Ｔ）における債権の状態ｘ _ｔと外部要因の状態ｚ _ｔの組合せの遷移を各々のノードに展開したＤＰグラフを作成するＤＰグラフ作成ステップと、前記ＤＰグラフ作成ステップで作成したＤＰグラフの各ノードにおける債権の状態ｘ _ｔと外部要因の状態ｚ _ｔの組合せに対して、前記遷移確率記憶手段から、時点ｔにおいて外部要因の状態ｚ _ｔで行動ｙ _ｔを選択した場合に、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移する債権状態遷移確率と、時点ｔにおける外部要因の状態ｚ _ｔが次の時点ｔ＋１において外部要因の状態ｚ _ｔ＋１へと遷移する外部要因遷移確率を読み出し、かつ、前記前記期待利得記憶手段から、時点ｔにおいて行動ｙ _ｔを選択し、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移した場合に得られる期待利得を読み出し、前記ＤＰグラフの末端（ｔ＝Ｔ）の各ノードから順に、各ノードにおいて選択し得る行動毎の期待総利得を、前記遷移確率記憶手段から読み出した各々の行動に対応する組合せの遷移確率と、前記期待利得記憶手段から読み出した各々の行動に対応する組合せの期待利得から算出して、期待総利得が最大となる行動を最適行動に決定する最適行動決定ステップと、前記ＤＰグラフの全てのノードについて前記最適行動決定ステップで決定した最適行動を、前記債権を制御するための最適政策として出力する最適政策出力ステップと、を実行させ、前記最適行動決定ステップでは、ｔ＝Ｔとなる各ノードについては、選択可能な各々の行動ｙ _ｔについて、遷移可能な各々の債権の状態ｘ _ｔ＋１につき、該当する行動と債権の状態の組合せに対して読み出した前記期待利得に、該当する外部要因の状態と行動と債権の状態の組合せに対して読み出した前記債権状態遷移確率を乗算し、その総和である期待総利得を演算して、前記期待総利得が最大となる行動ｙ _ｔを最適行動に決定し、１≦ｔ≦Ｔ−１となる各ノードについては、選択可能な各々の行動ｙ _ｔについて、遷移可能な各々の債権の状態ｘ _ｔ＋１につき、該当する行動と債権の状態の組合せに対して読み出した前記期待利得と、遷移可能な各々の外部要因の状態ｚ _ｔ＋１についての該当するノードについて演算された前記期待総利得の最大値に読み出した前記外部要因遷移確率を乗じた値の総和とを加算した値に対して、該当する外部要因の状態と行動と債権の状態の組合せに対して読み出した前記債権状態遷移確率を乗算し、その総和である期待総利得を演算して、前記期待総利得が最大となる行動ｙ _ｔを最適行動に決定することを、ｔ＝１となる各ノードまで繰り返すことを特徴とする債権ポートフォリオ制御プログラムである。
The loan portfolio control program corresponding to the present invention is a bond portfolio control program used to select an optimal policy for bond control, and selects an action y _t with a state z _t of an external factor at one time point t. when the transition probability state x _t creditors at time t is changed to the state x _{t + 1} of the claim at the next time point t + 1, and the state z _t of the state x _{t + 1} and external factors of the state x _t and receivables receivables stored as a receivable state transition probability for each combination of action y _t, and the transition probability state z _t of the external factors it is changed to the state z _{t + 1} external factors at the next time point t + 1 in one point in time t, external factors The transition probability storage means for storing the external factor transition probability for each combination of the state z _t and the external factor state z _{t + 1} , and the action y _t at one time point t , The expected gain state x _t creditors at time t is obtained when a transition to the state x _{t + 1} of the claim at the next time point t + 1, for each combination of states of claims x state of _t and creditors x _{t + 1} and Action y _t An initial condition receiving step for receiving an input of an initial state x ₁ of a bond, an initial state z ₁ of an external factor, and a control period length T in a bond portfolio control device comprising an expected gain storage means for storing; the combination of the initial state z ₁ in the initial state x ₁ and external factors of claims input has been received in step as a starting point, the initial condition reception each time point t (1 ≦ t until the control period length T, the input of which is accepted in step A DP graph creating step for creating a DP graph in which transitions of combinations of the bond state x _t and the external factor state z _t in each node are generated in ≦ T), and the DP graph For the combination of the bond state x _t and the external factor state z _t at each node of the DP graph created in the creation step, the action y _t is applied from the transition probability storage means at the external factor state z _t at the time t. outside If selected, the creditor state transition probability state x _t creditors at time t is changed to the state x _{t + 1} of the claim at the next time point t + 1, state z _t of external factors at time t is at the next time point t + 1 The external factor transition probability of transitioning to the factor state z _{t + 1} is read, and the action y _t is selected at the time t from the expected gain storage means , and the bond state x _{t at} the time t is the next time t + 1 reading an expected gain obtained when a transition to the state x _{t + 1} of the claims, in order from the end nodes (t = T) of the DP graph, at each node Expected total gain for each action that may be-option, the transition probability of the combination corresponding to each of the actions read from said transition probability storage means, from the expected gain of the combination corresponding to each of the actions read from the expected gain storage means calculated to the optimal action determining step of determining the optimal action the action expected total gain is maximized, the optimal action determined by the optimal action determining step for all the nodes of the DP graph, for controlling the creditor of the optimal policy output step of outputting as the optimum policy is run, in the optimum action determining step, for each node to be t = T, the behavior y _t of selectable each transition can each receivable per state x _{t + 1,} the expected payoff read for a combination of state of the relevant actions and claims Zhou of the state of the relevant external factors act and creditors Multiplies the receivable state transition probability read out of the combination, by calculating a is expected total gain the sum to determine the action y _t of the expected total gain is maximized optimal behavior, 1 ≦ t ≦ For each node that becomes T−1, for each selectable action y _t , the expected gain read for the combination of the corresponding action and claim state for each transitionable claim state x _{t + 1} , , For a value obtained by adding the sum of the values obtained by multiplying the read value of the external factor transition probability to the maximum value of the expected total gain calculated for the corresponding node for each transitionable external factor state z _{t + 1} A row where the expected total gain is maximized by multiplying the read state transition probability read out for the combination of the state of the relevant external factor, the action and the state of the claim, and calculating the expected total gain which is the sum of them. determining a y _t the optimal action is receivable portfolio control program and repeating until each node to be t = 1.

さらに、本発明に対応する債権ポートフォリオ制御プログラムは、前記初期条件受付ステップでは、対象となる債権が新規債権か既存債権かを識別する債権識別情報の入力を受け付け、前記期待利得記憶手段には、新規債権を対象にした、最初の時点１において行動ｙ _１を選択し、時点１における債権の状態ｘ _１が次の時点２において債権の状態ｘ _２へと遷移した場合に得られる期待利得が、債権の状態ｘ _１と債権の状態ｘ _２と行動ｙ _１の組合せ毎に、既存債権についての期待利得とは別に記憶されていて、前記最適行動決定ステップでは、前記初期条件受付ステップで新規債権を示す債権識別情報の入力を受け付けている場合には、ｔ＝１となるノードについては、前記期待利得記憶手段から新規債権についての期待利得を読み出し、前記期待利得を適用して期待総利得を演算し、前記最適行動を決定することを特徴とすることもできる。
Furthermore, in the claim portfolio control program corresponding to the present invention, in the initial condition accepting step, accepts input of claim identification information for identifying whether the subject claim is a new claim or an existing claim, and the expected gain storage means and new claims to a subject, and select an action y ₁ at the first time point 1, the expected gain states x ₁ creditor at time 1 is obtained when a transition to the state x ₂ creditors at the next time point 2, For each combination of bond state x ₁ , bond state x ₂ and action y ₁ , it is stored separately from the expected gain for the existing bond. If you are accepting an input of the claim identification information indicating, for the node to be t = 1, read an expected gain for new loans from the expected gain storage means, The expected total gain may be calculated by applying the expected gain to determine the optimum action.

本発明は、本発明にかかる債権ポートフォリオ制御装置によって実行される、債権ポートフォリオ制御方法として特定することもできる。 The present invention can also be specified as a bond portfolio control method executed by the bond portfolio control apparatus according to the present invention.

本発明に対応する債権ポートフォリオ制御方法は、債権の制御に最適な政策を選択するために用いられる債権ポートフォリオ制御方法であって、一の時点ｔにおいて外部要因の状態ｚ _ｔで行動ｙ _ｔを選択した場合に、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移する遷移確率を、債権の状態ｘ _ｔと債権の状態ｘ _ｔ＋１と外部要因の状態ｚ _ｔと行動ｙ _ｔの組合せ毎に債権状態遷移確率として記憶し、かつ、一の時点ｔにおける外部要因の状態ｚ _ｔが次の時点ｔ＋１において外部要因の状態ｚ _ｔ＋１へと遷移する遷移確率を、外部要因の状態ｚ _ｔと外部要因の状態ｚ _ｔ＋１の組合せ毎に外部要因遷移確率として記憶する遷移確率記憶手段と、一の時点ｔにおいて行動ｙ _ｔを選択し、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移した場合に得られる期待利得を、債権の状態ｘ _ｔと債権の状態ｘ _ｔ＋１と行動ｙ _ｔの組合せ毎に記憶する期待利得記憶手段とを備えた債権ポートフォリオ制御装置が、債権の初期状態ｘ _１、外部要因の初期状態ｚ _１、及び制御期間長Ｔの入力を受け付ける初期条件受付ステップと、前記債権ポートフォリオ制御装置が、前記初期条件受付ステップで入力を受け付けた債権の初期状態ｘ _１と外部要因の初期状態ｚ _１の組合せを起点にして、前記初期条件受付ステップで入力を受け付けた制御期間長Ｔまでの各時点ｔ（１≦ｔ≦Ｔ）における債権の状態ｘ _ｔと外部要因の状態ｚ _ｔの組合せの遷移を各々のノードに展開したＤＰグラフを作成するＤＰグラフ作成ステップと、前記債権ポートフォリオ制御装置が、前記ＤＰグラフ作成ステップで作成したＤＰグラフの各ノードにおける債権の状態ｘ _ｔと外部要因の状態ｚ _ｔの組合せに対して、前記遷移確率記憶手段から、時点ｔにおいて外部要因の状態ｚ _ｔで行動ｙ _ｔを選択した場合に、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移する債権状態遷移確率と、時点ｔにおける外部要因の状態ｚ _ｔが次の時点ｔ＋１において外部要因の状態ｚ _ｔ＋１へと遷移する外部要因遷移確率を読み出し、かつ、前記前記期待利得記憶手段から、時点ｔにおいて行動ｙ _ｔを選択し、時点ｔにおける債権の状態ｘ _ｔが次の時点ｔ＋１において債権の状態ｘ _ｔ＋１へと遷移した場合に得られる期待利得を読み出し、前記ＤＰグラフの末端（ｔ＝Ｔ）の各ノードから順に、各ノードにおいて選択し得る行動毎の期待総利得を、前記遷移確率記憶手段から読み出した各々の行動に対応する組合せの遷移確率と、前記期待利得記憶手段から読み出した各々の行動に対応する組合せの期待利得から算出して、期待総利得が最大となる行動を最適行動に決定する最適行動決定ステップと、前記債権ポートフォリオ制御装置が、前記ＤＰグラフの全てのノードについて前記最適行動決定ステップで決定した最適行動を、前記債権を制御するための最適政策として出力する最適政策出力ステップと、を有していて、前記最適行動決定ステップでは、ｔ＝Ｔとなる各ノードについては、選択可能な各々の行動ｙ _ｔについて、遷移可能な各々の債権の状態ｘ _ｔ＋１につき、該当する行動と債権の状態の組合せに対して読み出した前記期待利得に、該当する外部要因の状態と行動と債権の状態の組合せに対して読み出した前記債権状態遷移確率を乗算し、その総和である期待総利得を演算して、前記期待総利得が最大となる行動ｙ _ｔを最適行動に決定し、１≦ｔ≦Ｔ−１となる各ノードについては、選択可能な各々の行動ｙ _ｔについて、遷移可能な各々の債権の状態ｘ _ｔ＋１につき、該当する行動と債権の状態の組合せに対して読み出した前記期待利得と、遷移可能な各々の外部要因の状態ｚ _ｔ＋１についての該当するノードについて演算された前記期待総利得の最大値に読み出した前記外部要因遷移確率を乗じた値の総和とを加算した値に対して、該当する外部要因の状態と行動と債権の状態の組合せに対して読み出した前記債権状態遷移確率を乗算し、その総和である期待総利得を演算して、前記期待総利得が最大となる行動ｙ _ｔを最適行動に決定することを、ｔ＝１となる各ノードまで繰り返すことを特徴とする債権ポートフォリオ制御方法である。
The bond portfolio control method corresponding to the present invention is a bond portfolio control method used for selecting an optimal policy for bond control, and selects an action y _t with a state z _t of an external factor at one time point t. when the transition probability state x _t creditors at time t is changed to the state x _{t + 1} of the claim at the next time point t + 1, and the state z _t of the state x _{t + 1} and external factors of the state x _t and receivables receivables stored as a receivable state transition probability for each combination of action y _t, and the transition probability state z _t of the external factors it is changed to the state z _{t + 1} external factors at the next time point t + 1 in one point in time t, external factors Transition probability storage means for storing the external factor transition probability for each combination of the state z _t and the external factor state z _{t + 1} , and the action y _t is selected at one time point t. State x _t creditor stores the expected gain obtained when a transition to the state x _{t + 1} of the claim at the next time point t + 1, for each combination of states x the state of _t and creditors x _{t + 1} and action y _t creditors that A bond portfolio control device comprising an expected gain storage means receives an initial condition receiving step x ₁ , an initial state z _{1 of} external factors, and an input of a control period length T ; the initial condition combining the initial state z ₁ in the initial state x ₁ and external factors of claims input has been received at the reception step as a starting point, and each time point to the initial condition control period length T, the input of which is accepted by the accepting step DP graph creation step for creating a DP graph in which transitions of combinations of bond state x _t and external factor state z _t at t (1 ≦ t ≦ T) are expanded to each node. And the bond portfolio control device, for the combination of the bond state x _t and the external factor state z _t at each node of the DP graph created in the DP graph creation step, from the transition probability storage means , when you select an action y _t by external factors state z _t at time t, and creditors state transition probability state x _t creditors at time t is changed to the state x _{t + 1} of the claim at the next time point t + 1, time t Reads the external factor transition probability that the external factor state z _t at the next time point t + 1 transitions to the external factor state z _{t + 1} , and selects the action y _t at the time point t from the expected gain storage means , reading an expected gain state x _t creditors at time t is obtained when a transition to the state x _{t + 1} of the claim at the next time point t + 1, the DP grayed In order from the node off of the end (t = T), the expectation total gain for each action that can be selected in each node, and the transition probabilities of the combination corresponding to each of the actions read from said transition probability storage means, the expected calculated from the expected gain of the corresponding combination to each action read from the gain storage unit, and optimum behavior decision step of determining the optimal action the action expected total gain is maximized, the receivable portfolio controller, the DP An optimal policy output step for outputting the optimal behavior determined in the optimal behavior determination step for all nodes of the graph as an optimal policy for controlling the bond, wherein in the optimal behavior determination step, t = for each node comprising as T, the behavior y _t of selectable respectively, per state x _{t + 1} transition possible each receivable, appropriate action Multiplying the expected gain read for the combination of bond status by the credit status transition probability read for the combination of the state of the relevant external factor and the action and the status of the bond, the expected total gain that is the sum And the action y _t having the maximum expected total gain is determined as the optimum action, and for each node satisfying 1 ≦ t ≦ T−1 , transition is possible for each selectable action y _t For each bond state x _{t + 1} , the expected gain read for the combination of the corresponding action and bond state, and the expectation computed for the corresponding node for each transitionable external factor state z _{t + 1} Before reading the maximum value of total gain and the sum of the values multiplied by the read external factor transition probabilities for the combination of the status of external factors, action and bond status Multiplied by the creditor state transition probability, that by calculating the expected total gain is the sum, repeat what the expected total gain is determined to optimal action action y _t with the maximum, until each node to be t = 1 This is a bond portfolio control method characterized by

さらに、本発明に対応する債権ポートフォリオ制御方法は、前記債権ポートフォリオ制御装置は、前記初期条件受付ステップでは、対象となる債権が新規債権か既存債権かを識別する債権識別情報の入力を受け付け、前記期待利得記憶手段には、新規債権を対象にした、最初の時点１において行動ｙ _１を選択し、時点１における債権の状態ｘ _１が次の時点２において債権の状態ｘ _２へと遷移した場合に得られる期待利得が、債権の状態ｘ _１と債権の状態ｘ _２と行動ｙ _１の組合せ毎に、既存債権についての期待利得とは別に記憶されていて、前記債権ポートフォリオ制御装置は、前記最適行動決定ステップでは、前記初期条件受付ステップで新規債権を示す債権識別情報の入力を受け付けている場合には、ｔ＝１となるノードについては、前記期待利得記憶手段から新規債権についての期待利得を読み出し、前記期待利得を適用して期待総利得を演算し、前記最適行動を決定することを特徴とすることもできる。 Furthermore, in the claim portfolio control method corresponding to the present invention, the claim portfolio control device accepts input of claim identification information for identifying whether the subject claim is a new claim or an existing claim in the initial condition receiving step, In the expected gain storage means, when the action y ₁ is selected at the first time point 1 for the new bond, and the state x ₁ of the bond at the time point 1 transitions to the state x ₂ of the bond at the next time point 2 expectations gain obtained is, for each combination of states x ₁ and state x ₂ and action y ₁ receivables receivables, the expected gain for existing loans have been separately stored, the loan portfolio control device, the optimum the behavior decision step, if they accept input receivables identification information indicating the new claims in the initial condition accepting step, the node serving as t = 1 is It is also possible to read the expected gain for the new bond from the expected gain storage means, calculate the expected total gain by applying the expected gain, and determine the optimum action.

本発明によって、融資債権等の債権の制御において、融資限度額設定等の最適な政策を選択する際に、設定した融資限度額の大小や経済環境等の外部要因の状況を、将来の信用格付の遷移確率に反映した行動を選択することが可能になる。これによって、従来の企業に対する融資判断において用いられているマルコフ連鎖を用いたモデルによる評価方法を、個人向けの融資にも好適なものとして適用することが可能になる。 According to the present invention, when selecting an optimal policy such as setting a loan limit in controlling loans such as loan receivables, the status of external factors such as the size of the set loan limit and the economic environment is determined in the future credit rating. It is possible to select an action reflected in the transition probability. As a result, it is possible to apply the evaluation method based on the model using Markov chain, which is used in the conventional loan judgment for companies, to be suitable also for loans for individuals.

本発明にかかる債権ポートフォリオ制御装置の原理構成図である。It is a principle lineblock diagram of a bond portfolio control device concerning the present invention. 本発明にかかる債権ポートフォリオ制御装置による最適政策出力の原理を説明するフローチャートである。It is a flowchart explaining the principle of the optimal policy output by the bond portfolio control apparatus concerning this invention. 本発明にかかる債権ポートフォリオ制御装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the bond portfolio control apparatus concerning this invention. 本発明にかかる債権ポートフォリオ制御装置の最適政策算出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the optimal policy calculation part of the bond portfolio control apparatus concerning this invention. 本発明にかかる債権ポートフォリオ制御装置の行動決定部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the action determination part of the bond portfolio control apparatus concerning this invention. 従来の債権ポートフォリオ制御において作成される、既存債権に関するＤＰグラフの一例を示す図である。It is a figure which shows an example of DP graph regarding the existing bond created in conventional bond portfolio control. 本発明にかかる債権ポートフォリオ制御装置により作成される、既存債権に関するＤＰグラフの一例を示す図である。It is a figure which shows an example of DP graph regarding the existing bond created by the bond portfolio control apparatus concerning this invention. 本発明にかかる債権ポートフォリオ制御装置により作成される、新規債権に関するＤＰグラフの一例を示す図である。It is a figure which shows an example of DP graph regarding the new bond created by the bond portfolio control apparatus concerning this invention. 従来の債権ポートフォリオ制御において、遷移確率テーブルに記憶される債権の状態の遷移確率の一例を示す図である。In conventional bond portfolio control, it is a figure which shows an example of the transition probability of the state of the bond memorize | stored in a transition probability table. 本発明にかかる債権ポートフォリオ制御装置の行動決定部の遷移確率テーブルに記憶された、債権の状態の遷移確率の一例を示す図である。It is a figure which shows an example of the transition probability of the state of a loan memorize | stored in the transition probability table of the action determination part of the bond portfolio control apparatus concerning this invention. 本発明にかかる債権ポートフォリオ制御装置の行動決定部の遷移確率テーブルに記憶された、外部要因の状態の遷移確率の一例を示す図である。It is a figure which shows an example of the transition probability of the state of an external factor memorize | stored in the transition probability table of the action determination part of the bond portfolio control apparatus concerning this invention. 本発明にかかる債権ポートフォリオ制御装置の行動決定部の期待利得テーブルに記憶された、期待利得情報の一例を示す図である。It is a figure which shows an example of the expected gain information memorize | stored in the expected gain table of the action determination part of the bond portfolio control apparatus concerning this invention. 本発明にかかる債権ポートフォリオ制御装置により、ＤＰグラフの各ノードについて選択された最適行動の一例を示す図である。It is a figure which shows an example of the optimal action selected about each node of DP graph by the bond portfolio control apparatus concerning this invention.

本発明を実施するための形態について、図面を用いて以下に詳細に説明する。尚、以下の説明は、本発明の実施形態の一例を示したものであって、本発明はかかる実施形態に限定されるものではない。例えば、以下の実施形態では、外部要因の状態と選択された行動のそれぞれの組み合わせに対応した状態遷移確率を用いることによって、経済環境等の外部要因による債権の状態の遷移確率と設定した融資限度額等の行動による債権の状態の遷移確率を反映して、最適行動を決定する方法について説明するが、状態遷移確率には、外部要因の状態又は選択された行動のいずれか一方のみの相違によるものを用いることによっても、それぞれ経済環境等の外部要因による債権の状態の遷移確率、又は設定した融資限度額等の行動による債権の状態の遷移確率を反映した最適行動を決定することが可能である。本発明は、融資先となる個人の信用格付を債権の状態、経済環境等の状況を外部要因の状態、金融機関の個人顧客に対する融資限度額の設定を行動、個人顧客への融資によって金融機関が得られる収益を期待利得として、本発明を金融機関の個人顧客への融資限度額の設定に用いると好適である。 Embodiments for carrying out the present invention will be described below in detail with reference to the drawings. The following description shows an example of an embodiment of the present invention, and the present invention is not limited to such an embodiment. For example, in the following embodiment, by using the state transition probability corresponding to each combination of the state of the external factor and the selected action, the state transition probability of the bond due to the external factor such as the economic environment and the set credit limit Explains how to determine the optimal behavior reflecting the state transition probability of the bond due to behavior such as the amount, etc., but the state transition probability depends on the difference between only the state of the external factor or the selected behavior It is also possible to determine the optimal behavior that reflects the probability of transition of the credit status due to external factors such as the economic environment, or the probability of transition of the credit status due to the action such as the set credit limit. is there. In the present invention, the credit rating of the individual who is the loan destination is the state of the credit, the state of the economic environment is the external factor, the setting of the loan limit for the individual customer of the financial institution is acted, and the financial institution by the loan to the individual customer It is preferable that the present invention is used for setting a loan limit amount for an individual customer of a financial institution, with the profit obtained from the above as an expected gain.

図１は、本発明にかかる債権ポートフォリオ制御装置の原理構成を示している。本発明にかかる債権ポートフォリオ制御装置は、債権の初期状態、外部要因の初期状態、及び制御期間長が与えられると、各時点の債権の各状態および外部要因の各状態における各行動のもとでの期待利得を用いて、制御期間の期待総利得を最大にすることが保証された最適政策を出力する。 FIG. 1 shows a principle configuration of a bond portfolio control apparatus according to the present invention. The loan portfolio control device according to the present invention, given the initial state of the bond, the initial state of the external factor, and the control period length, under each action of each state of the bond and each state of the external factor at each time point Output an optimal policy guaranteed to maximize the expected total gain in the control period.

債権の初期状態、外部要因の初期状態、及び制御期間長が入力装置から入力されると、これを受け付けた最適政策算出部が、行動決定部と連携して制御期間における期待総利得を最大にする政策を出力する。行動決定部では、債権の状態、外部要因の状態、及び時点が与えられると、当該時点の当該状態においてそれ以降の期待総利得を最大にする最適行動と期待総利得の最大値を出力する。 When the initial state of receivables, the initial state of external factors, and the control period length are input from the input device, the optimal policy calculation unit that accepts the input maximizes the expected total gain in the control period in cooperation with the action determination unit. To output the policy. In the behavior determination unit, given the state of the bond, the state of the external factor, and the time point, the behavior determining unit outputs the optimum behavior for maximizing the expected total gain thereafter and the maximum value of the expected total gain in the state at the time point.

図２のフローチャートは、本発明にかかる債権ポートフォリオ制御装置による最適政策出力の原理を示している。本発明にかかる債権ポートフォリオ制御装置による最適政策出力方法は、債権の初期状態、外部要因の初期状態、制御期間長を入力する段階（Ｓ１０）と、制御期間における期待総利得を最大にする政策を求めるために解く必要のある動的計画法（ＤＰ）の問題をＤＰグラフとして作成する段階（Ｓ２０）と、Ｓ２０において作成されたＤＰグラフに従って制御期間の最終年度から遡りながら動的計画法（ＤＰ）の問題を解いていく段階（Ｓ３０）と、Ｓ３０から与えられる各時点の債権の各状態及び外部要因の各状態において、当該時点以降の期待総利得を最大にする最適行動と期待総利得の最大値を出力する段階（Ｓ４０）と、Ｓ２０において作成されたＤＰグラフの動的計画法（ＤＰ）の問題を解き終わったかどうか判断する段階（Ｓ５０）と、制御期間における期待総利得を最大にすることが保証された最適政策を出力する段階（Ｓ６０）と、を有するものである。 The flowchart of FIG. 2 shows the principle of optimum policy output by the bond portfolio control apparatus according to the present invention. According to the present invention, there is provided an optimum policy output method using a bond portfolio control apparatus, the step of inputting an initial state of a bond, an initial state of external factors, and a control period length (S10), and a policy for maximizing an expected total gain in the control period. A step of creating a dynamic programming (DP) problem that needs to be solved to obtain a DP graph as a DP graph (S20), and a dynamic programming method (DP) that goes back from the last year of the control period according to the DP graph created in S20 ) In the stage of solving the problem (S30), and in each state of the receivable at each point of time and each state of the external factor given from S30, the optimum action and the expected total gain to maximize the expected total gain after that point A step of outputting the maximum value (S40), and a step of determining whether the problem of the dynamic programming (DP) of the DP graph created in S20 has been solved (S 0), outputting a optimal policy that is guaranteed to the maximum expected total gain in the control period and (S60), and has a.

以上によって、本発明にかかる債権ポートフォリオ制御装置による最適政策出力方法は、動的計画法（ＤＰ）を用いて各時点の債権の各状態および外部要因の各状態において当該時点以降の期待総利得を最大化し、最終的に制御期間における期待総利得を最大にすることが保証されている。 As described above, the optimal policy output method by the bond portfolio control apparatus according to the present invention uses the dynamic programming method (DP) to calculate the expected total gain after the point in each state of the bond and each state of the external factor. It is guaranteed to maximize and ultimately maximize the expected total gain in the control period.

図３は、本発明にかかる債権ポートフォリオ制御装置のシステム構成の一例を示したものである。債権ポートフォリオ制御装置１００は、最適政策算出部１１０、行動決定部１２０を含んで構成されている。最適政策算出部１１０は、ＤＰグラフ作成器１１１と、ＤＰ実施器１１２とからなり、入力装置２００からデータの入力を受け付けて、出力装置３００に算出結果を出力する。行動決定部１２０は、行動決定器１２１と、遷移確率テーブル１２２と、期待利得テーブル１２３とからなる。 FIG. 3 shows an example of the system configuration of the bond portfolio control apparatus according to the present invention. The bond portfolio control apparatus 100 includes an optimal policy calculation unit 110 and an action determination unit 120. The optimum policy calculation unit 110 includes a DP graph creator 111 and a DP implementer 112, receives data input from the input device 200, and outputs a calculation result to the output device 300. The behavior determination unit 120 includes a behavior determiner 121, a transition probability table 122, and an expected gain table 123.

債権ポートフォリオ制御装置１００のハードウエアの構成は特に限定されるものではないが、ＣＰＵ、ＲＯＭ、ＲＡＭ、ＨＤＤを備えたコンピュータであって、ＨＤＤに格納されたアプリケーションプログラムによって所定の処理を実行するためには、ＲＯＭに記憶された入力制御や出力制御などのハードウエア制御のための基本的な各種プログラムを起動し、ＲＡＭをアプリケーションプログラムのワークエリアとして機能させながら、ＣＰＵが演算処理を行う。 The hardware configuration of the bond portfolio control device 100 is not particularly limited, but is a computer having a CPU, ROM, RAM, and HDD, and executes predetermined processing by an application program stored in the HDD. First, various basic programs for hardware control such as input control and output control stored in the ROM are activated, and the CPU performs arithmetic processing while causing the RAM to function as a work area for application programs.

最適政策算出部１１０と行動決定部１２０、すなわち、最適政策算出部１１０のＤＰグラフ作成器１１１とＤＰ実施器１１２、行動決定部１２０の行動決定器１２１は、いずれも機能的に特定されるものであって、これらの機能に対応するアプリケーションプログラムがＨＤＤからＲＡＭに読み出され、ＣＰＵで演算処理が実行されることによって、各々の機能が実現される。 The optimal policy calculation unit 110 and the behavior determination unit 120, that is, the DP graph creator 111 and the DP implementer 112 of the optimal policy calculation unit 110, and the behavior determination unit 121 of the behavior determination unit 120 are all functionally specified. The application programs corresponding to these functions are read from the HDD to the RAM, and the arithmetic processing is executed by the CPU, thereby realizing each function.

また、行動決定部１２０の遷移確率テーブル１２２と期待利得テーブル１２３には、それぞれＨＤＤの所定の記憶領域が割り当てられる。入力装置２００にはキーボードやマウスなどが、出力装置３００にはディスプレイやプリンタが用いられる。 In addition, a predetermined storage area of the HDD is allocated to each of the transition probability table 122 and the expected gain table 123 of the behavior determination unit 120. A keyboard or mouse is used for the input device 200, and a display or printer is used for the output device 300.

さらに、債権ポートフォリオ制御装置１００は１台のコンピュータからなるものであってもよいし、ネットワークで接続された複数のコンピュータによって構成されるものであってもよい。例えば、遷移確率テーブル１２２と期待利得テーブル１２３をデータベースサーバに備え、最適政策算出部１１０を備える端末からデータベースサーバにアクセスすることによって、両者が一体となって債権ポートフォリオ制御装置１００として動作することとしてもよい。 Furthermore, the bond portfolio control apparatus 100 may be composed of a single computer, or may be composed of a plurality of computers connected via a network. For example, the transition probability table 122 and the expected gain table 123 are provided in the database server, and by accessing the database server from a terminal provided with the optimal policy calculation unit 110, the two operate together as the bond portfolio control device 100. Also good.

図４のフローチャートを用いて、債権ポートフォリオ制御装置１００の最適政策算出部１１０の動作について説明する。まず、ＤＰグラフ作成器１１１に、入力装置２００から債権の初期状態ｘ１と外部要因の初期状態ｚ１と制御期間長Ｔが入力される（Ｓ７０）。 The operation of the optimum policy calculation unit 110 of the bond portfolio control apparatus 100 will be described using the flowchart of FIG. First, the initial state x1 of the bond, the initial state z1 of the external factor, and the control period length T are input from the input device 200 to the DP graph creator 111 (S70).

ここで入力される値について、ｘ１は、対象となる債権が既存債権（既存の顧客）である場合には、既存債権の信用格付の状態集合
の要素である。対象となる債権が新規債権（新規の顧客）である場合には、新規債権の信用格付の状態集合
の要素である。ｚ１は、経済環境を示す指数等の外部要因の状態集合
の要素である。 For the value entered here, x1 is the state set of the credit rating of the existing claim if the subject claim is an existing claim (existing customer)
Elements. If the subject receivable is a new receivable (new customer), the credit rating status set of the new receivable
Elements. z1 is the state set of external factors such as an index indicating the economic environment
Elements.

債権の初期状態ｘ１と外部要因の初期状態ｚ１と制御期間長Ｔが入力されると、Ｔ年間の期待総利得を最大化するための動的計画法（ＤＰ）の問題を解くためのＤＰグラフを作成する（Ｓ７１）。例えば、既存債権に関する制御でｘ１＝ｓ１、ｚ１＝θ１、｜Ｓ｜＝２、｜Θ｜＝２の場合であれば、図７に示したようなＤＰグラフを作成する。 DP graph for solving the problem of dynamic programming (DP) to maximize the expected total gain for T years when initial state x1 of bond, initial state z1 of external factor and control period length T are input Is created (S71). For example, if x1 = s1, z1 = θ1, | S | = 2, and | Θ | = 2, the DP graph as shown in FIG.

ここで作成されるＤＰグラフは、１年目が既存債権の初期状態と外部要因の初期状態からなるノードで表現され、２年目からＴ年目までの間は、既存債権の各状態と外部要因の各状態の組み合わせのパターンが各々のノードに展開されて、１年目からＴ年目までの間の既存債権の各状態と外部要因の各状態の組み合わせの遷移が表現されたグラフとなる。これは、末端のＴ年目のノードから遡りながら、動的計画法（ＤＰ）でＴ年間のマルコフ決定過程問題を解くことによって、Ｔ年間の期待総利得を最大化する最適政策を求めるための準備となるものである。 In the DP graph created here, the first year is represented by a node consisting of the initial state of the existing bond and the initial state of the external factors. From the second year to the T-year, each state of the existing bond and the external The combination pattern of each state of the factor is expanded to each node, and a graph expressing the transition of the combination of each state of the existing bond and each state of the external factor from the first year to the T-year is obtained. . This is to find an optimal policy that maximizes the T-year expected total gain by solving the T-year Markov decision process problem using dynamic programming (DP), going back from the last T-year node. It is a preparation.

このＤＰグラフの特徴は、展開されたノードが債権の状態と外部要因の状態の組み合わせとなっており、各々のノードには外部要因の状態が反映されていることである。従来の企業に対する融資判断等に用いられる債権ポートフォリオ制御の考え方では、経済環境等の外部要因の状況によって信用格付の遷移確率が異なるようなモデル化は行われていないため、ここで作成されるＤＰグラフは図６の例のようになると考えられる。 A characteristic of this DP graph is that the developed nodes are a combination of the state of bonds and the state of external factors, and the state of external factors is reflected in each node. In the conventional concept of receivable portfolio control used for loan judgments for companies, the DP created here is not modeled so that the transition probability of credit rating differs depending on the external factors such as the economic environment. The graph is considered to be similar to the example of FIG.

一方、新規債権が対象になる場合であって、ｘ１＝ｓ´１、ｚ１＝θ１、｜Ｓ｜＝２、｜Θ｜＝２であるとすると、図８に示したようなＤＰグラフを作成する。新規の顧客への１年目となる新規債権については、融資限度額を抑えるなど既存債権とは異なる対応が行なわれることが多く、後に説明する期待利得テーブルでは既存債権と異なる値を用いることが望ましいため、１年目に既存債権とは異なる新規債権の状態と外部要因の状態の組み合わせによるノードを用いるものである。２年目以降については、新規債権も既存債権の扱いとなるため、図７と同様に既存債権の各状態と外部要因の各状態の組み合わせによるノードで表現される。 On the other hand, when a new bond is targeted and x1 = s′1, z1 = θ1, | S | = 2, and | Θ | = 2, a DP graph as shown in FIG. 8 is created. To do. New credits for the first year for new customers are often handled differently from existing credits, such as by reducing the loan limit, and the expected gain table described later may use values that differ from existing credits. Since it is desirable, in the first year, a node based on a combination of the state of a new bond different from the existing bond and the state of an external factor is used. From the second year onwards, new claims are also treated as existing claims, and therefore, as in FIG. 7, they are represented by nodes based on combinations of states of existing claims and states of external factors.

続いて、ＤＰ実施器１１２が、動的計画法でＴ年間のマルコフ決定過程問題を解くことによって、Ｔ年間の期待総利得を最大化する最適政策を求める処理を実行する。具体的には、ＤＰグラフの末端（Ｔ年目）の各ノードから順に、当該ノードでの最適行動（設定すべき融資限度額）と当該ノード以降の期待総利得の最大値を、行動決定器１２１と連携して求めるために、一のノード毎に、当該ノードの時点ｔ（何年目かという自然数）と、債権の状態ｘｔ（ｔ年目の債権の状態）と、外部要因の状態ｚｔ（ｔ年目の外部要因の状態）を行動決定器１２１に出力する（Ｓ７２）。 Subsequently, the DP implementer 112 executes a process for obtaining an optimal policy that maximizes the expected total gain for T years by solving the T-year Markov decision process problem by dynamic programming. Specifically, in order from each node at the end (T year) of the DP graph, an action determiner is used to determine the optimal action (financing limit amount to be set) at the node and the maximum expected total gain after the node. 121, for each node, for each node, the time point t (natural number of what year) of the node, the state of the bond xt (the state of the bond in the t year), and the state zt of the external factor The state of the external factor in year t is output to the action determiner 121 (S72).

ＤＰ実施器１１２から、一のノード毎に、当該ノードの時点ｔ、債権の状態ｘｔ、外部要因の状態ｚｔを行動決定器１２１に出力すると、行動決定器１２１が当該ノードにおける最適行動（設定すべき融資限度額）を決定するので、ＤＰ実施器１１２は決定された最適行動を当該ノード以降の期待総利得の最大値とあわせて受け付ける（Ｓ７３）。 When the DP execution unit 112 outputs the time t of the node, the state xt of the bond, and the state zt of the external factor to the behavior determiner 121 for each node, the behavior determiner 121 sets the optimal behavior (set Therefore, the DP implementer 112 accepts the determined optimum action together with the maximum expected total gain after the node (S73).

こうした一のノードについて最適行動と期待総利得の最大値を受け付けると、ＤＰグラフの1年目のノードまで、全てのノードについての処理が終了したかを判断し（Ｓ７４）、終了していない場合には、次のノードについて同様の処理を繰り返す。終了している場合には、ＤＰグラフの全ノードにおける最適行動（設定すべき融資限度額）と当該ノード以降の期待総利得の最大値を最適政策として出力する（Ｓ７５）。 When the optimum behavior and the maximum expected total gain are received for such one node, it is determined whether the processing for all nodes up to the first year node of the DP graph has been completed (S74). The same process is repeated for the next node. If completed, the optimum action (financing limit amount to be set) in all the nodes of the DP graph and the maximum value of the expected total gain after that node are output as the optimum policy (S75).

最適政策の出力形式は特に限定されるものではないが、出力対象となる情報は、図１３の例に示したように、各々のノードについて決定された最適行動（設定すべき融資限度額）と当該ノード以降の期待総利得の最大値である。図１３の例では、最適行動（設定すべき融資限度額）の集合を
として、ｙ１＝ｇ１、｜Ｙ｜＝２であるとしている。これらの情報を一覧できる図表等の形式で出力することによって、融資限度額を設定する金融機関等は、各々の時点の顧客の信用格付と経済状態に応じて、最適行動としてどのように融資限度額を決定すればよいかを把握することが可能になる。 Although the output format of the optimal policy is not particularly limited, the information to be output includes the optimal action determined for each node (the loan limit to be set), as shown in the example of FIG. This is the maximum expected total gain after the node. In the example of FIG. 13, a set of optimal actions (financing limits to be set) is
Y1 = g1 and | Y | = 2. By outputting this information in the form of a chart or the like that can be listed, financial institutions that set the loan limit will determine how the loan limit as the optimum action according to the customer's credit rating and economic conditions at each point in time. It becomes possible to know whether the amount should be determined.

図５のフローチャートを用いて、債権ポートフォリオ制御装置１００の行動決定部１２０の動作について説明する。まず、行動決定部１２０の行動決定器１２１において、最適政策算出部１１０のＤＰ実施器１１２が図４のＳ７２において出力した、時点ｔ（何年目かという自然数）と、債権の状態ｘｔ（ｔ年目の債権の状態）と、外部要因の状態ｚｔ（ｔ年目の外部要因の状態）の入力を受け付ける（Ｓ８０）。 The operation of the action determining unit 120 of the bond portfolio control apparatus 100 will be described using the flowchart of FIG. First, in the action determining unit 121 of the action determining unit 120, the DP execution unit 112 of the optimum policy calculating unit 110 outputs the time t (natural number of what year) output in S72 of FIG. The input of the year bond status) and the external factor status zt (t year external factor status) are received (S80).

次に、債権の状態がｘｔで外部要因の状態がｚｔという条件のもとで、行動ｙｔを選択した場合に、次の年（時点）の債権の状態がｘｔ＋１になる状態遷移確率を、遷移確率テーブル１２２から読み出す（Ｓ８１）。 Next, when the state of the bond is xt and the condition of the external factor is zt, when the action yt is selected, the state transition probability that the bond state of the next year (time) becomes xt + 1 is changed. Read from the probability table 122 (S81).

図１０は、ここで用いられる遷移確率テーブル１２２の一例を示したものである。債権の状態ｓ１が、次の年（時点）においてｓ１又はｓ２に遷移する確率を示したものであるが、従来の方法では、図９の例に示したように、外部要因の状態や選択された行動による状態遷移確率の相違は反映されていなかった。これに対し、図１０の例では、外部要因の状態がθ１である場合とθ２である場合、選択された行動がｇ１である場合とｇ２である場合のそれぞれの組み合わせに対応した状態遷移確率が記憶されているため、設定した融資限度額等の行動による債権の状態の遷移確率の相違や、経済環境等の外部要因による債権の状態の遷移確率の相違を反映して、最適行動を決定することが可能になる。 FIG. 10 shows an example of the transition probability table 122 used here. The bond status s1 indicates the probability of transitioning to s1 or s2 in the next year (time). In the conventional method, as shown in the example of FIG. Differences in state transition probabilities due to different behaviors were not reflected. On the other hand, in the example of FIG. 10, when the state of the external factor is θ1 and θ2, the state transition probabilities corresponding to the respective combinations when the selected action is g1 and when the selected action is g2 are shown. Since it is stored, the optimal action is determined by reflecting the difference in the probability transition of the credit due to the behavior such as the set loan limit and the difference in the probability transition of the credit due to external factors such as the economic environment. It becomes possible.

また、遷移確率テーブル１２２には、図１１の例に示したような、外部要因の状態の遷移確率も記憶されている。外部要因の状態遷移確率も、後に説明する部分最適解の再利用において用いられるため、ここで遷移確率テーブル１２２から読み出される。 The transition probability table 122 also stores transition probabilities of external factor states as shown in the example of FIG. Since the state transition probability of the external factor is also used in the reuse of the partially optimal solution described later, it is read from the transition probability table 122 here.

次に、債権の状態ｘｔで行動ｙｔを選択したという条件のもとで、次の年（時点）の債権の状態がｘｔ＋１になった場合の期待利得を、期待利得テーブル１２３から読み出す（Ｓ８２）。 Next, under the condition that the action yt is selected in the bond state xt, the expected gain when the bond state in the next year (time) becomes xt + 1 is read from the expected gain table 123 (S82). .

図１２は、ここで用いられる期待利得テーブル１２３の一例を示したものである。債権の状態が遷移するパターンに応じて、それぞれ行動としてｇ１又はｇ２のいずれかを選択した場合のそれぞれについて期待される利得が記憶されている。この期待利得にそれぞれ債権の状態の遷移確率を乗じることによって、それぞれの期待利得についてその利得が発生する確率を反映することが可能になる。 FIG. 12 shows an example of the expected gain table 123 used here. The expected gain is stored for each of the cases where either g1 or g2 is selected as the action according to the pattern in which the state of the bond transitions. By multiplying the expected gain by the probability of transition of the respective receivable state, it is possible to reflect the probability that the gain will occur for each expected gain.

続いて、選択する行動毎（ｇ１又はｇ２）に期待総利得を算出し（Ｓ８３）、期待総利得が最大となる行動を、最適行動として決定する（Ｓ８４）。ここでの期待総利得の最大値を求めるための演算は、次のように求められる。 Subsequently, the expected total gain is calculated for each action (g1 or g2) to be selected (S83), and the action with the maximum expected total gain is determined as the optimum action (S84). The calculation for obtaining the maximum value of the expected total gain here is obtained as follows.

まず、入力を受け付けた時点ｔが、ｔ＝Ｔとなる期間の最終時点の場合については、次式によって当該ノード以降の期待総利得の最大値を求める。
First, in the case where the time t when the input is accepted is the final time of the period in which t = T, the maximum value of the expected total gain after the node is obtained by the following equation.

ここで、ｙＴはＴ年目に選択する行動（設定すべき融資限度額）を示し、ｙＴは先に説明したとおり、行動集合
の要素である。
は、債権の状態がｘＴで、外部要因の状態がｚＴという条件のもとで行動ｙＴを選択した場合に、次の年の債権の状態がｘＴ＋１になる状態遷移確率で、遷移確率テーブル１２２から読み出したものである。
は、債権の状態ｘＴで行動ｙＴを選択したもとにおいて、次の年の債権の状態がｘＴ＋１になった場合の期待利得で、期待利得テーブル１２３から読み出したものである。 Here, yT indicates an action to be selected in the T year (a loan limit to be set), and yT is an action set as described above.
Elements.
Is the state transition probability that the state of the next year's bond is xT + 1 when the action yT is selected under the condition that the bond state is xT and the external factor state is zT. It is read out.
Is the expected gain when the state of the next year's bond is xT + 1 under the condition of the bond state xT, which is read from the expected gain table 123.

次に、入力を受け付けた時点ｔが、
となる時点である場合については、次式によって当該ノード以降の期待総利得の最大値を求める。
Next, the time t when the input is accepted is
In the case where the current point in time is reached, the maximum value of the expected total gain after the node is obtained by the following equation.

ただし、
は、外部要因の状態間での状態遷移確率で、遷移確率テーブル１２２から読み出したものであり、
は、時点ｔのノードについて求める以前に求めた、時点ｔ＋１において、債権の状態がｘ＋１で外部要因の状態がｚｔ＋１のノードにおける期待総利得の最大値である。本発明では動的計画法（ＤＰ）を利用しているので、このように部分最適解を再利用する。 However,
Is the state transition probability between the states of the external factor, read from the transition probability table 122,
Is the maximum value of the expected total gain at the time point t + 1 obtained before the time t node is obtained at the node where the bond state is x + 1 and the external factor state is zt + 1. Since dynamic programming (DP) is used in the present invention, the partially optimal solution is reused in this way.

このとき、ｔ＝Ｔの場合の当該ノードにおける最適な行動は、次式によって定義される。
At this time, the optimal action at the node when t = T is defined by the following equation.

また、
の場合の当該ノードにおける最適な行動は、次式によって定義される。
Also,
In this case, the optimal action at the node is defined by the following equation.

以上のように、ＤＰ実施器１１２から与えられた、時点ｔ（何年目かという自然数）、債権の状態ｘｔ（ｔ年目の債権の状態）、外部要因の状態ｚｔ（ｔ年目の外部要因の状態）について、期待総利得が最大となる最適行動と期待総利得の最大値が決定されると、これを最適政策算出部１１０のＤＰ実施器１１２に出力する（Ｓ８５）。ＤＰ実施器１１２では、図４のＳ７３でこれらを受け付ける処理を実行する。 As described above, the time t (natural number of what year), the status xt of the bond xt (the status of the bond in the tth year), the external factor status zt (the external of the tth year) When the optimal action that maximizes the expected total gain and the maximum value of the expected total gain are determined for the factor state), these are output to the DP implementer 112 of the optimal policy calculation unit 110 (S85). The DP implementer 112 executes a process for accepting these in S73 of FIG.

１００債権ポートフォリオ制御装置
１１０最適政策算出部
１１１ＤＰグラフ作成器
１１２ＤＰ実施器
１２０行動決定部
１２１行動決定器
１２２遷移確率テーブル
１２３期待利得テーブル
２００入力装置
３００出力装置 100 Bond Portfolio Control Device 110 Optimal Policy Calculation Unit 111 DP Graph Maker 112 DP Executor 120 Action Determination Unit 121 Action Determiner 122 Transition Probability Table 123 Expected Gain Table 200 Input Device 300 Output Device

Claims

A receivable portfolio control device used to select an optimal policy for receivable control,
Initial state x 1 _creditor, and the initial condition accepting means for accepting an input of an initial state z _1, and the control period length T of the external factors,
Said initial condition accepting unit as a starting point the combination of the initial state z ₁ in the initial state x ₁ and external factors of the claim, the input of which is accepted, the initial condition accepting means each time point t to the control period length T, the input of which is accepted A DP graph creation means for creating a DP graph in which transitions of combinations of bond state x _t and external factor state z _t in (1 ≦ t ≦ T) are expanded to each node;
When you select an action y _t by external factors state z _t at one time t, creditors at time t the state x _t is the transition probability of transition to state x _{t + 1} of the claim at the next time point t + 1, creditor For each combination of the state x _t , the bond state x _{t + 1} , the external factor state z _t and the action y _t , the bond state transition probability is stored, and the external factor state z _t at one time t is the next time t + 1 and a transition probability of transition to the state z _{t + 1} of the external factors, the transition probability storage means for storing as external factors transition probability for each combination of states z _{t + 1} state z _t and external factors of the external factor in,
Select action y _t In one point in time t, the expected gain state x _t creditors at time t is obtained when a transition to the state x _{t + 1} of the claim at the next time point t + 1, the state of receivables x _t and creditors Expected gain storage means for storing each combination of state x _{t + 1} and action y _t
For the combination of the bond state x _t and the external factor state z _t at each node of the DP graph created by the DP graph creating means, the transition probability storage means takes action in the external factor state z _t at the time t. when you select y _t, and creditors state transition probability state x _t creditors at time t is changed to the state x _{t + 1} of the claim at the next time point t + 1, the external factors at time t the state z _t is the next time reads the external factors transition probability of transition to the state z _{t + 1} of the external factor in t + 1, and, from the said expected gain storage unit, select an action y _t at time t, creditors at time t the state x _t is the following reading an expected gain obtained when at time t + 1 transitions to the state x _{t + 1} of the claims, in order from the end nodes (t = T) of the DP graph, each node Expected payoff combinations expected total gain for each action that can be selected, and the transition probabilities of the combination corresponding to each of the actions read from said transition probability storage means, corresponding to each of behavior read from the expected gain storing means in Calculated from the above, and the optimum action determining means for determining the action having the maximum expected total gain as the optimum action,
The optimal action of the optimum motion determination unit is determined for all nodes of the DP graph, the optimal policy output means for outputting as an optimum policy for controlling the creditor,
The optimum behavior determining means comprises
For each node serving as t = T, the behavior y _t of selectable respectively,
For each bond state x _{t + 1} that can be transitioned ,
Multiply the expected gain read for the combination of the corresponding action and bond state by the read state transition probability read for the combination of the state of the relevant external factor and the action and the bond state, and by calculating a certain expected total gain, and determines an activity y _t of the expected total gain is maximized optimal behavior,
For each node to be 1 ≦ t ≦ T-1, the action y _t of selectable respectively,
For each bond state x _{t + 1} that can be transitioned ,
The expected gain read for the combination of the corresponding action and bond status, and the maximum expected total gain calculated for the corresponding node for each transitionable external factor state z _{t + 1} Multiply the sum of the values multiplied by the external factor transition probabilities by the read credit status transition probabilities read for the combination of the relevant external factor status, action and credit status, A loan portfolio control characterized by calculating a certain expected total gain and repeating the determination of the action y _t having the maximum expected total gain as the optimum action until each node where t = 1 apparatus.

The initial condition accepting means accepts input of claim identification information for identifying whether the subject claim is a new claim or an existing claim,
In the expected gain storage means , the action y ₁ is selected at the first time point 1 for the new bond, and the state x ₁ of the bond at the time point 1 transitions to the state x ₂ of the bond at the next time point 2 The expected gain obtained in this case is stored separately from the expected gain for the existing receivable for each combination of receivable state x ₁ , receivable state x ₂ and action y ₁ ,
The optimum motion determination unit, when the initial condition accepting unit is accepting input of the claim identification information indicating the new claim, for the node to be t = 1, the expectations for new loan from the expected gain storage means 2. The bond portfolio control apparatus according to claim 1 , wherein a gain is read out, an expected total gain is calculated by applying the expected gain, and the optimum behavior is determined.

A receivable portfolio control program used to select the optimal policy for receivable control,
When you select an action y _t by external factors state z _t at one time t, creditors at time t the state x _t is the transition probability of transition to state x _{t + 1} of the claim at the next time point t + 1, creditor For each combination of the state x _t , the bond state x _{t + 1} , the external factor state z _t and the action y _t , the bond state transition probability is stored, and the external factor state z _t at one time t is the next time t + 1 in a transition probability of transition to the state z _{t + 1} of the external factors, the transition probability storage means for storing as external factors transition probability for each combination of states z _{t + 1} state z _t and external factors external factors, in one point in time t select action y _t, the expected gain state x _t creditors at time t is obtained when a transition to the state x _{t + 1} of the claim at the next time point t + 1, the state x _t and creditors receivables state x _t Creditors portfolio controller having an expected gain storing means for storing for each combination of ₁ and action y _t,
Initial state x 1 _creditor, the initial condition accepting step for accepting input of an initial state z _1, and the control period length T of the external factors,
The initial condition combining the initial state z ₁ in the initial state x ₁ and external factors of Claims input has been received at the reception step to the starting point, the initial conditions each time point t to the control period length T, the input of which is accepted by the accepting step A DP graph creation step for creating a DP graph in which transitions of combinations of bond state x _t and external factor state z _t in (1 ≦ t ≦ T) are expanded to each node;
For the combination of the bond state x _t and the external factor state z _t at each node of the DP graph created in the DP graph creation step, the transition probability storage means performs an action in the external factor state z _t at time t. when you select y _t, and creditors state transition probability state x _t creditors at time t is changed to the state x _{t + 1} of the claim at the next time point t + 1, the external factors at time t the state z _t is the next time reads the external factors transition probability of transition to the state z _{t + 1} of the external factor in t + 1, and, from the said expected gain storage unit, select an action y _t at time t, creditors at time t the state x _t is the following reading an expected gain obtained when at time t + 1 transitions to the state x _{t + 1} of the claims, in order from the end nodes (t = T) of the DP graph, each Expected total gain for each action that can be selected in the over-de, a transition probability of the combination corresponding to each of the actions read from said transition probability storage means, the combination corresponding to each of the actions read from the expected gain storage means An optimal action determination step that calculates from the expected gain and determines the action that maximizes the expected total gain as the optimal action;
And optimal policy output step of the optimal action determined in the optimum motion determination step for all nodes is output as the optimum policy for controlling the claims of the DP graph,
In the optimum action determining step,
For each node serving as t = T, the behavior y _t of selectable respectively,
For each bond state x _{t + 1} that can be transitioned ,
Multiply the expected gain read for the combination of the corresponding action and bond state by the read state transition probability read for the combination of the state of the relevant external factor and the action and the bond state, and by calculating a certain expected total gain, and determines an activity y _t of the expected total gain is maximized optimal behavior,
For each node to be 1 ≦ t ≦ T-1, the action y _t of selectable respectively,
For each bond state x _{t + 1} that can be transitioned ,
The expected gain read for the combination of the corresponding action and bond status, and the maximum expected total gain calculated for the corresponding node for each transitionable external factor state z _{t + 1} Multiply the sum of the values multiplied by the external factor transition probabilities by the read credit status transition probabilities read for the combination of the relevant external factor status, action and credit status, A loan portfolio control characterized by calculating a certain expected total gain and repeating the determination of the action y _t having the maximum expected total gain as the optimum action until each node where t = 1 program.

In the initial condition accepting step, accepting input of claim identification information for identifying whether the subject claim is a new claim or an existing claim,
In the expected gain storage means , the action y ₁ is selected at the first time point 1 for the new bond, and the state x ₁ of the bond at the time point 1 transitions to the state x ₂ of the bond at the next time point 2 The expected gain obtained in this case is stored separately from the expected gain for the existing receivable for each combination of receivable state x ₁ , receivable state x ₂ and action y ₁ ,
In the optimum action determining step, if they accept input receivables identification information indicating the new claims in the initial condition accepting step, the node serving as t = 1 is expected for new loans from the expected gain storage means 4. The bond portfolio control program according to claim 3 , wherein a gain is read, the expected total gain is calculated by applying the expected gain, and the optimum behavior is determined.

A receivable portfolio control method used to select an optimal policy for receivable control,
When you select an action y _t by external factors state z _t at one time t, creditors at time t the state x _t is the transition probability of transition to state x _{t + 1} of the claim at the next time point t + 1, creditor For each combination of the state x _t , the bond state x _{t + 1} , the external factor state z _t and the action y _t , the bond state transition probability is stored, and the external factor state z _t at one time t is the next time t + 1 in a transition probability of transition to the state z _{t + 1} of the external factors, the transition probability storage means for storing as external factors transition probability for each combination of states z _{t + 1} state z _t and external factors external factors, in one point in time t select action y _t, the expected gain state x _t creditors at time t is obtained when a transition to the state x _{t + 1} of the claim at the next time point t + 1, the state x _t and creditors receivables state x _t Initial accepting ₁ and Action y _t receivable portfolio controller having an expected gain storing means for storing for each combination of the initial state x 1 _creditor, the initial state z 1 of external _factors, and the input of the control period length T Condition acceptance step,
The loan portfolio controller, the initial condition reception by starting from the combination of the initial state z ₁ in the initial state x ₁ and external factors of Claims input has been received in step, the initial condition control period, the input of which is accepted by the accepting step A DP graph creation step for creating a DP graph in which transitions of combinations of bond state x _t and external factor state z _t at each time point t (1 ≦ t ≦ T) up to length T are expanded to each node;
The bond portfolio control device generates an external at time t from the transition probability storage means for the combination of bond state x _t and external factor state z _t at each node of the DP graph created in the DP graph creation step. when you select an action y _t a factor in state z _t, creditors at time t the state x _t is a receivable state transition probability of transition to state x _{t + 1} of the claim at the next time point t + 1, the external factors at time t state z _t reads the external factors transition probability of transition to the state z _{t + 1} external factors at the next time point t + 1, and, from the said expected gain storage unit, select an action y _t at time t, receivables at time t reading an expected gain state x _t is obtained when a transition to the state x _{t + 1} of the claim at the next time point t + 1, the end of the DP graph In order from each node (t = T), the expectation total gain for each action that can be selected in each node, and the transition probabilities of the combination corresponding to each of the actions read from said transition probability storage means, the expected gain storage means Calculating from the expected gain of the combination corresponding to each action read out from, and determining the action that maximizes the expected total gain as the optimum action; and
The loan portfolio control device, the optimal policy output step of the optimal action determined in the optimum motion determination step for all nodes is output as the optimum policy for controlling the claims of the DP graph,
In the optimum action determining step,
For each node serving as t = T, the behavior y _t of selectable respectively,
For each bond state x _{t + 1} that can be transitioned ,
Multiply the expected gain read for the combination of the corresponding action and bond state by the read state transition probability read for the combination of the state of the relevant external factor and the action and the bond state, and by calculating a certain expected total gain, and determines an activity y _t of the expected total gain is maximized optimal behavior,
For each node to be 1 ≦ t ≦ T-1, the action y _t of selectable respectively,
For each bond state x _{t + 1} that can be transitioned ,
The expected gain read for the combination of the corresponding action and bond status, and the maximum expected total gain calculated for the corresponding node for each transitionable external factor state z _{t + 1} Multiply the sum of the values multiplied by the external factor transition probabilities by the read credit status transition probabilities read for the combination of the relevant external factor status, action and credit status, A loan portfolio control characterized by calculating a certain expected total gain and repeating the determination of the action y _t having the maximum expected total gain as the optimum action until each node where t = 1 Method.

The loan portfolio control device accepts the input of claim identification information for identifying whether the target claim is a new claim or an existing claim in the initial condition receiving step,
In the expected gain storage means , the action y ₁ is selected at the first time point 1 for the new bond, and the state x ₁ of the bond at the time point 1 transitions to the state x ₂ of the bond at the next time point 2 The expected gain obtained in this case is stored separately from the expected gain for the existing receivable for each combination of receivable state x ₁ , receivable state x ₂ and action y ₁ ,
In the bond portfolio control device, in the optimum action determining step, when receiving input of bond identification information indicating a new bond in the initial condition receiving step, the expected gain storage is performed for a node where t = 1. 6. The claim portfolio control method according to claim 5 , wherein an expected gain for a new bond is read from the means, an expected total gain is calculated by applying the expected gain, and the optimum behavior is determined.