JP2008299641A

JP2008299641A - Parallel solving method of simultaneous linear equations and node sequencing method

Info

Publication number: JP2008299641A
Application number: JP2007145639A
Authority: JP
Inventors: Tatsuo Horiuchi; 龍男堀内
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2007-05-31
Filing date: 2007-05-31
Publication date: 2008-12-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide a node sequencing method for parallel calculation of simultaneous linear equations and a parallel solving method of simultaneous linear equations, capable of executing solving of simultaneous linear equations that is calculation for large-scale system network with real time performance. <P>SOLUTION: For a radial system part, node sequencing is performed by optionally selecting a node to be sequenced first from nodes with minimum branch number connected thereto and performing the following node selection in order in the ascending order of the branch number connected to the node while preferentially selecting a node of node sequencing candidate when the node of node sequencing candidate and a counter end node thereof are nodes not matched with the counter end nodes of sequenced nodes. For a loop-like system part, simulation of the generation number of new non-zero elements which are generated in contraction of nodes is performed by parallel processing, and node sequencing is performed by performing node selection in the ascending order of the new non-zero element generation number while preferentially selecting a node of node sequencing candidate when the node of node sequencing candidate and a counter end node thereof are nodes matched to the counter end nodes of sequenced nodes. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、電力系統の計算に現れる連立一次方程式の並列計算に関し、特に、並列計算を高速に処理できるノードの順序付け方法と、前進消去・後退代入処理の並列計算に関する。 The present invention relates to parallel calculation of simultaneous linear equations appearing in power system calculations, and more particularly to a node ordering method capable of processing parallel calculations at high speed and parallel calculation of forward erasure / backward substitution processing.

従来、連立一次方程式の解法として、三角分解による直接解法が知られており、この手法を用いた場合に、連立一次方程式の求解は、（１）ノードの順序付け、（２）係数行列の三角分解（ＬＤＵ分解）、（３）前進／後退代入計算による解の算出、の３つの処理を行うことになる。また、連立一次方程式の高速な求解のために並列計算が利用されている。並列計算では、複数のＣＰＵに処理を分散して計算を行い、また、ＣＰＵ間のデータの授受は通信処理を経て行っている。 Conventionally, as a method of solving simultaneous linear equations, a direct solution method using trigonometric decomposition is known. When this method is used, solving of simultaneous linear equations involves (1) ordering of nodes and (2) triangular decomposition of coefficient matrix. Three processes are performed: (LDU decomposition) and (3) solution calculation by forward / backward substitution calculation. In addition, parallel computation is used for fast solution of simultaneous linear equations. In parallel calculation, processing is distributed to a plurality of CPUs for calculation, and data exchange between CPUs is performed through communication processing.

従来の連立一次方程式の並列計算は、前進／後退代入処理のツリー作成に基づいており、その際、ノードの順序付け方法が重要になる。従来の方法では、Ｔｉｎｎｅｙ２法に基づき、ノードの消去により発生するｆｉｌｌ−ｉｎ（非零要素）の数が少ないものから選択し、あるステージ（並列処理での各ステップをステージという）で接続ブランチ数が２本以下のノードが消去される場合には、当該ステージでは接続ブランチ数が３本以上のノードは消去せず、かつ、あるノードから２ブランチまでの範囲に、同じステージで消去されるノードがある場合、当該ノードは消去しないとしている（非特許文献１を参照）。 The conventional parallel calculation of simultaneous linear equations is based on creating a tree of forward / backward substitution processing, and at that time, the node ordering method becomes important. In the conventional method, based on the Tinney 2 method, the number of connected branches is selected at a certain stage (each step in parallel processing is referred to as a stage) by selecting from those having a small number of fill-in (non-zero elements) generated by erasing nodes. If two or less nodes are erased, nodes with 3 or more connected branches are not erased in the relevant stage, and nodes that are erased in the same stage within a range from a certain node to 2 branches If there is, the node is not erased (see Non-Patent Document 1).

永田真幸、内田直之著「過度安定度計算高速化のための系統計算の並列処理アルゴリズムの開発」電気学会論文誌Ｂ、１２０巻２号、平成１２年Masayuki Nagata, Naoyuki Uchida “Development of Parallel Processing Algorithm for Systematic Calculation for Acceleration of Transient Stability Calculation” IEEJ Transactions B, Vol. 120, No. 2, 2000 田岡久雄、阿部茂著「パイプライン処理に適した連立一次方程式の高速求解法とその電力系統解析への応用」電気学会論文誌Ｂ、１０５巻５号、昭和６０年Hisao Taoka, Shigeru Abe "High-speed solving method of simultaneous linear equations suitable for pipeline processing and its application to power system analysis" IEEJ Transaction B, Vol. 105, No. 5, 1985

従来の連立一次方程式の並列計算のためのノード順序付け方法は、ノード消去により発生するｆｉｌｌ−ｉｎの数が少ないものから選択し、あるステージで接続ブランチ数が２本以下のノードを消去する場合には、当該ステージでは接続ブランチ数が３本以上のノードは消去せず、かつ、あるノードから２ブランチまでの範囲に、同じステージで消去されるノードがある場合、当該ノードは消去しないとしており、また、ノード順序付けの結果として得られた処理を各ＣＰＵに割当てるために、通信回数の削減とＣＰＵ間での計算処理の均等化のバランスをとるようにしている。 A conventional node ordering method for parallel calculation of simultaneous linear equations is selected when the number of fill-ins generated by node erasure is small, and a node having two or less connected branches is erased at a certain stage. Does not delete nodes with 3 or more connected branches at this stage, and if there is a node to be deleted at the same stage in a range from a certain node to 2 branches, the node is not deleted. Further, in order to assign the processing obtained as a result of the node ordering to each CPU, a balance is made between the reduction in the number of communications and the equalization of calculation processing among the CPUs.

従来の方法は、上記の通りノードの順序付けにＴｉｎｎｅｙ２法を適用しており、放射状系統に対しては最適な順序付けになるが、ループ状系統に対しては最適な順序付けにはならず、ｆｉｌｌ−ｉｎが多くなり処理性能が悪化すること、また、並列計算を各ＣＰＵへ割当てるための処理に時間が掛り、並列処理のオーバヘッドが大きくなること、また、並列処理の効果をＣＰＵ間通信量と各ＣＰＵの処理量のバランスを考慮して決定するため、ＣＰＵ資源の無駄が発生する等の問題点があった。このような問題点と、さらに、オフラインで実施される系統解析業務以外の訓練シミュレータ、オンラインの給電自動化システム等への適用にはリアルタイム性能が要求されることから、従来の方法はこの分野には適用が困難であるという問題点があった。 The conventional method applies the Tinney 2 method to node ordering as described above, and is optimal for a radial system, but not optimal for a loop system, and fill− in increases, processing performance deteriorates, processing for assigning parallel computations to each CPU takes time, and overhead of parallel processing increases. Since the CPU processing amount is determined in consideration of the CPU processing amount, there is a problem that CPU resources are wasted. Because of these problems, and the real-time performance required for application to training simulators other than system analysis work performed offline, online power supply automation systems, etc., conventional methods are in this field. There was a problem that it was difficult to apply.

本発明は、上記に鑑みてなされたものであって、大規模系統の回路網計算である連立一次方程式の求解を実時間性能で実行できる連立一次方程式の並列計算用のノード順序付け方法および連立一次方程式の並列求解方法を得ることを目的とする。 The present invention has been made in view of the above, and a node ordering method for parallel calculation of simultaneous linear equations and simultaneous linear that can solve a simultaneous linear equation that is a network calculation of a large-scale system with real-time performance. The purpose is to obtain a parallel solving method of equations.

上述した課題を解決し、目的を達成するために、本発明にかかるノード順序付け方法は、複数個のＣＰＵと、前記各ＣＰＵが共通にアクセス可能な共有メモリと、を有する対称型マルチＣＰＵ構成の並列計算装置を用い、係数行列の三角分解、前進消去処理、および後退代入処理に基づいて電力系統の解析における連立一次方程式の解を並列計算する際に使用され、前記係数行列の構造ならびに前記前進消去処理および前記後退代入処理の手順を、ノードおよびノード間を接続するブランチからなるツリーで表現するときのノード順序付け方法であって、前記ツリーにおけるループを含まない系統部分である放射状系統部分に属するノードについてのノード順序付けを行う第１のステップと、前記ツリーにおける放射状系統部分以外の系統部分であるループ状系統部分に属するノードについてのノード順序付けを行う第２のステップと、を含み、前記第１のステップにおいては、最初に順序付けするノードはそのノードに接続されているブランチ数が最小のノードの中から任意に選択し、以降はノード順序付け候補のノードのうち各ノードに接続されているブランチ数が少ないノードから順に選択するとともに、ノード順序付け候補のノードとそれにブランチを介して接続された隣接ノードである相手端ノードが順序付け済みのノードの相手端ノードと一致しないノードである場合に当該ノード順序付け候補のノードを優先して選択する、という選択基準に基づいて前記複数個のＣＰＵ単位にノード順序付けを行い、前記第２のステップにおいては、前記複数個のＣＰＵを用いて前記前進消去処理におけるノードの縮約時に発生する新規非零要素発生数のシミュレーションを並列処理して実施し、前記シミュレーションにより得られた前記新規非零要素発生数の少ないノードから選択するとともに、ノード順序付け候補のノードとそれにブランチを介して接続された隣接ノードである相手端ノードが順序付け済みのノードの相手端ノードと一致しないノードである場合に当該ノード順序付け候補のノードを優先して選択する、という選択基準に基づいて前記複数個のＣＰＵ単位にノード順序付けを行うことを特徴とする。 In order to solve the above-described problems and achieve the object, a node ordering method according to the present invention has a symmetric multi-CPU configuration having a plurality of CPUs and a shared memory accessible to the CPUs in common. Used in parallel calculation of a solution of simultaneous linear equations in power system analysis based on triangulation of coefficient matrix, forward elimination process, and backward substitution process using a parallel computing device, and the structure of the coefficient matrix and the forward A node ordering method for expressing a procedure of erasure processing and backward substitution processing as a tree composed of nodes and branches connecting the nodes, and belongs to a radial system portion that is a system portion not including a loop in the tree A first step of node ordering for the nodes, and a system part other than the radial system part in the tree And a second step of performing node ordering for nodes belonging to the loop-like system part, wherein in the first step, the first node to be ordered has the smallest number of branches connected to that node. Select any node from the nodes, and then select nodes from the node ordering candidate nodes in order from the node with the fewest number of branches connected to each node. Based on the selection criterion that the adjacent node that is an adjacent node is a node that does not match the other node of the already ordered node, the node ordering candidate node is preferentially selected. Node ordering is performed, and in the second step, the plurality of CPUs are used to A simulation of the number of new non-zero element occurrences occurring at the time of contraction of nodes in forward erasure processing is performed in parallel processing, and a node ordering is selected from the nodes with a low number of new non-zero element occurrences obtained by the simulation. When a candidate node and a partner node that is an adjacent node connected via a branch are nodes that do not match a partner node of a node that has already been ordered, that node is selected with priority. Node ordering is performed for the plurality of CPU units based on a selection criterion.

この発明にかかるノード順序付け方法によれば、前進消去・後退代入処理における並列処理の実行効率を向上させて、処理性能を高速化できるという効果があり、大規模系統の回路網方程式（連立一次方程式）を実時間性能で実行することが可能になるという効果がある。 According to the node ordering method of the present invention, there is an effect that the execution efficiency of parallel processing in forward erasure / backward substitution processing can be improved, and the processing performance can be increased. ) Can be executed with real-time performance.

以下に、本発明にかかる連立一次方程式の並列求解方法およびノード順序付け方法の実施の形態を図面に基づいて詳細に説明する。なお、この実施の形態によりこの発明が限定されるものではない。 Embodiments of a parallel solution method and a node ordering method for simultaneous linear equations according to the present invention will be described below in detail with reference to the drawings. Note that the present invention is not limited to the embodiments.

実施の形態１．
図１は、本実施の形態が適用される訓練シミュレータの構成を示す構成図である。図１では、まず、訓練を実施するための訓練の問題となる系統構成状態、発電機・負荷条件、および事故発生条件等からなる訓練シナリオを、トレーナが訓練管理サーバ群１、トレーナ卓４を使用して作成し、登録する。次に、訓練実施時には、トレーナがトレーナ卓４から登録した訓練シナリオを選択して、実時間で実行するが、それに伴い、系統摸擬サーバ群２で系統シミュレーションが実行されて、その計算結果がトレーナ卓４、トレーニ卓５、および大画面系統盤６に表示される。トレーニ（被訓練者）は自動化摸擬サーバ群３、トレーニ卓５、大画面系統盤６を使用して、事故発生状況を確認、把握して、事故復旧操作のための復旧指令、復旧操作等を行い、停電している負荷の復旧等を行う。一方、訓練実施中の系統シミュレーションは、系統摸擬サーバ群２で実行されるが、本実施の形態を適用した系統の静的、動的特性をシミュレーションするソフトウェアは、この系統摸擬サーバ群２に実装され、動作する。訓練管理サーバ群１、系統摸擬サーバ群２、自動化摸擬サーバ群３間の情報の送受信、また、トレーナ卓４、トレーニ卓５、大画面系統盤６へのデータ送信は、システムＬＡＮ７を介して行う。 Embodiment 1 FIG.
FIG. 1 is a configuration diagram showing a configuration of a training simulator to which the present embodiment is applied. In FIG. 1, first, a trainer sets up a training management server group 1 and a trainer table 4 for a training scenario including a system configuration state, a generator / load condition, an accident occurrence condition, and the like, which are training problems for performing training. Use to create and register. Next, at the time of training, the trainer selects a training scenario registered from the trainer table 4 and executes it in real time. Along with this, a system simulation is executed in the system simulation server group 2 and the calculation result is obtained. It is displayed on the trainer table 4, the traini table 5, and the large screen system panel 6. Trainees (trainees) use the automated simulation server group 3, trainee table 5, and large screen system board 6 to confirm and grasp the accident occurrence status, recovery commands for recovery operations, recovery operations, etc. To restore the load that caused the power failure. On the other hand, the system simulation during the training is executed by the system simulation server group 2, but the software that simulates the static and dynamic characteristics of the system to which this embodiment is applied is the system simulation server group 2. Implemented and works. Transmission / reception of information among the training management server group 1, the system management server group 2, and the automation server server 3 and data transmission to the trainer table 4, the traini table 5 and the large screen system panel 6 are performed via the system LAN 7. Do it.

図２は、本実施の形態にかかる連立一次方程式の並列求解に利用される計算機構成の一例を示す図である。図２に示すように、本実施の形態における計算機は、例えば、対称型マルチＣＰＵ構成の計算機であり、複数のＣＰＵであるＣＰＵ１（８）、ＣＰＵ２（９）、ＣＰＵ３（１０）、・・・、ＣＰＵｎ（１１）と、各ＣＰＵが共通にアクセス可能な共有メモリ１２と、を備えて構成される。ここで、ｎは２以上の任意の整数である。なお、各ＣＰＵは、キャッシュメモリ、ローカルメモリ、および外部記憶装置を実装しており、システムＬＡＮ７と接続している。 FIG. 2 is a diagram illustrating an example of a computer configuration used for parallel solution of simultaneous linear equations according to the present embodiment. As shown in FIG. 2, the computer in the present embodiment is a computer having a symmetric multi-CPU configuration, for example, a plurality of CPUs CPU1 (8), CPU2 (9), CPU3 (10),. , CPUn (11) and a shared memory 12 that can be commonly accessed by the CPUs. Here, n is an arbitrary integer of 2 or more. Each CPU is mounted with a cache memory, a local memory, and an external storage device, and is connected to the system LAN 7.

図３は、系統摸擬サーバ群２に実装し、系統シミュレーションを行う動態安定度計算の処理例を示すフロー図である。動態安定度計算は、積分刻み毎（通常１０ｍｓｅｃ）に回路網方程式の求解と発電機の微分方程式の求解を交互に行い、電力系統の母線電圧、位相角、周波数、有効電力、無効電力、および発電機の内部位相角等を計算する。 FIG. 3 is a flowchart showing a processing example of dynamic stability calculation that is implemented in the system simulation server group 2 and performs system simulation. The dynamic stability calculation alternately solves the network equation and the generator differential equation every integral step (usually 10 msec), and generates the power system bus voltage, phase angle, frequency, active power, reactive power, and Calculate the internal phase angle of the generator.

図３においては、まず、初期処理として、計算時刻の初期化（ステップＳ１３）とＹ行列の作成、三角分解（ステップＳ１４）の処理を順次行う。ここで、Ｙ行列は回路網方程式の係数行列である。次に、回路網方程式の求解を行うが、まず、系統構成変化の有無を判定し（ステップＳ１５）、変化がない場合にはステップＳ１７に分岐する。変化がある場合は、Ｙ行列の修正、三角分解（ステップＳ１６）を行い、続いて、回路網方程式の電流項である発電機の等価電流源の計算（ステップＳ１７）と非線形負荷の等価電流源の計算（ステップＳ１８）とを行い、ステップＳ１６、Ｓ１７、Ｓ１８の結果を用いて回路網方程式の求解（ステップＳ１９）を行い、母線電圧を得る。そして、計算結果である回路網の母線電圧（Ｖ）収束の判定（ステップＳ２０）を行い、未収束の場合は、ステップＳ１８に分岐し、計算結果の母線電圧を使用して非線形負荷の等価電流源の計算（ステップ１８）を行い、回路網方程式の求解（ステップＳ１９）を繰返す。回路網方程式の求解が収束すれば、次に、発電機の微分方程式の求解を行う。 In FIG. 3, first, initialization processing (step S13), Y matrix creation, and triangulation (step S14) are sequentially performed as initial processing. Here, the Y matrix is a coefficient matrix of a network equation. Next, the network network equation is solved. First, it is determined whether or not there is a system configuration change (step S15), and if there is no change, the process branches to step S17. If there is a change, the Y matrix is corrected and triangulated (step S16). Subsequently, the generator equivalent current source, which is the current term of the network equation, is calculated (step S17) and the nonlinear load equivalent current source. (Step S18) and the solution of the network equation (Step S19) is obtained using the results of Steps S16, S17, and S18 to obtain the bus voltage. Then, determination is made on the convergence (V) of the bus voltage (V) of the network as the calculation result. If not converged, the process branches to step S18, and the equivalent current of the nonlinear load is branched using the calculated bus voltage. The source is calculated (step 18), and the solution of the network equation (step S19) is repeated. If the solution of the network equation is converged, then the differential equation of the generator is solved.

まず、発電機の動揺方程式の求解（ステップＳ２１）を行い、発電機内部位相角を算出し、次に発電機のＡＶＲ（自動電圧調整装置）、ガバナ等の制御系と発電機の電機子回路等について、発電機微分方程式の求解（ステップＳ２２）を行い、発電機内部電圧等を算出し、その発電機の内部電圧（Ｖ）収束（ステップＳ２３）の判定を行う。未収束の場合は、ステップＳ１７に分岐し、回路網方程式の求解と発電機微分方程式の求解を繰返す。回路網の電圧と発電機の内部電圧が収束した場合は、動態安定度計算の計算時刻更新（ステップＳ２４）を行い、さらに、計算終了（ステップＳ２５）の判定を行い、計算終了時間に至っていない場合は、ステップＳ１５に分岐し、回路網方程式の求解と発電機微分方程式の求解を繰返す。計算終了時間に至っている場合は、処理を終了する。本実施の形態の連立一次方程式の並列計算方法は、回路網方程式の求解（ステップＳ１９）を高速に処理するために適用されるものである。 First, the generator oscillation equation is solved (step S21), the generator internal phase angle is calculated, then the generator AVR (automatic voltage regulator), governor control system, and generator armature circuit Etc., the generator differential equation is solved (step S22), the generator internal voltage is calculated, and the internal voltage (V) convergence (step S23) of the generator is determined. If not converged, the process branches to step S17, and the solution of the network network equation and the solution of the generator differential equation are repeated. When the voltage of the circuit network and the internal voltage of the generator converge, the calculation time of dynamic stability calculation is updated (step S24), and the calculation end (step S25) is determined and the calculation end time is not reached. In this case, the process branches to step S15, and the solution of the network network equation and the solution of the generator differential equation are repeated. If the calculation end time has been reached, the process ends. The parallel calculation method for simultaneous linear equations according to the present embodiment is applied to process the solution of the network equation (step S19) at high speed.

次に、本実施の形態と従来の技術との相違を明確にするために、非特許文献１に記載の従来の系統計算の並列処理アルゴリズムの説明を行う。 Next, in order to clarify the difference between the present embodiment and the conventional technique, a conventional parallel processing algorithm for system calculation described in Non-Patent Document 1 will be described.

従来の方法においても、計算機として、並列計算機（マルチＣＰＵ）を想定しており、ＣＰＵ間のデータ授受は通信処理により行うことを想定している。また、連立一次方程式の解法として、三角分解による直接解法を用いている。本手法を用いた場合、
ｙ＝Ａ・ｘ・・・（１）
という連立一次方程式の求解を以下の３つの処理で行うことになる。
（１）ノードの順序付け
（２）係数行列Ａの三角分解（ＬＤＵ分解）
（３）前進／後退代入計算によるｘの算出 Also in the conventional method, a parallel computer (multi-CPU) is assumed as a computer, and data exchange between CPUs is assumed to be performed by communication processing. In addition, as a method of solving the simultaneous linear equations, a direct solution method using triangular decomposition is used. When using this method,
y = A · x (1)
The simultaneous linear equation is solved by the following three processes.
(1) Node ordering (2) Triangular decomposition (LDU decomposition) of coefficient matrix A
(3) Calculation of x by forward / backward substitution calculation

安定度計算は、系統計算（回路網計算）と発電機微分方程式求解から構成され、系統計算が全体の約４割を占め、その内、前進／後退代入計算が約６割を占め、かつ、多数回の計算が行われる。また、三角分解と前進／後退代入の処理は、ノードの順序付けが決まれば自動的に決定されるので、ノードの順序付けをどのように行うかが逐次処理での直接法高速化の鍵となる。連立一次方程式の直接解法における前進代入処理は、
ｘｊ＝ｘｊ＋ｘｉ・Ｄｉｊ・・・（２）
という計算の繰返し処理であり、（２）式は、ノードｉとノードｊとを結ぶブランチＤｉｊを消去して、ノードｊの値を変更するという処理である（（２）式のＤｉｊは、三角分解（ＬＤＵ分解）における対角要素である。）。あるノードｉについてブランチＤｉｊを全てのｊについて消去すれば、ノードｉの値ｘｉは以後の処理には必要でなくなる。これはノードｉの消去に対応する。後退代入計算も同様であり、例えば、図２１−１の単純なネットワークの場合には、（２）式の処理を前進／後退代入それぞれについて、６回繰返すことになる。なお、図中の点線のブランチは三角分解の結果生じるｆｉｌｌ−ｉｎ（三角分解により新たに発生する非零要素）に対応する。 Stability calculation is composed of system calculation (network calculation) and generator differential equation solution, system calculation accounts for about 40% of the total, of which forward / backward substitution calculation accounts for about 60%, and Multiple calculations are performed. In addition, since the triangulation and forward / backward substitution processing are automatically determined when the node ordering is determined, how to order the nodes is the key to speeding up the direct method in the sequential processing. The forward substitution process in the direct solution of simultaneous linear equations is
xj = xj + xi · Dij (2)
The calculation (2) is a process of deleting the branch Dij connecting the node i and the node j and changing the value of the node j (Dij in the expression (2) is a triangle) Diagonal elements in decomposition (LDU decomposition).) If the branch Dij is deleted for all j with respect to a certain node i, the value xi of the node i is not necessary for the subsequent processing. This corresponds to the erasure of node i. The reverse substitution calculation is the same. For example, in the case of the simple network of FIG. 21A, the processing of the expression (2) is repeated six times for each of forward / backward substitution. The dotted branch in the figure corresponds to fill-in (a non-zero element newly generated by triangulation) resulting from triangulation.

前進／後退代入計算の処理は、図２１−２のようにツリー構造の処理過程として表すことができる。図２１−２の横向きの矢印は前進代入でのノードの消去を表している。また、図２１−２での矢印の向き（上から下）は前進代入での処理の流れであり、後退代入の場合はツリーの根から葉の方へ（下から上へ）、ツリーをたどる処理となる。 The forward / backward substitution calculation process can be expressed as a tree structure process as shown in FIG. The horizontal arrow in FIG. 21-2 represents the deletion of a node by forward substitution. Further, the direction of the arrow (from top to bottom) in FIG. 21-2 is the flow of processing in forward substitution. In backward substitution, the tree is traced from the root to the leaf (from bottom to top). It becomes processing.

図中、四角で囲っている箇所は、ノードが複数データを持つことを意味している。例えば、ノード１を消去した場合、ノード３とノード５の両方の値が変わることになるが、ノード１から見てノード番号が近い方のノード３に渡す。そのためノード３は自分自身のデータとノード５の値を更新するためのデータを持つことになる。この表現法により、ノード間をどのようなデータが移動していくかを容易に把握できる。 In the figure, a portion surrounded by a square means that the node has a plurality of data. For example, when node 1 is deleted, the values of both node 3 and node 5 change, but the value is passed to node 3 with the node number closer to node 1. Therefore, the node 3 has its own data and data for updating the value of the node 5. With this representation method, it is possible to easily grasp what kind of data moves between nodes.

ここで注意すべきは、ノード１の消去とノード２の消去の処理は順番を入れ替えても、あるいは同時に並列に処理しても計算結果に影響を与えないことである。なぜならば、これらの計算は互いに無関係に行うことができる（依存関係がない）からである。つまり、ツリーにおいて並んでいる横向きの矢印（ノードの消去）同士は並列に実行できることになる。したがって、ノードの消去を一つの処理単位として考えると、図２１−２の例の前進代入計算は、逐次処理では４ステップの処理であるのに対し、並列処理では３ステップの処理で済むことになる。以下、並列処理での各ステップ（複数のノード同時消去）をステージと呼ぶ。 It should be noted here that the processing of erasing node 1 and erasing node 2 does not affect the calculation result even if the order is changed or they are processed in parallel at the same time. This is because these calculations can be performed independently of each other (no dependency). That is, the horizontal arrows arranged in the tree (deletion of nodes) can be executed in parallel. Therefore, considering node erasure as one processing unit, the forward substitution calculation in the example of FIG. 21-2 is a four-step process in the sequential process, but a three-step process in the parallel process. Become. Hereinafter, each step (simultaneous deletion of a plurality of nodes) in parallel processing is referred to as a stage.

図２１−２のようなツリーを用いた前進／後退代入処理の表現は、これまでも提案されているが、図２１−２のツリー表現は、これまでのものよりも前進代入計算処理で並列に実行可能な処理の抽出を容易なものとするために考案されたものであり、以下の特徴をもっている。
（１）ツリーの各ステージの処理は、前進代入においては必ず左から右に（後退代入では右から左に）ツリーをたどる向きとなり、逆向きの処理は発生しない。
（２）同一ステージ内のノードの処理（横向きの矢印）は全て並列に行うことができ
る。
（３）並列処理の場合、例えば前進代入で、ノードの消去処理時に消去されるノードと値の変わるノードが異なるＣＰＵに割当てられている場合には、通信処理が発生する。これはツリーでＣＰＵ間をまたがる横向きの矢印となるため、並列処理時に必要となる通信回数が容易に把握できる。
（４）ノード間でどのようなデータが渡されるかがツリー上に明示されている。このため、並列処理時に通信で授受されるデータが容易に把握できる。 The expression of the forward / backward substitution process using a tree as shown in FIG. 21-2 has been proposed so far, but the tree expression of FIG. 21-2 is more parallel in the forward substitution calculation process than before. The invention has been devised in order to facilitate the extraction of processes that can be executed, and has the following characteristics.
(1) The processing of each stage of the tree always follows the tree from left to right in forward substitution (from right to left in backward substitution), and reverse processing does not occur.
(2) All processing of nodes in the same stage (horizontal arrows) can be performed in parallel.
(3) In the case of parallel processing, for example, in forward substitution, when a node whose value is changed and a node whose value is changed during the node deletion processing are assigned to different CPUs, communication processing occurs. Since this is a horizontal arrow across the CPUs in the tree, the number of communications required for parallel processing can be easily grasped.
(4) It is clearly shown on the tree what kind of data is passed between the nodes. For this reason, the data exchanged by communication at the time of parallel processing can be easily grasped.

非特許文献１では、Ｔｉｎｎｅｙ２法をベースにして並列処理向けの新しいノードの順序付け手法を考案している。前進／後退代入計算の並列処理を効率良く行うには、並列に実行できる計算ができるだけ多くなるように順序付けをすることが必要となる。そのためには、ステージ数がなるべく小さくなるようにノードの順序付けを行えば良い。そこでノードの順序付けを、各ステージでのノードの消去処理が通信処理の妨げにならない範囲でできるだけ多くなるように、かつＬＤＵ分解でのｆｉｌｌ−ｉｎの発生ができるだけ少なくなるように行うようにしている。 In Non-Patent Document 1, a new node ordering method for parallel processing is devised based on the Tinney 2 method. In order to perform parallel processing of forward / backward substitution calculations efficiently, it is necessary to order so that as many calculations as possible can be executed in parallel. For this purpose, the nodes may be ordered so that the number of stages is as small as possible. Therefore, the ordering of the nodes is performed so that the node erasure process in each stage is as much as possible within a range not interfering with the communication process, and the occurrence of fill-in in the LDU decomposition is minimized. .

具体的には、各ステージで消去されるノードに関して、以下の２つの選択基準を設けている。 Specifically, the following two selection criteria are provided for the nodes to be erased at each stage.

（１）ノードの消去により発生するｆｉｌｌ−ｉｎが少ないものから選択していく。ただし、あるステージで接続ブランチが２本以下のノードが消去される場合には、接続ブランチ数が３本以上のノードはこのステージでは消去せず、次のステージ以降で消去する。
（２）あるノードから２ブランチまでの範囲に、同じステージで消去されるノードがある場合、当該ノードは消去しない。 (1) Select one that has less fill-in generated by erasing the node. However, when a node having two or less connection branches is deleted at a certain stage, a node having three or more connection branches is not deleted at this stage, but is deleted after the next stage.
(2) If there is a node to be erased at the same stage in a range from a certain node to 2 branches, the node is not erased.

ノードの消去によって発生するｆｉｌｌ−ｉｎの数は、そのノードに接続しているブランチ数によって決まるので、接続するブランチ数が少ないノードから選択することで、（１）の条件を満たすことができる。また、接続ブランチ数の多いノードが早い段階で消去されると、ｆｉｌｌ−ｉｎの数の増大を招くので、接続ブランチ数が２本以下のノードが消去されるステージでは、接続ブランチ数が３本以上のノードは消去しない。 Since the number of fill-ins generated by erasing a node is determined by the number of branches connected to the node, the condition (1) can be satisfied by selecting from the nodes having a small number of connected branches. In addition, if a node with a large number of connected branches is erased at an early stage, the number of fill-in is increased. Therefore, in a stage where a node with two or fewer connected branches is erased, the number of connected branches is three. The above nodes are not deleted.

（２）の条件に関しては以下の理由による。あるノードが消去されるとそのノードに隣接するノードの値が書き換えられる。消去されたノードから２ブランチの範囲までに、同じステージで消去されるノードがあると、再度の値の書き換えが行われる可能性がある。２度の書き換えの両方が通信処理を伴う場合、２つの通信処理の前後関係を判断するのは不可能である。２つの通信処理のスケジューリング上の前後関係が実際の前後関係と異なる場合には、通信時間を余計に長くしてしまうことになり、また、使用計算機のＣＰＵ間の通信方式が共通のバスを用いる場合には、バスの混雑が発生し、通信処理に大きなペナルティが発生する可能性がある。このような通信処理の非効率化を避けるために、（２）の条件を設けている。 Regarding the condition of (2), the reason is as follows. When a node is deleted, the value of the node adjacent to that node is rewritten. If there is a node erased at the same stage from the erased node to the range of 2 branches, there is a possibility that the value is rewritten again. When both of the two rewrites involve a communication process, it is impossible to determine the relationship between the two communication processes. If the scheduling context of the two communication processes is different from the actual context, the communication time will be excessively long, and the communication method between the CPUs of the computers used uses a common bus. In some cases, bus congestion may occur, and a large penalty may occur in communication processing. In order to avoid such inefficiency in communication processing, the condition (2) is provided.

以上に述べた、ノードの順序付けのアルゴリズムを図２２に示す（非特許文献１を参照）。ここでは、接続ブランチ数が２以下のノードがない場合にのみ、接続ブランチ数が３以上のノードが選択できるようにしている。 The node ordering algorithm described above is shown in FIG. 22 (see Non-Patent Document 1). Here, only when there are no nodes with two or less connection branches, a node with three or more connection branches can be selected.

以上、ノードの順序付けについて述べたが、これは前進／後退代入の計算過程から、「並列処理が行える処理を抽出する処理」であり、順序付けの結果として得られた処理を各ＣＰＵに割当てていく必要がある。系統解析計算の多くが基本的にノード単位の処理であるため、非特許文献１では、ノード単位にＣＰＵに割当てることとしている。 The ordering of nodes has been described above. This is “a process for extracting a process that can perform parallel processing” from the calculation process of forward / backward substitution, and the process obtained as a result of the ordering is assigned to each CPU. There is a need. Since most of the system analysis calculations are basically processing in node units, Non-Patent Document 1 assigns them to CPUs in node units.

効率の良い並列処理を実現するためには、以下の２点を考慮する必要がある。
・通信回数の低減
・ＣＰＵ間での計算処理の均等化 In order to realize efficient parallel processing, it is necessary to consider the following two points.
・ Reduction of communication frequency ・ Equivalent calculation processing among CPUs

この２つの目的はトレードオフの関係にある。そこで、一つのパラメータ（以下、グループ化係数と呼ぶ）を用いて、ノードにＣＰＵを割当てる際にノードのグループの大きさを調整するようにし、両者のバランスが取れるようにしている。 These two purposes are in a trade-off relationship. Therefore, the size of the group of nodes is adjusted when assigning CPUs to the nodes using one parameter (hereinafter referred to as a grouping coefficient) so that the two can be balanced.

通信の処理は計算の処理に比べて時間を要するため、通信回数をなるべく少なくすることが並列処理による高速化の観点から不可欠となる。処理ツリー上では通信処理はＣＰＵ間にまたがる横向きの矢印であるので、横向きの矢印の少ない、番号の大きなステージで通信を行うようにする。 Since the communication process requires more time than the calculation process, it is indispensable to reduce the number of communication as much as possible from the viewpoint of speeding up by parallel processing. Since the communication process is a horizontal arrow across the CPUs on the processing tree, communication is performed at a stage with a large number with few horizontal arrows.

各ＣＰＵに処理を割当てるアルゴリズムは以下のようになる。
（１）前進／後退代入の処理ツリーを番号の大きなステージで切る。ただし、切り離された枝葉の部分のノード数が、（全ノード数）／（グループ化係数）を上回らないようにする。切り離された枝葉の部分の各々を一つのグループとする。
（２）（１）の結果得られるグループの境界線から、（ＣＰＵ数−１）本の境界線を選択して、ＣＰＵへの割り当ての境界線とする。その際、
Σ_ｉ｜ｎｉ−ｎＴｏｔａｌ／ｎＣＰＵ｜・・・（３）
を最小化する境界線の組合せを選択する。ただし、ｎｉはＣＰＵｉに割当てられるノードの数、ｎＴｏｔａｌは全ノード数、ｎＣＰＵはＣＰＵ数を表す。 The algorithm for assigning processing to each CPU is as follows.
(1) The forward / backward substitution processing tree is cut at a stage with a large number. However, the number of nodes in the separated branches and leaves should not exceed (total number of nodes) / (grouping coefficient). Each of the separated branches and leaves is made into one group.
(2) From the group boundary lines obtained as a result of (1), (number of CPUs-1) boundary lines are selected and set as the boundary lines for allocation to the CPU. that time,
Σ _i | ni-nTotal / nCPU | (3)
Select the combination of borders that minimizes. Here, ni represents the number of nodes assigned to CPUi, nTotal represents the total number of nodes, and nCPU represents the number of CPUs.

上述のような非特許文献１に記載の方法は、ノードの順序付けにＴｉｎｎｅｙ２法を適用しており、放射状系統に対しては最適になるが、ループ状系統に対しては最適にはならず、ｆｉｌｌ−ｉｎが多くなり処理性能が悪化すること、また、並列計算を各ＣＰＵへ割当てるための処理に時間が掛り、並列処理のオーバヘッドが大きくなること、また、並列処理の効果をＣＰＵ間通信量と各ＣＰＵの処理量のバランスを考慮して決定するため、ＣＰＵ資源の無駄が発生する等の問題点があった。ここで、放射状系統とは、内部にループ構造を含まない系統であり、ループ状系統とはループ構造を含む系統である。なお、実際の系統では、発電所等が系統の末端にあり、そのため、ツリーの中に放射状系統部分は必ず存在している。このような問題点と、さらに、オフラインで実施される系統解析業務以外の訓練シミュレータ、オンラインの給電自動化システム等への適用にはリアルタイム性能が要求されることから、非特許文献１に記載の方法はこの分野には適用が困難であるという問題点があった。 The method described in Non-Patent Document 1 as described above applies the Tinney 2 method to node ordering, and is optimal for a radial system, but not optimal for a loop system. Processing performance deteriorates due to an increase in fill-in, and it takes time to allocate parallel computations to each CPU, which increases the overhead of parallel processing. The CPU resource is determined in consideration of the balance of the processing amount of each CPU, and there is a problem that CPU resources are wasted. Here, the radial system is a system that does not include a loop structure, and the loop system is a system that includes a loop structure. In an actual system, a power plant or the like is at the end of the system, and therefore a radial system part is always present in the tree. Since such problems and real-time performance are required for application to training simulators other than system analysis work performed offline, online power supply automation systems, etc., the method described in Non-Patent Document 1 Had a problem that it was difficult to apply in this field.

そこで、本実施の形態では、リアルタイム性能を実現させるために、非特許文献２に記載された連立一次方程式の求解方法に基づき、前進消去・後退代入処理の並列処理化を向上させるノード順序付け方法を以下に開示する。 Therefore, in this embodiment, in order to realize real-time performance, a node ordering method that improves parallel processing of forward erasure / backward substitution processing based on the simultaneous linear equation solving method described in Non-Patent Document 2 is provided. It is disclosed below.

図１１−１および図１１−２は、本実施形態のノード順序付け方法を示すフロー図であり、図１１−２は、図１１−１に続く処理を示す。なお、図１１−１と図１１−２とは、‘１’で示す箇所で接続される。図１１−１において、まず初期化処理（１）（ステップＳ４０）により、フラグ類の初期設定を行い、次に各ノードに接続する相手端ノード番号、相手端ノード数等のノードテーブル作成（ステップＳ４１）を行い、相手端ノード数の少ない順番にノードテーブル並替え（ステップＳ４２）を行う。相手端ノード数が同一の場合は、先に処理したノード順に並替えを行う。なお、各ノードに接続する相手端ノードとは、各ノードに（１本の）ブランチを介して接続されている隣接ノードをいう。例えば、図１３−３においては、ノード１の相手端ノードは、ノード５である。 FIGS. 11A and 11B are flowcharts showing the node ordering method of the present embodiment, and FIG. 11B shows the processing following FIG. In addition, FIG. 11-1 and FIG. 11-2 are connected in the location shown as '1'. In FIG. 11A, first, initialization processing (1) (step S40) is performed to initially set flags, and then a node table such as a partner node number and a partner node number connected to each node is created (step S41) is performed, and the node table rearrangement is performed in the order from the smallest number of counterpart nodes (step S42). When the number of counterpart nodes is the same, rearrangement is performed in the order of the previously processed nodes. The partner node connected to each node refers to an adjacent node connected to each node via a (one) branch. For example, in FIG. 13C, the counterpart node of node 1 is node 5.

次に、まず、放射状系統部分のノード順序付けを行う。放射状系統部分においては、例えば図１３−３の放射状系統におけるノード１、ノード２、ノード３、ノード４、およびノード６のように、あるノードに接続するブランチ数が１つという部分を含み、従って、このようなノードに対しては、相手端ノード数も１つということになる。また、例えばノード１を任意に選択して順序付けた場合、ノード１が消去されノード１とノード５との間のブランチも消去されると、ノード５の接続ブランチ数が１つとなって、同様の構造が生じる。そして、順次、放射状系統部分のノード順序付けを行えば、最適なノード順序付けを行えることが分かっている（Ｔｉｎｎｅｙ２法の考え方）。以下、この特徴に基づき、ノード順序付けを行う。 Next, node ordering of the radial system portion is performed first. The radial system part includes a part having one branch connected to a certain node, such as node 1, node 2, node 3, node 4, and node 6 in the radial system of FIG. For such a node, the number of counterpart nodes is also one. For example, when node 1 is arbitrarily selected and ordered, if node 1 is deleted and the branch between node 1 and node 5 is also deleted, the number of connection branches of node 5 becomes one, and the same A structure arises. Then, it is known that optimal node ordering can be performed if the node ordering of the radial system portions is sequentially performed (the concept of the Tinney 2 method). Hereinafter, node ordering is performed based on this feature.

前進消去、後退代入処理を並列処理する場合、通常は各ＣＰＵが１ノードの処理を行うため、並列処理に利用されるＣＰＵ数（以下、並列ＣＰＵ数という。）の要素を一度に扱うことになり、それらの要素処理単位でノード順序付けを行う必要がある。このため、まずステップＳ４３で同時に順序付けを行うノード数が並列ＣＰＵ数以下か否かの判定を行い、Ｎｏの場合は初期化処理（２）（ステップＳ４４）により、要素数カウントの初期化を行う。Ｙｅｓの場合は、何も処理せず、ステップＳ４５に分岐する。 When forward erasure and backward substitution processing are performed in parallel, each CPU normally performs processing for one node, and therefore, the number of CPUs used for parallel processing (hereinafter referred to as the number of parallel CPUs) is handled at a time. Therefore, it is necessary to perform node ordering for each element processing unit. For this reason, first, in step S43, it is determined whether or not the number of nodes to be simultaneously ordered is equal to or less than the number of parallel CPUs. If No, the element count is initialized by the initialization process (2) (step S44). . In the case of Yes, nothing is processed and the process branches to step S45.

ステップＳ４５では、順序付け対象のノードｉが放射状系統か否かの判定を行い、Ｙｅｓであれば、ステップＳ４６で、ノードｉは並列処理可能か否かの判定を行う。なお、並列処理可能か否かの判定については後述する。ステップＳ４６での判定の結果がＹｅｓの場合、ノードｉを順序付け（ステップＳ４７）、各種配列のカウントアップ（１）（ステップＳ４８）を行い、ステップＳ４３に分岐する。ステップＳ４６での判定の結果がＮｏの場合は、この処理では当該ノードｉを順序付け除外対象（ステップＳ４９）とし、各種配列のカウントアップ（２）（ステップＳ５０）を行い、ステップＳ４３に分岐する。 In step S45, it is determined whether or not the node i to be ordered is a radial system. If Yes, in step S46, it is determined whether or not the node i can be processed in parallel. The determination of whether parallel processing is possible will be described later. If the result of determination in step S46 is Yes, node i is ordered (step S47), various arrays are counted up (1) (step S48), and the process branches to step S43. If the result of the determination in step S46 is No, in this process, the node i is excluded from ordering (step S49), the various arrays are counted up (2) (step S50), and the process branches to step S43.

ステップＳ４５でＮｏの場合は、除外対象ノードとした放射状系統部分が残っているかどうかを確認するために、順序付けが未実施のノードについて、相手端ノード数が少ない順番にノードテーブル並替え（ステップＳ５１）を行い、初期化処理（３）（ステップＳ５２）を行う。次に、ノードテーブル並替え後のノードについて、ノードｉが放射状系統部分か否かの判定（ステップＳ５３）を行い、Ｙｅｓの場合は、さらに放射状系統部分が全て除外対象であるか否かの判定（ステップＳ５４）を行い、ステップＳ５４での判定結果がＮｏの場合、ステップＳ４６に分岐する。ステップＳ５４での判定結果がＹｅｓの場合は、全除外対象ノードを順番に順序付け（ステップＳ５５）を行い、図１１−２の‘１’に示すステップに進む。ステップＳ５３でＮｏの場合は、図１１−２の‘１’に示すステップに進む。 In the case of No in step S45, the node table is rearranged in the order of the smaller number of partner end nodes for the nodes that have not been ordered in order to check whether or not the radial system portion as the node to be excluded remains (step S51). ) To perform initialization processing (3) (step S52). Next, with respect to the node after the node table rearrangement, it is determined whether or not the node i is a radial system part (step S53). In the case of Yes, it is further determined whether or not all the radial system parts are to be excluded. (Step S54) is performed, and if the determination result in Step S54 is No, the process branches to Step S46. If the determination result in step S54 is Yes, all the exclusion target nodes are ordered in order (step S55), and the process proceeds to step ‘1’ in FIG. In the case of No in step S53, the process proceeds to the step indicated by "1" in FIG.

上記までの処理で、放射状系統部分のノード順序付けが終了したので、以降はループ系統部分のノード順序付けを行う。ループ系統部分については、最適なノード順序付け方法が無く、従って、各ノードの新規非零要素発生のシミュレーションを行い、その新規非零要素発生数の一番少ないノードについて、順序付けを行い、それを繰返して実施して行くことが準最適化になることになっている（Ｔｉｎｎｅｙ３法の考え方）。ここでは、その考え方に基づき、以下の手順でノード順序付けを行う。 Since the node ordering of the radial system portion has been completed by the above processing, the node ordering of the loop system portion is performed thereafter. For the loop system part, there is no optimal node ordering method. Therefore, the simulation of the generation of new non-zero elements at each node is performed, the order with the least number of new non-zero element occurrences is performed, and it is repeated. It is supposed to be semi-optimized to carry out (concept of Tinney 3 method). Here, node ordering is performed according to the following procedure based on this concept.

図１１−２に示すように、まず、ステップＳ５６で、各種フラグ等の初期化処理（４）を行い、次に残りノード全てについて、新規非零要素発生のシミュレーション（ステップＳ５７）を行い、新規非零要素発生数の少ない順番にノード並替え（ステップＳ５８）を行う。次に、順序付けノード数が並列ＣＰＵ数以上か否かの判定（ステップＳ５９）を行い、Ｙｅｓの場合は、順序付け配列数等の初期化処理（２）（ステップＳ６０）を行い、ステップＳ６１に分岐する。Ｎｏの場合は、何も処理せずにステップＳ６１に分岐する。ステップＳ６１では、ノードｉが並列処理可能か否かの判定を行い、Ｙｅｓの場合、ノードｉを順序付け（ステップＳ６２）をし、各種配列のカウントアップ（３）（ステップＳ６３）を行い、次に、ステップＳ６４で、全ノードの順序付けが終了したか否かの判定を行い、Ｎｏの場合は、各種配列のカウントアップ（４）（ステップＳ６５）を行い、ステップＳ５７に分岐する。ステップＳ６４での判定結果がＹｅｓの場合は、処理を終了する。ステップＳ６１でＮｏの場合は、この処理ではノードｉを除外対象（ステップＳ６６）とし、各種配列のカウントアップ（５）（ステップＳ６７）を行い、続いて、残りのノードが全て除外対象か否かの判定（ステップＳ６８）を行い、Ｎｏの場合は、各種配列のカウントアップ（６）（ステップＳ６９）を行い、ステップＳ５９に分岐する。ステップＳ６８でＹｅｓの場合は、残りのノードについて、新規非零要素発生のシミュレーション（ステップＳ７０）を行い、新規非零要素発生数の少ない順番にノード並替え（ステップＳ７１）を行い、そして、それが最小のノードｉの順序付け（ステップＳ７２）を行い、全て終了すれば処理を終了する。 As shown in FIG. 11B, first, initialization processing (4) of various flags and the like is performed in step S56, and then simulation of new non-zero element generation (step S57) is performed for all remaining nodes. Node rearrangement is performed in the order of decreasing number of non-zero elements (step S58). Next, it is determined whether or not the number of ordered nodes is equal to or greater than the number of parallel CPUs (step S59). If Yes, initialization processing (2) (step S60) for the number of ordered arrays is performed, and the process branches to step S61. To do. In No, it branches to step S61, without processing anything. In step S61, it is determined whether or not the node i can be processed in parallel. If yes, the node i is ordered (step S62), the various arrays are counted up (3) (step S63), and then In step S64, it is determined whether or not the ordering of all the nodes has been completed. If No, the various arrays are counted up (4) (step S65), and the process branches to step S57. If the determination result in step S64 is Yes, the process ends. In the case of No in step S61, in this process, node i is excluded (step S66), various arrays are counted up (5) (step S67), and then whether all the remaining nodes are excluded or not is determined. (No in step S68). If No, the various sequences are counted up (6) (step S69), and the process branches to step S59. In the case of Yes in step S68, a simulation of new non-zero element generation (step S70) is performed for the remaining nodes, and node rearrangement (step S71) is performed in the order of the new non-zero element generation number, and Is assigned to the smallest node i (step S72).

図１２は、図１１−２における新規非零要素発生のシミュレーションの並列処理を示すフロー図である。ループ系統部分においては、ノード順序付けの最適化のために、残りの全ノードについて、そのノードを順序付けした場合の新規非零要素発生数がいくつになるかのシミュレーションを行い、その新規非零要素発生数が最小のノードを次の順序付けノード候補とし、当該ノードを並列処理が可能か否かの判定を行い、並列処理が可能な場合は、そのノードの順序付けを行い、以降、上記の処理を最後まで繰返し、実施している。従って、ループ系統部分のノード順序付けは、処理時間が掛かることになる。そのため、新規非零要素発生のシミュレーションについては、並列処理を行い、処理の高速化を実現する。 FIG. 12 is a flowchart showing parallel processing of the simulation of new non-zero element generation in FIG. 11-2. In the loop part, for the optimization of node ordering, for all remaining nodes, a simulation is performed to determine how many new nonzero elements are generated when the nodes are ordered. The node with the smallest number is set as the next candidate for the ordering node, and it is determined whether or not the node can be processed in parallel. If parallel processing is possible, the node is ordered, and then the above processing is completed. It repeats until. Therefore, the node ordering of the loop system portion takes processing time. Therefore, for the simulation of new non-zero element generation, parallel processing is performed to increase the processing speed.

図１２において、ループ系統部分の全ノードを、並列処理を行う複数のＣＰＵに平均的に按分して、処理を分担することを考える。まず、各ＣＰＵで処理する場合の平均処理量の計算（ステップＳ７３）を行い、また、平均処理量の計算における余りの計算（ステップＳ７４）を行う。余りの処理量については、ＣＰＵ１で担当させる（すなわち、ＣＰＵ１は、平均処理量＋余り分の処理を担当する。）。次に、処理するＣＰＵがＣＰＵ１か否かの判定（ステップＳ７５）を行い、Ｙｅｓの場合は、処理対象（ｋ１、ｋ２）の計算（ステップＳ７６）を行い、以降の処理におけるフラグ類の初期化処理（ステップＳ７７）を行い、ステップＳ７９に分岐する。ＣＰＵ１以外の場合は、処理対象（ｋ１、ｋ２）の計算（ステップＳ７８）を行い、ステップＳ７９に分岐する。ステップＳ７９では、各ＣＰＵが処理を実行できる状態か否かを、自身に対するＣＰＵ処理フラグである自ＣＰＵ処理フラグがゼロか否かで判定し、Ｎｏの場合は継続してチェックを行う。ステップＳ７９でＹｅｓの場合は、各ＣＰＵが担当する処理量について、新規非零要素発生のシミュレーション（ステップＳ８０）を行い、その結果を共有メモリに保存する。その後、各ＣＰＵは処理が終了したことをＣＰＵ１に示すために、自ＣＰＵ処理フラグに、自ＣＰＵ番号を書き込む（ステップＳ８１）。 In FIG. 12, it is considered that all nodes in the loop system portion are equally distributed to a plurality of CPUs that perform parallel processing, and the processing is shared. First, calculation of the average processing amount in the case of processing by each CPU (step S73) is performed, and the remainder of the calculation of the average processing amount is calculated (step S74). The CPU 1 takes charge of the surplus processing amount (that is, the CPU 1 takes charge of processing of the average processing amount + the surplus). Next, it is determined whether or not the CPU to be processed is CPU 1 (step S75). If Yes, the processing target (k1, k2) is calculated (step S76), and flags are initialized in the subsequent processing. Processing (step S77) is performed, and the process branches to step S79. In cases other than the CPU 1, the processing target (k1, k2) is calculated (step S78), and the process branches to step S79. In step S79, it is determined whether or not each CPU is ready to execute processing based on whether or not its own CPU processing flag, which is a CPU processing flag for itself, is zero. In the case of Yes in step S79, a new non-zero element generation simulation (step S80) is performed for the processing amount handled by each CPU, and the result is stored in the shared memory. Thereafter, each CPU writes its own CPU number in its own CPU processing flag to indicate to CPU 1 that the processing has been completed (step S81).

次に、再び、ＣＰＵ１か否かの判定（ステップＳ８２）を行い、ＣＰＵ１の場合は、全ＣＰＵの処理フラグが正か否かの判定（ステップＳ８３）を行い、ステップＳ８３での判定結果がＮｏの場合は継続してチェックを行う。ステップＳ８３での判定結果がＹｅｓの場合は、各ＣＰＵが計算した新規非零要素発生数が少ない順番に並び替えを行い、それが最小である最初のノードを選定し（ステップＳ８４）、その後、処理終了フラグにＣＰＵ１の番号をセットし（ステップＳ８５）、処理を終了する。ステップＳ８２でＣＰＵ１でない場合、処理終了フラグが正か否かの判定（ステップＳ８６）を行い、Ｎｏの場合は継続してチェックを行い、Ｙｅｓの場合は処理を終了する。 Next, it is determined again whether or not it is CPU 1 (step S 82). In the case of CPU 1, it is determined whether or not the processing flags of all CPUs are positive (step S 83), and the determination result in step S 83 is No. In the case of, check continuously. If the determination result in step S83 is Yes, the CPU calculates the number of new nonzero element occurrences calculated by each CPU, selects the first node with the smallest number (step S84), and then The CPU 1 number is set in the process end flag (step S85), and the process ends. If it is not CPU1 in step S82, it is determined whether or not the process end flag is positive (step S86). If No, the check is continued, and if Yes, the process ends.

次に、本実施の形態の動作について説明する。本実施の形態は、非特許文献２に記載の連立一次方程式求解の実時間処理を実現するために、連立一次方程式の前進消去・後退代入の処理について、並列処理を可能とすると共に、並列処理の実行効率を向上させて処理性能の大幅な向上を実現するものである。なお、図２に示すように、連立一次方程式の並列計算を行う計算機構成は、共有メモリ結合された対称型マルチＣＰＵを想定しており、この計算機であれば、各ＣＰＵで共通に読み書きするデータを共有メモリに配置することにより、計算機間の通信手段を不要とし、高速にデータのアクセスができることになり、各ＣＰＵの性能を十分に活用することが可能になる。 Next, the operation of the present embodiment will be described. In the present embodiment, in order to realize real-time processing for solving simultaneous linear equations described in Non-Patent Document 2, parallel processing can be performed for forward erasure / reverse substitution processing of simultaneous linear equations. The processing efficiency is improved and the processing performance is greatly improved. As shown in FIG. 2, the computer configuration for performing parallel calculation of simultaneous linear equations is assumed to be a symmetric multi-CPU coupled with a shared memory. Is placed in the shared memory, communication means between computers becomes unnecessary, data can be accessed at high speed, and the performance of each CPU can be fully utilized.

非特許文献２では、ノードの順序付け方法としてＴｉｎｎｅｙ２法に基づいている。図４は、非特許文献２に記載の系統例を示す図であり、円内の数字は、ノード番号を示す。また、図５は、図４の系統例の回路網方程式（連立一次方程式）を示す図である。また、非特許文献２では、係数行列の三角分解については、図６の処理フローにより実施し、その結果を図７に示すテーブル形式で保存している。図６では、処理はｉ、ｊ、ｋの３重のループで構成されており、ループｉではステップＳ２６、Ｓ２７の処理を、ループｊではステップＳ２８、Ｓ２９の処理を、また、ループｋではステップＳ３０の処理を行う。図６におけるＮＮｍａｘはノード総数を示す。図７では、’のついている要素は、三角分解の過程で変更を受けた要素であり、Ｐ（ｋ）、Ｑ（ｋ）はノードの縮約順序を示すベクトルである。また、Ｎｍａｘは、係数行列の三角分解後の非対角非零要素数を示す。ノード順序付けは、Ｑ（ｋ）で示すノードを消去し、Ｐ（ｋ）で示すノードに縮約したことを示している。一方、係数行列の三角分解後に行う前進消去・後退代入の処理は、図８に示す処理フローに従って実行するが、後退代入の処理は、２段階に分かれている。この前進消去の処理において、並列処理の阻害要因があり、それを図９により説明する。 Non-Patent Document 2 is based on the Tinney 2 method as a node ordering method. FIG. 4 is a diagram illustrating a system example described in Non-Patent Document 2, and the numbers in the circles indicate node numbers. FIG. 5 is a diagram showing circuit network equations (simultaneous linear equations) of the system example of FIG. In Non-Patent Document 2, triangulation of a coefficient matrix is performed by the processing flow of FIG. 6, and the result is stored in the table format shown in FIG. In FIG. 6, the process is composed of a triple loop of i, j, and k. In loop i, the processes in steps S26 and S27 are performed, in loop j, the processes in steps S28 and S29 are performed, and in loop k, the steps are performed. The process of S30 is performed. NNmax in FIG. 6 indicates the total number of nodes. In FIG. 7, elements marked with 'are elements that have been changed during the triangulation process, and P (k) and Q (k) are vectors indicating the contraction order of the nodes. Nmax indicates the number of non-diagonal non-zero elements after triangular decomposition of the coefficient matrix. The node ordering indicates that the node indicated by Q (k) is deleted and reduced to the node indicated by P (k). On the other hand, the forward erasure / backward substitution process performed after triangulation of the coefficient matrix is executed according to the processing flow shown in FIG. 8, but the backward substitution process is divided into two stages. In this forward erasure process, there is an obstacle to parallel processing, which will be described with reference to FIG.

図９においては、例えばＣＰＵ４台で並列処理する場合を示しており、１回目の並列処理では、ｋ＝１，２，３，４をそれぞれＣＰＵ１、ＣＰＵ２、ＣＰＵ３、ＣＰＵ４が担当するが、ＣＰＵ２については、Ｑ（２）＝２が先行して処理するＣＰＵ１のＰ（１）＝２と一致するため、並列処理ができず、直列に処理を行う必要がある。そのため、ＣＰＵ１の処理が終了した後に、ＣＰＵ２の処理を実行する必要がある。また、ＣＰＵ４については、Ｐ（４）＝４が、ＣＰＵ３のＰ（３）＝４と一致するため、並列処理ができず、直列に処理を行う必要があり、ＣＰＵ３の処理が終了した後に、ＣＰＵ４の処理を実行する必要がある。同様に、２回目の並列処理においては、ＣＰＵ２の処理は並列処理ができず、また、３回目の並列処理においては、ＣＰＵ２の処理は並列処理ができないことが分かる。 FIG. 9 shows a case where parallel processing is performed by four CPUs, for example. In the first parallel processing, k = 1, 2, 3, and 4 are in charge of CPU1, CPU2, CPU3, and CPU4, respectively. Since Q (2) = 2 coincides with P (1) = 2 of CPU 1 that processes in advance, parallel processing cannot be performed, and processing must be performed in series. Therefore, it is necessary to execute the processing of CPU 2 after the processing of CPU 1 is completed. For CPU4, P (4) = 4 matches P3 (3) = 4 of CPU3, so parallel processing cannot be performed and processing must be performed in series. It is necessary to execute the processing of the CPU 4. Similarly, in the second parallel processing, it is understood that the processing of the CPU 2 cannot be performed in parallel, and in the third parallel processing, the processing of the CPU 2 cannot be performed in parallel processing.

一方、後退代入の処理における並列処理の阻害要因については、図１０を参照して説明する。後退代入処理については、要素番号の大きいものから要素番号が小さい方向に、処理を行う。まず、１回目の並列処理においては、ｋ＝１０，９，８，７を、それぞれＣＰＵ１、ＣＰＵ２、ＣＰＵ３、ＣＰＵ４が担当するが、ＣＰＵ２については、Ｐ（９）＝１０が、ＣＰＵ１のＱ（１０）＝１０と一致するため、並列処理ができず、直列に処理を行う必要がある。そのため、ＣＰＵ１の処理が終了した後に、ＣＰＵ２の処理を実行する必要がある。また、２回目の並列処理においては、ＣＰＵ２については、Ｐ（５）＝３が、ＣＰＵ１のＱ（６）＝３と一致するため、並列処理ができず、直列に処理を行う必要があり、ＣＰＵ１の処理が終了した後に、ＣＰＵ２の処理を実行する必要がある。さらに、ＣＰＵ４については、Ｐ（３）＝４が、ＣＰＵ３のＰ（４）＝４と一致するため、並列処理ができず、直列に処理を行う必要があり、ＣＰＵ３の処理が終了した後に、ＣＰＵ４の処理を実行する必要がある。３回目の並列処理でも、ＣＰＵについては、Ｐ（１）＝２が、ＣＰＵ１のＱ（２）＝２と一致するため、並列処理ができず、直列に処理を行う必要があり、ＣＰＵ１の処理が終了した後に、ＣＰＵ２の処理を実行する必要がある。前述の通り、後退代入処理における並列処理の阻害要因も、前進消去処理と同じであることが分かる。 On the other hand, the obstacle factor of the parallel processing in the backward substitution processing will be described with reference to FIG. As for backward substitution processing, processing is performed in the direction from the larger element number to the smaller element number. First, in the first parallel processing, k = 10, 9, 8, and 7 are handled by CPU1, CPU2, CPU3, and CPU4, respectively, but for CPU2, P (9) = 10 is the Q (1) of CPU1. Since 10) = 10, parallel processing cannot be performed, and processing must be performed in series. Therefore, it is necessary to execute the processing of CPU 2 after the processing of CPU 1 is completed. In the second parallel processing, for CPU2, P (5) = 3 matches Q1 (6) = 3 of CPU1, so parallel processing cannot be performed, and processing must be performed in series. It is necessary to execute the processing of CPU2 after the processing of CPU1 is completed. Furthermore, for CPU4, P (3) = 4 matches P3 (4) = 4 of CPU3, so parallel processing cannot be performed and it is necessary to perform processing in series. It is necessary to execute the processing of the CPU 4. Even in the third parallel processing, for CPU, P (1) = 2 coincides with Q (2) = 2 of CPU1, so parallel processing cannot be performed and processing must be performed in series. After the process is completed, the CPU 2 needs to execute the process. As described above, it can be understood that the obstacle factor of the parallel processing in the backward substitution process is the same as that in the forward erasure process.

係数行列のノード順序付けを行うと、三角分解、前進消去、後退代入の処理も全て、同じ影響を受けることになる。従って、係数行列の三角分解の処理に悪影響を与えず、前進消去と後退代入処理の並列処理の実行効率を向上させるノード順序付け方法を考案することが求められる。 When node ordering of the coefficient matrix is performed, the triangulation, forward elimination, and backward substitution processing are all affected by the same effect. Therefore, it is required to devise a node ordering method that improves the execution efficiency of the parallel processing of forward erasure and backward substitution processing without adversely affecting the triangulation of the coefficient matrix.

従って、本実施の形態では、以下の基本的考え方に基づき、ノード順序付けを行う。 Therefore, in this embodiment, node ordering is performed based on the following basic concept.

（１）ｆｉｌｌ−ｉｎの最小なノードからノード順序付けを行う。この条件を維持すれば、係数行列の三角分解の処理は、従来の方法と同じ考え方であり、処理性能が悪くなることは無い。
（２）（１）に基づき、前進消去処理における並列処理が可能なノードを優先して、ノ
ード順序付けを行う。これにより、前進消去処理の並列処理の実行効率を向上できると共に、後退代入処理の並列処理の実行効率も向上できる。 (1) Node ordering is performed from the smallest node of fill-in. If this condition is maintained, the process of triangulating the coefficient matrix is the same as the conventional method, and the processing performance is not deteriorated.
(2) Based on (1), node ordering is performed with priority given to nodes capable of parallel processing in forward erasure processing. Thereby, the execution efficiency of the parallel process of the forward erasure process can be improved, and the execution efficiency of the parallel process of the backward substitution process can be improved.

前進消去処理の並列処理の実行効率を向上させるためのノード順序付け方法を以下に説明する。前進消去処理における並列処理の阻害要因については、図９に基づき説明した。従って、この阻害要因を解消し、以下の考え方でノード順序付けを行う。
（１）ｆｉｌｌ−ｉｎが最小なノードからノード順序付けを行う。
（２）最初のノードは、（１）で得られたｆｉｌｌ−ｉｎが最小なノードの中から任意に選択するが、以降の並列処理対象のノードについては、当該ノード順序付け候補のノードとその相手端ノードが、順序付け済みのノードの相手端ノードと一致しないノードである場合、優先してノード順序付けを行う。ノード順序付け候補のノードとその相手端ノードが順序付け済みのノードの相手端ノードと一致しないノードであるか否かは、前述の、並列処理が可能か否かの判定である。
（３）優先して順序付けるノードが全て無くなれば、残りのノードをｆｉｌｌ−ｉｎが最小なノードから順にノード順序付けを行う。 A node ordering method for improving the execution efficiency of the forward erasure process parallel processing will be described below. The obstruction factors of the parallel processing in the forward erasure processing have been described based on FIG. Therefore, this obstruction factor is eliminated and node ordering is performed based on the following concept.
(1) Node ordering is performed from the node with the smallest fill-in.
(2) The first node is arbitrarily selected from the nodes with the smallest fill-in obtained in (1). For the subsequent nodes to be processed in parallel, the node ordering candidate node and its counterpart If the end node is a node that does not match the other end node of the already ordered node, the node ordering is performed with priority. Whether or not the node ordering candidate node and its counterpart node are nodes that do not match the counterpart node of the ordered node is the above-described determination of whether parallel processing is possible.
(3) If all nodes that are preferentially ordered disappear, the remaining nodes are ordered in order from the node with the smallest fill-in.

上記の考え方を処理フローで表現したのが、図１１−１および図１１−２である。図１１−１では、まず、ｆｉｌｌ−ｉｎが最小となる放射状系統部分のノード順序付けを行い、その後、図１１−２で、ループ状系統部分のノード順序付けを行う方法である。なお、ノード順序付け方法については、放射状系統部分にはＴｉｎｎｅｙ２法を、ループ状系統部分についてはＴｉｎｎｅｙ３法を適用しており、ｆｉｌｌ−ｉｎが最小となる方法である。 FIG. 11A and FIG. 11B represent the above concept by a processing flow. In FIG. 11A, first, the node ordering of the radial system portion where fill-in is minimized is performed, and then the node ordering of the loop system portion is performed in FIG. As for the node ordering method, the Tinney 2 method is applied to the radial system portion, and the Tinney 3 method is applied to the loop system portion, so that the fill-in is minimized.

なお、Ｔｉｎｎｅｙ３法を適用した計算では、新規非零要素発生のシミュレーションを全ノードについて実施し、ｆｉｌｌ−ｉｎが最小なノードを選択することになるため、これを並列処理して、高速化を実現している。その処理フローを図１２に示す。図１２では、まず、各ＣＰＵが自分の処理対象要素を計算するために、各ＣＰＵの平均処理量と余りを計算し、ＣＰＵ１が先頭から、（平均処理量＋余り）の要素を分担し、ＣＰＵ２以降はそれに続く、平均処理量の要素を分担するようにしている。その後、各ＣＰＵが分担する全要素について、新規非零要素発生数のシミュレーションを行い、処理が終了すれば、共有メモリにそのシミュレーション結果と自分のＣＰＵ処理フラグに自分のＣＰＵ番号をセットする。全体の管理を行うＣＰＵ１は、自分の処理が終了し、全ＣＰＵのＣＰＵ処理フラグがセットされていれば、全処理が終了したことになるので、各ＣＰＵの計算した新規非零要素発生数のシミュレーション結果を共有メモリから取り出し、それを新規非零要素発生数の少ない順番に並び替えを行い、最初のノードをノード順序付け候補として選択する。そして、処理終了フラグをセットし、処理を終了する。一方、ＣＰＵ１以外のＣＰＵは、処理終了フラグをチェックし、処理終了フラグがセットされていれば、処理を終了する。以上の方法で、新規非零要素発生数のシミュレーションを並列処理することができる。 In the calculation using the Tinney 3 method, a simulation of new non-zero element generation is performed for all nodes, and a node with the smallest fill-in is selected. This is processed in parallel to achieve high speed. is doing. The processing flow is shown in FIG. In FIG. 12, first, in order for each CPU to calculate its own processing target element, the average processing amount and the remainder of each CPU are calculated, and the CPU 1 shares the element of (average processing amount + remainder) from the top. Subsequent to CPU 2, the subsequent elements of the average processing amount are shared. After that, a simulation of the number of new non-zero elements generated is performed for all the elements shared by each CPU, and when the process is completed, the CPU result is set in the shared memory and the CPU number is set in its own CPU processing flag. The CPU 1 that manages the entire system has completed its own processing when its own processing has been completed and the CPU processing flags of all CPUs have been set. Therefore, the number of new non-zero element occurrences calculated by each CPU is determined. The simulation result is taken out from the shared memory, and is rearranged in the order of the number of new non-zero element occurrences, and the first node is selected as a node ordering candidate. Then, a process end flag is set, and the process ends. On the other hand, the CPUs other than the CPU 1 check the process end flag, and if the process end flag is set, the process ends. With the above method, the simulation of the number of new non-zero element occurrences can be processed in parallel.

次に、このノード順序付け方法の効果を、図１３−１〜図１３−４を参照して説明する。図１３−１は、系統例における従来のノード順序付けされたノード番号を示し、図１３−２は、そのノード順序付けに従い、三角分解をした結果を示す。図１３−３は、本実施形態のノード順序付けに基づくノード順序付けの結果を示し、図１３−４は、そのノード順序付けに基づき、三角分解をした結果を示す。図１３−１および図１３−３は、図５に示す連立一次方程式の係数行列の構造をツリーで表現し、このツリーはノードとノード間を接続するブランチとから構成されている。図１３−１、１３−２では、矢印で示すように、並列処理が不可能な処理が４回発生している。一方、図１３−３、１３−４は、実施の形態１のノード順序付け方法の結果を示しており、並列処理が不可能な処理が２回となっており、効果があることを示している。特に、本実施の形態におけるノード順序付けの基準の一つである、ノード順序付け候補のノードとその相手端ノードが順序付け済みのノードの相手端ノードと一致しないノードである場合に優先してノード順序付けにより、図１３−２に示すような並列処理の阻害要因を解消する効果があることがわかる。 Next, the effect of this node ordering method will be described with reference to FIGS. 13-1 to 13-4. FIG. 13-1 shows a conventional node ordering node number in the system example, and FIG. 13-2 shows a result of triangulation according to the node ordering. FIG. 13-3 shows the result of node ordering based on the node ordering of the present embodiment, and FIG. 13-4 shows the result of triangulation based on the node ordering. FIGS. 13A and 13C represent the structure of the coefficient matrix of the simultaneous linear equations shown in FIG. 5 as a tree, and this tree is composed of nodes and branches connecting the nodes. In FIGS. 13A and 13B, as indicated by arrows, a process that cannot be performed in parallel occurs four times. On the other hand, FIGS. 13-3 and 13-4 show the results of the node ordering method according to the first embodiment, which shows that the processing that cannot be performed in parallel is performed twice and is effective. . In particular, when the node ordering candidate node and its counterpart node are nodes that do not match the counterpart node of the ordered node, which is one of the node ordering criteria in the present embodiment, the node ordering has priority. It can be seen that there is an effect of eliminating the obstacle factor of the parallel processing as shown in FIG.

このように、本実施の形態における連立一次方程式の並列計算方式のノード順序付け方法は、ｆｉｌｌ−ｉｎが最小なノードからノード順序付けを行うが、最初のノードは、ｆｉｌｌ−ｉｎが最小なノードの中から任意に選択し、以降の並列処理対象のノードについては、当該ノードとその相手端ノードが順序付け済みのノードの相手端ノードと一致しないノードである場合に優先してノード順序付けを行う。そして、優先して順序付けるノードが無くなれば、残るノードをｆｉｌｌ−ｉｎが最小なノードからノード順序付けを行うようにしたものである。本実施の形態によれば、前進消去・後退代入処理における並列処理の実行効率を向上させて、処理性能を高速化できるという効果があり、大規模系統の回路網方程式（連立一次方程式）を実時間性能で実行することが可能となる。 As described above, the node ordering method in the parallel calculation method of simultaneous linear equations according to the present embodiment performs node ordering from the node having the smallest fill-in. The first node is a node in the node having the smallest fill-in. As for nodes to be processed in parallel thereafter, node ordering is performed preferentially when the node and its counterpart node are nodes that do not match the counterpart node of the ordered node. Then, when there is no node to be prioritized, the remaining nodes are ordered from the node with the smallest fill-in. According to the present embodiment, there is an effect that the execution efficiency of parallel processing in forward erasure / backward substitution processing can be improved and the processing performance can be increased, and a network equation (simultaneous linear equation) of a large-scale system can be realized. It is possible to execute with time performance.

実施の形態２．
図１１−１は、実施の形態２のノード順序付け方法を示すフロー図であり、図１１−２は、図１１−１に続くフロー図である。すなわち、本実施の形態でも実施の形態１と同様のノード順序付け方法に従う。しかしながら、実施の形態１では、ループ状系統部分のノード順序付けにおける新規非零要素発生のシミュレーションを並列処理しているが、実施の形態２では、この新規非零要素発生のシミュレーションを並列処理せず、１台のＣＰＵで処理するようにしたものである。このようにすれば、処理性能は高速化できないが、ノード順序付けの最適化は実現可能であり、ＣＰＵ台数を少なくして訓練シミュレータを安価に構成できるという効果がある。 Embodiment 2. FIG.
FIG. 11A is a flowchart illustrating the node ordering method according to the second embodiment, and FIG. 11B is a flowchart subsequent to FIG. That is, this embodiment also follows the same node ordering method as in the first embodiment. However, in the first embodiment, the simulation of new non-zero element generation in the node ordering of the loop-like system portion is processed in parallel. However, in the second embodiment, this new non-zero element generation simulation is not processed in parallel. Processing is performed by one CPU. In this way, the processing performance cannot be increased, but optimization of node ordering can be realized, and there is an effect that the number of CPUs can be reduced and the training simulator can be configured at low cost.

実施の形態３．
図１４−１および図１４−２は、実施の形態３のノード順序付け方法を示すフロー図である。なお、図１４−２は、図１４−１に続く処理を示しており、‘２’で示す箇所において処理が接続される。図１４−１において、まず初期化処理（１）（ステップＳ９０）により、フラグ類の初期設定を行い、次に各ノードに接続する相手端ノード番号、相手端ノード数等のノードテーブル作成（ステップＳ９１）を行い、相手端ノード数の少ない順番にノードテーブル並替え（ステップＳ９２）を行う。相手端ノード数が同一の場合は、先に処理したノード順に並び替えを行う。 Embodiment 3 FIG.
14A and 14B are flowcharts illustrating the node ordering method according to the third embodiment. FIG. 14-2 shows the processing following FIG. 14-1, and the processing is connected at the location indicated by “2”. In FIG. 14A, first, initialization processing (1) (step S90) is performed to initially set flags, and then a node table such as the partner node number and the partner node number connected to each node is created (step S91) is performed, and the node table rearrangement is performed in the order of the smaller number of partner end nodes (step S92). If the number of counterpart end nodes is the same, rearrangement is performed in the order of the previously processed nodes.

前進消去、後退代入処理を並列処理する場合、各ＣＰＵが１ノードの処理を行うため、並列処理を行うＣＰＵ数の要素を一度に扱うことになり、それらの要素処理単位でノード順序付けを行う必要がある。このため、まずステップＳ９３で、同時に順序付けを行うノード数が並列ＣＰＵ数以下か否かの判定を行い、Ｎｏの場合は初期化処理（２）（ステップＳ９４）により、要素数カウントの初期化を行う。ステップＳ９３での判定結果がＹｅｓの場合は、何も処理せず、ステップＳ９５に分岐する。 When forward erasure and backward substitution processing are performed in parallel, each CPU performs processing for one node, so the number of CPUs that perform parallel processing must be handled at one time, and node ordering is required for each element processing unit. There is. For this reason, first, in step S93, it is determined whether or not the number of nodes to be simultaneously ordered is equal to or less than the number of parallel CPUs. If No, initialization of the element count is performed by the initialization process (2) (step S94). Do. If the determination result in step S93 is Yes, nothing is processed and the process branches to step S95.

ステップＳ９５では、順序付け対象のノードｉが放射状系統か否かの判定を行い、Ｙｅｓであれば、ステップＳ９６で、ノードｉは並列処理可能か否かの判定を行う。ステップＳ９６での判定結果がＹｅｓの場合、ノードｉを順序付け（ステップＳ９７）を行い、各種配列のカウントアップ（１）（ステップＳ９８）を行い、ステップＳ９３に分岐する。
ステップＳ９６での判定結果がＮｏの場合は、この処理では当該ノードｉを順序付け除外対象（ステップＳ９９）とし、各種配列のカウントアップ（２）（ステップＳ１００）を行い、ステップＳ９３に分岐する。 In Step S95, it is determined whether or not the node i to be ordered is a radial system. If Yes, in Step S96, it is determined whether or not the node i can be processed in parallel. If the determination result in step S96 is Yes, node i is ordered (step S97), the various arrays are counted up (1) (step S98), and the process branches to step S93.
If the determination result in step S96 is No, in this process, the node i is excluded from ordering (step S99), the various arrays are counted up (2) (step S100), and the process branches to step S93.

ステップＳ９５でＮｏの場合は、除外対象ノードとした放射状系統部分が残っているかどうかを確認するために、順序付けが未実施のノードについて、相手端ノード数が少ない順番にノードテーブル並替え（ステップＳ１０１）を行い、初期化処理（３）（ステップＳ１０２）を行う。次に、ノードテーブル並替え後のノードについて、ノードｉが放射状系統部分か否かの判定（ステップＳ１０３）を行い、Ｙｅｓの場合は、さらに放射状系統部分が全て除外対象であるか否かの判定（ステップＳ１０４）を行い、Ｎｏの場合、ステップＳ９６に分岐する。ステップＳ１０４でＹｅｓの場合は、全除外対象ノードを順番に順序付け（ステップＳ１０５）を行い、図１４−２に示す処理に進む。また、ステップＳ１０３でＮｏの場合は、図１４−２示す処理に進む。 In the case of No in step S95, the node table is rearranged in the order of the smaller number of counterpart end nodes for the nodes that have not been ordered in order to check whether or not the radial system portion as the exclusion target node remains (step S101). ) To perform initialization processing (3) (step S102). Next, with respect to the node after the node table rearrangement, it is determined whether or not the node i is a radial system part (step S103). In the case of Yes, it is further determined whether or not all the radial system parts are to be excluded. (Step S104) is performed. If No, the process branches to Step S96. In the case of Yes in step S104, all exclusion target nodes are ordered in order (step S105), and the process proceeds to the process illustrated in FIG. Further, in the case of No in step S103, the process proceeds to the process illustrated in FIG.

上記までの処理で、放射状系統部分のノード順序付けが終了したので、以降はループ状系統部分のノード順序付けを行う。ループ状系統部分についても、放射状系統部分と同様の考え方でノード順序付けを行う。 Since the node ordering of the radial system portion has been completed by the above processing, the node ordering of the loop system portion is performed thereafter. Node ordering is also performed for the loop system portion in the same way as the radial system portion.

図１４−２に示すように、まず、ステップＳ１０６で、各種フラグ等の初期化処理（４）を行い、次に、残りノード全てについて、ノード接続数の少ない順番にノードテーブルの並び替え（ステップＳ１０７）を行い、ノード接続数が少ない順番にノードを並替える。次に、順序付けノード数が並列ＣＰＵ数以上か否かの判定（ステップＳ１０８）を行い、Ｙｅｓの場合は、順序付け配列数等の初期化処理（２）（ステップＳ１０９）を行い、ステップＳ１１０に分岐する。ステップ１０８でＮｏの場合は、何も処理せずにステップＳ１１０に分岐する。ステップＳ１１０では、ノードｉが並列処理可能か否かの判定を行い、Ｙｅｓの場合、ノードｉを順序付け（ステップＳ１１１）をし、各種配列のカウントアップ（３）（ステップＳ１１２）を行い、次に、ステップＳ１１３で全ノードの順序付けが終了したかの判定を行う。ステップＳ１１３でＮｏの場合は、各種配列のカウントアップ（４）（ステップＳ１１４）を行い、ステップＳ１０７に分岐する。ステップＳ１１３でＹｅｓの場合は、処理を終了する。 As shown in FIG. 14B, first, in step S106, initialization processing (4) of various flags and the like is performed, and then the node table is rearranged in order of decreasing number of node connections for all remaining nodes (step S107), the nodes are rearranged in the order of the smaller number of node connections. Next, it is determined whether or not the number of ordered nodes is equal to or greater than the number of parallel CPUs (step S108). If Yes, initialization processing (2) (step S109) such as the number of ordered arrays is performed, and the process branches to step S110. To do. If No in step 108, no processing is performed and the process branches to step S110. In step S110, it is determined whether or not node i can be processed in parallel. If yes, node i is ordered (step S111), various arrays are counted up (3) (step S112), and then In step S113, it is determined whether the ordering of all nodes has been completed. In the case of No in step S113, the various arrays are counted up (4) (step S114), and the process branches to step S107. If Yes in step S113, the process ends.

ステップＳ１１０でＮｏの場合は、この処理ではノードｉを除外対象（ステップＳ１１５）とし、各種配列のカウントアップ（５）（ステップＳ１１６）を行い、続いて、残りのノードが全て除外対象か否かの判定（ステップＳ１１７）を行い、Ｎｏの場合は、各種配列のカウントアップ（６）（ステップＳ１１８）を行い、ステップＳ１０８に分岐する。ステップＳ１１７でＹｅｓの場合は、残りのノードについて、ノード接続数が少ない順番にノードテーブルの並替え（ステップＳ１１９）を行い、そして、それが最小のノードｉの順序付け（ステップＳ１２０）を行い、全て終了すれば処理を終了する。 In the case of No in step S110, in this process, node i is excluded (step S115), various arrays are counted up (5) (step S116), and then whether all the remaining nodes are excluded or not. (No in step S117), and if No, various sequences are counted up (6) (step S118), and the process branches to step S108. In the case of Yes in step S117, the node table is rearranged in the order of the smaller number of node connections (step S119) for the remaining nodes, and the order of the smallest node i is performed (step S120). If completed, the process is terminated.

実施の形態１では、ループ状系統部分のノード順序付けにおいて、新規非零要素発生数のシミュレーションを並列処理するようにしたが、この実施の形態３では、ループ状系統部分のノード順序付けにおいて、新規非零要素発生のシミュレーションの替わりに、放射状系統部分と同様に、あるノードに接続する相手端ノード数が少ない順番にノード順序付け候補とするようにしたものである。このようにすれば、ノード順序付けの最適化の度合いは実施の形態１よりは低下するが、ノード順序付けの処理を１台のＣＰＵでも高速に処理できるため、訓練シミュレータにおける訓練実行段階でも系統構成状態の変化が発生した場合に、再度、ノード順序付けを高速に実行できるため、訓練実行段階における回路網計算を準最適化できる。そのため、連立一次方程式の求解を高速に処理することが可能になると共に、安価に訓練シミュレータを構成できるという効果がある。また、給電自動化システム等のオンライン用途のシステムにも適用可能になるという効果がある。 In the first embodiment, the simulation of the number of new non-zero elements is processed in parallel in the node ordering of the loop-like system portion. In this third embodiment, the new non-zero element ordering in the node order of the loop-like system portion is performed. Instead of the zero element generation simulation, the node ordering candidates are set in the order of decreasing number of counterpart nodes connected to a certain node as in the radial system portion. In this way, the degree of optimization of the node ordering is lower than that in the first embodiment, but the node ordering process can be performed at high speed even with one CPU, so that the system configuration state can be achieved even at the training execution stage in the training simulator. When the change occurs, the node ordering can be executed again at high speed, so that the network calculation in the training execution stage can be sub-optimized. Therefore, it is possible to process the simultaneous linear equations at high speed and to construct a training simulator at a low cost. Moreover, there is an effect that it can be applied to a system for online use such as an automatic feeding system.

実施の形態４．
本実施の形態にかかる連立一次方程式の並列求解方法について説明する。なお、ノード順序付けは、実施の形態１〜３のいずれか１つのノード順序付け方法を利用して行い、また、係数行列の三角分解は非特許文献２の方法により実施するものとし、その三角分解結果を用いて、前進消去処理および後退代入処理を行うものとする。図１５−１および図１５−２は、本実施の形態にかかる連立一次方程式の並列求解方法を示すフロー図であり、実施の形態１、２、または３のノード順序付け方法を適用した前進消去、後退代入過程の複数ＣＰＵによる並列処理を示すフロー図である。なお、図１５−２は、図１５−１に続く処理の流れを示し、‘３’で示す箇所で接続されている。 Embodiment 4 FIG.
A parallel solving method for simultaneous linear equations according to this embodiment will be described. Note that the node ordering is performed using any one of the node ordering methods of the first to third embodiments, and the triangulation of the coefficient matrix is performed by the method of Non-Patent Document 2, and the result of the triangulation The forward erasure process and the backward substitution process are performed using. FIGS. 15-1 and 15-2 are flowcharts showing the parallel solving method of simultaneous linear equations according to the present embodiment, and forward erasure to which the node ordering method of the first, second, or third embodiment is applied; It is a flowchart which shows the parallel processing by multiple CPU of a backward substitution process. FIG. 15-2 shows the flow of processing following FIG. 15-1 and is connected at the locations indicated by “3”.

図１５−１において、まず、前進消去過程の並列処理を行うが、最初に、ＣＰＵ１か否かの判定（ステップＳ１２１）を行い、Ｙｅｓの場合は、フラグ類の初期化処理（ステップＳ１２２）を行い、次に、各ＣＰＵが並列処理できるか否かを示す並列処理フラグの作成（ステップＳ１２３）を行い、ステップＳ１２４に進む。ステップＳ１２１での判定結果がＮｏの場合は、ステップＳ１２４に分岐する。ステップＳ１２４では、複数個のＣＰＵが並列処理を行う処理回数Ｌ、処理順序ｋの設定を行い、続いて、処理対象ｂｊ要素の選定（ステップＳ１２５）を行う。ここで、ｂｊは、係数ベクトルｂのｊ番目の成分を表す。次に、ｂｊ要素が並列処理可能か否かの判定（ステップＳ１２６）を行い、Ｙｅｓの場合は、ステップＳ１２８に分岐する。ステップＳ１２６での判定結果がＮｏの場合は、ｂｊ要素の処理に必要な、関連する要素である関連ｂｉ要素が処理済みか否かの判定（ステップＳ１２７）を行い、ステップＳ１２７での判定結果がＮｏの場合は継続してチェックを行い、ステップＳ１２７での判定結果がＹｅｓの場合はステップＳ１２８に進む。ステップＳ１２８では、前進消去過程の計算を行い、続いて、ＣＰＵ処理フラグの設定（ステップＳ１２９）をし、前進消去過程終了の判定（ステップＳ１３０）を行う。ステップＳ１３０での判定結果がＮｏの場合はＣＰＵ処理フラグをゼロに設定（ステップＳ１３１）し、ステップＳ１２４に分岐する。ステップＳ１３０での判定結果がＹｅｓの場合は、前進消去過程の処理を終了し、後退代入過程１の処理に移る。 15A, first, parallel processing in the forward erasure process is performed. First, it is determined whether or not the CPU 1 is used (step S121). If Yes, flag initialization processing (step S122) is performed. Next, a parallel processing flag indicating whether each CPU can perform parallel processing is created (step S123), and the process proceeds to step S124. If the determination result in step S121 is No, the process branches to step S124. In step S124, the number of processing times L and the processing order k in which a plurality of CPUs perform parallel processing are set, and then a processing target bj element is selected (step S125). Here, bj represents the jth component of the coefficient vector b. Next, it is determined whether or not the bj element can be processed in parallel (step S126). If Yes, the process branches to step S128. If the determination result in step S126 is No, it is determined whether or not the related bi element, which is a related element, necessary for processing the bj element has been processed (step S127), and the determination result in step S127 is If No, the check is continued, and if the determination result in Step S127 is Yes, the process proceeds to Step S128. In step S128, the forward erase process is calculated, then the CPU processing flag is set (step S129), and the end of the forward erase process is determined (step S130). If the determination result in step S130 is No, the CPU processing flag is set to zero (step S131), and the process branches to step S124. If the determination result in step S130 is Yes, the process of the forward erasure process is terminated and the process proceeds to the process of the backward substitution process 1.

図１５−２に示すように、後退代入過程１では、まず、ＣＰＵ１か否かの判定（ステップＳ１３２）を行い、Ｙｅｓの場合は、フラグ類の初期化処理（ステップＳ１３３）を行い、Ｎｏの場合はスキップする。次に、処理対象ｘｉ群の設定（ステップＳ１３４）を行い、後退代入過程１の計算（ステップＳ１３５）を実行し、計算終了後に、ＣＰＵ処理フラグの設定（ステップＳ１３６）を行い、続いて、後退代入過程１終了の判定（ステップＳ１３７）を行う。ここで、ｘｉは解ｘのｉ番目の成分を表し、また、ステップＳ１３５において、ａｉｉは係数行列のｉ番目の対角成分、ｂｉは係数ベクトルのｉ番目の成分、
Ｃ（ｉ）＝１／ａｉｉである（図６を参照）。ステップＳ１３７での判定結果がＮｏの場合は継続してチェックを行うが、Ｙｅｓの場合は後退代入過程１を終了し、後退代入過程２に移る。 As shown in FIG. 15B, in the backward substitution process 1, first, it is determined whether or not the CPU 1 is used (step S132). If Yes, flags are initialized (step S133). If you skip. Next, the processing target xi group is set (step S134), the calculation of the backward substitution process 1 (step S135) is executed, and after completion of the calculation, the CPU processing flag is set (step S136). The end of substitution process 1 is determined (step S137). Here, xi represents the i-th component of the solution x, and in step S135, aii is the i-th diagonal component of the coefficient matrix, bi is the i-th component of the coefficient vector,
C (i) = 1 / aii (see FIG. 6). If the determination result in Step S137 is No, the check is continued. If Yes, the backward substitution process 1 is terminated and the backward substitution process 2 is performed.

後退代入過程２では、まず、ＣＰＵ１か否かの判定（ステップＳ１３８）を行い、Ｙｅｓの場合にはフラグ類の初期化処理（ステップＳ１３９）を行い、次に、各ＣＰＵが並列処理できるか否かを示す並列処理フラグの作成（ステップＳ１４０）を行うが、Ｎｏの場合、ステップＳ１４１に分岐する。ステップＳ１４１では、処理回数Ｌ、処理順序ｋの設定を行い、処理対象ｘｉ要素の選定（ステップＳ１４２）をし、次に、ｘｉ要素が並列処理可能か否かの判定（ステップＳ１４３）を行う。ステップＳ１４３での判定結果がＮｏの場合は、ｘｉ要素の処理に必要な、関連する要素である関連ｘｊ要素が処理済みか否かの判定（ステップＳ１４４）を行い、ステップＳ１４４での判定結果がＮｏの場合は継続してチェックを行い、ステップＳ１４４での判定結果がＹｅｓの場合はステップＳ１４５に進む。ステップＳ１４３での判定結果がＹｅｓの場合は、ステップＳ１４５に分岐する。 In the backward substitution process 2, first, it is determined whether or not the CPU 1 is used (step S138). If Yes, flags are initialized (step S139). Next, whether or not each CPU can perform parallel processing is determined. A parallel processing flag indicating whether or not is created (step S140), but if No, the process branches to step S141. In step S141, the processing count L and the processing order k are set, the processing target xi element is selected (step S142), and then it is determined whether the xi element can be processed in parallel (step S143). If the determination result in step S143 is No, it is determined whether or not the related xj element, which is a related element, necessary for processing the xi element has been processed (step S144), and the determination result in step S144 is In the case of No, the check is continuously performed, and in the case where the determination result in Step S144 is Yes, the process proceeds to Step S145. If the determination result in step S143 is Yes, the process branches to step S145.

ステップＳ１４５では、後退代入過程２の計算を行い、次に、ＣＰＵ処理フラグの設定（ステップＳ１４６）をし、最後に、後退代入過程２終了の判定（ステップＳ１４７）を行い、Ｎｏの場合はＣＰＵ処理フラグにゼロを設定（ステップＳ１４８）し、ステップＳ１４１に分岐する。ステップＳ１４７での判定結果がＹｅｓの場合は、後退代入過程２の処理を終了する。 In step S145, the backward substitution process 2 is calculated, then the CPU processing flag is set (step S146), and finally the completion of the backward substitution process 2 is determined (step S147). A processing flag is set to zero (step S148), and the process branches to step S141. If the determination result in step S147 is Yes, the process of the backward substitution process 2 ends.

図１６−１および図１６−２は、本実施の形態の前進消去過程における並列処理フラグ作成の処理を示すフロー図である。なお、図１６−２は、図１６−１に続くフロー図であり、‘４’で示す箇所で接続されている。図１６−１では、まず、並列処理フラグ作成対象の設定（ステップＳ１５０）を行い、続いて、ＣＰＵ１か否かの判定（ステップＳ１５１）を行い、Ｙｅｓの場合はＣＰＵ実行フラグ、ＣＰＵ参照フラグ、ＣＰＵ処理フラグ、並列処理フラグ作成の終了フラグ等、フラグ類の初期化処理（ステップＳ１５２）を行い、Ｎｏの場合はステップＳ１５３に分岐する。次に、処理要素の配列ｉと処理ＣＰＵ番号ｋｍ、ｋｎの初期値設定（ステップＳ１５３）を行う。次に、図１６−２では、前進消去の並列処理対象要素の計算（ステップＳ１５４）を行い、以降、各要素について並列処理が可能か否かの判定を行い、可能な場合はＣＰＵ実行フラグにＣＰＵ番号を設定し、不可能な場合は、処理終了を参照すべきＣＰＵ番号をＣＰＵ参照フラグに設定して、並列処理フラグの作成を行う。 FIGS. 16A and 16B are flowcharts illustrating the parallel processing flag creation process in the forward erasure process of the present embodiment. FIG. 16B is a flowchart subsequent to FIG. 16A and is connected at a position indicated by “4”. In FIG. 16A, first, a parallel processing flag creation target is set (step S150), and then it is determined whether or not the CPU is 1 (step S151). If yes, a CPU execution flag, a CPU reference flag, Initialization processing (step S152) of flags, such as a CPU processing flag and a parallel processing flag creation end flag, is performed. If No, the process branches to step S153. Next, initial values of the processing element array i and the processing CPU numbers km and kn are set (step S153). Next, in FIG. 16B, calculation of elements for parallel processing for forward erasure (step S154) is performed, and thereafter, it is determined whether or not parallel processing is possible for each element. If possible, the CPU execution flag is set. A CPU number is set, and if it is not possible, a CPU number that should refer to the end of processing is set as a CPU reference flag, and a parallel processing flag is created.

まず、選定した要素の配列Ｑが既に選定した要素の配列Ｐと一致するか否かのチェックを行う。一致する場合は、並列処理が不可能であり、一致しない場合は、並列処理が可能と判定する。この処理は、ｍとｎのループから構成するが、ｍのループでは、並列処理対象要素を全て含み、ステップＳ１５５で、ｍのループにおける処理ＣＰＵ番号をカウントアップする。ｎのループは最初の要素は必ず含むが、それ以降の要素は、ｍ−１の要素までとする。これは、例えば、２番目の要素は１番目の要素と、３番目の要素は１番目、２番目の要素と、４番目の要素は１番目、２番目、３番目の要素と比較するためである。 First, it is checked whether or not the selected element array Q matches the already selected element array P. If they match, parallel processing is impossible, and if they do not match, it is determined that parallel processing is possible. This process is composed of a loop of m and n. In the loop of m, all the parallel processing target elements are included, and in step S155, the processing CPU number in the loop of m is counted up. The loop of n always includes the first element, but the subsequent elements are limited to m−1 elements. This is because, for example, the 2nd element is compared with the 1st element, the 3rd element is the 1st, 2nd element, the 4th element is compared with the 1st, 2nd, 3rd element. is there.

まず、ステップＳ１５６で、ｎのループにおける処理ＣＰＵ番号をカウントアップする。ｋｎは先行して選定された要素を処理するＣＰＵ番号であり、ｋｍは先行して選定された要素に対し並列処理を行うＣＰＵ番号である。 First, in step S156, the processing CPU number in the loop of n is counted up. kn is a CPU number that processes an element selected in advance, and km is a CPU number that performs parallel processing on the element selected in advance.

次に、ｋｍが１か否かの判定（ステップＳ１５７）を行い、Ｙｅｓの場合は、並列処理対象の先頭要素であるため、無条件で並列処理可能として、ＣＰＵ実行フラグを設定し、ＣＰＵ参照フラグをクリア（ステップＳ１６２）し、ステップＳ１６１の次に分岐する（ｍのループの更新を行う。）。ステップＳ１５７でＮｏの場合は、Ｑ（ｍ）とＰ（ｎ）が一致するか否かの判定（ステップＳ１５９）を行い、Ｙｅｓの場合は並列処理が不可能であるため、ＣＰＵ参照フラグを設定し、ＣＰＵ実行フラグをクリア（ステップＳ１５９）するが、Ｎｏの場合は何も処理しない。そして、ｎのループが終了した後で、Ｑ（ｍ）と全Ｐ（ｎ）とが不一致か否かの判定（ステップＳ１６０）を行い、Ｙｅｓの場合は並列処理が可能であるため、ＣＰＵ実行フラグを設定し、ＣＰＵ参照フラグをクリア（ステップＳ１６１）する。ステップＳ１６０でＮｏの場合は何も処理しない。 Next, it is determined whether km is 1 or not (step S157). If yes, since it is the first element to be processed in parallel, the CPU execution flag is set so that parallel processing can be performed unconditionally, and the CPU is referred to. The flag is cleared (step S162), and the process branches after step S161 (the m loop is updated). If No in step S157, a determination is made as to whether Q (m) and P (n) match (step S159). If yes, parallel processing is impossible, so a CPU reference flag is set. Then, the CPU execution flag is cleared (step S159), but if No, nothing is processed. Then, after the loop of n is completed, it is determined whether Q (m) and all P (n) do not match (step S160). If Yes, parallel processing is possible, so CPU execution The flag is set and the CPU reference flag is cleared (step S161). If No in step S160, no processing is performed.

次に、さらに並列処理が可能か否かの判定として、先行して選定した要素の配列Ｐとそれ以降に選定した要素の配列Ｐとが一致するか否かのチェックを行う。一致しない場合は、並列処理が可能であるが、一致する場合は並列処理が不可能であるため、それぞれに対応して、ＣＰＵ実行フラグ、ＣＰＵ参照フラグの設定を行う。 Next, as a determination of whether or not parallel processing is possible, it is checked whether or not the array P of elements selected in advance matches the array P of elements selected thereafter. If they do not match, parallel processing is possible, but if they do match, parallel processing is impossible, so the CPU execution flag and CPU reference flag are set corresponding to each.

まず、ｎのループにおける処理ＣＰＵ番号を初期値設定（ステップＳ１６３）する。これらの処理は、ｍとｎのループで構成し、ｍのループについては、先頭の要素は無条件で並列処理可能であるため、２番目の要素から選定する、また、ｎのループについては、先頭の要素からｍ−１の要素までを選定する。これは、２番目の場合は１番目の要素と、３番目の要素の場合は１番目、２番目の要素と、４番目の場合は１番目、２番目、３番目の要素と比較するためである。 First, the processing CPU number in the n loop is set to an initial value (step S163). These processes are composed of m and n loops, and for the m loops, the first element can be processed in parallel unconditionally, so the second element is selected, and for the n loops, Select from the first element to the element of m-1. This is to compare the first element in the second case, the first and second elements in the third element, and the first, second and third elements in the fourth case. is there.

まず、ステップＳ１６４で処理ＣＰＵ番号ｋｎをカウントアップし、Ｐ（ｍ）とＰ（ｎ）とが一致するか否かの判定（ステップＳ１６５）を行い、一致する場合は並列処理が不可能なためＣＰＵ参照フラグを設定し、ＣＰＵ実行フラグをクリア（ステップＳ１６６）し、Ｎｏの場合は何も処理しない。ｍとｎのループの処理が終了すれば、全要素の処理が終了したか否かの判定（ステップＳ１６７）を行い、Ｎｏの場合は、配列ｉを並列処理するＣＰＵ台数分（ＣＰＵｍａｘ）だけ増加（ステップＳ１６８）させ、また、処理ＣＰＵ番号ｋｍ、ｋｎをクリア（ステップＳ１６９）して、ステップＳ１５４に分岐する。Ｙｅｓの場合は自ＣＰＵ処理フラグに、処理を実施したＣＰＵ番号を設定し（ステップＳ１７０）、ＣＰＵ１の判定（ステップＳ１７１）を行い、Ｙｅｓの場合は全ＣＰＵ処理フラグが正か否かの判定（ステップＳ１７２）を行う。ステップＳ１７２で、Ｎｏの場合は継続してチェックを行い、Ｙｅｓの場合は並列処理フラグ作成の終了フラグを設定（ステップＳ１７３）し、ＣＰＵ１の処理を終了する。ステップＳ１７１でＣＰＵ１以外の場合は、並列処理フラグ作成の終了フラグの判定（ステップＳ１７４）を行い、判定結果が、Ｎｏの場合は継続してチェックを行い、Ｙｅｓの場合は処理を終了する。 First, in step S164, the processing CPU number kn is counted up, and it is determined whether P (m) and P (n) match (step S165). If they match, parallel processing is impossible. The CPU reference flag is set, the CPU execution flag is cleared (step S166), and if No, no processing is performed. When the processing of the loop of m and n is completed, it is determined whether or not the processing of all elements has been completed (step S167), and in the case of No, the array i is increased by the number of CPUs (CPUmax) to be processed in parallel. (Step S168), and the processing CPU numbers km and kn are cleared (Step S169), and the process branches to Step S154. In the case of Yes, the CPU number that executed the process is set in the own CPU processing flag (Step S170), and the CPU 1 is determined (Step S171). In the case of Yes, the determination is made as to whether or not all the CPU processing flags are positive ( Step S172) is performed. In step S172, in the case of No, the check is continuously performed, and in the case of Yes, a parallel processing flag creation end flag is set (step S173), and the processing of the CPU 1 is ended. If the CPU is other than CPU1 in step S171, a parallel processing flag creation end flag is determined (step S174). If the determination result is No, the check is continued, and if Yes, the process ends.

図１７−１および図１７−２は、本実施の形態の後退代入過程における並列処理フラグ作成の処理を示すフロー図である。なお、図１７−２は、図１７−１に続くフロー図であり、‘５’で示す箇所で接続されている。前進消去処理では、要素１から最大要素の方向に処理が進むが、後退代入処理の場合は、逆に、最大の要素から要素１の方向に処理が進む。これが両者の最大の差異である。 FIGS. 17A and 17B are flowcharts illustrating the parallel processing flag creation process in the backward substitution process of the present embodiment. FIG. 17-2 is a flowchart subsequent to FIG. 17A, and is connected at a location indicated by “5”. In the forward deletion process, the process proceeds from the element 1 in the direction of the maximum element. In the backward substitution process, the process proceeds in the direction from the maximum element to the element 1. This is the biggest difference between the two.

図１７−１では、まず、並列処理フラグ作成対象の設定（ステップＳ１５０）を行い、続いて、ＣＰＵ１か否かの判定（ステップＳ１５０）を行い、Ｙｅｓの場合はＣＰＵ実行フラグ、ＣＰＵ参照フラグ、ＣＰＵ処理フラグ、並列処理フラグ作成の終了フラグ等、フラグ類の初期化処理（ステップＳ１５２）を行い、Ｎｏの場合はステップ１５３に分岐する。次に、処理要素の配列ｉとＣＰＵ処理番号ｋｍ、ｋｎの初期値設定（ステップＳ１５２）を行う。次に、図１７−２では、後退代入の並列処理対象要素の計算（ステップＳ１５４）を行い、以降、各要素について並列処理が可能か否かの判定を行い、可能な場合はＣＰＵ実行フラグにＣＰＵ番号を設定し、不可能な場合は、処理終了を参照するべきＣＰＵ番号をＣＰＵ参照フラグに設定して、並列処理フラグの作成を行う。 In FIG. 17A, first, a parallel processing flag creation target is set (step S150), and then it is determined whether or not the CPU is 1 (step S150). If yes, a CPU execution flag, a CPU reference flag, Initialization processing (step S152) of flags such as a CPU processing flag and a parallel processing flag creation end flag is performed. If No, the process branches to step 153. Next, initial values of processing element array i and CPU processing numbers km and kn are set (step S152). Next, in FIG. 17-2, calculation of elements for parallel processing for backward substitution (step S154) is performed, and thereafter, it is determined whether or not parallel processing is possible for each element. If possible, the CPU execution flag is set. A CPU number is set. If it is not possible, a CPU number to be referred to for the end of processing is set as a CPU reference flag, and a parallel processing flag is created.

まず、選定した要素の配列Ｑが既に選定した要素の配列Ｐと一致するか否かのチェックを行う。一致する場合は、並列処理が不可能であり、一致しない場合は、並列処理が可能と判定する。この処理は、ｍとｎのループから構成するが、ｍのループでは、並列処理対象要素を全て含み、ステップＳ１５５でｍのループにおける処理ＣＰＵ番号をカウントアップする。ｎのループは最初の要素は必ず含むが、それ以降の要素は、ｍ−１の要素までとする。これは、２番目の要素は１番目の要素と、３番目の要素は１番目、２番目の要素と、４番目の要素は１番目、２番目、３番目の要素と比較するためである。 First, it is checked whether or not the selected element array Q matches the already selected element array P. If they match, parallel processing is impossible, and if they do not match, it is determined that parallel processing is possible. This process is composed of a loop of m and n. In the loop of m, all the parallel processing target elements are included, and the processing CPU number in the loop of m is counted up in step S155. The loop of n always includes the first element, but the subsequent elements are limited to m−1 elements. This is because the second element is compared with the first element, the third element is compared with the first element, the second element, and the fourth element is compared with the first, second, and third elements.

次に、ｋｍが１か否かの判定（ステップＳ１５７）を行い、Ｙｅｓの場合は、並列処理対象の先頭要素であるため、無条件で並列処理可能として、ＣＰＵ実行フラグを設定し、ＣＰＵ参照フラグをクリア（ステップＳ１６２）し、ステップＳ１６２の次に分岐する（ｍのループの更新を行う。）。Ｎｏの場合は、ステップＳ１５７でＮｏの場合は、Ｐ（ｍ）とＱ（ｎ）とが一致する否かの判定（ステップＳ１５９）を行い、Ｙｅｓの場合は並列処理が不可能であるため、ＣＰＵ参照フラグを設定し、ＣＰＵ実行フラグをクリア（ステップ１００）するが、Ｎｏの場合は何も処理しない。そして、ｎのループが終了した後で、Ｐ（ｍ）と全Ｑ（ｎ）とが不一致か否かの判定（ステップＳ１６０）を行い、Ｙｅｓの場合は並列処理が可能であるため、ＣＰＵ実行フラグを設定し、ＣＰＵ参照フラグをクリア（ステップＳ１６１）する。ステップＳ１６０でＮｏの場合は何も処理しない。 Next, it is determined whether km is 1 or not (step S157). If yes, since it is the first element to be processed in parallel, the CPU execution flag is set so that parallel processing can be performed unconditionally, and the CPU is referred to. The flag is cleared (step S162), and the process branches after step S162 (m loop is updated). In the case of No, in the case of No in Step S157, it is determined whether P (m) and Q (n) match (Step S159). In the case of Yes, parallel processing is impossible. The CPU reference flag is set and the CPU execution flag is cleared (step 100). If No, nothing is processed. Then, after the loop of n is completed, it is determined whether P (m) and all Q (n) do not match (step S160). If yes, parallel processing is possible, so CPU execution The flag is set and the CPU reference flag is cleared (step S161). If No in step S160, no processing is performed.

次に、さらに並列処理が可能か否かの判定として、先行して選定した要素の配列Ｐとそれ以降に選定した要素の配列Ｐが一致するかのチェックを行う。一致しない場合は、並列処理が可能であるが、一致する場合は並列処理が不可能であるため、それぞれに対応して、ＣＰＵ実行フラグ、ＣＰＵ参照フラグの設定を行う。 Next, as a determination of whether parallel processing is possible, it is checked whether the array P of elements selected in advance matches the array P of elements selected thereafter. If they do not match, parallel processing is possible, but if they do match, parallel processing is impossible, so the CPU execution flag and CPU reference flag are set corresponding to each.

まず、ステップＳ１６４で処理ＣＰＵ番号ｋｎをカウントアップし、Ｐ（ｍ）とＰ（ｎ）が一致するかの判定（ステップＳ１６５）を行い、一致する場合は並列処理が不可能なためＣＰＵ参照フラグを設定し、ＣＰＵ実行フラグをクリア（ステップＳ１６６）し、Ｎｏの場合は何も処理しない。ｍとｎのループの処理が終了すれば、全要素の処理が終了したか否かの判定（ステップＳ１６７）を行い、Ｎｏの場合は、配列ｉを並列処理するＣＰＵ台数分（ＣＰＵｍａｘ）だけ減少（ステップＳ１６８）させ、また、処理ＣＰＵ番号ｋｍ、ｋｎをクリア（ステップＳ１６９）して、ステップＳ１５４に分岐する。Ｙｅｓの場合は自ＣＰＵ処理フラグに自ＣＰＵ番号を設定（ステップＳ１７０）し、ＣＰＵ１か否かの判定（ステップＳ１７１）を行い、Ｙｅｓの場合は全ＣＰＵ処理フラグが正か否かの判定（ステップＳ１７２）を行う。ステップＳ７１２で、Ｎｏの場合は継続してチェックし、Ｙｅｓの場合は並列処理フラグ作成の終了フラグを設定（ステップＳ１７３）し、ＣＰＵ１の処理を終了する。ステップＳ１７１でＮｏの場合は、並列処理フラグ作成の終了フラグが正か否かの判定（ステップＳ１７４）を行い、判定結果が、Ｎｏの場合は継続してチェックを行い、Ｙｅｓの場合は処理を終了する。 First, in step S164, the processing CPU number kn is counted up, and it is determined whether P (m) and P (n) match (step S165). If they match, the CPU reference flag is not available because parallel processing is impossible. Is set, the CPU execution flag is cleared (step S166), and if No, no processing is performed. When the processing of the loop of m and n is completed, it is determined whether or not the processing of all elements has been completed (step S167), and if No, the array i is decreased by the number of CPUs (CPUmax) to be processed in parallel. (Step S168), and the processing CPU numbers km and kn are cleared (Step S169), and the process branches to Step S154. In the case of Yes, the CPU number is set in the CPU processing flag (Step S170), and it is determined whether or not the CPU is 1 (Step S171). In the case of Yes, it is determined whether or not all the CPU processing flags are positive (Step S171). S172) is performed. In step S712, if No, the check is continued, and if Yes, a parallel processing flag creation end flag is set (step S173), and the processing of the CPU 1 is ended. If No in step S171, it is determined whether or not the parallel processing flag creation end flag is positive (step S174). If the determination result is No, the check is continued, and if Yes, the process is performed. finish.

実施の形態１では、前進消去、後退代入処理の並列処理の実行効率を上げるためのノード順序付け方法を説明したが、この実施の形態４では、そのノード順序付けを適用した場合の前進消去、後退代入処理の並列処理の実現方法を説明した。このように、実施の形態１と実施の形態４とを組合せることにより、連立一次方程式求解の並列処理を実現できると共に、高速処理が可能になるという効果がある。 In the first embodiment, the node ordering method for increasing the execution efficiency of the parallel processing of the forward erasure and the backward substitution process has been described. In the fourth embodiment, the forward erasure and backward substitution when the node ordering is applied. A method for realizing parallel processing has been described. Thus, by combining the first embodiment and the fourth embodiment, there is an effect that parallel processing for solving simultaneous linear equations can be realized and high-speed processing can be realized.

次に、動作について説明する。まず、前進消去過程の並列処理について、説明する。複数のＣＰＵで並列処理を行う場合、処理前に、ｂｉ要素の並列処理が可能であるか否かのチェックを行い、並列処理フラグ（ＣＰＵ実行フラグ、ＣＰＵ参照フラグ）を複数ＣＰＵで並列に作成し、各ＣＰＵの処理するｂｉ、ｂｊ要素の決定、１回毎の並列計算ステップにおける各ＣＰＵの処理が終了したことの検出方法、また、全行の処理が終了したことの検出方法を定める必要がある。
（ａ）並列処理フラグ（ＣＰＵ実行フラグ、ＣＰＵ参照フラグ）の作成
（１）各ＣＰＵは、各ＣＰＵが担当する自分の並列処理フラグ作成対象の設定を行う。
（２）ＣＰＵ１は、まず、全ＣＰＵのＣＰＵ実行フラグ、ＣＰＵ参照フラグを「０」で初期化する。
（３）次に、各ＣＰＵは、当該並列処理における並列処理対象要素について、後段のＱ（ｍ）と前段の全Ｐ（ｎ）の値とが一致するか否かチェックする。一致する場合は、並列処理が不可になるのでＣＰＵ参照フラグを設定し、全て一致しない場合は並列処理が可能になるので、ＣＰＵ実行フラグを設定する。
・並列処理対象の最初の要素は、無条件で並列処理可能とし、当該ＣＰＵのＣＰＵ実行フラグに当該ＣＰＵ番号を設定する。
・後段の要素と先行する全要素のチェックの結果、一致する場合は、当該ＣＰＵのＣＰＵ参照フラグに、参照すべき一致した要素のＰ（ｎ）を担当するＣＰＵ番号を設定する。
・後段の要素と先行する全要素のチェックの結果、全て一致しない場合は、当該要素Ｑ（ｍ）を担当するＣＰＵのＣＰＵ実行フラグに当該ＣＰＵ番号を設定する。
（４）次に、各ＣＰＵは、当該並列処理における並列処理対象要素について、後段のＰ（ｍ）と前段の全Ｐ（ｎ）の値とが一致するか否かチェックする。一致する場合は、並列処理が不可になるのでＣＰＵ参照フラグを設定する。全て一致しない場合は何も処理しない。
・一致する場合は、当該ＣＰＵのＣＰＵ参照フラグに、参照すべき一致した要素のＰ（ｎ）を担当するＣＰＵ番号を設定する。
・一致しない場合は、何もしない。
（５）各ＣＰＵは、自分の処理が終了すれば、自ＣＰＵ処理フラグに自分のＣＰＵ番号を設定する。
（６）ＣＰＵ１は、全てのＣＰＵの処理が終了したか否かをＣＰＵ終了フラグでチェックし、終了していれば、並列処理フラグ作成の終了フラグを設定し、処理を終了する。
（７）ＣＰＵ１以外のＣＰＵも、並列処理フラグ作成の終了フラグが設定されている場合は、処理を終了する。
（ｂ）処理対象ｂｉ、ｂｊ要素の決定
（１）各ＣＰＵが処理するｂｉ、ｂｊ要素を下記の（４）式により算出する。また、図１８は、前進消去過程の並列処理において、４台のＣＰＵに処理を割り振り各種フラグで管理する様子を示す図である。
ｋ＝（Ｌ−１）＊ＣＰＵｍａｘ＋ＣＰＵｉ・・・（４）
ｊ＝Ｐ（ｋ）
ｉ＝Ｑ（ｋ）
ただし、
Ｌ：各ＣＰＵの計算回数（ＣＰＵ１が初期化。初期値は１）
ＣＰＵｍａｘ：並列処理するＣＰＵ台数
ＣＰＵｉ：ＣＰＵ番号（ＣＰＵ１＝１、・・・、ＣＰＵｎ＝ＣＰＵｍａｘ）
（ｃ）ＣＰＵ処理フラグの管理
ＣＰＵ１が各ＣＰＵの計算回数Ｌの更新を行うためには、各ＣＰＵの処理が終了したことを確認する必要がある。これをＣＰＵ処理フラグと定義する。
（１）初期化
ＣＰＵ１が、全ＣＰＵのＣＰＵ処理フラグを、０で初期化する。
（２）ＣＰＵ処理フラグの設定
各ＣＰＵが、担当の処理が終了した時点で、ＣＰＵ処理フラグに自ＣＰＵ番号を書き込む。
（３）計算回数Ｌのカウントアップ
ＣＰＵ１は、全ＣＰＵのＣＰＵ処理フラグが正になった時点で、Ｌを１だけ、カウントアップし、全ＣＰＵのＣＰＵ処理フラグをリセットする。
（ｄ）前進消去過程終了フラグの管理
前進消去過程の処理終了は、各ＣＰＵが行う。
（１）初期化
ＣＰＵ１が、前進消去過程終了フラグを、０で初期化する。
（２）最終行の検出
各ＣＰＵが処理するｂｊ要素の行番号が、行サイズｎｍａｘと一致した場合、最
終行の処理をしたと判断し、検出したＣＰＵは、前進消去過程終了フラグに自ＣＰＵ番号をセットする。
（３）前進消去過程の終了
各ＣＰＵは、前進消去過程終了フラグが正の場合、前進消去過程の処理が終了したと判断し、次の後退代入過程１の処理に入る。
（ｅ）前進消去過程の並列処理手順
（１）各ＣＰＵが担当する並列処理フラグ作成対象を設定する。
（２）ＣＰＵ１が、各ＣＰＵのＣＰＵ実行フラグ、ＣＰＵ参照フラグ、ＣＰＵ処理フラグ、前進消去過程終了フラグを初期化する。
（３）各ＣＰＵが、全てのＣＰＵ実行フラグ、ＣＰＵ参照フラグを並列に作成する。
（４）各ＣＰＵが、処理対象ｂｉ、ｂｊ要素を決定する。
（５）各ＣＰＵが、ＣＰＵ実行フラグをチェックする。
・自分のＣＰＵ番号と一致する場合は、ｂｊ要素の計算を行い、ＣＰＵ処理フラグをセットする。なお、ｂｉ要素と掛け算となるＤ要素がゼロの場合は、その処理をスキップする。
・自分のＣＰＵ番号と一致しない場合は、ＣＰＵ参照フラグが示すＣＰＵ番号について、そのＣＰＵ処理フラグをチェックする。
（ｉ）ＣＰＵ処理フラグが正の場合は、ｂｊ要素の処理を行い、ＣＰＵ処理フラグをセットする。
（ｉｉ）ＣＰＵ処理フラグが正でない場合は、継続チェックを行う（先行ＣＰＵの計算終了待ち）。
（６）ＣＰＵ１が、全ＣＰＵ処理フラグが正であることを確認し、Ｌを１アップし、
全ＣＰＵのＣＰＵ処理フラグをリセットし、（４）に戻る。
（７）各ＣＰＵは、自分が最終行を処理した場合、前進消去過程終了フラグをセット
する。
（８）各ＣＰＵは、前進消去過程終了フラグが正の場合、前進消去を終了し、次の後
退代入過程１に進む。 Next, the operation will be described. First, the parallel processing of the forward erasure process will be described. When performing parallel processing with multiple CPUs, check whether bi elements can be processed in parallel before processing, and create parallel processing flags (CPU execution flag, CPU reference flag) in parallel with multiple CPUs. In addition, it is necessary to determine the bi and bj elements to be processed by each CPU, and to determine the detection method for the completion of the processing of each CPU in each parallel calculation step, and the detection method for the completion of the processing for all rows. There is.
(A) Creation of parallel processing flag (CPU execution flag, CPU reference flag) (1) Each CPU sets its own parallel processing flag creation target that each CPU is responsible for.
(2) The CPU 1 first initializes the CPU execution flags and CPU reference flags of all the CPUs with “0”.
(3) Next, each CPU checks whether or not the value of Q (m) in the subsequent stage matches the value of all P (n) in the previous stage for the parallel processing target element in the parallel processing. If they match, parallel processing becomes impossible, so a CPU reference flag is set. If not all match, parallel processing is possible, so a CPU execution flag is set.
The first element to be parallel processed can be processed in parallel unconditionally, and the CPU number is set in the CPU execution flag of the CPU.
If the result of checking the subsequent element and all preceding elements match, the CPU number in charge of P (n) of the matched element to be referenced is set in the CPU reference flag of the CPU.
If the result of checking the subsequent element and all preceding elements does not match, the CPU number is set in the CPU execution flag of the CPU in charge of the element Q (m).
(4) Next, each CPU checks whether or not the values of P (m) in the subsequent stage and the values of all P (n) in the previous stage match for the parallel processing target element in the parallel processing. If they match, parallel processing becomes impossible, so a CPU reference flag is set. If all do not match, do nothing.
If they match, the CPU number responsible for P (n) of the matched element to be referenced is set in the CPU reference flag of the CPU.
・ If they do not match, do nothing.
(5) Each CPU sets its own CPU number in its own CPU processing flag when its own processing is completed.
(6) The CPU 1 checks whether or not the processing of all CPUs has been completed using the CPU termination flag, and if completed, sets the parallel processing flag creation termination flag and terminates the processing.
(7) CPUs other than CPU1 also terminate the process if the parallel processing flag creation end flag is set.
(B) Determination of processing target bi and bj elements (1) Bi and bj elements to be processed by each CPU are calculated by the following equation (4). FIG. 18 is a diagram showing a state in which processing is allocated to four CPUs and managed by various flags in parallel processing in the forward erasure process.
k = (L−1) * CPUmax + CPUi (4)
j = P (k)
i = Q (k)
However,
L: Number of calculations for each CPU (CPU1 is initialized. Initial value is 1)
CPUmax: number of CPUs to be processed in parallel CPUi: CPU number (CPU1 = 1,..., CPUn = CPUmax)
(C) Management of CPU processing flag In order for the CPU 1 to update the calculation count L of each CPU, it is necessary to confirm that the processing of each CPU has been completed. This is defined as a CPU processing flag.
(1) Initialization The CPU 1 initializes CPU processing flags of all CPUs with 0.
(2) Setting of CPU processing flag Each CPU writes its own CPU number in the CPU processing flag at the time when the assigned process is completed.
(3) Counting up the number of calculations L When the CPU processing flags of all CPUs become positive, the CPU 1 counts up L by 1, and resets the CPU processing flags of all CPUs.
(D) Management of forward erasure process end flag Each CPU performs the end of the forward erasure process.
(1) Initialization The CPU 1 initializes the forward erasure process end flag with 0.
(2) Detection of the last row If the row number of the bj element processed by each CPU matches the row size nmax, it is determined that the last row has been processed, and the detected CPU sets its own CPU to the forward erase process end flag. Set the number.
(3) End of forward erase process When each forward erase process end flag is positive, each CPU determines that the process of the forward erase process has ended, and enters the process of the next backward substitution process 1.
(E) Parallel processing procedure in the forward erasure process (1) A parallel processing flag creation target for each CPU is set.
(2) The CPU 1 initializes the CPU execution flag, CPU reference flag, CPU processing flag, and forward erasure process end flag of each CPU.
(3) Each CPU creates all CPU execution flags and CPU reference flags in parallel.
(4) Each CPU determines processing target bi and bj elements.
(5) Each CPU checks the CPU execution flag.
If it matches the own CPU number, the bj element is calculated and the CPU processing flag is set. If the D element multiplied by the bi element is zero, the process is skipped.
If the CPU number does not match, check the CPU processing flag for the CPU number indicated by the CPU reference flag.
(I) If the CPU processing flag is positive, the bj element is processed and the CPU processing flag is set.
(Ii) If the CPU processing flag is not positive, a continuation check is performed (waiting for the calculation of the preceding CPU to finish).
(6) The CPU 1 confirms that all CPU processing flags are positive, increases L by 1,
The CPU processing flags of all the CPUs are reset, and the process returns to (4).
(7) Each CPU sets a forward erasure process end flag when it processes the last line.
(8) If the forward erasure process end flag is positive, each CPU ends forward erasure and proceeds to the next backward substitution process 1.

次に、後退代入過程１の並列処理について、説明する。後退代入過程１の処理は、各ＣＰＵが全処理を分担して、並列処理することができる。複数のＣＰＵで並列処理を行う場合、各ＣＰＵの処理するｘｊ要素の決定、各ＣＰＵの処理が終了したことの検出方法、また、全行の処理が終了したことの検出方法を決める必要がある。
（ａ）処理対象ｘｉの決定
（１）各ＣＰＵが処理するｘｉ要素を下記の式により算出する。
・各ＣＰＵの平均処理量
ｋ０＝（ｎｍａｘ／ＣＰＵｍａｘ）の整数値
・余り
ｋ１＝ｎｍａｘ−ｋ０・ＣＰＵｍａｘ
・ＣＰＵ１の処理量
ｉ＝１〜ｋ０＋ｋ１
・ＣＰＵ２〜ＣＰＵｍａｘの処理量
ｉ＝（ｋ０＋ｋ１）＋（ＣＰＵｉ−２）＊ｋ０＋１
〜（ｋ０＋ｋ１）＋（ＣＰＵｉ−１）＊ｋ０
（ｂ）ＣＰＵ処理フラグ、後退代入過程１終了フラグの管理
後退代入過程１の処理が終了したことをチェックするためには、各ＣＰＵの処理が終了したことを確認する必要がある。また、図１９は、後退代入過程１の並列処理において、４台のＣＰＵに処理を割り振りＣＰＵ処理フラグで管理する様子を示す図である。
（１）初期化
ＣＰＵ１が、全ＣＰＵのＣＰＵ処理フラグ、後退代入過程１終了フラグを、０で初期化する。
（２）ＣＰＵ処理フラグの設定
各ＣＰＵが、担当の処理が終了した時点で、ＣＰＵ処理フラグに自ＣＰＵ番号を書き込む。
（３）後退代入過程１終了フラグの設定
ＣＰＵ１は、自分の処理が終了した時点に、他ＣＰＵのＣＰＵ処理フラグが正になったことを検出し、後退代入過程１終了フラグをセットする。
（ｃ）後退代入過程１の並列処理手順
（１）初期化
ＣＰＵ１が、全ＣＰＵのＣＰＵ処理フラグ、後退代入過程１終了フラグを、０で初期化する。
（２）処理対象ｘｉの決定
各ＣＰＵが、自分が担当する処理対象ｘｉを計算する。
（３）ＣＰＵ処理フラグの設定
各ＣＰＵが、担当の処理が終了した時点で、ＣＰＵ処理フラグに自ＣＰＵ番号を書き込む。
（４）後退代入過程１終了フラグの設定
ＣＰＵ１は、自分の処理が終了した時点で、他ＣＰＵのＣＰＵ処理フラグが正になっていることを検出し、後退代入過程１終了フラグをセットする。
（５）各ＣＰＵは、後退代入過程１終了フラグが正の場合、後退代入過程１の処理が終了したことを認識し、次の後退代入過程２の処理に進む。 Next, parallel processing in the backward substitution process 1 will be described. The processing of the backward substitution process 1 can be performed in parallel with each CPU sharing all the processing. When parallel processing is performed by a plurality of CPUs, it is necessary to determine an xj element to be processed by each CPU, a detection method that the processing of each CPU is completed, and a detection method that the processing of all rows is completed. .
(A) Determination of processing target xi (1) The xi element which each CPU processes is calculated by the following formula.
-Average processing amount of each CPU k0 = integer value of (nmax / CPUmax)-Remainder k1 = nmax-k0-CPUmax
CPU 1 processing amount i = 1 to k0 + k1
Processing amount of CPU2 to CPUmax i = (k0 + k1) + (CPUi−2) * k0 + 1
~ (K0 + k1) + (CPUi-1) * k0
(B) Management of CPU processing flag and backward substitution process 1 end flag In order to check that the processing of backward substitution process 1 has been completed, it is necessary to confirm that the processing of each CPU has been completed. FIG. 19 is a diagram showing a state in which processing is allocated to four CPUs and managed by the CPU processing flag in the parallel processing of the backward substitution process 1.
(1) Initialization The CPU 1 initializes the CPU processing flag and the backward substitution process 1 end flag of all CPUs with 0.
(2) Setting of CPU processing flag Each CPU writes its own CPU number in the CPU processing flag at the time when the assigned process is completed.
(3) Setting of the backward substitution process 1 end flag The CPU 1 detects that the CPU processing flag of the other CPU has become positive at the time when its own processing is finished, and sets the backward substitution process 1 end flag.
(C) Parallel processing procedure of backward substitution process 1 (1) Initialization CPU 1 initializes the CPU processing flag and backward substitution process 1 end flag of all CPUs with 0.
(2) Determination of processing target xi Each CPU calculates the processing target xi that it is in charge of.
(3) Setting of CPU processing flag Each CPU writes its own CPU number in the CPU processing flag at the time when the assigned process is completed.
(4) Setting of the backward substitution process 1 end flag The CPU 1 detects that the CPU processing flag of the other CPU is positive when its own processing is completed, and sets the backward substitution process 1 end flag.
(5) When the backward substitution process 1 end flag is positive, each CPU recognizes that the process of the backward substitution process 1 has ended, and proceeds to the next backward substitution process 2 process.

最後に、後退代入過程２の並列処理について、説明する。複数のＣＰＵで並列処理を行う場合、処理前に、ｘｉ、ｘｊ要素の並列処理が可能であるか否かのチェックを行い、並列処理フラグ（ＣＰＵ実行フラグ、ＣＰＵ参照フラグ）を複数ＣＰＵで並列に作成し、各ＣＰＵの処理するｘｉ、ｘｊ要素の決定、１回毎の並列計算ステップにおける各ＣＰＵの処理が終了したことの検出方法、また、全行の処理が終了したことの検出方法を定める必要がある。
（ａ）並列処理フラグ（ＣＰＵ実行フラグ、ＣＰＵ参照フラグ）の作成
（１）各ＣＰＵが自分の担当する並列処理フラグ作成の対象要素を設定する。
（２）ＣＰＵ１は、まず、全ＣＰＵのＣＰＵ実行フラグ、ＣＰＵ参照フラグを「０」で初期化する。
（３）次に、各ＣＰＵは、今回の並列処理対象要素について、後段のＰ（ｍ）と前段の全Ｑ（ｎ）の値とが一致するか否かチェックする。一致する場合は、並列処理が不可になるのでＣＰＵ参照フラグを設定し、全て一致しない場合は並列処理が可能になるので、ＣＰＵ実行フラグを設定する。
・並列処理対象の最初の要素は、無条件で並列処理可能とし、当該ＣＰＵのＣＰＵ実行フラグに当該ＣＰＵ番号を設定する。
・後段の要素と先行する全要素のチェックの結果、一致する場合は、当該ＣＰＵのＣＰＵ参照フラグに、参照すべき一致した要素のＱ（ｎ）を担当するＣＰＵ番号を設定する。
・後段の要素と先行する全要素のチェックの結果、全て一致しない場合は、当該要素Ｐ（ｍ）を担当するＣＰＵのＣＰＵ実行フラグに当該ＣＰＵ番号を設定する。
（４）次に、各ＣＰＵは、今回の並列処理対象要素について、後段のＰ（ｍ）と前段の全Ｐ（ｎ）の値とが一致するかチェックする。一致する場合は、並列処理が不可になるのでＣＰＵ参照フラグを設定する。全て一致しない場合は何も処理しない。
・一致する場合は、当該ＣＰＵのＣＰＵ参照フラグに、参照すべき一致した要素のＰ（ｎ）を担当するＣＰＵ番号を設定する。
・一致しない場合は、何もしない。
（５）各ＣＰＵは、自分の担当する処理が終了すれば、自ＣＰＵ処理フラグに自分のＣＰＵ番号を設定する。
（６）ＣＰＵ１は、全てのＣＰＵの処理が終了したか否かを、ＣＰＵ処理フラグから判定し、終了している場合は、並列処理フラグ作成の終了フラグを設定し、処理を終了する。
（７）ＣＰＵ１以外のＣＰＵは、並列処理フラグ作成の終了フラグをチェックし、設定済みの場合は処理を終了する。
（ｂ）処理対象ｘｉ、ｘｊ要素の決定
（１）各ＣＰＵが処理するｘｉを下記の（５）式により算出する。また、図２０は、後退代入過程２の並列処理において、４台のＣＰＵに処理を割り振り各種フラグで管理する様子を示す図である。
ｋ＝ｎｍａｘ−（Ｌ−１）・ＣＰＵmax−ＣＰＵｉ＋１・・・（５）
ｉ＝Ｑ（ｋ）
ｊ＝Ｐ（ｋ）
ただし、
ｎｍａｘ：ｘｉの最大要素数
Ｌ：各ＣＰＵの計算回数（ＣＰＵ１が初期化。初期値は１）
ＣＰＵｍａｘ：並列処理するＣＰＵ台数
ＣＰＵｉ：ＣＰＵ番号（ＣＰＵ１＝１、・・・、ＣＰＵｎ＝ＣＰＵｍａｘ）
（ｃ）ＣＰＵ処理フラグの管理
ＣＰＵ１が各ＣＰＵの計算回数Ｌの更新を行うためには、各ＣＰＵの処理が終了したことを確認する必要がある。これをＣＰＵ処理フラグと定義する。
（１）初期化
ＣＰＵ１が、全ＣＰＵのＣＰＵ処理フラグを、０で初期化する。
（２）ＣＰＵ処理フラグの設定
各ＣＰＵが、担当の処理が終了した時点で、ＣＰＵ処理フラグに自ＣＰＵ番号を書き込む。
（３）計算回数Ｌのカウントアップ
ＣＰＵ１は、全ＣＰＵのＣＰＵ処理フラグが正になった時点で、Ｌを１だけ、
カウントアップし、全ＣＰＵのＣＰＵ処理フラグをリセットする。
（ｄ）後退代入過程２終了フラグの管理
後退代入過程２の処理終了は、各ＣＰＵが行う。
（１）初期化
ＣＰＵ１が、後退代入過程２終了フラグを、０で初期化する。
（２）最終行の検出
各ＣＰＵが処理するｘｉの行番号が、「１」と一致した場合、最終行の処理をしたと判断し、検出したＣＰＵは、後退代入過程２終了フラグに自ＣＰＵ番号をセットする。
（３）後退代入過程２の終了
各ＣＰＵは、後退代入過程２終了フラグが正の場合、後退代入の処理が終了したと判断し、全処理の終了とする。
（ｅ）後退代入過程２の並列処理手順
（１）各ＣＰＵは、自分の担当する並列処理フラグ作成の対象要素を設定する。
（２）ＣＰＵ１が、各ＣＰＵのＣＰＵ実行フラグ、ＣＰＵ参照フラグ、ＣＰＵ処理フラグ、後退代入過程２終了フラグを初期化する。
（３）各ＣＰＵが、全てのＣＰＵ実行フラグ、ＣＰＵ参照フラグを並列に作成する。
（４）各ＣＰＵが、処理対象ｘｉ、ｘｊ要素を決定する。
（５）各ＣＰＵが、ＣＰＵ実行フラグをチェックする。
・自分のＣＰＵ番号と一致する場合は、ｘｉ要素の計算を行い、ＣＰＵ処理フラグをセットする。なお、ｘｊ要素と掛け算となるＤ要素がゼロの場合は、その処理をスキップする。
・自分のＣＰＵ番号と一致しない場合は、ＣＰＵ参照フラグが示すＣＰＵ番号について、そのＣＰＵ処理フラグをチェックする。
（ｉ）ＣＰＵ処理フラグが正の場合は、ｘｉ要素の処理を行い、ＣＰＵ処理フラ
グをセットする。
（ｉｉ）ＣＰＵ処理フラグが正でない場合は、継続チェックを行う（先行ＣＰＵの処理終了待ち）。
（６）ＣＰＵ１が、全ＣＰＵ処理フラグが正であることを確認し、Ｌを１カウントアップし、全ＣＰＵのＣＰＵ処理フラグをリセットし、（４）に戻る。
（７）各ＣＰＵは、自分が１行目を処理した場合、後退代入過程２終了フラグをセッ
トする。
（８）各ＣＰＵは、後退代入過程２終了フラグが正の場合、後退代入過程２を終了す
る。 Finally, parallel processing in the backward substitution process 2 will be described. When parallel processing is performed by a plurality of CPUs, it is checked whether parallel processing of xi and xj elements is possible before processing, and parallel processing flags (CPU execution flag, CPU reference flag) are parallelized by a plurality of CPUs. Xi, xj element to be processed by each CPU, determination method for detecting the end of processing of each CPU in each parallel calculation step, and detection method for detecting the end of processing of all rows It is necessary to determine.
(A) Creation of parallel processing flag (CPU execution flag, CPU reference flag) (1) Each CPU sets a target element for creating a parallel processing flag that it is responsible for.
(2) The CPU 1 first initializes the CPU execution flags and CPU reference flags of all the CPUs with “0”.
(3) Next, each CPU checks whether or not the value of P (m) in the subsequent stage matches the value of all Q (n) in the previous stage for the current parallel processing target element. If they match, parallel processing becomes impossible, so a CPU reference flag is set. If not all match, parallel processing is possible, so a CPU execution flag is set.
The first element to be parallel processed can be processed in parallel unconditionally, and the CPU number is set in the CPU execution flag of the CPU.
If the result of checking the subsequent element and all preceding elements match, the CPU number in charge of Q (n) of the matched element to be referred to is set in the CPU reference flag of the CPU.
If the result of checking the subsequent element and all preceding elements does not match, the CPU number is set in the CPU execution flag of the CPU in charge of the element P (m).
(4) Next, each CPU checks whether or not the value of P (m) in the subsequent stage matches the value of all P (n) in the previous stage for the current parallel processing target element. If they match, parallel processing becomes impossible, so a CPU reference flag is set. If all do not match, do nothing.
If they match, the CPU number responsible for P (n) of the matched element to be referenced is set in the CPU reference flag of the CPU.
・ If they do not match, do nothing.
(5) Each CPU sets its own CPU number in its own CPU processing flag when the processing that it is in charge of is completed.
(6) The CPU 1 determines whether or not the processing of all the CPUs has been completed from the CPU processing flag. If it has been completed, the CPU 1 sets a parallel processing flag creation end flag and ends the processing.
(7) The CPUs other than the CPU 1 check the end flag for creating the parallel processing flag, and if it has been set, end the processing.
(B) Determination of processing target xi and xj elements (1) xi to be processed by each CPU is calculated by the following equation (5). FIG. 20 is a diagram showing a state in which processing is allocated to four CPUs and managed by various flags in the parallel processing of the backward substitution process 2.
k = nmax− (L−1) · CPUmax−CPUi + 1 (5)
i = Q (k)
j = P (k)
However,
nmax: Maximum number of elements of xi L: Number of calculations of each CPU (initialized by CPU 1; initial value is 1)
CPUmax: number of CPUs to be processed in parallel CPUi: CPU number (CPU1 = 1,..., CPUn = CPUmax)
(C) Management of CPU processing flag In order for the CPU 1 to update the calculation count L of each CPU, it is necessary to confirm that the processing of each CPU has been completed. This is defined as a CPU processing flag.
(1) Initialization The CPU 1 initializes CPU processing flags of all CPUs with 0.
(2) Setting of CPU processing flag Each CPU writes its own CPU number in the CPU processing flag at the time when the assigned process is completed.
(3) Counting up the number of calculations L When the CPU processing flag of all CPUs becomes positive, the CPU 1 sets L to 1,
Counts up and resets CPU processing flags of all CPUs.
(D) Management of Backward Substitution Process 2 End Flag Each CPU performs the end of the backward substitution process 2 processing.
(1) Initialization The CPU 1 initializes the backward substitution process 2 end flag with 0.
(2) Detection of the last row When the row number of xi processed by each CPU matches “1”, it is determined that the last row has been processed, and the detected CPU sets its own CPU in the backward substitution process 2 end flag. Set the number.
(3) End of Backward Substitution Process 2 If the backward substitution process 2 end flag is positive, each CPU determines that the backward substitution process has ended and ends all processes.
(E) Parallel processing procedure of backward substitution process 2 (1) Each CPU sets a target element for creating a parallel processing flag that it is responsible for.
(2) The CPU 1 initializes the CPU execution flag, CPU reference flag, CPU processing flag, and backward substitution process 2 end flag of each CPU.
(3) Each CPU creates all CPU execution flags and CPU reference flags in parallel.
(4) Each CPU determines processing target xi and xj elements.
(5) Each CPU checks the CPU execution flag.
If it matches the CPU number of the user, the xi element is calculated and the CPU processing flag is set. If the D element multiplied by the xj element is zero, the process is skipped.
If the CPU number does not match, check the CPU processing flag for the CPU number indicated by the CPU reference flag.
(I) When the CPU processing flag is positive, the xi element is processed and the CPU processing flag is set.
(Ii) If the CPU processing flag is not positive, a continuation check is performed (waiting for the end of processing of the preceding CPU).
(6) The CPU 1 confirms that all CPU processing flags are positive, increments L by 1, resets the CPU processing flags of all CPUs, and returns to (4).
(7) Each CPU sets the backward substitution process 2 end flag when it processes the first line.
(8) Each CPU ends the backward substitution process 2 if the backward substitution process 2 end flag is positive.

本実施の形態によれば、実施の形態１〜３のいずれかのノード順序付けの結果に基づき、処理の実施前に、前進消去・後退代入処理の並列処理可能の可否を判定し、可能な場合はＣＰＵ実行フラグをセットし、不可能な場合はＣＰＵ参照フラグに参照すべきＣＰＵ番号をセットし、前進消去・後退代入処理においては、ＣＰＵ実行フラグがセットされている場合は並列処理を行い、セットされていない場合は、ＣＰＵ参照フラグより、参照すべきＣＰＵ番号を抽出し、そのＣＰＵの処理が終了したことを確認して、その後に当該ＣＰＵの処理を行うようにしたので、従来はすべて直列に処理していた前進消去・後退代入処理の並列処理を実現すると共に、並列処理の実行効率を向上させることができるという効果がある。また、図２に示すような対称型マルチＣＰＵ構成の計算機に基づき、本実施の形態にかかる連立一次方程式の並列求解方法を用いることにより、連立一次方程式の解を並列求解するための並列求解装置を構成することができる。 According to the present embodiment, based on the result of the node ordering in any one of the first to third embodiments, it is determined whether or not the forward erasure / backward substitution process can be performed in parallel before the process is performed. Sets the CPU execution flag, if not possible, sets the CPU number to be referred to the CPU reference flag, and in forward erasure / reverse substitution processing, performs parallel processing when the CPU execution flag is set, If it is not set, the CPU number to be referred to is extracted from the CPU reference flag, and after confirming that the processing of the CPU is completed, the processing of the CPU is performed thereafter. In addition to realizing parallel processing of forward erasure / backward substitution processing processed in series, there is an effect that the execution efficiency of the parallel processing can be improved. Further, based on a computer having a symmetric multi-CPU configuration as shown in FIG. 2, by using the parallel solving method of simultaneous linear equations according to the present embodiment, a parallel solving apparatus for solving solutions of simultaneous linear equations in parallel Can be configured.

実施の形態１が適用される訓練シミュレータの構成を示す構成図である。It is a block diagram which shows the structure of the training simulator to which Embodiment 1 is applied. 実施の形態１にかかる連立一次方程式の並列求解に利用される計算機構成の一例を示す図である。2 is a diagram illustrating an example of a computer configuration used for parallel solution of simultaneous linear equations according to the first embodiment; FIG. 実施の形態１において、系統摸擬サーバ群に実装し、系統シミュレーションを行う動態安定度計算の処理例を示すフロー図である。In Embodiment 1, it is a flowchart which shows the process example of the dynamic stability calculation which mounts in a system | strain pseudo | simulation server group and performs system simulation. 非特許文献２に記載の系統例を示す図である。It is a figure which shows the example of a system | strain described in the nonpatent literature 2. 図４の系統例の回路網方程式（連立一次方程式）を示す図である。It is a figure which shows the network equation (simultaneous linear equation) of the system example of FIG. 三角分解の基本的な処理を示すフロー図である。It is a flowchart which shows the basic process of triangulation. 三角分解結果の各種データを保存するテーブルの構造を示す図である。It is a figure which shows the structure of the table which stores the various data of a triangulation result. 前進消去・後退代入処理を示す処理フロー図である。It is a processing flow figure showing forward erasure / reverse substitution processing. 前進消去処理における並列処理の問題点を示す図である。It is a figure which shows the problem of the parallel processing in a forward erasure | elimination process. 後退代入処理における並列処理の問題点を示す図である。It is a figure which shows the problem of the parallel processing in a backward substitution process. 実施の形態１のノード順序付け方法を示すフロー図である。FIG. 3 is a flowchart showing a node ordering method according to the first embodiment. 図１１−１に続くフロー図である。It is a flowchart following FIG. 図１１−２における新規非零要素発生のシミュレーションの並列処理を示すフロー図である。It is a flowchart which shows the parallel processing of the simulation of new non-zero element generation | occurrence | production in FIG. 11-2. 実施の形態１のノード順序付けの効果を説明するための図である。FIG. 10 is a diagram for explaining the effect of node ordering according to the first embodiment. 実施の形態１のノード順序付けの効果を説明するための図である。FIG. 10 is a diagram for explaining the effect of node ordering according to the first embodiment. 実施の形態１のノード順序付けの効果を説明するための図である。FIG. 10 is a diagram for explaining the effect of node ordering according to the first embodiment. 実施の形態１のノード順序付けの効果を説明するための図である。FIG. 10 is a diagram for explaining the effect of node ordering according to the first embodiment. 実施の形態３のノード順序付け方法を示すフロー図である。FIG. 10 is a flowchart illustrating a node ordering method according to the third embodiment. 図１４−１に続くフロー図である。It is a flowchart following FIG. 実施の形態４における連立一次方程式の並列求解方法を示すフロー図である。FIG. 10 is a flowchart showing a parallel solving method for simultaneous linear equations in the fourth embodiment. 図１５−１に続くフロー図である。It is a flowchart following FIG. 実施の形態４の後退代入過程における並列処理フラグ作成の処理を示すフロー図である。FIG. 20 is a flowchart showing parallel processing flag creation processing in the backward substitution process of the fourth embodiment. 図１６−１に続くフロー図である。It is a flowchart following FIG. 実施の形態４の後退代入過程における並列処理フラグ作成の処理を示すフロー図である。FIG. 20 is a flowchart showing parallel processing flag creation processing in the backward substitution process of the fourth embodiment. 図１７−１に続くフロー図である。It is a flowchart following FIG. 前進消去過程の並列処理において、４台のＣＰＵに処理を割り振り各種フラグで管理する様子を示す図である。It is a figure which shows a mode that a process is allocated to four CPUs and managed with various flags in the parallel process of a forward erasure process. 後退代入過程１の並列処理において、４台のＣＰＵに処理を割り振りＣＰＵ処理フラグで管理する様子を示す図である。It is a figure which shows a mode that a process is allocated to four CPUs and managed with a CPU process flag in the parallel process of the backward substitution process 1. FIG. 後退代入過程２の並列処理において、４台のＣＰＵに処理を割り振り各種フラグで管理する様子を示す図である。It is a figure which shows a mode that a process is allocated to four CPUs and managed with various flags in the parallel processing of the backward substitution process 2. 非特許文献１に記載の単純なネットワークを示す図である。1 is a diagram illustrating a simple network described in Non-Patent Document 1. FIG. 非特許文献１に記載のツリー構造の処理過程を示す図である。It is a figure which shows the process of the tree structure of a nonpatent literature 1. 非特許文献１に記載のノード順序付けアルゴリズムを示すフロー図である。It is a flowchart which shows the node ordering algorithm of a nonpatent literature 1.

Explanation of symbols

１訓練管理サーバ群
２系統摸擬サーバ群
３自動化模擬サーバ群
４トレーナ卓
５トレーニ卓
６大画面系統盤
７システムＬＡＮ
８ＣＰＵ１
９ＣＰＵ２
１０ＣＰＵ３
１１ＣＰＵｎ
１２共有メモリ 1 Training management server group 2 System simulation server group 3 Automated simulation server group 4 Trainer table 5 Traini table 6 Large screen system board 7 System LAN
8 CPU1
9 CPU2
10 CPU3
11 CPUn
12 Shared memory

Claims

Using a parallel computer having a symmetric multi-CPU configuration having a plurality of CPUs and a shared memory that can be commonly accessed by the CPUs, and based on triangulation of coefficient matrices, forward erasure processing, and backward substitution processing Used in parallel calculation of solutions of simultaneous linear equations in power system analysis, the structure of the coefficient matrix and the procedure of the forward elimination process and the backward substitution process are represented by a tree composed of nodes and branches connecting the nodes. A node ordering method when expressing,
A first step of performing node ordering for nodes belonging to a radial system part that is a system part not including a loop in the tree;
A second step of performing node ordering for nodes belonging to a loop system part that is a system part other than the radial system part in the tree;
Including
In the first step, the node to be ordered first is arbitrarily selected from the nodes having the smallest number of branches connected to the node, and thereafter, the node is connected to each node among the node ordering candidate nodes. When nodes are selected in order from the node with the smallest number of branches, and the node that is the node ordering candidate and the partner node that is an adjacent node connected to it via the branch is a node that does not match the partner node of the ordered node Node ordering is performed for each of the plurality of CPUs based on a selection criterion that the node ordering candidate node is preferentially selected.
In the second step, a simulation of the number of new non-zero elements generated at the time of contraction of nodes in the forward erasure process is performed in parallel using the plurality of CPUs, and obtained by the simulation. The node is selected from the nodes having a low number of new non-zero elements, and the node that is a node ordering candidate and the partner node connected to the node through a branch are not matched with the partner node of the ordered node. A node ordering method, wherein node ordering is performed for each of the plurality of CPUs based on a selection criterion that a node ordering candidate node is preferentially selected in some cases.

Using a parallel computer having a symmetric multi-CPU configuration having a plurality of CPUs and a shared memory that can be commonly accessed by the CPUs, and based on triangulation of coefficient matrices, forward erasure processing, and backward substitution processing Used in parallel calculation of solutions of simultaneous linear equations in power system analysis, the structure of the coefficient matrix and the procedure of the forward elimination process and the backward substitution process are represented by a tree composed of nodes and branches connecting the nodes. A node ordering method when expressing,
A first step of performing node ordering for nodes belonging to a radial system part that is a system part not including a loop in the tree;
A second step of performing node ordering for nodes belonging to a loop system part that is a system part other than the radial system part in the tree;
Including
In the first step, the first node to be ordered is arbitrarily selected from the nodes having the smallest number of branches connected to the node, and thereafter, the node is connected to each node among the node ordering candidate nodes. If nodes are selected in order from the node with the smallest number of branches, and the node that is the node ordering candidate and the other node that is an adjacent node connected via the branch are nodes that do not match the other node of the already ordered node Node ordering is performed for each of the plurality of CPUs based on a selection criterion that a node ordering candidate node is preferentially selected.
In the second step, a simulation of the number of new non-zero elements generated at the time of node contraction in the forward erasure process is performed using a single CPU of the plurality of CPUs. Select from the obtained nodes with a small number of new non-zero elements, and the node ordering candidate node and the adjacent node connected to it via a branch coincide with the other node of the ordered node A node ordering method, wherein node ordering is performed for each of the plurality of CPUs based on a selection criterion that a node that is a node ordering candidate is preferentially selected when the node is not a node.

Using a parallel computer having a symmetric multi-CPU configuration having a plurality of CPUs and a shared memory that can be commonly accessed by the CPUs, and based on triangulation of coefficient matrices, forward erasure processing, and backward substitution processing Used in parallel calculation of solutions of simultaneous linear equations in power system analysis, the structure of the coefficient matrix and the procedure of the forward elimination process and the backward substitution process are represented by a tree composed of nodes and branches connecting the nodes. A node ordering method when expressing,
A first step of performing node ordering for nodes belonging to a radial system part that is a system part not including a loop in the tree;
A second step of performing node ordering for nodes belonging to a loop system part that is a system part other than the radial system part in the tree;
Including
In the first step, the first node to be ordered is arbitrarily selected from the nodes having the smallest number of branches connected to the node, and thereafter, the node is connected to each node among the node ordering candidate nodes. If nodes are selected in order from the node with the smallest number of branches, and the node that is the node ordering candidate and the other node that is an adjacent node connected via the branch are nodes that do not match the other node of the already ordered node Node ordering is performed for each of the plurality of CPUs based on a selection criterion that a node ordering candidate node is preferentially selected.
In the second step, processing is performed using a single CPU of the plurality of CPUs, so that nodes are sequentially selected from the nodes with the fewest number of branches connected to each node among the node ordering candidate nodes. When a node ordering candidate node and a neighboring node connected to it via a branch are nodes that do not match the other node of the ordered node, the node ordering candidate node is selected with priority A node ordering method, wherein node ordering is performed for each of the plurality of CPUs based on a selection criterion of

The node ordering method according to any one of claims 1 to 3, wherein a parallel computing device having a symmetric multi-CPU configuration including a plurality of CPUs and a shared memory accessible by the CPUs in common. Parallel solution of simultaneous linear equations in power system analysis based on processing procedure including applied node ordering step, triangulation step for triangulating coefficient matrix, forward elimination processing step, backward substitution processing step A parallel solution method for calculating,
In the forward erasure processing step, a parallel processing flag for determining whether or not parallel processing is possible is created in parallel by the plurality of CPUs, and the parallel processing flag can be referred to and the parallel processing flag is referred to Execute parallel processing for forward erasure by performing parallel processing for elements that cannot be processed in parallel and confirming the end of processing of related elements necessary for processing of the elements.
The backward substitution processing step includes a process of substituting the product of the coefficient vector component of the simultaneous linear equation and the inverse of the diagonal component of the coefficient matrix corresponding to this component into the solution vector component to be processed. A substitution first step, and a backward substitution second step that is a process after the backward substitution first step,
In the first backward substitution first step, the processing target is distributed to the plurality of CPUs to perform parallel processing,
In the backward substitution second step, a parallel processing flag for determining whether or not parallel processing is possible is created in parallel by the plurality of CPUs, and parallel processing is possible with reference to the parallel processing flag Execute parallel processing for elements, and for elements that cannot be processed in parallel, execute parallel processing of backward substitution by confirming the end of the processing of the related elements necessary for processing of the element. A parallel solution method for simultaneous linear equations.