JP7240691B1

JP7240691B1 - Data drive active power distribution network abnormal state detection method and system

Info

Publication number: JP7240691B1
Application number: JP2022071358A
Authority: JP
Inventors: 天光呂; 邵瑞王; 建陳; 文博李; 東磊孫; ▲千▼ 艾
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2021-09-08
Filing date: 2022-04-25
Publication date: 2023-03-16
Anticipated expiration: 2042-04-25
Also published as: CN113496262B; CN113496262A; JP2023042527A

Abstract

【課題】データセットの利用効率を向上し、異常状態の予測精度を向上し、予測誤差を低減するデータドライブの能動配電網異常状態検知方法及びシステムを提供する。【解決手段】方法は、配電網異常状態発生後のノードパラメータデータを取得し、データを訓練済みの配電網異常状態予測モデルに入力して、配電網異常状態予測結果を出力し、配電網履歴異常状態ノードパラメータデータをクラスタリングし、相関ルールアルゴリズムにより、配電網異常状態タイプと強い相関を有するデータサンプルを見つけて、訓練用のサンプルデータセットを形成し、サンプルデータセットに基づいて配電網異常状態予測モデルの訓練を行い、三層データマイニング構造を用いて、データ分類と相関ルールにより、対応する故障タイプと強い相関を有するデータサンプルを抽出して取得する。【選択図】図１Kind Code: A1 A data drive active power grid abnormal condition detection method and system for improving data set utilization efficiency, improving abnormal condition prediction accuracy, and reducing prediction error. The method obtains node parameter data after the occurrence of a distribution network abnormality, inputs the data into a trained distribution network abnormality prediction model, outputs a distribution network abnormality prediction result, and outputs a distribution network history. Cluster the abnormal node parameter data, find data samples with strong correlation with the grid abnormal condition type by association rule algorithm, form a sample data set for training, and determine the grid abnormal condition based on the sample data set A prediction model is trained, and a three-layer data mining structure is used to extract and obtain data samples that have a strong correlation with corresponding fault types according to data classification and association rules. [Selection drawing] Fig. 1

Description

本発明は、配電網異常状態分類及び予測技術分野に関し、特にデータドライブの能動配電網異常状態検知方法及びシステムに関する。 TECHNICAL FIELD The present invention relates to the technical field of power grid abnormal condition classification and prediction, and more particularly to a data drive active power grid abnormal condition detection method and system.

本部分の説明は、本発明に関連する背景技術を提供するだけであり、従来技術を必然的に構成するものではない。 This section merely provides background information related to the present invention and may not necessarily constitute prior art.

能動配電網知能リアルタイム監視の深化に伴い、監視・保護装置は能動配電網に多く取り付けられ、これらの装置によって電網運転状態データを収集し、データ分析処理から、能動配電網の運転状態が正常であるか否かを判断する。しかし、このような診断方案は、監視装置の設置のために多くのコストがかかると同時に、診断精度が低いという問題がある。 With the deepening of active power grid intelligent real-time monitoring, more and more monitoring and protection devices are installed in the active power grid. These devices collect the power grid operating status data, and from the data analysis process, the active power grid operating status is normal. determine whether there is However, such a diagnosis method has the problems of high cost due to the installation of the monitoring device and low diagnosis accuracy.

上記従来の故障診断方案の問題を回避するために、人工知能は自己学習や自己最適化という利点に頼んで、能動配電網短絡故障、断線故障や過負荷等の異常状態の診断に用いられる。一般的な故障診断方法としては、エキスパートシステム、ファジー理論、人工ニューラルネットワーク、ベイジアンネットワーク、及びペトリネットワーク等の方法がある。エキスパートシステムに基づく故障診断方法では、能動配電網の運転状態等のデータを用いて、エキスパートの経験知識と合わせて、エキスパートモデルを構築して、故障タイプを推論する。ファジィー理論に基づく診断方法では、ファジィー制御によって任意の非線形連続関数の近似シミュレーションを実現する。人工ニューラルネットワークに基づく診断方法では、人間の脳や自然のニューラルネットワークを模倣し、既存の履歴データをモデル化し、関連パラメータの訓練を完了し、問題の解決セットを取得する。ペトリネットワークに基づく診断手法では、複雑な情報ベースが必要なく、離散的な動的能動配電網故障を効果的に処理することができるが、フォールトトレランスが低いという問題がある。 In order to avoid the above-mentioned problems of conventional fault diagnosis methods, artificial intelligence is used to diagnose abnormal conditions such as active power grid short-circuit faults, disconnection faults and overloads, relying on the advantages of self-learning and self-optimization. Common fault diagnosis methods include methods such as expert systems, fuzzy logic, artificial neural networks, Bayesian networks, and Petri networks. In the expert system-based fault diagnosis method, data such as the operating state of the active power distribution network is used to construct an expert model in combination with the expert's empirical knowledge to infer the fault type. A diagnostic method based on fuzzy theory realizes an approximate simulation of an arbitrary nonlinear continuous function by fuzzy control. Diagnostic methods based on artificial neural networks mimic the human brain and natural neural networks, model existing historical data, complete training of relevant parameters, and obtain a solution set for the problem. Although the Petri network-based diagnostic approach does not require a complex information base and can effectively handle discrete dynamic active grid faults, it suffers from low fault tolerance.

能動配電網の規模が非常に大きく、故障データが多い場合に、故障データに対する分析効果は、アルゴリズムの選択に大幅に依存する。そして、現在の人工知能が取り入れた故障診断方案は、いつも単一の人工知能アルゴリズムやモデルのみを採用し、その診断精度に向上する余裕があり、且つフォールトトレランスが低く、エキスパートの経験に強く依存するなどの制限がある。 When the scale of the active distribution network is very large and there is a lot of fault data, the analysis effect on the fault data is highly dependent on the choice of algorithms. In addition, the current failure diagnosis method adopted by artificial intelligence always adopts only a single artificial intelligence algorithm or model, which has room for improvement in diagnosis accuracy and low fault tolerance, and strongly depends on the experience of experts. There are restrictions such as

上記課題を解決するために、本発明は、三層データマイニングを用いて能動配電網異常状態に対して分類及び予測を行って、異常状態分類及び予測の精度が低く時間が長い問題を対応的に解決することができるデータドライブの能動配電網異常状態検知方法及びシステムが抽出される。 In order to solve the above problems, the present invention uses three-layer data mining to classify and predict active distribution network abnormal conditions, so as to solve the problem of low accuracy and long time of abnormal condition classification and prediction correspondingly. A data drive active grid abnormal state detection method and system are extracted.

上記目的を実現するために、いくつかの実施形態において、下記の技術的解決策を採用する。 To achieve the above objectives, the following technical solutions are adopted in some embodiments.

データドライブの能動配電網異常状態検知方法であって、
配電網異常状態発生後のノードパラメータデータを取得し、前記データを訓練済みの配電網異常状態予測モデルに入力して、配電網異常状態予測結果を出力するステップを含み、
前記配電網異常状態予測モデルの訓練工程は、配電網履歴異常状態ノードパラメータデータをクラスタリングし、相関ルールアルゴリズムにより、配電網異常状態タイプと強い相関を有するデータサンプルを見つけて、訓練用のサンプルデータセットを形成するステップと、前記サンプルデータセットに基づいて前記配電網異常状態予測モデルの訓練を行うステップとを含む。 A data drive active grid abnormal condition detection method, comprising:
Acquiring node parameter data after the occurrence of a distribution grid abnormality, inputting the data into a trained distribution grid abnormality prediction model, and outputting a distribution grid abnormality prediction result;
The training process of the grid abnormal condition prediction model includes clustering the grid historical abnormal condition node parameter data, finding data samples with a strong correlation with the grid abnormal condition type by association rule algorithm, and obtaining sample data for training. forming a set; and training the grid abnormal condition prediction model based on the sample data set.

さらなる解決策として、配電網履歴異常状態ノードパラメータデータをクラスタリングする前記工程は、具体的に、
ユークリッド距離に基づいてデータサンプルを分類するステップと、
判別関数によってデータサンプルクラスタリングが完成したか否かを判断するステップと、
誤差二乗和を算出し、ＳＳＥ－Ｋグラフによって参照されるクラスタリング数を特定し、前記参照されるクラスタリング数に基づいて最終的なクラスタリング数を特定するステップと、を含む。 As a further solution, the step of clustering the grid historical abnormal condition node parameter data specifically comprises:
classifying the data samples based on Euclidean distance;
determining whether the data sample clustering is complete by the discriminant function;
calculating the error sum of squares, determining the clustering number referenced by the SSE-K graph, and determining the final clustering number based on the referenced clustering number.

さらなる解決策として、配電網履歴異常状態ノードパラメータデータをクラスタリングした後に、
クラスタリング後のデータサンプルを自己エンコードし、第１サンプルデータセットを形成するステップをさらに含み、第１サンプルデータセットにおける元素はベクトルであり、各ベクトルはＴＶ及びＦ_ｉから構成され、（ただし、ＴＶ＝｛Ｎ０Ｃ_１，Ｎ０Ｃ_２，．．．，Ｎ０Ｃ_ａ｝は自己コーディング後のデータであり、Ｃ値の範囲は［１，Ｋ］であり、ａはデータ収集が行われるノード番号であり、Ｆ_ｉは異常状態タイプの値である）、
前記自己コーディング後のデータは、少なくともデータの位置するノード位置情報及びデータの属するクラスタリングカテゴリ情報を含む。 As a further solution, after clustering the grid historical abnormal state node parameter data,
Self-encoding the clustered data samples to form a first sample data set, wherein the elements in the first sample data set are vectors, each vector composed of TV and F _i , where TV = {N0C ₁ , N0C ₂ ,..., N0C _a } is the data after self-coding, the range of C values is [1, K], a is the node number where data collection is performed, F _i is the value of the abnormal condition type),
The self-coded data includes at least node location information where the data is located and clustering category information to which the data belongs.

さらなる解決策として、相関ルールアルゴリズムによって配電網異常状態タイプと強い相関を有するデータサンプルを見つけて、訓練用のサンプルデータセットを形成する前記ステップは、具体的に、
第１サンプルデータセットにおける各ベクトルのＴＶ及びＦ_ｉの２種類元素を１種類元素として併合し、新たなデータセットＭ：｛Ｚ_１，Ｚ_２，…，Ｚ_ｉ｝を構成するステップと（ただし、Ｚ_ｉは１つの自己コーディング後のデータと対応する異常状態タイプとからなる新たなベクトルを表し、ｉは新たなベクトルの数である）、
頻繁なアイテムセットの支持度及び信頼度を用いて前記新たなベクトルのうちの２つ又は複数の元素間の繋がりを数値化し、支持度及び信頼度要求を満たす非０ベクトルを抽出して、配電網異常状態予測モデルの訓練を行うための第２サンプルデータセットを形成するステップと、を含む。 As a further solution, the step of finding data samples with strong correlations with grid abnormal condition types by an association rule algorithm to form a sample data set for training specifically comprises:
a step of merging the two elements TV and F _i of each vector in the first sample data set as one element to form a new data set M: {Z ₁ , Z ₂ , . . . , Z _i }; , Z _i represents a new vector consisting of one post-self-coding data and the corresponding abnormal condition type, i is the number of new vectors),
Quantifying the connections between two or more elements in the new vectors using the support and confidence of frequent itemsets, extracting the non-zero vectors that satisfy the support and confidence requirements, and distributing power forming a second sample data set for training a network abnormal condition prediction model.

さらなる解決策として、前記サンプルデータセットに基づいて前記配電網異常状態予測モデルの訓練を行う前記ステップは、具体的に、
得られたサンプルデータセットに基づいて、確率的勾配降下アルゴリズムを用いて回帰訓練を行い、配電網異常状態予測モデルの最適なパラメトリック解を取得するステップを含む。 As a further solution, the step of training the grid abnormal condition prediction model based on the sample data set specifically comprises:
Regression training is performed using a stochastic gradient descent algorithm based on the obtained sample data set to obtain an optimal parametric solution of the grid abnormal condition prediction model.

さらなる解決策として、前記配電網異常状態予測モデルは具体的に、

である
（ただし、ｗは予測モデル関数の重みベクトルであり、ｂは予測モデル関数のインターセプトを表し、ｗ^Ｔ（ＣＶ_ｊ）はベクトルｗとベクトルＣＶ_ｊのスカラー積を表し、ベクトルＣＶ_ｊは訓練サンプルである）。 As a further solution, the distribution network abnormal state prediction model specifically:

where w is the weight vector of the predictive model function, b represents the intercept of the predictive model function, w ^T (CV _j ) represents the scalar product of vector w and vector CV _j , and vector CV _j is the training sample).

さらなる解決策として、目的関数を最適化することによって前記配電網異常状態予測モデルのフィット度を測定するステップをさらに含み、前記目的関数の最適化は具体的に、

である
（ただし、Ｌ（Ｆ_ｉ，ｆ（ＣＶ_ｊ））はロス関数を表し、αはハイパーパラメータで設定値であり、Ｒ（ｗ）は正則化項であり、ｎはサンプルデータセット総量であり、Ｆ_ｉは異常状態タイプの値である）。 A further solution further comprises measuring the fit of the grid abnormal condition prediction model by optimizing an objective function, wherein the optimization of the objective function is specifically:

where L(F _i , f(CV _j )) represents the loss function, α is the hyperparameter and set value, R(w) is the regularization term, and n is the total amount of the sample data set. Yes and F _i is the value of the abnormal condition type).

他のいくつかの実施形態において、下記の技術的解決策を採用する。 In some other embodiments, the following technical solutions are adopted.

データドライブの能動配電網異常状態感知システムであって、
配電網異常状態発生後のノードパラメータデータを取得するためのデータ取得モジュールと、
前記データを訓練済みの配電網異常状態予測モデルに入力し、配電網異常状態予測結果を出力する異常状態予測モジュールと、を含み、
前記配電網短絡異常状態モデルの訓練工程は、配電網履歴故障ノードパラメータデータをクラスタリングし、相関ルールアルゴリズムにより、配電網短絡異常状態と強い相関を有するデータサンプルを見つけて、訓練用のサンプルデータセットを形成するステップと、前記サンプルデータセットに基づいて前記配電網異常状態予測モデルの訓練を行うステップとを含む。 An active grid abnormal condition sensing system for a data drive, comprising:
a data acquisition module for acquiring node parameter data after a grid abnormal condition occurs;
an abnormal condition prediction module that inputs the data to a trained distribution network abnormal condition prediction model and outputs a distribution network abnormal condition prediction result;
The training step of the grid short circuit abnormal condition model includes clustering the grid historical fault node parameter data, finding data samples with a strong correlation with the grid short circuit abnormal condition by association rule algorithm, and obtaining a sample data set for training. and training the grid abnormal condition prediction model based on the sample data set.

従来技術に比べて、本発明の有益な効果は以下のとおりである。
（１）本発明の分散型電源含有の配電網異常状態予測は、三層データマイニング構造を用いて、データ分類と相関ルールにより、対応する故障タイプと強い相関を有するデータサンプルを抽出して取得し、データセットの利用効率を向上することができ、三層データマイニング構造により異常状態の予測精度を向上し、算出時間を短くし、予測誤差を低減することができる。
（２）本発明は、自己エンコーディングによるデータフォーマット簡略化の方法を抽出し、異なる異常状態タイプラベルを設置することで、データ関連性が不明瞭、不明確である問題を効果的に回避して、異常状態タイプと強い相関を有するデータを正確的に抽出することができ、データセットの有効性を増強する。 The beneficial effects of the present invention compared to the prior art are as follows.
(1) The distribution network abnormal state prediction including distributed power sources of the present invention uses a three-layer data mining structure to extract and obtain data samples that have a strong correlation with the corresponding fault type through data classification and association rules. However, it can improve the utilization efficiency of the data set, and the three-layer data mining structure can improve the prediction accuracy of the abnormal state, shorten the calculation time, and reduce the prediction error.
(2) The present invention extracts a data format simplification method by self-encoding and sets different abnormal state type labels to effectively avoid the problem of unclear and unclear data relevance. , it can accurately extract data with strong correlations with abnormal condition types, enhancing the effectiveness of datasets.

図１は本発明の実施例における第１層データマイニング工程のフローチャートである。FIG. 1 is a flow chart of the first layer data mining process in an embodiment of the present invention. 図２は本発明の実施例における第３層データマイニング工程のフローチャートである。FIG. 2 is a flow chart of the third layer data mining process in an embodiment of the present invention. 図３は本発明の実施例における誤差二乗和ＳＳＥ及び参照Ｋ値のカーブダイアグラムである。FIG. 3 is a curve diagram of error sum of squares SSE and reference K value in an embodiment of the present invention.

指摘すべきこととして、以下の詳細な説明は例示的なものであり、本出願にさらなる説明を提供することを目的とする。特に明示しない限り、本発明で使用される全ての技術及び科学的用語は本出願の属する技術分野の当業者によって一般的に理解される意味と同じものを有する。 It should be noted that the following detailed description is exemplary and is intended to provide further explanation for the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

注意すべきこととして、ここで使用される用語は具体的な実施形態を説明するためのものに過ぎず、本出願の例示的な実施形態を限定する意図ではない。ここで使用されるように、文脈が明確に示されない限り、そうでなければ単数形も複数の形式を含み、また、理解すべきこととして、本明細書において用語「含有」及び／又は「含む」を使用する場合、それは特徴、ステップ、操作、デバイス、コンポーネント及び／又はそれらの組み合わせが存在することを示す。
実施例１ It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments of the present application. As used herein, unless the context clearly indicates otherwise, the singular forms also include the plural forms and, as used herein, it should be understood that the terms "include" and/or "include" , indicates the presence of features, steps, operations, devices, components, and/or combinations thereof.
Example 1

１つ又は複数の実施形態において、下記の工程を含むデータドライブの能動配電網異常状態検知方法を開示する。
（１）配電網異常状態発生後のノードパラメータデータを取得し、ノードパラメータデータは、ノード電圧データ又はノード電流データを含むが、これらに限定されない。
（２）前記データを訓練済みの配電網異常状態予測モデルに入力し、配電網異常状態予測結果を出力する。 In one or more embodiments, a data drive active grid fault condition detection method is disclosed that includes the following steps.
(1) obtaining node parameter data after occurrence of a grid abnormal condition, the node parameter data including, but not limited to, node voltage data or node current data;
(2) Inputting the data to a trained distribution network abnormality prediction model, and outputting a distribution network abnormality prediction result.

前記配電網異常状態予測モデルの訓練工程は、配電網履歴異常状態ノードパラメータデータをクラスタリングし、相関ルールアルゴリズムにより、配電網異常状態タイプと強い相関を有するデータサンプルを見つけて、訓練用のサンプルデータセットを形成するステップと、得られたサンプルデータセットに基づいて配電網異常状態予測モデルの訓練を行うステップとを含む。 The training process of the grid abnormal condition prediction model includes clustering the grid historical abnormal condition node parameter data, finding data samples with a strong correlation with the grid abnormal condition type by association rule algorithm, and obtaining sample data for training. forming a set; and training a grid abnormal condition prediction model based on the obtained sample data set.

本実施例の配電網異常状態予測は、主に単相対地間短絡、二相相間短絡、二相対地間短絡、三相対地間短絡の４種類の能動配電網短絡故障、及び負荷異常と断線故障に対するものである。 The power distribution network abnormal state prediction of this embodiment mainly includes four types of active power distribution network short-circuit faults: single-phase short circuit, two-phase short circuit, two-phase short circuit, and three-phase short circuit, load abnormality and disconnection. It is against failure.

本実施例において、各異常状態について、１つの予測モデルをそれぞれ構築し、４種類の短絡故障、負荷異常及び断線故障を例として、これらの異常状態のそれぞれに対して予測モデルを構築し、モデリング方法及び思想がいずれも同じで、予測システム全体にこれらの異常状態が含まれてもよく、異常状態発生後のデータを入力し、モデルを予測し算出し、最後に信頼性のより高い予測結果を取得する。 In the present embodiment, one prediction model is constructed for each abnormal state, and four types of short-circuit faults, load abnormalities, and disconnection faults are used as examples, and prediction models are constructed for each of these abnormal states, modeling The method and idea are the same, and the whole prediction system can include these abnormal states, input the data after the occurrence of the abnormal state, predict and calculate the model, and finally obtain a more reliable prediction result. to get

本実施例で三層データマイニング方法を用いて配電網異常状態予測モデルの訓練を実現する。 In this embodiment, the three-layer data mining method is used to realize the training of the grid abnormal condition prediction model.

第１層データマイニングは、Ｋ平均値クラスタリングアルゴリズムを用いて生異常状態データを分類し、次に自己エンコードルールにより生異常状態データのフォーマットを簡略化する。 The first layer data mining uses a K-means clustering algorithm to classify the raw anomaly data, and then uses self-encoding rules to simplify the format of the raw anomaly data.

第２層データマイニングは、第１層で得られた分類済の異常状態データの上で、相関ルールにより、異常状態予測結果に与える影響が小さいデータをカリングし、配電網異常状態タイプと強い相関を有するデータを、予測モデルパラメータ回帰訓練のデータセットとして抽出する。 In the 2nd layer data mining, on the classified abnormal state data obtained in the 1st layer, data that has a small impact on the abnormal state prediction result is culled by the association rule, and the data has a strong correlation with the distribution network abnormal state type. as the data set for predictive model parameter regression training.

配電網異常状態タイプと強い相関を有するデータサンプルとは、設定された最小支持度を満たし設定された最小信頼度を満たすデータサンプルである。 A data sample that has a strong correlation with the grid abnormal condition type is a data sample that satisfies the set minimum support and satisfies the set minimum confidence.

第３層データマイニングは、確率的勾配降下アルゴリズムにより、第２層データマイニングで得られたデータを訓練セットとして、各種の異常状態タイプの予測モデルを取得する。 The third layer data mining uses the data obtained in the second layer data mining as a training set to obtain prediction models for various types of abnormal conditions by means of a stochastic gradient descent algorithm.

以下、三層データマイニング工程のそれぞれを詳しく説明する。
１．第１層データマイニング Each of the three layers of data mining steps is described in detail below.
1. First layer data mining

まず、異常状態発生後のノード電圧データを収集し、次にＫ平均値クラスタリングアルゴリズムによってこれらのデータを分類する。本実施例では、分類済のデータをエンコードして、データのフォーマットを簡略化するための自己エンコードルールを抽出する。Ｋ平均値クラスタリングアルゴリズムは、自己エンコーディングとともに第１層データマイニング工程を構成し、第１層データマイニング処理によって生成されたデータは第１サンプルデータセットを構成する。 First, we collect the node voltage data after the occurrence of an abnormal condition, and then classify these data by a K-means clustering algorithm. In this embodiment, we encode the classified data and extract self-encoding rules to simplify the formatting of the data. The K-means clustering algorithm together with auto-encoding constitutes the first layer data mining process, and the data generated by the first layer data mining process constitutes the first sample data set.

図１を参照すると、Ｋ平均値クラスタリングアルゴリズムは主として、ユークリッド距離がデータサンプル分類のために用いられること、判別関数がサンプルクラスタリングが完成したか否かを判断するために用いられること、エルボー法が参照されるクラスタリング数Ｋ値を特定して、次に参照されるクラスタリング数Ｋ値に基づいて最終的なＫ値を特定するために用いられることという３つの面を含む。 Referring to FIG. 1, the K-means clustering algorithm is mainly that the Euclidean distance is used for data sample classification, the discriminant function is used to determine whether the sample clustering is complete, and the elbow method is It includes three aspects: identifying the referenced clustering number K value, which is then used to identify the final K value based on the referenced clustering number K value.

具体的に、Ｋ平均値クラスタリングアルゴリズムは、サンプル点と各類中心サンプル点との間のユークリッド距離に基づいて当該データサンプルがあるクラスタリングに属することを判断する。１つのデータサンプルとある種類の中心サンプル点との間のユークリッド距離が最小となる時に、当該データサンプルはこの中心サンプル点の位置するクラスタリングに属する。ユークリッド距離の算出式は、

であり、
ただし、Ｐ＝（ｐ_１，ｐ_２，．．．，ｐ_ｎ）はｎ次元空間内の１つのデータサンプルを表し、例えば、三機九ノードシステムにおいて、ｎ＝９となると、Ｐ＝（ｐ_１，ｐ_２，．．．，ｐ_９）はある時刻で９つのノードの電圧データを表す。 Specifically, the K-means clustering algorithm determines that the data sample belongs to a clustering based on the Euclidean distance between the sample point and each class central sample point. When the Euclidean distance between a data sample and a kind of central sample point is minimal, the data sample belongs to the clustering where this central sample point is located. The formula for Euclidean distance is

and
where P=(p ₁ , p ₂ , . . . , p _n ) represents one data sample in n-dimensional space, e.g. ₁ , p ₂ ,..., p ₉ ) represent the voltage data of the 9 nodes at a certain time.

はｊ番目のクラスタリングの中心サンプル点を表し、Ｐ、Ｑ^ｊはいずれもベクトルである。最初に、ユークリッド距離を反復算出する工程において、各クラスタリングの中心サンプル点を任意に選択することができる。

represents the central sample point of the j-th clustering, and both P and Q ^j are vectors. First, in the step of iteratively calculating the Euclidean distance, a central sample point for each clustering can be chosen arbitrarily.

分類が完成していない場合に、各クラスタリングの中心点を更新する必要がある。各クラスタリングのうちのすべてのサンプルデータの平均値を、次回の反復の新たな中心点とし、同時に判別関数を用いて反復更新が停止したか否か、即ち分類が完成したか否かを判断する。ここの判別関数は、すべてのクラスタリングのうちのサンプルとサンプル中心との差の二乗和が最小であることを表し、その算出式は、

であり、
ただし、Ｋは分類の数を表し、

はｊ番目のクラスタリングのサンプル中心を表し、Ｐ^ｊはｊ番目のクラスタリングのうちのいずれも１つのサンプルデータを表し、

はいずれもｎ次元のサンプルデータのうちの１つの元素を表す。 If the classification is not complete, we need to update the center points for each clustering. The average value of all sample data in each clustering is taken as the new center point for the next iteration, and at the same time the discriminant function is used to determine whether the iterative update has stopped, i.e. whether the classification is complete. . The discriminant function here represents the minimum sum of squares of the difference between the sample and the center of the sample among all clusterings, and its calculation formula is

and
where K represents the number of classifications,

represents the sample center of the j-th clustering, P ^j represents one sample data for any of the j-th clustering,

each represent one element in the n-dimensional sample data.

式（２）で表す判別関数は最小に収束する時に、各クラスタリングのサンプル中心が明らかに変化せず、この場合に反復を停止し、分類工程が完成し、すべてのサンプル点がＫ類として分類される。 When the discriminant function represented by equation (2) converges to a minimum, the sample center of each clustering does not change obviously, in which case the iteration is stopped, the classification process is completed, and all sample points are classified as class K. be done.

Ｋ平均値クラスタリングアルゴリズムの欠点の１つとしては、分類の数を事前に知ることができないため、クラスタリング工程が順調に行われにくく、同時にクラスタリング品質が高くない。故に、参照されるクラスタリング数Ｋは、このクラスタリングアルゴリズムのキーポイントである。 One of the drawbacks of the K-means clustering algorithm is that the number of classifications cannot be known in advance, so the clustering process is difficult to perform smoothly and the clustering quality is not high at the same time. Hence, the referenced clustering number K is the key point of this clustering algorithm.

本実施例はエルボー法により、誤差二乗和（ＳＳＥ）を算出して、参照されるＫ値を取得する。Ｋが大きくなると、生データはどんどん細かく分類されると同時に、各クラスタリングの集約度はますます高くなり、二乗誤差の和はますます小さくなる。したがって、Ｋをある値にすると、二乗誤差和は急落し、ＳＳＥ－Ｋグラフに反映されて、ＳＳＥ曲線の傾斜は急に遅くなり、傾斜低下の速度が最速であり、さらにＫが大きくなると、ＳＳＥはゆっくりと変化することとなる。このような変化の傾向が、曲がった肘のような形の肘と似ていることから、この方法をエルボー法と呼んでいる。エルボ部位に対応するＫ値は、参照されるクラスタリング数である。 This embodiment uses the elbow method to calculate the sum of squared errors (SSE) to obtain the referenced K value. As K gets larger, the raw data is classified more and more finely, while each clustering gets more and more compact and the sum of the squared errors gets smaller and smaller. Therefore, for a certain value of K, the sum of squared errors drops off sharply, reflected in the SSE-K graph, where the slope of the SSE curve suddenly slows down, the rate of slope decrease being the fastest, and as K increases, The SSE will change slowly. This method is called the elbow method because this tendency of change resembles a curved elbow. The K value corresponding to the elbow site is the referenced clustering number.

誤差二乗和の算出式は、

である。 The formula for calculating the sum of squared errors is

is.

ＳＳＥを反復算出する工程において、Ｋは人為的に取られた値で、しばしば１から適当な上限値（例えば１５であり、曲線の傾向をよりよく観察できる）まで増加し、最終的にＳＳＥ－Ｋ曲線によって参照されるＫ値が決められる。Ｃ_ｉはｉ番目のクラスタリングを表し、ｐはＣ_ｉのうちのいずれか１つのデータサンプルを表し、ｍ_ｉはＣ_ｉのうちのすべてのサンプルデータの平均値、つまりそのサンプル中心を表す。 In the process of iteratively calculating the SSE, K is an artificially chosen value, often increasing from 1 to a suitable upper limit (eg 15, to better observe the trend of the curve), and finally the SSE- A K value referenced by the K curve is determined. C _i represents the ith clustering, p represents any one data sample of C _i , and m _i represents the average value of all sample data of C _i , ie its sample center.

参照されるＫ値に基づいて、データサンプル容量及びデータフォーマットの煩雑度合を特定し、最終的なＫ値は参照Ｋ値以上の数である。 Based on the referenced K value, the data sample capacity and complexity of the data format are determined, and the final K value is greater than or equal to the reference K value.

Ｋ平均値クラスタリングアルゴリズムの後、各クラスタリングのデータサンプルは特定の類似性を持っているが、この場合にまだデータフォーマットが複雑で、さらなる処理に適していない。本実施例では自己エンコーディングの方法を採用して、分類済のデータを簡略化する。生データのキー情報、例えば当該データの位置するノード及び当該データの位置するクラスタリングを保留するために、下記のエンコーディングルールＮ０Ｃ（４）に応じて自己エンコードする。 After the K-means clustering algorithm, each clustering data sample has a certain similarity, but in this case the data format is still complicated and not suitable for further processing. This embodiment adopts the method of self-encoding to simplify the classified data. To preserve the key information of the raw data, such as the node where the data is located and the clustering where the data is located, self-encode according to the encoding rule N0C(4) below.

ここで、Ｎは当該データの位置するノード位置を表し、Ｃは当該データの属するクラスタリングを表し、中間の０は識別するためのもので、実際な意義がない。あるデータが自己エンコーディングされた後、データフォーマットが４０６であることを例として、４０６は、ある時刻のノード４の電圧データがクラスタリング６に属することを表す。 Here, N represents the node position where the data is located, C represents the clustering to which the data belongs, and 0 in the middle is for identification and has no practical significance. Taking the data format as 406 after some data is self-encoded, 406 indicates that the voltage data of node 4 at a certain time belongs to clustering 6 .

上記自己エンコードルールはノードデータのみに対するものであり、異常状態タイプのデータラベルは人に設定されたものであり、例えば、本実施例で異常状態タイプの値
Ｆ_ｉε ｛１１０，１２０，１３０，１００，１４０，１５０｝
については、Ｆ_ｉ＝１１０の場合に、異常状態が単相対地間故障であり、Ｆ_ｉ＝１２０の場合に、異常状態が二相相間故障であり、Ｆ_ｉ＝１３０の場合に、異常状態が二相対地間故障であり、Ｆ_ｉ＝１００の場合に、異常状態が三相短絡故障であり、Ｆ_ｉ＝１４０の場合に、異常状態が負荷異常であり、Ｆ_ｉ＝１５０の場合に、異常状態が断線故障である。 The above self-encoding rule is for node data only, and the abnormal condition type data label is set to person, for example, the abnormal condition type values F _i ε {110,120,130, 100, 140, 150}
, if F _i =110, the fault _condition is a single phase-to-ground fault; if F _i =120, the fault condition is a two-phase-to-phase fault; is a two-to-ground fault, when F _i =100, the fault condition is a three-phase short circuit fault, when F _i =140, the fault condition is a load fault, and when F _i =150, , the abnormal state is a disconnection fault.

初期データセット｛ＮＶ，Ｆ_ｉ｝は、上記したＫ平均値クラスタリング及び自己エンコーディング処理の後に、第１サンプルデータセットを構成し、第１サンプルデータセットにおける元素はベクトルであり、各ベクトルは、ＴＶ及びＦ_ｉから構成され、ＴＶ＝｛Ｎ０Ｃ_１，Ｎ０Ｃ_２，．．．，Ｎ０Ｃ_ａ｝は、自己エンコーディング後のデータであり、Ｃ値の範囲は［１，Ｋ］であり、ａはデータ収集が行われるノード番号であるが、故障ノードが含まれず、Ｆ_ｉは異常状態タイプの値である。
２．第２層データマイニング The initial data set {NV, F _i } constitutes the first sample data set after the K-means clustering and autoencoding process described above, the elements in the first sample data set being vectors, each vector being TV and F _i , TV={N0C ₁ , N0C ₂ , . . . , N0C _a } is the data after self-encoding, the range of C values is [1, K], a is the node number where data collection is performed, but no faulty nodes are included, and F _i is the abnormal State type value.
2. Second layer data mining

能動配電網において、異常状態タイプとノード電圧との間に一定の繋がりがあり、異常状態発生時に、ノード電圧波形にも対応する変化が生じ、本実施例では相関ルールアルゴリズムを第２層データマイニングに用いて、対応する異常状態タイプと強い相関を有するデータサンプルを見つけて、これらのデータを訓練セットとして、異常状態予測モデルを訓練することにより、モデルの精度を向上させる。 In the active power distribution network, there is a certain connection between the abnormal condition type and the node voltage, and when the abnormal condition occurs, the node voltage waveform will also have a corresponding change. is used to find data samples that have a strong correlation with the corresponding abnormal condition types, and use these data as a training set to train an abnormal condition prediction model, thereby improving the accuracy of the model.

Ａｐｒｉｏｒｉアルゴリズムは、頻繁なアイテムセットをマイニングする相関ルールアルゴリズムである。まず、第１サンプルデータセットにおけるＴＶ及びＦ_ｉの２種類の元素を１種類の元素として併合し、即ち対応する異常状態タイプとノード電圧データを１つのベクトルとして構成し、新たなデータセットＭ：｛Ｚ_１，Ｚ_２，…，Ｚ_ｉ｝を形成し、Ｚ_ｉは１つの自己エンコーディング後のデータと対応する異常状態タイプからなる新たなベクトルを表し、ｉは新たなベクトルの数を表す。 The Apriori algorithm is an association rule algorithm that mines frequent itemsets. First, the two elements of TV and F _i in the first sample data set are merged into one element, that is, the corresponding abnormal state type and node voltage data are formed into one vector, and a new data set M: Form _{ Z ₁ _, Z ₂ , .

頻繁なアイテムセットの支持度及び信頼度を用いて前記新たなベクトルのうちの２つ又は複数の元素間の繋がりを数値化し、支持度及び信頼度要求を満たす非０ベクトルを抽出して、配電網異常状態予測モデルの訓練を行うための第２サンプルデータセットを形成する。 Quantifying the connections between two or more elements in the new vectors using the support and confidence of frequent itemsets, extracting the non-zero vectors that satisfy the support and confidence requirements, and distributing power A second sample data set is formed for training the network anomaly prediction model.

頻繁なアイテムセットの相関ルールは、１つのベクトルのうちの２つ又は複数の元素間の繋がりであり、このような繋がりは、頻繁なアイテムセットの支持度及び信頼度を用いて数値化して、元素間の関連度を測定する。支持度及び信頼度要求を満たす非０ベクトルは抽出されて、第２サンプルデータセットを構成する。 A frequent itemset association rule is a connection between two or more elements in a vector, and such a connection is quantified using the frequent itemset support and confidence, Measures the degree of association between elements. Non-zero vectors that satisfy the support and confidence requirements are extracted to constitute the second sample data set.

Ｚ_ｘ、Ｚ_ｙをそれぞれデータセットＭのうちの２つの非０ベクトルとすると、支持度及び信頼度の算出式は下記に説明する。
支持度はＺ_ｘとＺ_ｙが同時に現れる確率を表し、算出式として、

であり、
ただし、Ｚ_ｘ、Ｚ_ｙはそれぞれデータセットＭのうちの２つの非０ベクトルである。 Let Z _x , Z _y be two non-zero vectors in the data set M, respectively, the formulas for calculating support and confidence are described below.
The degree of support represents the probability that Z _x and Z _y appear at the same time, and the calculation formula is

and
where Z _x , Z _y are two non-zero vectors in dataset M, respectively.

信頼度は、Ｚ_ｘが現れる時にＺ_ｙが同時に現れる確率を表し、算出式として、

である。 Reliability represents the probability that _Zy appears at the same time when _Zx appears.

is.

データセットＭのうち、上記支持度、信頼度要求を満たす非０ベクトルは、抽出され、次にデータセットＭのうちの各元素は、ノードデータ及び異常状態タイプに応じて［ＣＶ，Ｆｉ］の形式に分割し、第２サンプルデータセットを構成し、ＣＶは、抽出されたサンプルデータを表し、異常状態タイプの値Ｆ_ｉに対応する。 Of the data set M, the non-zero vectors that satisfy the above support and confidence requirements are extracted, then each element of the data set M is divided into [CV, Fi] format to form a second sample data set, CV representing the extracted sample data and corresponding to the abnormal condition type values F _i .

生データセット、第１サンプルデータセット及び第２サンプルデータセットの相違としては、
生データセット、第１サンプルデータセットは、同じ次元及びサンプル数を有し、第１サンプルデータセットは、生データセットにＫ平均値クラスタリング及び自己エンコーディング処理が行われて生成されたものである一方、第２サンプルデータセットは、第１サンプルデータセットに相関ルールアルゴリズム処理が行われて生成されたものであり、対応する異常状態タイプと強い相関を有するデータサンプルのみを含み、そのサンプル数量は第１サンプルデータセットよりも大幅に少なくなる。
３．第３層データマイニング The differences between the raw data set, the first sample data set and the second sample data set are:
The raw data set and the first sample data set have the same dimension and number of samples, and the first sample data set is generated by subjecting the raw data set to K-means clustering and auto-encoding processing. , the second sample data set is generated by subjecting the first sample data set to the association rule algorithm processing, and includes only data samples that have a strong correlation with the corresponding abnormal condition type, and the sample quantity is the first Much less than the one-sample data set.
3. Third layer data mining

先の二層データマイニングの後、生データは分類され、能動配電網異常状態タイプと強い相関関係を有するデータサンプルはマイニングされて、第２サンプルデータセットを構成した。 After the previous two-layer data mining, the raw data was classified and the data samples with strong correlations with active grid fault condition types were mined to constitute a second sample data set.

具体的な例として、生データセットにおいて４０，０００セットのデータがあり、処理後の第２サンプルデータセットにおいてデータが２５，０００～３０，０００セットとなり、データ量が大幅に低減した。 As a specific example, there were 40,000 sets of data in the raw data set, and 25,000-30,000 sets of data in the second sample data set after processing, greatly reducing the amount of data.

図２を参照すると、第３層データマイニングは、第２サンプルデータセットを訓練セットとし、確率的勾配降下（ＳＧＤ）アルゴリズムを採用して回帰訓練により、異常状態予測モデルの最適パラメータを取得する。 Referring to FIG. 2, the third layer data mining takes the second sample data set as the training set and adopts the stochastic gradient descent (SGD) algorithm to obtain the optimal parameters of the abnormal condition prediction model through regression training.

確率的勾配降下アルゴリズムは、反復最適化アルゴリズムであり、常に、機械学習におけるモデルパラメータの最適化問題を解決するために用いられる。確率的勾配降下アルゴリズムは、勾配降下アルゴリズムの改良形式であり、テキスト分類、自然言語処理等の大規模スパース機械学習の問題に成功に適用された。勾配は、多変量関数のパラメータ偏微分を取得し、それを１つのベクトルを構成するためのものである。勾配におけるすべての偏微分がいずれも０となると、モデルパラメータの最適解が得られる。確率的勾配降下アルゴリズムでは、反復するたびに、１つサンプルデータのみをランダム使用し、サンプルの総量が多い場合、一部のサンプルしか反復算出に使用されないため、モデルの訓練時間を短縮する。 The stochastic gradient descent algorithm is an iterative optimization algorithm and is always used to solve model parameter optimization problems in machine learning. The stochastic gradient descent algorithm is an improved form of the gradient descent algorithm and has been successfully applied to large-scale sparse machine learning problems such as text classification, natural language processing. Gradients are for taking the parametric partial derivatives of a multivariate function and constructing them into a vector. The optimal solution for the model parameters is obtained when all partial derivatives on the gradients are all zero. The stochastic gradient descent algorithm randomly uses only one sample data in each iteration, and when the total number of samples is large, only some samples are used for iterative computation, thus shortening the training time of the model.

本実施例は、第２サンプルデータセットを訓練セットとし、各ノードの電圧データの重みが線形的であるとすると、これによって線形的モデル関数が構築され、下記の通りである。

ただし、ｗは予測モデル関数の重みベクトルであり、ｂは予測モデル関数のインターセプトを表し、ｗ^Ｔ（ＣＶ_ｊ）はベクトルｗとベクトルＣＶ_ｊとのスカラー積を表す。 This embodiment takes the second sample data set as the training set, and assumes that the weight of the voltage data of each node is linear, thereby constructing a linear model function, as follows:

where w is the weight vector of the prediction model function, b represents the intercept of the prediction model function, and w ^T (CV _j ) represents the scalar product of vector w and vector CV _j .

ロス関数は、実際異常状態タイプ値Ｆ_ｉとモデル予測値ｆ（ＣＶ_ｊ）との間の差を測定するためのもので、Ｌ（Ｆ_ｉ，ｆ（ＣＶ_ｊ））で表され、本発明はロジスティック回帰ロス関数を採用し、算出式として、

である。 The loss function is for measuring the difference between the actual abnormal condition type value F _i and the model prediction value f(CV _j ), denoted by L(F _i , f(CV _j )), the present invention adopts the logistic regression loss function, and as a calculation formula,

is.

リスク関数はロス関数の所望値であり、Ｅｒで表され、算出式として、

である。 The risk function is the desired value of the loss function, is represented by Er, and is calculated as follows:

is.

これによって分かるように、目的関数はリスク関数を最小化することになるが、履歴データが多く、関数が複雑であるため、予測結果のオーバーフィッティングにつながりやすい。このような状況を回避するために、本発明は、構造リスク関数を導入して、Ｓｒで表すと、

であり、
ただし、αは設定されたハイパーパラメータ、例えば０．１であり、αを設定することでパラメータの範囲を小さくして、モデルを簡略化する目的を達成するとともに、モデルによい一般化能力を具備させる。正則化項Ｒ（ｗ）は、ロス関数の複雑さを測定するためのもので、ロス関数パラメータを規制する役割を果たす。本実施例では、Ｌ２正則化を採用し、即ち、

である。 As can be seen from this, the objective function minimizes the risk function, but since there are many historical data and the function is complicated, it easily leads to overfitting of the prediction result. To avoid this situation, the present invention introduces a structural risk function, denoted by Sr,

and
However, α is a set hyperparameter, such as 0.1, and setting α reduces the range of parameters to achieve the purpose of simplifying the model, and the model has good generalization ability. Let The regularization term R(w) is a measure of the complexity of the loss function and serves to regulate the loss function parameters. In this example, we employ L2 regularization, i.e.

is.

本実施例では、最適化目的関数により予測モデルのフィット度を測定し、リスク関数及び構造リスク関数が小さいほど、モデルフィット度合いは高くなり、最終的に最適化目的関数として、

であり、
ｗは予測モデル関数の重みベクトルであり、ｂは予測モデル関数のインターセプトを表し、ｎは第２サンプルデータセット総量であり、Ｆ_ｉは異常状態タイプの値である。 In this embodiment, the degree of fit of the prediction model is measured by the optimization objective function, and the smaller the risk function and the structural risk function, the higher the degree of model fit. Finally, as the optimization objective function,

and
w is the weight vector of the prediction model function, b represents the intercept of the prediction model function, n is the total amount of the second sample data set, and _Fi is the value of the abnormal condition type.

確率的勾配降下アルゴリズムでは、毎回に一部のテストセットを用いて反復算出し、反復する時に、モデルパラメータは下記式で更新される。

ただし、ηは、経時的な学習率であり、算出式として、

であり、
ただし、ｔはタイムステップであり、ｔ_０は始まる時間である。 In the stochastic gradient descent algorithm, iteratively computes with some test set each time, and at each iteration, the model parameters are updated by

However, η is the learning rate over time, and the calculation formula is

and
where t is the time step and _t0 is the starting time.

本実施例では、三層データマイニング構造から、順次にクラスタリングアルゴリズム、相関ルール、確率的勾配降下により生データを処理して、能動配電網異常状態の予測モデルを訓練し、異常状態分類、及び予測精度が低く、時間が長い問題を対応的に解決することができ、異常状態継続時間を短縮し、断電損失を低減するように、タイムリーに対応措置を講じることに寄与する。 In this embodiment, from the three-layer data mining structure, the raw data is processed by clustering algorithm, association rule and stochastic gradient descent in order to train a prediction model of active grid abnormal conditions, abnormal condition classification and prediction. Problems with low accuracy and long time can be solved correspondingly, which contributes to taking timely countermeasures so as to shorten the duration of abnormal conditions and reduce power interruption losses.

他の異常状態の感知予測方法は上記工程と類似する。 Other abnormal state sensing prediction methods are similar to the above steps.

本実施例方法の効果を検証するために、本実施例では、ＩＥＥＥ三機九ノードシステムを例とし、このモデルの上で、一般的な４種類の短絡故障及び負荷異常と断線故障をデバッグするとともに、ノード電圧データを収集し、短絡故障は、故障ノードをノード８に設定し、負荷異常設置はノード８での負荷脱落であり、断線故障設置は、線路７－８切断であり、，ノード８を除く他のノードの電圧データを収集する。
（１）第１層データマイニング In order to verify the effect of the method of this embodiment, this embodiment takes the IEEE three-machine nine-node system as an example, and debugs four common short-circuit faults, load abnormalities and disconnection faults on this model. and collect node voltage data, a short circuit fault sets the fault node to node 8, a load fault installation is load drop at node 8, an open fault installation is line 7-8 disconnection, node Collect the voltage data of the other nodes except 8.
(1) First layer data mining

Ｋ平均値クラスタリングアルゴリズムでは、まずエルボー法で適当なＫ値をクラスタリング数として特定し、四タイプの短絡故障が発生した時に、ノード２の短絡電圧幅値データを抽出し、エルボー法により、取得した誤差二乗和ＳＳＥとＫ値の曲線を図３に示す。図３から分かるように、Ｋ＝５の時に、傾斜の低下速度が最速であり、Ｋ＞５の時に、ＳＳＥの変化が緩やかなので、エルボ部位に対応する参照Ｋ値が５となり、最終的なＫ値を５以上の値としてもよい（最終的なＫ値が５未満の場合に、分類が十分に細かくなく、データの集約度が十分に高くない）。このようにＫ値の範囲を小さくし、より適当なＫ値を速く取ることに寄与する。 In the K-mean value clustering algorithm, first, an appropriate K value is specified as the clustering number by the elbow method, and when four types of short-circuit faults occur, the short-circuit voltage width value data of node 2 is extracted and acquired by the elbow method. A curve of error sum of squares SSE and K value is shown in FIG. As can be seen from FIG. 3, when K=5, the rate of decrease in slope is the fastest, and when K>5, the change in SSE is slow, so the reference K value corresponding to the elbow site is 5, and the final The K value may be 5 or more (if the final K value is less than 5, the classification is not fine enough and the data aggregation is not high enough). In this way, the range of K values is narrowed, which contributes to obtaining a more suitable K value quickly.

最終的なＫ値は５以上である場合に、データ間の類似性特徴が失われないが、数値が大きすぎると、再びクラスタリング数が多すぎるようになってしまう。本実施例のデータサンプル容量を合わせて、以降の自己エンコーディングの工程において、クラスタリング数が小さいためデータフォーマットが頻繁に重複することがなく、クラスタリング数が多すぎるとデータフォーマットが煩雑にならず、第２層データマイニングがスムーズに進むように、今回の実験で、Ｋ＝５である上で、Ｋ値を順次増加して選択する試験を行い、最終的にＫ＝８を決め、全てのデータサンプルが８類に分けられ、各クラスタリングのデータがより大きな類似性を有する。 If the final K value is 5 or more, similarity features between data are not lost, but if the value is too large, the number of clusterings becomes too large again. In accordance with the data sample capacity of this embodiment, in the subsequent self-encoding process, the data format does not frequently overlap because the clustering number is small, and the data format does not become complicated if the clustering number is too large. In order for the two-layer data mining to proceed smoothly, in this experiment, on top of K = 5, a test was conducted in which the K value was sequentially increased and selected, and finally K = 8 was determined, and all data samples are divided into 8 classes, and the data of each clustering have greater similarity.

データは分類された後、自己エンコードルールに応じてデータフォーマットを簡略化し、三相短絡故障発生後のノード２の電圧データを例とし、その電圧値は区間［－１６．０１ｋＶ，１５．８５ｋＶ］にあり、８類に分けられると、各類が［－１６．０１ｋＶ，－１２．０３ｋＶ］、［－１２．０３ｋＶ，－８．０４ｋＶ］、［－８．０４ｋＶ，－４．０６ｋＶ］、［－４．０６ｋＶ，－０．０８ｋＶ］、［－０．０８ｋＶ，３．９０ｋＶ］、［３．９０ｋＶ，７．８８ｋＶ］、［７．８８ｋＶ，１１．８７ｋＶ］、［１１．８７ｋＶ，１５．８５ｋＶ］となり、ある時刻の電圧値が５．３６ｋＶとなる時に、自己エンコードルールに応じて、そのデータフォーマットが２０６になり、負荷異常発生後のノード４の電圧データを例として、その電圧値は区間［－１９６．５８２ｋＶ，２０６．７５４ｋＶ］にあり、８類に分けられると、各類が［－１９６．５８２ｋＶ，－１４６．１６４８ｋＶ］、［－１４６．１６４ｋＶ，－９５．７４７ｋＶ］、［－９５．７４７ｋＶ，－４５．３３０ｋＶ］、［－４５．３３０ｋＶ，５．０８６ｋＶ］、［５．０８６ｋＶ，５５．５０３ｋＶ］、［５５．５０３ｋＶ，１０５．９２０ｋＶ］、［１０５．９２０ｋＶ，１５６．３３７ｋＶ］、［１５６．３３７ｋＶ，２０６．７５４ｋＶ］となり、ある時刻の電圧値が２０．０４８ｋＶとなる時に、自己エンコードルールに応じて、そのデータフォーマットが４０５になり、断線故障発生後のノード１電圧データを例として、その電圧値は区間［－１４．４５４ｋＶ，１４．４６８ｋＶ］にあり、８類に分けられると、各類が［－１４．４５４ｋＶ，－１０．８３９ｋＶ］、［－１０．８３９ｋＶ，－７．２２４ｋＶ］、［－７．２２４ｋＶ，－３．６０８ｋＶ］、［－３．６０８ｋＶ，０．００６ｋＶ］、［０．００６ｋＶ，３．６２１ｋＶ］、［３．６２１ｋＶ，７．２３７ｋＶ］、［７．２３７ｋＶ，１０．８５２ｋＶ］、［１０．８５２ｋＶ，１４．４６８ｋＶ］となり、ある時刻の電圧値が９．５８１ｋＶとなる時に、自己エンコードルールに応じて、そのデータフォーマットが１０７になる。残りのノードの電圧データは同様の処理工程が行われて、その最終的な結果を表１に示す。 After the data is classified, the data format is simplified according to the self-encoding rule, taking the voltage data of node 2 after the occurrence of a three-phase short circuit fault as an example, the voltage value is in the interval [-16.01 kV, 15.85 kV] , and when divided into 8 types, each type is [-16.01 kV, -12.03 kV], [-12.03 kV, -8.04 kV], [-8.04 kV, -4.06 kV], [ -4.06kV, -0.08kV], [-0.08kV, 3.90kV], [3.90kV, 7.88kV], [7.88kV, 11.87kV], [11.87kV, 15.85kV ], and when the voltage value at a certain time is 5.36 kV, the data format becomes 206 according to the self-encoding rule, and the voltage value is in the section [-196.582 kV, 206.754 kV], and divided into eight classes, each class is [-196.582 kV, -146.1648 kV], [-146.164 kV, -95.747 kV], [-95 .747 kV, −45.330 kV], [−45.330 kV, 5.086 kV], [5.086 kV, 55.503 kV], [55.503 kV, 105.920 kV], [105.920 kV, 156.337 kV], [156.337 kV, 206.754 kV], and when the voltage value at a certain time is 20.048 kV, the data format becomes 405 according to the self-encoding rule, and node 1 voltage data after the occurrence of a disconnection fault is an example. , the voltage value is in the section [-14.454 kV, 14.468 kV], and when divided into 8 classes, each class is [-14.454 kV, -10.839 kV], [-10.839 kV, -7 .224 kV], [−7.224 kV, −3.608 kV], [−3.608 kV, 0.006 kV], [0.006 kV, 3.621 kV], [3.621 kV, 7.237 kV], [7. 237 kV, 10.852 kV], [10.852 kV, 14.468 kV], and when the voltage value at a certain time is 9.581 kV, the data format becomes 107 according to the self-encoding rule. The voltage data for the remaining nodes are similarly processed and the final results are shown in Table 1.

表１データベースI一部の自己エンコーディングデータ

（２）第２層データマイニング Table 1 Database I Some self-encoding data

(2) Second layer data mining

１回目のデータマイニングの後に、将データベースIにおけるデータをＡｐｒｉｏｒｉアルゴリズムに入力して、強い相関データマイニングを行う。まず、予め最小支持度を０．２、最小信頼度を０．８、データベースIのうち最小支持度を満たすデータサンプルを頻繁なアイテムセット、頻繁なアイテムセットのうち最小信頼度を満たすデータを強い相関データとして設定する。２回目のデータマイニングの結果を表２に示す。 After the first round of data mining, the data in database I are input into the Apriori algorithm for strong correlation data mining. First, the minimum support is 0.2, the minimum reliability is 0.8, the data sample that satisfies the minimum support in database I is a frequent itemset, and the data that satisfies the minimum reliability among the frequent itemsets is forced. Set as correlation data. Table 2 shows the results of the second round of data mining.

表２２回目のデータマイニングの結果

Table 2 Result of second data mining

番号８の相関ルール｛２０５、４０７、６０３｝→｛１３０｝を例として、ノード２の電圧が第５クラスタリングにあり、ノード４の電圧が第７クラスタリングにあり、ノード６の電圧が第３クラスタリングにある時に、異常状態タイプが二相対地間短絡である可能性が非常に高いことが表され、番号１１の相関ルール｛５０８｝→｛１４０｝を例として、ノード５の電圧が第８クラスタリングにある時に、異常状態タイプが負荷異常である可能性が非常に高いことが表され、番号１４の相関ルール｛１０５、４０５、６０３｝→｛１５０｝を例として、ノード１の電圧が第５クラスタリングにあり、ノード４の電圧が第５クラスタリングにあり、ノード６の電圧が第３クラスタリングにある時に、異常状態タイプが断線故障である可能性が非常に高いことが表される。相関ルールアルゴリズムにより、得られたデータベースIIを表３に示す。 Taking the association rule {205, 407, 603}→{130} of number 8 as an example, the voltage of node 2 is in the fifth clustering, the voltage of node 4 is in the seventh clustering, and the voltage of node 6 is in the third clustering. , it is very likely that the abnormal condition type is a two-to-ground short circuit. , it is expressed that the fault condition type is very likely to be load fault, and the voltage at node 1 is the fifth When the voltage at node 4 is at the fifth clustering and the voltage at node 6 is at the third clustering, the fault condition type is most likely open fault. Table 3 shows the database II obtained by the association rule algorithm.

表３データベースII部分のデータ

（３）３回目のデータマイニング Table 3 Data of database II part

(3) Third data mining

３回目のデータマイニングにおいて、ハイパーパラメータαを０．１、反復回数を５００回に設定する。データベースIIデータサンプルを確率的勾配降下アルゴリズムに入力して訓練されたモデルパラメータを表４に示す。 In the third data mining, the hyperparameter α is set to 0.1 and the number of iterations is set to 500. The model parameters trained by inputting the Database II data samples into the stochastic gradient descent algorithm are shown in Table 4.

表４モデルパラメータデータ

（４）結果分析 Table 4 Model parameter data

(4) Result analysis

上記分類及び予測モデルの正確性を測定するために、本明細書で１００００個のデータサンプルをテストセットとする。異常状態分類及び予測の正確性は高く、三相短絡故障予測モデルの正確率が８５．３０％、単相対地間短絡故障モデルの正確率が７４．８０％、二相相間短絡モデルの正確率が７８．２０％、二相対地間短絡予測モデルの正確率が８７％、負荷異常予測モデルの正確率が９３．２９％、断線故障予測モデルの正確率が８９．２５％である。 To measure the accuracy of the above classification and prediction models, we use 10000 data samples as the test set. The accuracy of abnormal state classification and prediction is high, with an accuracy rate of 85.30% for the three-phase short-circuit fault prediction model, an accuracy rate of 74.80% for the single-to-ground short-circuit fault model, and an accuracy rate of 74.80% for the two-phase short-circuit fault model. is 78.20%, the accuracy rate of the two-to-ground short circuit prediction model is 87%, the accuracy rate of the load abnormality prediction model is 93.29%, and the accuracy rate of the disconnection failure prediction model is 89.25%.

本方法では、三層データマイニングから、まずクラスタリング及び自己エンコーディングにより、すべてのデータサンプルフォーマットを簡略化し、あるノードのデータに固定のＫ種類フォーマットのみがあることについて、生成する可能性がある異常データ及びドット抜けと正常データとの違いを効果的に消去し（例えば幅値の急激な変化やデータの欠落）、同時にデータ間の類似性を十分にマイニングし、例えば負荷異常時に、ノード４における電圧データのうち、１５個の異常データサンプルが生じ、クラスタリング及び自己エンコーディングの後に、これらの異常データ点はいずれも第８クラスタリングに属し、即ちこれらのデータフォーマットがいずれも４０８であり、この場合に、第８クラスタリングに多量データがあれば、自己エンコーディングによりこれらの異常データ点を正常データに同化して、異常データ及びドット抜けの影響を低減させ、第８クラスタリングにこの１５個のデータサンプルのみがあれば，第２層強い相関により抽出すると、これらのデータを低減することができる。 In this method, from three-layer data mining, firstly by clustering and self-encoding, all data sample formats are simplified, and there are only fixed K kinds of formats in the data of a node, which may generate anomalous data and effectively eliminate the difference between missing dots and normal data (e.g. sudden changes in width values or missing data), and at the same time sufficiently mine the similarity between data, e.g. Among the data, 15 abnormal data samples occur, after clustering and self-encoding, these abnormal data points all belong to the 8th clustering, i.e. their data formats are all 408, in which case, If there is a large amount of data in the eighth clustering, self-encoding will assimilate these abnormal data points into normal data to reduce the effects of abnormal data and missing dots, and if there are only these 15 data samples in the eighth clustering. For example, these data can be reduced by extracting with the second layer strong correlation.

第２層データマイニングにより、支持度及び信頼度の２つの数値化基準従って、異常状態強い相関のデータサンプルを抽出して、アルゴリズムがモデル訓練を行うフォールトトレランスを向上させ、モデル訓練の正確さを増加する。 Through the second layer data mining, according to the two quantification criteria of support and confidence, extract the data samples with strong correlation of abnormal conditions, improve the fault tolerance of the algorithm training the model, and improve the accuracy of the model training. To increase.

従来方法のフォールトトレランスに比べると、本方法で、生データの状況を直接に考慮せず、まずデータを処理し、フォールトトレランスが向上するとともに、データサンプルの数が低減し、訓練時間が短縮し、モデルが訓練を行う時間は約３５０秒であり、予測時間は２０秒内である。 Compared with the fault tolerance of traditional methods, this method does not directly consider the situation of raw data, but processes the data first, improving fault tolerance, reducing the number of data samples, and shortening the training time. , the time for the model to train is about 350 seconds and the prediction time is within 20 seconds.

本実施例の異常状態検知方法と、従来のサポートベクターマシンを直接に利用して異常状態感知を行う方法とを比較すると、表５、表６及び表７は、それぞれ予測正確さ、予測時間及びフォールトトレランスの面で比較結果を出す。 Comparing the abnormal state detection method of the present embodiment with the conventional method of directly using the support vector machine to detect abnormal states, Tables 5, 6 and 7 show the prediction accuracy, prediction time and Give comparative results in terms of fault tolerance.

表５予測正確さ比較

Table 5 Comparison of prediction accuracy

表６予測時間比較

Table 6 Prediction time comparison

表７フォールトトレランス比較（予測正確さによる数値化）

Table 7 Fault tolerance comparison (quantification by prediction accuracy)

上記の比較結果から、本実施例の方法では、予測時間、予測正確さ及びフォールトトレランスの面で、いずれも従来のサポートベクターマシンの方法よりも明らかに優れることが得られる。
実施例２ From the above comparison results, it can be seen that the method of this embodiment clearly outperforms the conventional support vector machine method in terms of prediction time, prediction accuracy and fault tolerance.
Example 2

１つ又は複数の実施形態において、
配電網異常状態発生後のノードパラメータデータを取得するためのデータ取得モジュールと、
前記データを訓練済みの配電網異常状態予測モデルに入力し、配電網短絡異常状態予測結果を出力する故障予測モジュールと、を含み、
前記配電網異常状態予測モデルの訓練工程は、配電網履歴異常状態ノードパラメータデータをクラスタリングし、相関ルールアルゴリズムにより、配電網異常状態タイプと強い相関を有するデータサンプルを見つけて、訓練用のサンプルデータセットを形成するステップと、前記サンプルデータセットに基づいて配電網異常状態予測モデルの訓練を行うステップとを含むデータドライブの能動配電網異常状態検知システムを開示する。 In one or more embodiments,
a data acquisition module for acquiring node parameter data after a grid abnormal condition occurs;
a failure prediction module that inputs the data to a trained distribution grid abnormal state prediction model and outputs a distribution grid short circuit abnormal state prediction result;
The training process of the grid abnormal condition prediction model includes clustering the grid historical abnormal condition node parameter data, finding data samples with a strong correlation with the grid abnormal condition type by association rule algorithm, and obtaining sample data for training. A data-driven active grid fault detection system is disclosed that includes forming a set and training a grid fault prediction model based on the sample data set.

説明すべきこととしては、本実施例の上記モジュールの具体的な実現方法は既に実施例１において詳細に説明され、ここでは重複に説明しない。 It should be noted that the specific implementation methods of the above modules in this embodiment have already been described in detail in Embodiment 1, and will not be repeated here.

上記図面を参照しながら本発明の具体的な実施形態を説明したが、本発明の保護範囲を限定するものではなく、当業者は、本発明の技術的解決策の上で、当業者が創造的な労力の必要がなく行い得る様々な修正又は変形が依然として本発明の保護範囲内にあることを理解すべきである。

Although the specific embodiments of the present invention have been described with reference to the above drawings, they are not intended to limit the protection scope of the present invention. It should be understood that various modifications or variations that can be made without significant effort still fall within the protection scope of the present invention.

Claims

A method for active grid abnormal condition detection of a data drive by an active grid abnormal condition sensing system , wherein a data acquisition module acquires node voltage data of a plurality of nodes as node parameter data after a grid abnormal condition occurs, an abnormal condition prediction module inputting the data into a trained distribution network abnormal condition prediction model and outputting a distribution network abnormal condition prediction result;
The active grid abnormal condition sensing system training the grid abnormal condition prediction model comprises:
The node voltages obtained by referring to the node voltage data of the plurality of nodes obtained after occurrence of the distribution network history abnormal state in a plurality of voltage intervals for classifying the node voltage data predetermined for each node. The number C of the class to which the data corresponds is obtained for each node, thereby clustering the node voltage data of each node and self-encoding the data sample after clustering, the node number N and the class of the node voltage data for each node A step of _forming a first sample data set composed of N0C in which the numbers C of the , the elements in the first sample data set are vectors, each vector consisting of TV and F _i , TV={N0C ₁ , N0C ₂ , . . . , N0C _a }, where N represents the position of each node for which data was collected , the range of C values is [1, K], K is the clustering number, and a is the node number for which data was collected. , F _i are values indicating a currently occurring abnormal state type among a plurality of predetermined abnormal state types;
Finding data samples with a strong correlation with the grid abnormal condition type by an association rule algorithm to form a sample data set for training, specifically the TV of each vector in the first sample data set and The two elements _of F _i are merged as one element to form a new data set M _: {Z ₁ , Z ₂ , . where i is the number of new vectors, and between two or more elements of said new vectors using the support and confidence of frequent itemsets quantifying connections and extracting non-zero vectors that satisfy support and confidence requirements to form a second sample data set, a training sample data set;
A step of training the distribution grid abnormal condition prediction model based on the training sample data set,
Specifically, the distribution network abnormal state prediction model is:

and
where w is the weight vector of the predictive model function, b represents the intercept of the predictive model function, w ^T (CV _j ) represents the scalar product of vector w and vector CV _j , and vector CV _j is the training sample. a step;
The step of measuring the fit of the grid abnormal condition prediction model by optimizing an objective function, wherein the optimization of the objective function specifically includes:

and
where L(F _i , f(CV _j )) represents the loss function, α is the hyperparameter and set value, R(w) is the regularization term, n is the sample data set total amount, and F a step where _i is the value of the abnormal condition type;
Based on the obtained sample data set, perform regression training using a stochastic gradient descent algorithm to obtain the optimal parametric solution of the grid abnormal condition prediction model, the stochastic gradient descent algorithm each time Iteratively calculate using a partial test set, and when iterating, the model parameters are updated by the following formula,

However, η is the learning rate over time, and the calculation formula is

and
where t is the time step and t ₀ is the time to begin
A data drive active power distribution network abnormal state detection method characterized by:

Specifically, the step of clustering the distribution network history abnormal state node parameter data includes:
classifying the data samples based on Euclidean distance;
determining whether the data sample clustering is complete by the discriminant function;
calculating the sum of error squares, identifying the clustering number referenced by the SSE-K graph, and identifying the final clustering number based on the referenced clustering number;
The active power grid abnormal state detection method for a data drive as claimed in claim 1 .

An active grid abnormal condition sensing system for a data drive, comprising:
a data acquisition module for acquiring node voltage data as node parameter data after occurrence of a grid abnormal condition;
an abnormal condition prediction module that inputs the data to a trained distribution network abnormal condition prediction model and outputs a distribution network abnormal condition prediction result;
The power grid abnormal condition model training step includes:
By referring to the node voltage data acquired after the occurrence of the distribution network history abnormal state in a plurality of voltage intervals for classifying the node voltage data predetermined for each node, the class number to which the acquired node voltage data corresponds. N0C in which the node voltage data of each node is clustered and the clustered data samples are self-encoded , and the node number N for each node and the node voltage data class number C are arranged; forming a first sample data set composed of values F _i indicative of a currently occurring abnormal condition type of a plurality of predetermined abnormal condition types, wherein the elements in the first sample data set are vectors; , where each vector consists of TV and F _i , TV={N0C ₁ , N0C ₂ , . . . , N0C _a }, where N represents the position of each node for which data was collected , the range of C values is [1, K], K is the clustering number, and a is the node number for which data was collected. , F _i are values indicating a currently occurring abnormal state type among a plurality of predetermined abnormal state types;
Finding data samples with a strong correlation with the grid abnormal condition type by an association rule algorithm to form a sample data set for training, specifically the TV of each vector in the first sample data set and The two elements _of F _i are merged as one element to form a new data set M _: {Z ₁ , Z ₂ , . where i is the number of new vectors, and between two or more elements of said new vectors using the support and confidence of frequent itemsets quantifying connections and extracting non-zero vectors that satisfy support and confidence requirements to form a second sample data set, a training sample data set;
A step of training the distribution grid abnormal condition prediction model based on the training sample data set,
Specifically, the distribution network abnormal state prediction model is:

However, η is the learning rate over time, and the calculation formula is

and
where t is the time step and t ₀ is the time to begin
An active power grid abnormal state sensing system for a data drive, characterized by: