WO2023273249A1 - 基于tsvm模型的智能电能表自动化检定系统异常检测方法 - Google Patents

基于tsvm模型的智能电能表自动化检定系统异常检测方法 Download PDF

Info

Publication number
WO2023273249A1
WO2023273249A1 PCT/CN2021/141547 CN2021141547W WO2023273249A1 WO 2023273249 A1 WO2023273249 A1 WO 2023273249A1 CN 2021141547 W CN2021141547 W CN 2021141547W WO 2023273249 A1 WO2023273249 A1 WO 2023273249A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
tsvm
samples
data
verification system
Prior art date
Application number
PCT/CN2021/141547
Other languages
English (en)
French (fr)
Inventor
庄葛巍
顾臻
贺青
周磊
张静月
冯秀庆
苏鹏涛
潘晔
Original Assignee
国网上海市电力公司
上海欣能信息科技发展有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国网上海市电力公司, 上海欣能信息科技发展有限公司 filed Critical 国网上海市电力公司
Priority to AU2021335237A priority Critical patent/AU2021335237A1/en
Publication of WO2023273249A1 publication Critical patent/WO2023273249A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R35/00Testing or calibrating of apparatus covered by the other groups of this subclass
    • G01R35/04Testing or calibrating of apparatus covered by the other groups of this subclass of instruments for measuring time integral of power or current
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01RMEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
    • G01R1/00Details of instruments or arrangements of the types included in groups G01R5/00 - G01R13/00 and G01R31/00
    • G01R1/02General constructional details
    • G01R1/04Housings; Supporting members; Arrangements of terminals

Definitions

  • the invention relates to an abnormality detection method for an automatic verification system of an intelligent electric energy meter, in particular to an abnormality detection method for an automatic verification system for an intelligent electric energy meter based on a direct push support vector machine (Transductive Support Vector Machine, TSVM) model.
  • TSVM Transductive Support Vector Machine
  • the metrology center regularly shuts down the automated verification system assembly line and conducts manual inspections to ensure that each verification unit is in a healthy operating state. It will still serve the test project before a manual inspection, which will lead to the risk of deviation of large-scale test results.
  • the possibility of the above situation can be reduced to a certain extent by shortening the time interval of manual inspection, it will greatly reduce the pipeline
  • the verification efficiency is improved, while increasing manpower and operation and maintenance costs. Therefore, it is of great significance to improve the reliability of the automated verification system to realize the online evaluation of the mechanical properties of the connection links of each verification epitope on the automated verification system.
  • the object of the present invention is to provide a TSVM model-based abnormality detection method for an automatic verification system of a smart electric energy meter in order to overcome the above-mentioned defects in the prior art.
  • a method for abnormality detection of an automatic verification system of an intelligent electric energy meter based on a TSVM model comprising the following steps:
  • S1 Perform feature extraction on experimental data containing a small amount of abnormal data to detect epitope errors, construct feature vectors, and perform preprocessing to form data samples;
  • S3 Use labeled samples and unlabeled samples to train in a semi-supervised manner to obtain an anomaly detection model based on TSVM;
  • the method of constructing the eigenvector in step S1 is as follows: obtain the historical error experimental data of each verification epitope under different verification experimental items, perform eigenvalue extraction on the historical error experimental data under each verification experimental item, and combine all The combination of eigenvalues under the verification experiment items is the eigenvector of the corresponding verification epitope.
  • the characteristic values include maximum value, minimum value, expectation, variance, skewness and kurtosis of historical error experimental data.
  • the preprocessing in step S1 includes normalization and dimensionality reduction of the feature vector of each epitope.
  • the standardization method is:
  • x is the eigenvalue in the eigenvector to be processed
  • u is the expectation of the eigenvalue in the eigenvector to be processed
  • S is the standard deviation of the eigenvalue in the eigenvector to be processed
  • z is the standardized eigenvalue.
  • the dimension reduction process includes principal component analysis.
  • step S2 is specifically:
  • an unsupervised anomaly detection algorithm is used to initially screen out "abnormal epitopes"
  • abnormal epitopes that were initially screened out were manually screened and marked, and the normal epitopes and abnormal epitopes were determined according to the results of the manual screening, and the data samples corresponding to the manually screened test epitopes were marked to form labeled samples.
  • the unsupervised anomaly detection algorithm includes an isolation forest algorithm, a local anomaly factor algorithm and a class of support vector machine algorithm.
  • the number of labeled samples is smaller than the number of unlabeled samples when performing model training in step S3.
  • the method also includes optimizing the TSVM-based anomaly detection model, specifically: using the model to predict the abnormal data in the sample to be detected, manually checking and marking, and then constructing a labeled sample library with all manually marked samples, Select data points that are closer to the classification boundary to form new labeled samples, and retrain the model with unlabeled samples in a semi-supervised manner to complete the optimization; use the optimized model to predict the data points in the labeled sample library, and calculate the labeled samples.
  • the ratio of the difference between the predicted state and the real state is less than the artificially set threshold, it is determined that the performance of the model meets the prediction accuracy conditions, and the model can directly predict the data set to be tested.
  • the present invention has following advantage:
  • the present invention uses a small amount of marked samples and a large number of unmarked samples to construct a TSVM-based anomaly detection model in a semi-supervised manner, which can effectively reduce the cost of manual inspection compared with other methods;
  • the present invention is based on the historical error experimental data produced by the same verification epitope, counts the maximum value and minimum value in each verification experimental item data respectively, calculates its expectation, variance, skewness and kurtosis, and is used to describe the verification
  • the average level, degree of dispersion, asymmetry, and proportion of extreme outliers in the data distribution of epitopes convert the abnormal state of epitopes into abnormalities in data distribution, making it possible to analyze epitope states through data, and realize epitope abnormalities at the same time
  • the online evaluation of the state reduces the impact on the assembly line and improves the efficiency of the verification work;
  • PCA principal component analysis
  • the present invention can continuously acquire new labeled samples and unlabeled samples during the working process and continue to expand and optimize the TSVM-based anomaly detection model according to the semi-supervised training method, continuously improving the accuracy of the model.
  • Fig. 1 is a kind of flow chart of the present invention based on the abnormality detection method of intelligent electric energy meter automatic verification system of TSVM model;
  • Fig. 2 is the sample feature information retention ratio under different dimensions in the embodiment of the present invention.
  • Fig. 3 is a schematic flow chart of the abnormal detection of the automatic verification system of the smart electric energy meter in the practical application of the present invention.
  • the present embodiment provides a method for detecting anomalies in the automatic verification system of smart electric energy meters based on the TSVM model, and the method includes the following steps:
  • S1 Perform feature extraction on experimental data containing a small amount of abnormal data to detect epitope errors, construct feature vectors, and perform preprocessing to form data samples.
  • an assembly line of the smart electric energy meter automatic verification system contains 30 verification units, and the test data set of each verification unit contains 60 verification surface samples.
  • the smart electric energy meters from the same batch They are randomly assigned to different epitopes, and a number of different error tests are carried out.
  • the obtained error test data can not only reflect the quality problems of the smart energy meter itself, but also indirectly reflect the problems of the verification device itself.
  • the calculation of relevant statistics is performed on the massive error experimental data generated in the same test epitope: based on the data generated by the same test epitope, the data of each experimental project is counted separately The maximum value and minimum value in , calculate its expectation, variance, skewness and kurtosis, which are used to describe the average level, degree of dispersion, asymmetry and proportion of extreme outliers of the data distribution of the test epitope, and the epitope Anomalous states translate to anomalies in the data distribution.
  • the method of constructing the eigenvector in the above step S1 is: to obtain the historical error experimental data of each verification epitope under different verification experimental items, to extract the eigenvalues of the historical error experimental data under each verification experimental item, and
  • the combination of eigenvalues under all verification experimental items is the eigenvector of the corresponding verification epitope, and the eigenvalues include the maximum value, minimum value, expectation, variance, skewness and kurtosis of the historical error experimental data.
  • the next assembly line of the verification system contains 30 verification units, and the test data set of each verification unit contains 60 test epitope samples, that is, ⁇ X1, X2...X60 ⁇ , and the error test data corresponding to each epitope is calculated separately
  • the maximum value, expectation, variance, skewness, and kurtosis of each epitope sample are constructed to construct the eigenvector of each epitope sample. Taking the m-item error experiment as an example, each sample contains 6m eigenvalues, that is, 6m dimensions.
  • x is the eigenvalue in the eigenvector to be processed
  • u is the expectation of the eigenvalue in the eigenvector to be processed
  • S is the standard deviation of the eigenvalue in the eigenvector to be processed
  • z is the standardized eigenvalue. Standardization can keep all features of the sample with a mean of 0 and a variance of 1.
  • the data dimension of each test epitope sample is as high as 60 dimensions.
  • the data samples are sparse and the distance calculation is difficult, which will increase the difficulty of anomaly detection. Therefore, it is necessary to perform dimensionality reduction processing on the feature vector, and principal component analysis (Principal Component Analysis, PCA) is the most commonly used dimensionality reduction method, specifically:
  • Xi represents different samples, and i takes an integer from 1 to 60;
  • the dimension d' after dimensionality reduction is specified by the user.
  • the proportion of data feature information in different dimensions is different.
  • the user can determine the value of d' by setting the proportion of feature information that he wants to keep.
  • Figure 2 shows the ratio of feature retention information corresponding to the data samples of the smart energy meter automatic verification system at different d' values. After normalization, if the sample data is to retain nearly 99.9% of the feature information, the data dimension needs to be more than 40 dimensions , that is, the effective data dimension used for anomaly detection algorithm analysis is 40 dimensions.
  • an unsupervised anomaly detection algorithm is used to initially screen out "abnormal epitopes"
  • Anomaly detection algorithms include Isolation Forest (Iforest), Local Outlier Factor (LOF) and One-Class Support Vector Machine (OCSVM).
  • Isolation Forest Iforest
  • LEF Local Outlier Factor
  • OCSVM One-Class Support Vector Machine
  • the Iforest algorithm has a better effect on global anomaly detection, and is suitable for anomaly detection of continuous and higher-dimensional data.
  • the Iforest algorithm is a binary tree-like division process. Each time, the characteristics of the data set are randomly extracted, and the random value is used as the division basis to divide the data set. After multiple iterations, an isolated tree is formed in the forest.
  • Sample data points at lower heights in the tree are more likely to be judged as abnormal data points.
  • the LOF algorithm is not as good as Iforest in detecting global outliers, but it is better in detecting local anomalies in datasets with relatively concentrated data distribution and small anomaly proportion.
  • the LOF algorithm is a density-based outlier detection method. It determines the local reachable density by calculating the Kth neighborhood (non-global) of the sample point, and judges whether the sample is Outliers, the lower the density of sample points, the more likely they are outliers.
  • OCSVM is a modified type of support vector machine, suitable for singular value detection and sample imbalance scenarios, and has a good effect on anomaly detection of high-dimensional, large-sample data.
  • the training samples of the OCSVM model are only one type of data.
  • the distribution shape of the data set is obtained, so that in the detection process, it is judged whether the data sample to be predicted belongs to the same type of data as the training sample. .
  • the principle of selecting labeled samples is to minimize the cost of labeling, and select samples that are most likely to be abnormal data points for labeling. While troubleshooting epitope failures, it also helps to quickly discover new abnormal types.
  • the Letter high-dimensional anomaly data set in the machine learning library is selected to detect the accuracy, data dimension and anomaly degree of the three unsupervised anomaly detection algorithms Similar to the data of the intelligent electric energy meter automatic verification system after PCA dimension reduction processing, the dimension of the Letter data set is 32, the sample size is 1600, and the number of abnormal samples is 100.
  • the parameters of the model algorithm are optimized by cross-validation method. The experimental results are shown in the table 1 shows:
  • S3 Use labeled samples and unlabeled samples to train in a semi-supervised manner to obtain a TSVM-based anomaly detection model, and the number of labeled samples is smaller than the number of unlabeled samples during model training.
  • TSVM is a representative of the semi-supervised support vector machine model. Like the standard binary classifier SVM, TSVM is an algorithm for solving binary classification problems. The algorithm will try all combinations of unlabeled samples as normal data points or abnormal data points, trying to find a hyperplane that maximizes the separation between all samples including labeled samples and unlabeled samples.
  • TSVM finds the approximate solution of the above formula through multiple iterations.
  • the method also includes the optimization of the TSVM-based anomaly detection model, specifically: use the model to predict the abnormal data in the samples to be detected, manually check and mark, and then use all manually marked samples to build a labeled sample library, and select the distance
  • the data points close to the classification boundary constitute a new labeled sample, and the unlabeled sample is retrained in a semi-supervised manner to complete the optimization; the optimized model is used to predict the data points in the labeled sample library, and the predicted state of the labeled sample is calculated.
  • the ratio of the difference between the real states is less than the artificially set threshold, it is determined that the performance of the model meets the prediction accuracy conditions, and the model can directly predict the data set to be tested.
  • Step 1 Data feature extraction and dimension reduction processing.
  • each verification unit contains 60 verification epitope samples. Based on the ten error experimental data generated by each verification epitope, its feature vector is constructed.
  • the eigenvector contains 60 eigenvalues. Taking the No. 1 test epitope of the No. 1 test unit as an example, its eigenvalues are shown in Table 2:
  • Step 2 Screen out "abnormal epitopes" through an unsupervised anomaly detection algorithm, hand them over to manual inspection, and obtain labeled samples while troubleshooting;
  • the epitope samples of the same test unit are used as the data set to be tested, and the LOF anomaly detection algorithm is used to pass the epitope Calculate the abnormal factor value of each epitope in the test unit (indicating the degree of abnormality of each sample), and then use the box plot method to screen the abnormal factor values of 60 epitope samples in the same test unit, The epitope samples that are most likely to be abnormal data points are screened out, and the "abnormal epitope" is checked manually.
  • the box plot method was used to detect the abnormality of the above abnormal factor values, and the upper threshold value of 1.39758 was taken as the judgment value.
  • the epitopes judged as abnormal in the No. 1 verification unit were: No. 11, 32, 34, 35, 51, 52 and 53 After manual inspection, it was found that 11, 51, and 53 were faulty, while 32, 34, 35, and 52 were not faulty.
  • the same unsupervised anomaly detection algorithm was applied to the entire pipeline data, and 322 epitopes were judged as abnormal. According to the verification, there are 230 non-faulty epitopes. It is obvious that the application of unsupervised anomaly detection in the abnormal detection of smart energy meters has a high misjudgment rate.
  • Step 3 Use the TSVM model to predict the results
  • TSVM uses unsupervised anomaly screening and manual inspection to obtain a small labeled sample set to train an initial SVM, and then uses the learner to mark unlabeled samples, so that all samples are labeled, and based on these labeled samples, re- Train the SVM, and then look for error-prone samples to keep adjusting.
  • the present invention adopts the method of randomly dividing samples into training sets and test sets in machine learning, but it is different from the application of directly dividing samples randomly.
  • the data is randomly divided into "training set” and "test set”, which are used to simulate the verification data set obtained in two different working processes of the pipeline, and then the training samples and test samples are obtained through feature extraction, standardization and dimensionality reduction.
  • the training samples include labeled samples and unlabeled samples.
  • the manually detected epitope sample data of Nos. 11, 32, 34, 35, 51, 52, and 53 can be used as labeled samples Xi, using - 1 and +1 indicate the normal and fault status of the assay epitope:
  • the TSVM model is trained in a semi-supervised manner by using labeled samples and unlabeled samples.
  • the model predicts the "test set”.
  • the comparison between the predicted results and the results of the unsupervised anomaly detection algorithm is shown in Table 5:
  • the TSVM model constructed by the present invention has a higher accuracy rate.
  • the method of the present invention can finally be used to assist professionals to carry out fixed-point review of the test epitope to find out the abnormal test epitope, thereby reducing the operation of the automated test system. Maintenance cost is guaranteed to ensure the accuracy of automatic verification assembly line verification, so as to accurately locate abnormal points and eliminate defects accurately.
  • the present invention proposes a method for constructing an abnormality detection model based on the TSVM model: in the face of impure test epitope samples, the most suspicious epitope samples are first screened out in an unsupervised manner, and then handed over to manual labeling. At the same time as the failure, part of the labeled sample data is obtained, and then the TSVM model is constructed by using the labeled sample and the unlabeled sample.
  • the anomaly detection model constructed by the present invention can realize the online detection of epitope anomalies in the pipeline, reduce the workload caused by outage maintenance, and improve the work efficiency of the pipeline; the algorithm model of the present invention and the unsupervised anomaly detection method
  • the TSVM model based on the semi-supervised learning method has higher accuracy, and the model can select favorable labeled samples to train the model through active learning to achieve the purpose of improving the performance of the model.
  • the future work process provides ideas for continuously optimizing and improving the performance of the TSVM model.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,包括如下步骤:S1:对包含少量异常数据的待测检定表位误差实验数据进行特征提取、构建特征向量,并进行预处理形成数据样本;S2:人工标记部分样本;S3:利用标记样本与未标记样本以半监督方式训练获得基于TSVM的异常检测模型;S4:利用基于TSVM的异常检测模型对检定表位异常状态进行动态预测。该方法具有准确度高、在线检测、节省检测成本等优点。

Description

基于TSVM模型的智能电能表自动化检定系统异常检测方法 技术领域
本发明涉及一种智能电能表自动化检定系统异常检测方法,尤其是涉及一种基于直推式支持向量机(Transductive Support Vector Machine,TSVM)模型的智能电能表自动化检定系统异常检测方法。
背景技术
电能表为电力交易提供贸易结算依据,因此,电能表检定工作的重要性日益凸显。随着智能电网建设工作的不断推进,智能电能表的需求日益增长,为应对激增的智能电能表检定需求,高检定效率的自动化检定系统应运而生。但智能电能表检定系统在长期不间断的运行过程中,接驳环节可能出现机械疲劳甚至老化,引起检定结果异常。
目前,计量中心是定期让自动化检定系统流水线停运,开展人工检查,确保各检定单元处于健康的运行状态,但该方法无法及时获悉自动化检定系统所监控的流水线相关的风险信息,使得检定系统在下一次人工检查前仍将服务于试验项目,这将导致出现大规模试验结果偏差的风险,虽然通过缩短人工检查的时间间隔,可以在一定程度上降低上述情况发生的可能性,但会大幅降低流水线的检定效率,同时增加人力和运维成本。因此,实现对自动化检定系统上各检定表位接驳环节机械性能的在线评价,对于提升自动化检定系统的可靠性具有重要意义。
发明内容
本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于TSVM模型的智能电能表自动化检定系统异常检测方法。
本发明的目的可以通过以下技术方案来实现:
一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,该方法包括如下步骤:
S1:对包含少量异常数据的待测检定表位误差实验数据进行特征提取、构建特征向量,并进行预处理形成数据样本;
S2:人工标记部分样本;
S3:利用标记样本与未标记样本以半监督方式训练获得基于TSVM的异常检测模型;
S4:利用基于TSVM的异常检测模型对检定表位异常状态进行动态预测。
优选地,步骤S1构建特征向量的方式为:获取每个检定表位在不同检定实验项目下的历史误差实验数据,对每一个检定实验项目下历史误差实验数据分别进行特征值提取,并将所有检定实验项目下的特征值组合为相应检定表位的特征向量。
优选地,所述的特征值包括历史误差实验数据的最大值、最小值、期望、方差、偏度和峰度。
优选地,步骤S1中预处理包括对每个表位的特征向量的标准化以及降维处理。
优选地,所述的标准化方式为:
Figure PCTCN2021141547-appb-000001
其中,x为待处理特征向量中的特征值,u为待处理特征向量中特征值的期望,S为待处理特征向量中特征值的标准差,z为经标准化后的特征值。
优选地,所述的降维处理包括主成分分析法。
优选地,步骤S2具体为:
基于数据样本,采用无监督异常检测算法初步筛选出“异常表位”;
对初步筛选出的“异常表位”进行人工排查并标记,根据人工排查结果确定正常表位和异常表位,对人工排查的检定表位对应的数据样本进行标记形成标记样本。
优选地,所述的无监督异常检测算法包括孤立森林算法、局部异常因子算法和一类支持向量机算法。
优选地,步骤S3中进行模型训练时标记样本的数量小于未标记样本的数量。
优选地,该方法还包括对基于TSVM的异常检测模型的优化,具体为:利用模型预测出待检测样本中的异常数据,人工排查并标记,然后用所有获得人工标记的样本构建标记样本库,从中选取距离分类边界较近的数据点构成新的标记样本,与未标记样本按照半监督方式再次训练模型完成优化;用优化后的模型对标记样本库中的数据点进行预测,计算标记样本的预测状态与真实状态之间差异的比率,其值小于人为设定的阈值时,判定该模型性能满足预测准确度条件,模型可直接对待 检测数据集进行预测。
与现有技术相比,本发明具有如下优点:
(1)本发明利用少量的标记样本和大量的无标记样本采用半监督方式构建基于TSVM的异常检测模型,跟其他方法相比,能有效减少人工检查的代价;
(2)本发明基于同一检定表位产生的历史误差实验数据,分别统计每个检定实验项目数据中的最大值、最小值,计算其期望、方差、偏度和峰度,用于描述该检定表位的数据分布的平均水平、离散程度、不对称性和极端异常值占比,将表位异常状态转换为数据分布的异常,使得通过数据进行表位状态分析成为可能,同时实现表位异常状态的在线评估,降低了对流水线的影响,提高了检定工作效率;
(3)本发明中采用的主成分分析(PCA)方法,有效对检定表位样本数据的维度进行降纬,有效解决了高纬度情况下数据样本稀疏、距离计算困难的难题,降低了异常检测的难度;
(4)本发明能够在工作过程中不断获取新的标记样本与未标记样本继续按照半监督训练方式对基于TSVM的异常检测模型进行扩展及优化,持续提高模型的准确度。
附图说明
图1为本发明一种基于TSVM模型的智能电能表自动化检定系统异常检测方法的流程框图;
图2为本发明实施例中不同维度下的样本特征信息保留占比;
图3为采用本发明实际应用中进行智能电能表自动化检定系统异常检测的流程示意图。
具体实施方式
下面结合附图和具体实施例对本发明进行详细说明。注意,以下的实施方式的说明只是实质上的例示,本发明并不意在对其适用物或其用途进行限定,且本发明并不限定于以下的实施方式。
实施例
如图1所示,本实施例提供一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,该方法包括如下步骤:
S1:对包含少量异常数据的待测检定表位误差实验数据进行特征提取、构建特征向量,并进行预处理形成数据样本。
具体地:假设智能电能表自动化检定系统一条流水线包含30个检定单元,每一个检定单元的试验数据集包含60个检定表位样本,在每一次的检定任务中,来自同一批次的智能电能表被随机的分配到不同表位中,进行多项不同的误差试验,所得到的误差试验数据除了反应智能电能表本身的质量问题外,还可以间接反映检定装置本身的问题。
假定同一批次的智能电能表的计量性能具有相同的分布特征,在所有检定表位均处于正常状态且状态一致时,认为处在同一检定单元的60个检定表位所对应的误差试验数据也应该具有相同的分布特征,当某个检定表位出现例如锈蚀、变形等故障时,其分布特征将与其他表位不同,表现为“异常”数据点。为便于在海量的误差实验数据中提取数据分布特征值,对在同一检定表位产生的海量误差实验数据进行相关统计量的计算:基于同一检定表位产生的数据,分别统计每个实验项目数据中的最大值、最小值,计算其期望、方差、偏度和峰度,用于描述该检定表位的数据分布的平均水平、离散程度、不对称性和极端异常值占比,将表位异常状态转换为数据分布的异常。
因此,上述步骤S1中构建特征向量的方式为:获取每个检定表位在不同检定实验项目下的历史误差实验数据,对每一个检定实验项目下历史误差实验数据分别进行特征值提取,并将所有检定实验项目下的特征值组合为相应检定表位的特征向量,特征值包括历史误差实验数据的最大值、最小值、期望、方差、偏度和峰度。
检定系统下一条流水线包含30个检定单元,每一个检定单元的试验数据集包含60个检定表位样本,即{X1,X2……X60},分别计算每个表位对应每一项误差试验数据的最值、期望、方差、偏度和峰度,构建每个表位样本的特征向量,以进行m项误差实验为例,则每个样本包含6m个特征值,即6m个维度。
为防止较大尺度的数据弱化其他特征数据的影响,致异常因子算法的预测性能降低,将样本的各个特征值缩放到相同的尺度下,采用标准化特征缩放处理数据,公式如下:
Figure PCTCN2021141547-appb-000002
其中,x为待处理特征向量中的特征值,u为待处理特征向量中特征值的期望,S为待处理特征向量中特征值的标准差,z为经标准化后的特征值。标准化可以使 样本的所有特征保持均值为0,方差为1。
每个检定表位样本的数据维度高达60维度,该情况下的数据样本稀疏、距离计算困难,会给异常检测增加难度,因此需要对特征向量进行降维处理,主成分分析(Principal Component Analysis,PCA)是最常用的一种降维方法,具体地:
输入样本集:D={X 1,X 2,……,X 59,X 60}。下列公式中X i表示不同的样本,i取1-60的整数;
对所有样本进行中心化:
Figure PCTCN2021141547-appb-000003
计算样本的协方差矩阵XX T
对协方差矩阵XX T做特征值分解;
取最大的d’个特征值所对应的特征向量W 1,W 2……,W d’
降维后的维度d’由用户指定,不同维度下的数据特征信息占比不同,用户可通过设定想要保留的特征信息占比来确定d’的取值。智能电能表自动化检定系统的数据样本在不同d’值时对应的特征保留信息占比如图2所示,标准化后的样本数据,若要保留接近99.9%的特征信息,需要数据维度在40维以上,即用于异常检测算法分析的有效数据维数为40维度。
S2:人工标记部分样本,具体为:
基于数据样本,采用无监督异常检测算法初步筛选出“异常表位”;
对初步筛选出的“异常表位”进行人工排查并标记,根据人工排查结果确定正常表位和异常表位,对人工排查的检定表位对应的数据样本进行标记形成标记样本,其中,无监督异常检测算法包括孤立森林算法(Isolation Forest,Iforest)、局部异常因子算法(Local Outlier Factor,LOF)和一类支持向量机算法(One-Class Support Vector Machine,OCSVM)。Iforest算法对全局异常检测的效果较好,适合对连续型、较高维度的数据进行异常检测。Iforest算法是多次二叉树式的划分过程,每次随机抽取数据集的特征,随机取值作为划分依据对数据集进行划分,经过多次迭代,直到在森林中形成一棵孤立的树。在树中处于较低高度的样本数据点,被判为异常数据点的可能性越大。LOF算法对全局异常点的检测效果不如Iforest,但对数据分布比较集中、异常比重较小的数据集的局部异常检测效果较好。LOF算法是基于密度的离群点检测方法,通过计算样本点的第K邻域(非全局)来确定局部 可达密度,通过比较样本点与其邻域点的局部可达密度来判断样本是否为异常点,样本点的密度越低,越可能是异常点。OCSVM是一种经过修改的支持向量机类型,适合奇异值检测以及样本不平衡场景,对高维度、大样本数据的异常检测效果好。OCSVM模型的训练样本仅为一类数据,通过建立出可代表该类数据的模型,获取数据集的分布形状,从而在检测过程中,判断待预测的数据样本是否与训练样本同属于一类数据。
标记样本的选取原则是尽量减少标记代价,选择最可能是异常数据点的样本进行标注,在排除表位故障的同时,还有助于较快发现新的异常类型。为了选择出适用于智能电能表自动化检定系统数据的无监督异常检测算法,选择机器学习库中的Letter高维异常数据集来检测三种无监督异常检测算法的准确率,其数据维度以及异常程度与经过PCA降维处理的智能电能表自动化检定系统数据相似,Letter数据集的维度为32,样本量为1600,其中异常样本数为100,采用交叉验证法优化模型算法的参数,实验结果如表1所示:
表1无监督异常检测的平均准确率
异常检测算法 Iforest LOF OCSVM
平均准确率 89% 91% 67%
S3:利用标记样本与未标记样本以半监督方式训练获得基于TSVM的异常检测模型,进行模型训练时标记样本的数量小于未标记样本的数量。
TSVM作为半监督支持向量机模型的代表,与标准的二分类器SVM一样,TSVM是解决二分类问题的算法。该算法将尝试未标记样本作为正常数据点或异常数据点的所有组合,试图从中找到一个超平面,能够让包括标记样本和未标记样本在内所有样本之间的间隔最大化。
已知样本类型的标记样本D l={(x 1,y 1)(x 2,y 2),……,(x l,y l)}和未标记样本D u={x l+1,x l+2,……,x m},其中y i∈{-1,+1},-1表示该样本类型为异常,+1表示该样本类型为正常,D l中的样本数量小于D u,TSVM算法的目标是为待标记样本找寻最合适的标记:
Figure PCTCN2021141547-appb-000004
其中
Figure PCTCN2021141547-appb-000005
使得:
Figure PCTCN2021141547-appb-000006
s.t.y i(w Tx i+b)≥1-ε i,i=1,2,……,l
Figure PCTCN2021141547-appb-000007
ε i≥0,i=1,2,……,m
式中,(w,b)为一个超平面;ε i是与所有样本一一对应的松弛向量;C l与C u分别是代表标记样本权重的折中参数与未标记样本权重的折中参数。TSVM通过多次迭代找寻上式的近似解。
S4:利用基于TSVM的异常检测模型对检定表位异常状态进行动态预测。
该方法还包括对基于TSVM的异常检测模型的优化,具体为:利用模型预测出待检测样本中的异常数据,人工排查并标记,然后用所有获得人工标记的样本构建标记样本库,从中选取距离分类边界较近的数据点构成新的标记样本,与未标记样本按照半监督方式再次训练模型完成优化;用优化后的模型对标记样本库中的数据点进行预测,计算标记样本的预测状态与真实状态之间差异的比率,其值小于人为设定的阈值时,判定该模型性能满足预测准确度条件,模型可直接对待检测数据集进行预测。
本实施例采用2020年11月10日到2020年11月13日,批次号为JYL20002的智能电能表自动化检定系统检定数据开展,具体如下:
步骤1:数据特征提取及降维处理。
该条检定系统下流水线共有30个检定单元,每个检定单元的数据集包含60个检定表位样本,基于每个检定表位产生的十项误差实验数据,构建其特征向量,每个样本的特征向量包含60个特征值,以1号检定单元的1号检定表位为例,其各项特征值如表2所示:
表2表位样本的特征值(1号样本为例)
Figure PCTCN2021141547-appb-000008
Figure PCTCN2021141547-appb-000009
对1号检定单元的60个样本的特征向量进行标准化处理和PCA降维,由原先的60维度降至40维度,降维后的数据特征如表3所示:
表3经过PCA降维后的特征数据
Figure PCTCN2021141547-appb-000010
步骤2:通过无监督异常检测算法筛选出“异常表位”,交由人工进行检查,在排除故障的同时获得标记样本;
考虑到检定单元之间还可能存在标准表误差不同以及电气回路存在故障等问题,在获取标记样本时,以同一检定单元的表位样本作为待测数据集,采用LOF异常检测算法,通过表位的特征数据计算该检定单元中每个表位的异常因子数值(表征每个样本的异常程度),然后采用箱型图法对同一检定单元的60个表位样本的异常因子数值进行异常筛选,筛选出最可能是异常数据点的表位样本,交由人工对“异常表位”进行检查。将无监督式异常检测算法应用于该批次(JYL20002)的30个检定单元,可以得到1800个检定表位的异常因子数值,其中1号检定单元的60个检定表位的异常因子数值如表4所示:
表4无监督异常算法结果
Figure PCTCN2021141547-appb-000011
Figure PCTCN2021141547-appb-000012
应用箱型图方法对上述异常因子数值进行异常检测,取上线阈值1.39758作为判定值,1号检定单元中被判定为异常的表位为:11、32、34、35、51、52和53号,经过人工检查发现,11、51、53故障,而32、34、35、52无故障,同样的无监督异常检测算法应用于整条流水线数据,判定为异常的表位有322个,经过人工核查,其中无故障的表位有230个,显而易见,无监督异常检测在智能电能表异常检测方面的应用存在误判率较高的问题。
步骤3:采用TSVM模型预测结果;
TSVM利用无监督异常筛选与人工检查获取到的小标记样本集训练出一个初始SVM,接着使用该学习器对未标记样本进行打标,这样所有样本都有了标记,基于这些有标记的样本重新训练SVM,之后再寻找易出错样本不断调整。
为了检测模型性能,本发明采用了机器学习中将样本随机划分为训练集和测试集的方法,但与直接将样本进行随机划分的应用不同,本发明是将该流水线中检定表位的误差实验数据随机划分为“训练集”和“测试集”,用于模拟流水线在两次不同工作过程中得到的检定数据集,再经过特征提取、标准化和降维处理得到训练样本与测试样本。
训练样本中包括标记样本和未标记样本,以1号单元为例,其中经过人工检测的11、32、34、35、51、52和53号表位样本数据可作为有标记样本Xi,用-1和+1表示检定表位的正常和故障状态:
D l={(X 11,-1),(X 32,+1),(X 34,+1),(X 35,+1),(X 51,-1),(X 52,+1),(X 53,-1)}
而未经过人工核查的其他表位可作为未标记样本集:
D u={X 1,X 2,……,X 10,X 12……X 31,X 33,X 36……X 50,X 54……X 60}
利用标记样本与未标记样本按照半监督方式训练得到TSVM模型,该模型对“测试集”进行预测,其预测结果与无监督异常检测算法结果对比如表5所示:
表5 TSVM与LOF异常检测结果对比
Figure PCTCN2021141547-appb-000013
通过模型预测结果可以看出,相比无监督异常检测模型而言,本发明所构建的TSVM模型具有更高准确率。
如图3所示,本发明的方法在得到异常表位预测结果后,最后能够用于辅助专业人员对检定表位开展定点复核,找出确实存在异常的检定表位,从而降低自动化检定系统运维成本,保障自动化检定流水线检定准确度,做到精准定位异常点精准消缺。
本发明提出了构建基于TSVM模型的异常检测模型的方法:面对不纯净的检定表位样本,首先以无监督方式筛选出最可疑的表位样本,再交由人工进行标记,在排除表位故障的同时,获得部分标记样本数据,然后利用标记样本与未标记样本来构建TSVM模型。实验结果表明,本发明所构建的异常检测模型可以实现流水线表位异常的在线检测,减少由于停运检修带来的工作量,能提高流水线的工作效率;本发明算法模型与无监督异常检测方法对比,基于半监督学习方式的TSVM模型具有更高的精准度,并且该模型能够通过主动学习方式,选取有利的标记样本训练模型,达到提升模型性能的目的,这为智能电能表自动化检定系统在今后的工作过程不断优化和改进TSVM模型性能提供了思路。
上述实施方式仅为例举,不表示对本发明范围的限定。这些实施方式还能以其它各种方式来实施,且能在不脱离本发明技术思想的范围内作各种省略、置换、变更。

Claims (10)

  1. 一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,其特征在于,该方法包括如下步骤:
    S1:对包含少量异常数据的待测检定表位误差实验数据进行特征提取、构建特征向量,并进行预处理形成数据样本;
    S2:人工标记部分样本;
    S3:利用标记样本与未标记样本以半监督方式训练获得基于TSVM的异常检测模型;
    S4:利用基于TSVM的异常检测模型对检定表位异常状态进行动态预测。
  2. 根据权利要求1所述的一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,其特征在于,步骤S1构建特征向量的方式为:获取每个检定表位在不同检定实验项目下的历史误差实验数据,对每一个检定实验项目下历史误差实验数据分别进行特征值提取,并将所有检定实验项目下的特征值组合为相应检定表位的特征向量。
  3. 根据权利要求2所述的一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,其特征在于,所述的特征值包括历史误差实验数据的最大值、最小值、期望、方差、偏度和峰度。
  4. 根据权利要求1所述的一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,其特征在于,步骤S1中预处理包括对每个表位的特征向量的标准化以及降维处理。
  5. 根据权利要求4所述的一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,其特征在于,所述的标准化方式为:
    Figure PCTCN2021141547-appb-100001
    其中,x为待处理特征向量中的特征值,u为待处理特征向量中特征值的期望,S为待处理特征向量中特征值的标准差,z为经标准化后的特征值。
  6. 根据权利要求4所述的一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,其特征在于,所述的降维处理包括主成分分析法。
  7. 根据权利要求1所述的一种基于TSVM模型的智能电能表自动化检定系统 异常检测方法,其特征在于,步骤S2具体为:
    基于数据样本,采用无监督异常检测算法初步筛选出“异常表位”;
    对初步筛选出的“异常表位”进行人工排查并标记,根据人工排查结果确定正常表位和异常表位,对人工排查的检定表位对应的数据样本进行标记形成标记样本。
  8. 根据权利要求7所述的一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,其特征在于,所述的无监督异常检测算法包括孤立森林算法、局部异常因子算法和一类支持向量机算法。
  9. 根据权利要求1所述的一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,其特征在于,步骤S3中进行模型训练时标记样本的数量小于未标记样本的数量。
  10. 根据权利要求1所述的一种基于TSVM模型的智能电能表自动化检定系统异常检测方法,其特征在于,该方法还包括对基于TSVM的异常检测模型的优化,具体为:利用模型预测出待检测样本中的异常数据,人工排查并标记,然后用所有获得人工标记的样本构建标记样本库,从中选取距离分类边界较近的数据点构成新的标记样本,与未标记样本按照半监督方式再次训练模型完成优化;用优化后的模型对标记样本库中的数据点进行预测,计算标记样本的预测状态与真实状态之间差异的比率,其值小于人为设定的阈值时,判定该模型性能满足预测准确度条件,模型可直接对待检测数据集进行预测。
PCT/CN2021/141547 2021-06-30 2021-12-27 基于tsvm模型的智能电能表自动化检定系统异常检测方法 WO2023273249A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021335237A AU2021335237A1 (en) 2021-06-30 2021-12-27 Method for detecting abnormality of automatic verification system of smart watt-hour meter based on transductive support vector machine (TSVM) model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110732174.7 2021-06-30
CN202110732174.7A CN113484817A (zh) 2021-06-30 2021-06-30 基于tsvm模型的智能电能表自动化检定系统异常检测方法

Publications (1)

Publication Number Publication Date
WO2023273249A1 true WO2023273249A1 (zh) 2023-01-05

Family

ID=77936778

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/141547 WO2023273249A1 (zh) 2021-06-30 2021-12-27 基于tsvm模型的智能电能表自动化检定系统异常检测方法

Country Status (3)

Country Link
CN (1) CN113484817A (zh)
AU (1) AU2021335237A1 (zh)
WO (1) WO2023273249A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118655362A (zh) * 2024-08-12 2024-09-17 山东德源电力科技股份有限公司 一种具有电能质量分析功能的融合一体终端

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113484817A (zh) * 2021-06-30 2021-10-08 国网上海市电力公司 基于tsvm模型的智能电能表自动化检定系统异常检测方法
CN116702078B (zh) * 2023-06-02 2024-03-26 中国电信股份有限公司浙江分公司 基于模块式可扩展机柜电源分配单元的状态侦测方法
CN118131117B (zh) * 2024-05-07 2024-08-09 南京电力自动化设备三厂有限公司 一种电能表流水线式自动老化方法及系统

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590262A (zh) * 2017-09-21 2018-01-16 黄国华 大数据分析的半监督学习方法
CN108985632A (zh) * 2018-07-16 2018-12-11 国网上海市电力公司 一种基于孤立森林算法的用电数据异常检测模型
CN109828230A (zh) * 2019-04-02 2019-05-31 国网新疆电力有限公司电力科学研究院 电能表自动化检定流水线表位故障的定位方法
CN110933102A (zh) * 2019-12-11 2020-03-27 支付宝(杭州)信息技术有限公司 基于半监督学习的异常流量检测模型训练方法及装置
CN111259937A (zh) * 2020-01-09 2020-06-09 中国人民解放军国防科技大学 一种基于改进tsvm的半监督通信辐射源个体识别方法
CN111398886A (zh) * 2020-04-09 2020-07-10 国网山东省电力公司电力科学研究院 一种自动化检定流水线表位在线异常的检测方法及系统
CN112115467A (zh) * 2020-09-04 2020-12-22 长沙理工大学 一种基于集成学习的半监督分类的入侵检测方法
US20210035024A1 (en) * 2018-02-02 2021-02-04 Visa International Service Association Efficient method for semi-supervised machine learning
CN113484817A (zh) * 2021-06-30 2021-10-08 国网上海市电力公司 基于tsvm模型的智能电能表自动化检定系统异常检测方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107976992B (zh) * 2017-11-29 2020-01-21 东北大学 基于图半监督支持向量机的工业过程大数据故障监测方法
CN111740991B (zh) * 2020-06-19 2022-08-09 上海仪电(集团)有限公司中央研究院 一种异常检测方法及系统

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590262A (zh) * 2017-09-21 2018-01-16 黄国华 大数据分析的半监督学习方法
US20210035024A1 (en) * 2018-02-02 2021-02-04 Visa International Service Association Efficient method for semi-supervised machine learning
CN108985632A (zh) * 2018-07-16 2018-12-11 国网上海市电力公司 一种基于孤立森林算法的用电数据异常检测模型
CN109828230A (zh) * 2019-04-02 2019-05-31 国网新疆电力有限公司电力科学研究院 电能表自动化检定流水线表位故障的定位方法
CN110933102A (zh) * 2019-12-11 2020-03-27 支付宝(杭州)信息技术有限公司 基于半监督学习的异常流量检测模型训练方法及装置
CN111259937A (zh) * 2020-01-09 2020-06-09 中国人民解放军国防科技大学 一种基于改进tsvm的半监督通信辐射源个体识别方法
CN111398886A (zh) * 2020-04-09 2020-07-10 国网山东省电力公司电力科学研究院 一种自动化检定流水线表位在线异常的检测方法及系统
CN112115467A (zh) * 2020-09-04 2020-12-22 长沙理工大学 一种基于集成学习的半监督分类的入侵检测方法
CN113484817A (zh) * 2021-06-30 2021-10-08 国网上海市电力公司 基于tsvm模型的智能电能表自动化检定系统异常检测方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118655362A (zh) * 2024-08-12 2024-09-17 山东德源电力科技股份有限公司 一种具有电能质量分析功能的融合一体终端

Also Published As

Publication number Publication date
CN113484817A (zh) 2021-10-08
AU2021335237A1 (en) 2023-02-02

Similar Documents

Publication Publication Date Title
WO2023273249A1 (zh) 基于tsvm模型的智能电能表自动化检定系统异常检测方法
CN103914064B (zh) 基于多分类器和d-s证据融合的工业过程故障诊断方法
CN110458230A (zh) 一种基于多判据融合的配变用采数据异常甄别方法
CN108053148B (zh) 一种电力信息系统故障高效诊断方法
CN108492000B (zh) 面向百万千瓦超超临界机组非平稳特性的故障诊断方法
CN109740859A (zh) 基于主成分分析法和支持向量机的变压器状态评估方法及系统
Liu et al. Unsupervised segmentation and elm for fabric defect image classification
CN103577681A (zh) 基于因子分析锅炉效率影响指标的定量评价方法
CN113255848A (zh) 基于大数据学习的水轮机空化声信号辨识方法
US11860608B2 (en) Industrial equipment operation, maintenance and optimization method and system based on complex network model
CN106482967A (zh) 一种代价敏感支持向量机机车车轮检测系统及方法
CN110045227B (zh) 一种基于随机矩阵与深度学习的配电网故障诊断方法
CN101738998B (zh) 一种基于局部判别分析的工业过程监测系统及方法
CN110687895B (zh) 一种基于自适应核主成分分析的化工过程故障检测方法
CN109409444B (zh) 一种基于先验概率的多元电网故障类型的判别方法
CN110794360A (zh) 一种基于机器学习预测智能电能表故障的方法及系统
CN109240276B (zh) 基于故障敏感主元选择的多块pca故障监测方法
CN109298633A (zh) 基于自适应分块非负矩阵分解的化工生产过程故障监测方法
CN111797533B (zh) 一种核动力装置运行参数异常检测方法及系统
WO2019019429A1 (zh) 一种虚拟机异常检测方法、装置、设备及存储介质
CN112884570A (zh) 一种模型安全性的确定方法、装置和设备
Yan et al. Deep learning technology for chiller faults diagnosis
CN116204825A (zh) 一种基于数据驱动的生产线设备故障检测方法
CN110244690B (zh) 一种多变量工业过程故障辨识方法及系统
Khan et al. Big data analytics for electricity theft detection in smart grids

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021335237

Country of ref document: AU

Date of ref document: 20211227

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21948174

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21948174

Country of ref document: EP

Kind code of ref document: A1