CN113344742A - Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis - Google Patents

Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis Download PDF

Info

Publication number
CN113344742A
CN113344742A CN202110723230.0A CN202110723230A CN113344742A CN 113344742 A CN113344742 A CN 113344742A CN 202110723230 A CN202110723230 A CN 202110723230A CN 113344742 A CN113344742 A CN 113344742A
Authority
CN
China
Prior art keywords
clustering
meter reading
automatic meter
success rate
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110723230.0A
Other languages
Chinese (zh)
Inventor
李尔园
宋先慧
傅洋
鞠永乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Integrated Electronic Systems Lab Co Ltd
Original Assignee
Integrated Electronic Systems Lab Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Integrated Electronic Systems Lab Co Ltd filed Critical Integrated Electronic Systems Lab Co Ltd
Priority to CN202110723230.0A priority Critical patent/CN113344742A/en
Publication of CN113344742A publication Critical patent/CN113344742A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The invention provides an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, which utilizes a big data clustering algorithm and an analysis method based on time sequence, not only can simultaneously analyze the influence of various factors on an automatic meter reading result, but also can find out long-term factors and short-term factors influencing the automatic meter reading success rate by prolonging the time sequence of analysis, thereby providing reference for improving the automatic meter reading success rate; and the analysis is carried out by a big data algorithm, so that the labor, material and time costs are greatly saved.

Description

Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis
Technical Field
The invention relates to the technical field of power consumption information acquisition of a power system, in particular to an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis.
Background
With the development of the smart power grid, diversified informationized and digitalized equipment is applied to the power utilization information acquisition system, so that the automatic acquisition of the power utilization information of the user is realized, and the power utilization management efficiency is greatly improved. In the electricity consumption information acquisition system, the meter reading success rate is the basis and the premise for carrying out analysis and treatment of line loss of the transformer area and refining operation decisions. However, in actual use, there are many reasons that affect the success rate of automatic meter reading, including system equipment factors, such as meter reading communication modules, communication modes, communication parameters, and the like; environmental factors such as GPRS signal interference and high mountain obstruction cause bad GPRS signals, communication abnormity caused by extreme environments and the like; human factors, such as weak wiring during installation, and a long distance between an equipment installation area and a control area, cause signal confusion, signal attenuation and the like. The problems that the success rate of automatic meter reading is not ideal and the success rate of automatic meter reading is difficult to improve can occur due to the influence of one or more factors in the information acquisition process. Therefore, if factors influencing the automatic meter reading success rate can be found out in a complex environment, a targeted treatment strategy can be formulated, and the automatic meter reading success rate is effectively improved.
Disclosure of Invention
The invention aims to provide an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, and aims to solve the problem that the influence factor of the automatic meter reading success rate cannot be obtained in the prior art, find out the influence factor of the automatic meter reading success rate in a complex environment and improve the automatic meter reading success rate.
In order to achieve the technical purpose, the invention provides an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, which comprises the following operations:
s1, acquiring relevant data of the area for automatic meter reading in the area to be analyzed to form a matrix D;
s2, dividing the relevant data of the transformer area into discrete characteristic data, continuous characteristic data and result identification data, respectively preprocessing, and combining the characteristic data into a matrix:
Figure BDA0003137411340000011
Figure BDA0003137411340000021
wherein I is a column vector formed by the ID of the electric energy meter,LEthe data acquisition method comprises the following steps of (1) freezing a column vector formed by electric energy reading time and mass codes at a measuring point day, and forming a column vector formed by 36 data respectively as the rest;
s3, performing K-means clustering on the samples based on the matrix D', marking clustering results by using result identification to obtain successful cluster centers and failed cluster centers, calculating attribute differences from each failed cluster center to adjacent successful cluster centers and adjacent failed cluster centers, screening and distinguishing influence factors of the successful clusters and the failed clusters according to the attribute differences, obtaining attributes and values of the influence factors according to a reverse mapping rule, counting the screened influence factors and sequencing according to occurrence times;
s4 selects a date sequence, executes steps S1-S3 for each date in the sequence, counts the occurrence frequency of each influence factor, compares the occurrence frequency with a set threshold value, and divides the influence factors into a long-term factor of the automatic meter reading success rate and a short-term factor of the automatic meter reading success rate.
Preferably, the data related to the distribution area comprises archive data, operation data, and geographic data and meteorological data of the distribution area.
Preferably, the preprocessing process of the discrete feature data includes value de-duplication, remapping, feature dimension raising and value influence degree difference elimination;
the value deduplication and remapping is specifically:
for discrete characteristic value XkAnd (k is 1,2 … 24), removing the weight of the value in the column vector, and recording the value after removing the weight as:
Vk=[vk1vk2…vkm],(m>0, m is a column vector XkNumber of values after weight removal)
Remapping various encoding values into continuous integer values starting from 1;
i.e. when m is greater than or equal to 2, based on
m=f1(vkm),(m=1,2…m)
Establishing vkmMapping to m, and mapping XkWhere x is 1,2 … 24knValue v ofkmIs replaced by m and is recorded as X after replacement'k,(k=1,2…24);
The specific process of characteristic dimension increasing and value influence degree difference elimination is as follows:
to X 'obtained by the last step'k(k-1, 2 … 24) based on
Em-1=f2(m)
Conversion into a matrix of m-1 columns, denoted
Figure BDA0003137411340000031
Wherein Em-1A row vector of m-1 columns with the m-1 th column being 1 and the remaining columns being 0, all columns being 0 when m is 1, m being equal to V in the previous stepkThe number of columns.
Preferably, the marking the clustering result by using the result identifier to obtain the successful clustering center and the failed clustering center specifically comprises:
the clustering result obtains m classes, which are marked as Ci(i ═ 1,2, …, m); the cluster center is marked as uc(i),(i=1,2,…,m);
With IiFor indexing, L in D' isEIs associated to CiAnd calculating CiThe proportion of the medium automatic meter reading failure is recorded as:
Figure BDA0003137411340000032
(ri∈[0,1],i=1,2,…,m)
calculating the proportion of automatic meter reading failure in D', and recording as:
Figure BDA0003137411340000033
setting multiplying power theta (theta is belonged to [1,3 ]]) To Ci(i ═ 1,2, …, m) is labeled, if:
ri*θ≤ravg
then C will beiMarking as success class, marking as success class
Figure BDA0003137411340000034
(i=1,2, …, m) with cluster centers as
Figure BDA0003137411340000035
(i ═ 1,2, …, m); otherwise, C is addediMarked as lossiness, marked as
Figure BDA0003137411340000036
i is 1,2, …, m), with the cluster center noted as
Figure BDA0003137411340000037
(i=1,2,…,m)。
Preferably, the calculating the attribute difference from each failure class center to the adjacent success class center and the adjacent failure class center is specifically as follows:
for a given failure class center
Figure BDA0003137411340000038
Calculate it to all
Figure BDA0003137411340000039
(i ═ 1,2, …, m) and
Figure BDA00031374113400000310
(i ═ 1,2, …, m) the euclidean distance of the cluster centers other than themselves, denoted as Δj,(j=1,2,…,m-1);
Determining DeltajThe maximum value and the minimum value of (j ═ 1,2, …, m-1) are respectively expressed as ΔmaxAnd Δmin
Setting a parameter lambda (lambda belongs to [1,5]), and finding out the following conditions:
Figure BDA0003137411340000041
delta ofj(j ═ 1,2, …, m-1) the corresponding successful clustering center, and it is recorded as the neighboring successful clustering center
Figure BDA0003137411340000042
(j ═ 1,2, …, n); and a failure clustering center, which is recorded as a neighboring failure clustering center
Figure BDA0003137411340000043
(j=1,2,…,m-n-1);
For selected
Figure BDA0003137411340000044
Calculate it and each
Figure BDA0003137411340000045
Is poor in the attribute of (1), is recorded as
Figure BDA0003137411340000046
Figure BDA0003137411340000047
Wherein, (j ═ 1,2, …, m-n-1), (k ═ 1,2, …, nF-1);
For selected
Figure BDA0003137411340000048
Calculate it and each
Figure BDA0003137411340000049
Is poor in the attribute of (1), is recorded as
Figure BDA00031374113400000410
Figure BDA00031374113400000411
Wherein, (j ═ 1,2, …, n), (k ═ 1,2, …, nF-1)。
Preferably, the screening of the impact factors for distinguishing the success class from the failure class according to the attribute difference specifically includes:
setting attribute difference threshold gamma (gamma belongs to (0,1)), and counting
Figure BDA00031374113400000412
Each of which satisfies
Figure BDA00031374113400000413
(j=1,2,…,m-n-1),(k=1,2,…,nF-1)
The results are reported as:
Figure BDA00031374113400000414
(j=1,2,…,m-n-1),(k=1,2,…,nF-1);
in the same way, make statistics of
Figure BDA00031374113400000415
Each of the attributes satisfies:
Figure BDA00031374113400000416
(j=1,2,…,n),(k=1,2,…,nF-1)
the results are reported as:
Figure BDA0003137411340000051
(j=1,2,…,n),(k=1,2,…,nF-1);
according to the rules:
Figure BDA0003137411340000052
(k=1,2,…,nF-1),
for a given column, calculate phikThe final result is summarized as:
Φi=[φ1 φ2…φk],(k=1,2,…,nF-1)
traverse phiiIf phi isk>0, then will phikThe corresponding attribute is screened as an impact factor.
Preferably, the obtaining of the attribute and the value of the impact factor according to the inverse mapping rule specifically includes:
for discrete attributes, performing a feature dimension-lifting function and inverse remapping in an inverse manner;
for continuous attributes, performing a reverse normalization process;
and integrating the attributes and values of the influence factors screened out by the discrete attributes and the continuous attributes.
Preferably, the inverse execution characteristic dimension-increasing function and the inverse remapping are specifically:
if it is not
Figure BDA0003137411340000053
Phi is thenkCorresponding to a discrete attribute, further if
Figure BDA0003137411340000054
(i-1, 2 … 24), phi can be determinedkCorresponding attribute being in D
Figure BDA0003137411340000055
By the equation:
Figure BDA0003137411340000056
m can be calculated, binding:
m=f1(vjm),(m=1,2…m)
the value v corresponding to the attribute can be obtained by calculationjm
Preferably, the reverse normalization process is specifically:
if it is not
Figure BDA0003137411340000061
Phi is thenkCorresponding continuous type attribute
Figure BDA0003137411340000062
Further, removing
Figure BDA0003137411340000063
The values in column k, in combination with the normalization formula:
Figure BDA0003137411340000064
and recorded Emax、EminMax, min, the value e of the attribute can be obtained by calculationi
Preferably, the step S4 is specifically:
selecting a date sequence:
T=[t1 t2…tj]
for each T in the sequence TjRepeating the analysis process of S1-S3, and combining the screened influence factors and the value set thereof
Figure BDA0003137411340000065
To obtain
Figure BDA0003137411340000066
Simultaneously counting the occurrence number k of each factorn
Setting a degree threshold value sigma, (sigma belongs to [1, j ]]) Comparing the number of occurrences K of each factor in the set KnAnd size of σ, all knFactor of more than or equal to sigma and value composition set K1(ii) a All k are put togethern<Factor of sigma and value composition set K2
K1The medium elements are long-term factors influencing the success rate of automatic meter reading; k2The elements in the method are short-term factors influencing the success rate of automatic meter reading.
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
compared with the prior art, the method utilizes a big data clustering algorithm and a time sequence-based analysis method, not only can simultaneously analyze the influence of various factors on the automatic meter reading result, but also can find out long-term factors and short-term factors (generally temporary factors causing fluctuation of the automatic meter reading success rate) influencing the automatic meter reading success rate by prolonging the time sequence of analysis, and provides reference for improving the automatic meter reading success rate; and the analysis is carried out by a big data algorithm, so that the labor, material and time costs are greatly saved.
Drawings
Fig. 1 is a flowchart of an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis provided in an embodiment of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis provided by the embodiment of the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention discloses an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, including the following steps:
and S1, acquiring the relevant data of the area for automatic meter reading in the area to be analyzed, and forming a matrix D.
The embodiment of the invention analyzes and processes the existing archive data, operation data, basic geographical data of a transformer area, meteorological data of the transformer area and the like by utilizing the factor screening based on clustering and the factor classification based on time sequence analysis, and finds out the factors influencing the success rate of automatic meter reading.
And selecting a date to be analyzed, recording the date as t', and acquiring related file data and operation data of the intelligent electric meter and the transformer area which are automatically read in the area to be analyzed on the date, and geographic data and meteorological data of the transformer area.
The archive data is mainly acquired from a power utilization information acquisition system, and relates to basic archive data of an electric energy meter, an operation terminal, a metering point, a power utilization client and a distribution room, and the basic archive data mainly comprises an electric energy meter unique identifier (usually ID), an electric energy meter communication mode, an electric energy meter manufacturer, an electric energy meter communication baud rate, an electric energy meter installation date, an electric energy meter last inspection date, an electric energy meter version, an electric energy meter type, a phase sequence, a client type, a terminal manufacturer, a terminal acquisition mode, a terminal priority power supply mode, a terminal communication protocol, a terminal commissioning date, a terminal latest parameter modification time, a metering point wiring mode, a metering point metering mode, a metering point commissioning date, a metering point updating time, a client sub-type, a line ID, a power utilization client classification, a power utilization client industry classification, a power utilization client power utilization type, a power utilization client operation capacity, The last check date of the electricity utilization client, the urban and rural categories of the electricity utilization client and the public and private transformer identification of the transformer area count 30 data.
The operation data is mainly acquired from the electricity consumption information acquisition system, relates to the daily frozen electric energy reading number of the measuring point and the operation data of the terminal, and mainly comprises 4 data in total, namely a daily frozen electric energy reading time quality code of the measuring point, the online state of the terminal, a successful time tick mark of the terminal and a clock difference value between the terminal and the system.
The geographical data of the affiliated district is mainly the area of the lowest-level administrative region to which the district belongs, and can be acquired from national government websites of province/city/county, and 1 data is counted.
The meteorological data of the affiliated region is mainly meteorological data of an administrative region to which the region belongs, and the meteorological data comprises weather, wind power, temperature/air temperature data which account for 3 items.
And integrating all the data, taking I D of the electric energy meter as a unique identifier, taking the electric energy meter as an analysis object, and integrating the data into a matrix with n rows and 38 columns by combining the interrelationship among the electric energy meter, an operation terminal, a metering point, a transformer area and an administrative area, wherein n is the number of the electric energy meters in the selected analysis area, and n is greater than 0.
Figure BDA0003137411340000081
Wherein I ═]i1 i2…in]T,(n>0) A column vector formed by the ID of the electric energy meter; xj=[xj1 xj2…xjn]T,(n>0, j is 1,2 … 36) are column vectors respectively formed by the other 36 data except the unique identifier of the electric energy meter and the daily frozen electric energy reading time and quality code of the measuring point; l ═ L1 l2…ln]T,(n>0) And the column vector is formed by the time and quality codes of daily freezing electric energy reading meter reading of the measuring point.
And S2, dividing the relevant data of the transformer area into discrete characteristic data, continuous characteristic data and result identification data, respectively preprocessing the discrete characteristic data, the continuous characteristic data and the result identification data, and combining the characteristic data into a matrix D'.
The data are preprocessed and divided into three types: discrete feature data, continuous feature data, and result identification data.
For discrete feature data, the influence difference of different values of discrete features on the K-means clustering result is required to be eliminated under the condition that the feature dimension is as small as possible. For example, an electricity meter manufacturer has A, B, C three manufacturers, which are respectively coded as 1,2 and 3, the distances between each two values 1,2 and 3 are different, and if the spatial distance between samples is used for judging the affinity and the sparsity and clustering, A, B, C manufacturers which are equal to each other generate different influence degrees on clustering results; and if the A, B, C manufacturers are respectively coded as [ 00 ], [ 01 ] and [ 10 ], the distance between every two manufacturers is 1, and the influence degrees of the manufacturers on the K-means clustering result are the same.
Need to be aligned with XjAnd (j is 1,2 … 36), processing X which is taken as a discrete type in the step (j) and comprises 24 items which are recorded as X, wherein the X comprises an electric energy meter communication mode, an electric energy meter manufacturer, an electric energy meter communication baud rate, an electric energy meter version, an electric energy meter category, a phase sequence, a client type, a terminal manufacturer, a terminal acquisition mode, a terminal priority power supply mode, a terminal communication protocol, a metering point wiring mode, a metering point metering mode, a client subtype, a line ID, an electricity client classification, an electricity client industry classification, an electricity client electricity utilization category, an electricity client urban and rural category, a platform public and private transformer identifier, a terminal online state, a terminal time synchronization success identifier and weather, and the X is recorded as Xk,(k=1,2…24)。
The preprocessing process of the discrete feature data comprises the following steps: value deduplication and remapping; and (4) feature dimension increasing and value influence degree difference elimination.
The specific process of the value deduplication and remapping is as follows:
the values of the discrete features (e.g., manufacturer code, type code, etc.) are de-duplicated. I.e. for a given Xk(k is 1,2 … 24), the values in the column vector are deduplicated (null values and invalid values are also taken as separate valid values), and the deduplicated values are expressed as:
Vk=[vk1 vk2…vkm],(m>0, m is a column vector XkNumber of values after weight removal)
Then, remapping various encoding values into continuous integer values starting from 1;
i.e. when m is greater than or equal to 2, based on
m=f1(vkm),(m=1,2…m)
Establishing vkmMapping to m, and mapping XkWhere x is 1,2 … 24knValue v ofkmIs replaced by m and is recorded as X after replacement'k,(k=1,2…24)。
The specific process of characteristic dimension increasing and value influence degree difference elimination is as follows:
to X 'obtained by the last step'k(k-1, 2 … 24) based on
Em-1=f2(m)
Conversion into a matrix of m-1 columns, denoted
Figure BDA0003137411340000091
(k-1, 2 … 24). Wherein Em-1A row vector of m-1 columns with the m-1 th column being 1 and the remaining columns being 0, all columns being 0 when m is 1, m being equal to V in the previous stepkThe number of columns.
For continuous characteristic data, abnormal value processing and standardization are mainly carried out, so that the values of all continuous characteristics are in a certain range, namely X is carried outjAnd (j ═ 1,2 … 36), processing the X which takes a continuous type, wherein the X comprises 12 items in total, which are recorded as X, including the installation date of the electric energy meter, the last inspection date of the electric energy meter, the terminal commissioning date, the latest parameter modification time of the terminal, the commissioning date of the metering point, the updating time of the metering point, the operating capacity of the electricity consumer, the last inspection date of the electricity consumer, the difference value of the clock of the terminal and the system, the area of the lowest-level administrative region of the platform area, the wind power and the temperatureg,(g=1,2…12)。
The preprocessing of continuous feature data is divided into two categories, including date/time type feature data preprocessing and other type feature data preprocessing.
The date/time characteristic data comprises 7 items in total, and is marked as X, wherein the 7 items comprise the installation date of the electric energy meter, the last inspection date of the electric energy meter, the commissioning date of the terminal, the latest parameter modification time of the terminal, the commissioning date of the metering point, the updating time of the metering point and the last inspection date of the electricity consumerg(g ═ 1,2 … 7), and the pretreatment comprises the following specific steps:
outlier processing, for a given XgProcessing abnormal value, i.e. XgNull values, invalid values, and the like in (g 1,2 … 7) are collectively replaced with XgThe minimum value (corresponding to date/time, then the earliest date/time) occurring in (d) is subtractedRemoved for 6 months, recorded as X 'after abnormal treatment'g,(g=1,2…7);
And data conversion, namely subtracting the value after the abnormal filling from the current time t' of the system, converting the value into a month, and marking the month as X ″, after the conversion is finishedg,(g=1,2…7);
Standardization, for X ″' obtained in the previous stepg(g ═ 1,2 … 7), based on the formula:
Figure BDA0003137411340000101
(max=1.0,min=0.0,Emax
maximum value of the feature, EminMinimum value for characteristic)
When E ismaxAnd EminAnd (3) equality:
Rescaled(ei)=0.5*(max+min)
standardized treatment is carried out, and the standard treatment is recorded as
Figure BDA0003137411340000102
(g ═ 1,2 … 7), and X ″, is recorded simultaneouslygCorresponding Emax、EminMax and min for use.
The other class feature data includes: the total number of 5 items, namely X, of the running capacity of the electricity customers, the clock difference value between the terminal and the system, the area of the lowest-level administrative region of the transformer area, the wind power and the temperatureg(g-8, 9 … 12), and the pretreatment comprises the following specific steps:
outlier processing, for a given XgProcessing abnormal value, i.e. XgNull values, invalid values, and the like in (g 8,9 … 12) are collectively replaced with XgThe maximum value (corresponding to date/time, the earliest date/time) of the abnormal processing is added with 1, and the value is recorded as X 'after the abnormal processing'g,(g=8,9…12);
Standardizing the data by using the same formula as the data standardization used in the date/time data processing process, and marking the standardized data as the standard data after the data is processed
Figure BDA0003137411340000111
(g=8,9…12)。
For the result identification data, the processing of the daily frozen electric energy indicating value meter reading time quality code of the measuring point is needed, and the quality code is mainly analyzed according to the format, so that the value is 1 only when the automatic meter reading is successful, and the rest values are 0. Is marked as L after the treatment is finishedE. And the method is used for marking the clustered result.
The matrix after all data preprocessing is recorded as:
Figure BDA0003137411340000112
the number of rows of D' is n (n)>0) The number of rows is
Figure BDA0003137411340000113
Wherein v isjAs a discrete feature XjThe number of duplicate removal values.
S3, performing K-means clustering on the samples based on the matrix D', marking clustering results by using result identification to obtain successful cluster centers and failed cluster centers, calculating attribute differences from each failed cluster center to adjacent successful cluster centers and adjacent failed cluster centers, screening and distinguishing the influence factors of the successful clusters and the failed clusters according to the attribute differences, obtaining the attributes and values of the influence factors according to a reverse mapping rule, counting the screened influence factors and sequencing according to the occurrence times.
Clustering samples based on the characteristic data obtained by the processing, marking the clustering result by using a result identifier, marking the clustering result as a 'meter reading success class' and a 'meter reading failure class', and further screening and distinguishing factors of 'success' and 'failure' by analyzing the attribute difference between a 'meter reading failure class' clustering center and a 'meter reading success class' clustering center adjacent to the clustering center, wherein the specific process is as follows:
based on the affinity and the sparsity of sample characteristic data, the samples are divided into m classes by adopting K-means clustering, and the overall characteristic of each class is represented by the attribute of a clustering center.
Divide matrix D' by LEOuter coverThe fraction taken out was recorded as:
Figure BDA0003137411340000121
x is an n (n)>0) Go to,
Figure BDA0003137411340000122
Column (denoted as n)F) A matrix of (a);
setting the clustering number m (m is equal to [15,30]), and using the cost function as:
Figure BDA0003137411340000123
uc(i) represents and x(i)Nearest cluster center point
The K-means clustering algorithm carries out clustering by taking I in X as ID and the rest as characteristics to obtain m classes which are marked as Ci(i ═ 1,2, …, m); the cluster center is marked as uc(i),(i=1,2,…,m)。
Respectively marking the clustering results (m classes) obtained in the last step as a 'meter reading failure class' or a 'meter reading success class', wherein the processing process is as follows:
with IiFor indexing, L in D' isEIs associated to CiAnd calculating CiThe proportion of the medium automatic meter reading failure is recorded as:
Figure BDA0003137411340000124
calculating the proportion of automatic meter reading failure in D', and recording as:
Figure BDA0003137411340000125
setting multiplying power theta (theta is belonged to [1,3 ]]) To Ci(i ═ 1,2, …, m) is labeled, if:
ri*θ≤ravg
then C will beiMarking as the class of automatic meter reading success
Figure BDA0003137411340000126
(i ═ 1,2, …, m), with the cluster centers noted
Figure BDA0003137411340000127
(i ═ 1,2, …, m); otherwise, C is addediMarking as failure of automatic meter reading
Figure BDA0003137411340000128
(i ═ 1,2, …, m), with the cluster centers noted
Figure BDA0003137411340000129
(i=1,2,…,m)。
Screening out the attribute with large difference by analyzing the attribute difference between the clustering center of the meter reading failure class and the clustering center of the meter reading success class, wherein the processing process is as follows:
for a given failure class center
Figure BDA00031374113400001210
Calculate it to all
Figure BDA00031374113400001211
(i ═ 1,2, …, m) and
Figure BDA00031374113400001212
(i ═ 1,2, …, m) the euclidean distance of the cluster centers other than themselves, denoted as Δj,(j=1,2,…,m-1);
Determining DeltajThe maximum value and the minimum value of (j ═ 1,2, …, m-1) are respectively expressed as ΔmaxAnd Δmin
Setting a parameter lambda (lambda belongs to [1,5]), and finding out the following conditions:
Figure BDA0003137411340000131
delta ofj(j ═ 1,2, …, m-1) the corresponding successful clustering center, and it is recorded as the neighboring successful clustering center
Figure BDA0003137411340000132
(j ═ 1,2, …, n); and a failure clustering center, which is recorded as a neighboring failure clustering center
Figure BDA0003137411340000133
(j=1,2,…,m-n-1);
For selected
Figure BDA0003137411340000134
Calculate it and each
Figure BDA0003137411340000135
Is poor in the attribute of (1), is recorded as
Figure BDA0003137411340000136
Figure BDA0003137411340000137
Wherein, (j ═ 1,2, …, m-n-1), (k ═ 1,2, …, nF-1);
For selected
Figure BDA0003137411340000138
Calculate it and each
Figure BDA0003137411340000139
Is poor in the attribute of (1), is recorded as
Figure BDA00031374113400001310
Figure BDA00031374113400001311
Wherein, (j ═ 1,2, …, n), (k ═ 1,2, …, nF-1);
Setting attribute difference threshold gamma (gamma belongs to (0,1)), and counting
Figure BDA00031374113400001312
Each of which satisfies
Figure BDA00031374113400001313
(j=1,2,…,m-n-1),(k=1,2,…,nF-1)
The results are reported as:
Figure BDA00031374113400001314
(j=1,2,…,m-n-1),(k=1,2,…,nF-1);
in the same way, make statistics of
Figure BDA00031374113400001315
Each of the attributes satisfies:
Figure BDA0003137411340000141
(j=1,2,…,n),(k=1,2,…,nF-1)
the results are reported as:
Figure BDA0003137411340000142
(j=1,2,…,n),(k=1,2,…,nF-1);
according to the rules:
Figure BDA0003137411340000143
(k=1,2,…,nF-1),
for a given column, calculate phikThe final result is summarized as:
Φi=[φ1 φ2…φk],(k=1,2,…,nF-1)
traverse phiiIf phi isk>0, then will phikThe corresponding attribute is screened as an impact factor.
And performing reverse engineering according to the remapping rule, the characteristic dimension-increasing function and parameter and the standardized function and parameter, and calculating the attribute name and the attribute value corresponding to the factor.
According to the difference of the data preprocessing steps, the following two cases can be divided:
for discrete attributes, it is only necessary to perform the feature dimension-raising function and the inverse remapping reversely, and the specific process is as follows:
if it is not
Figure BDA0003137411340000144
Phi is thenkCorresponding to a discrete attribute, further if
Figure BDA0003137411340000145
(i=1,2…24)
Then phi can be determinedkCorresponding attribute being in D
Figure BDA0003137411340000146
By the equation:
Figure BDA0003137411340000147
m can be calculated, binding:
m=f1(vjm),(m=1,2…m)
the value v corresponding to the attribute can be obtained by calculationjm
For continuous attributes, an inverse normalization process is performed, which specifically includes the following steps:
if it is not
Figure BDA0003137411340000151
Phi is thenkCorresponding continuous type attribute
Figure BDA0003137411340000152
Further, removing
Figure BDA0003137411340000153
The values in column k, in combination with the normalization formula:
Figure BDA0003137411340000154
and recorded Emax、EminMax, min, the value e of the attribute can be obtained by calculationi
Integrating the influence factors (attributes) and corresponding values screened out by the discrete attributes and the continuous attributes, and recording as follows:
Figure BDA0003137411340000155
and S35, sorting the result.
Obtained by making a mark
Figure BDA0003137411340000156
And (i-1, 2, …, m), repeating the processes of the steps S33, S34 and S35, merging the screened factors, and counting the times of occurrence of the factors in the repeated iteration process.
Setting a threshold value tau of the occurrence times, (tau is equal to [1, m ∈ ]]) And filtering the combined factors, only keeping the factors with the occurrence frequency larger than tau, and sorting according to the occurrence frequency from high to low. The combined, filtered and sequenced influence factors and values thereof are recorded as a set
Figure BDA0003137411340000157
S4 selects a date sequence, executes steps S1-S3 for each date in the sequence, counts the occurrence frequency of each influence factor, compares the occurrence frequency with a set threshold value, and divides the influence factors into a long-term factor of the automatic meter reading success rate and a short-term factor of the automatic meter reading success rate.
Based on time sequence analysis, the results of factor screening are classified into long-term factors and short-term fluctuation factors, and the analysis steps are as follows:
selecting a date sequence:
T=[t1 t2…tj]
for each T in the sequence TjRepeating the analysis process of S1-S3, and combining the screened influence factors and the value set thereof
Figure BDA0003137411340000161
To obtain
Figure BDA0003137411340000162
Simultaneously counting the occurrence number k of each factorn
Setting a degree threshold value sigma, (sigma belongs to [1, j ]]) Comparing the number of occurrences K of each factor in the set KnAnd size of σ, all knFactor of more than or equal to sigma and value composition set K1(ii) a All k are put togethern<Factor of sigma and value composition set K2
K1The medium elements are long-term factors influencing the success rate of automatic meter reading; k2The elements in the method are short-term factors influencing the success rate of automatic meter reading. If K is1Or/and K2And if the number is null, the factor influencing the automatic meter reading success rate is not found.
According to the embodiment of the invention, by utilizing a big data clustering algorithm and a time sequence-based analysis method, the influence of multiple factors on an automatic meter reading result can be simultaneously analyzed, and long-term factors and short-term factors (generally temporary factors causing fluctuation of the automatic meter reading success rate) influencing the automatic meter reading success rate can be found out by prolonging the time sequence of analysis, so that reference is provided for improving the automatic meter reading success rate; and the analysis is carried out by a big data algorithm, so that the labor, material and time costs are greatly saved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. The automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis is characterized by comprising the following operations:
s1, acquiring relevant data of the area for automatic meter reading in the area to be analyzed to form a matrix D;
s2, dividing the relevant data of the transformer area into discrete characteristic data, continuous characteristic data and result identification data, respectively preprocessing, and combining the characteristic data into a matrix:
Figure FDA0003137411330000011
Figure FDA0003137411330000012
wherein I is a column vector formed by ID of the electric energy meter, LEThe data acquisition method comprises the following steps of (1) freezing a column vector formed by electric energy reading time and mass codes at a measuring point day, and forming a column vector formed by 36 data respectively as the rest;
s3, performing K-means clustering on the samples based on the matrix D', marking clustering results by using result identification to obtain successful cluster centers and failed cluster centers, calculating attribute differences from each failed cluster center to adjacent successful cluster centers and adjacent failed cluster centers, screening and distinguishing influence factors of the successful clusters and the failed clusters according to the attribute differences, obtaining attributes and values of the influence factors according to a reverse mapping rule, counting the screened influence factors and sequencing according to occurrence times;
s4 selects a date sequence, executes steps S1-S3 for each date in the sequence, counts the occurrence frequency of each influence factor, compares the occurrence frequency with a set threshold value, and divides the influence factors into a long-term factor of the automatic meter reading success rate and a short-term factor of the automatic meter reading success rate.
2. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the relevant data of the distribution room comprises archive data, operation data, and geographic data and meteorological data of the distribution room.
3. The method for analyzing the influence factors of the automatic meter reading success rate based on the clustering and the time sequence analysis as claimed in claim 1, wherein the preprocessing process of the discrete feature data comprises value duplication removal and remapping, feature dimension increasing and value influence degree difference elimination;
the value deduplication and remapping is specifically:
for discrete characteristic value XkAnd (k is 1, 2.. 24), carrying out duplicate removal on values in the column vectors, and recording the values after the duplicate removal as:
Vk=[vk1 vk2 … vkm](m >0, m being the column vector XkNumber of values after weight removal)
Remapping various encoding values into continuous integer values starting from 1;
i.e. when m is greater than or equal to 2, based on
m=f1(vkm),(m=1,2...m)
Establishing vkmMapping to m, and mapping Xk(k 1, 2.. 24) in which x isknValue v ofkmIs replaced by m and is recorded as X after replacement'k,(k=1,2...24);
The specific process of characteristic dimension increasing and value influence degree difference elimination is as follows:
to X 'obtained by the last step'k(k 1, 2.. 24) based on
Em-1=f2(m)
Conversion into a matrix of m-1 columns, denoted
Figure FDA0003137411330000021
Wherein Em-1A row vector of m-1 columns with the m-1 th column being 1 and the remaining columns being 0, all columns being 0 when m is 1, m being equal to V in the previous stepkThe number of columns.
4. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the marking of clustering results by using result identification to obtain successful cluster centers and failed cluster centers specifically comprises:
the clustering result obtains m classes, which are marked as Ci(i ═ 1,2, …, m); the cluster center is marked as uc(i),(i=1,2,…,m);
With IiFor indexing, L in D' isEIs associated to CiAnd calculating CiThe proportion of the medium automatic meter reading failure is recorded as:
Figure FDA0003137411330000022
calculating the proportion of automatic meter reading failure in D', and recording as:
Figure FDA0003137411330000023
setting multiplying power theta (theta is belonged to [1,3 ]]) To Ci(i ═ 1,2, …, m) is labeled, if:
ri*θ≤ravg
then C will beiMarking as success class, marking as success class
Figure FDA0003137411330000024
Simultaneous clustering centers as
Figure FDA0003137411330000025
Figure FDA0003137411330000026
Otherwise, C is addediMarked as lossiness, marked as
Figure FDA0003137411330000027
Simultaneous clustering centers as
Figure FDA0003137411330000028
5. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the calculating of the attribute difference from each failed clustering center to the adjacent successful clustering center and the adjacent failed clustering center specifically comprises:
for a given failure class center
Figure FDA0003137411330000031
Calculate it to all
Figure FDA0003137411330000032
And
Figure FDA0003137411330000033
the Euclidean distance of the cluster center except the cluster center is marked as deltaj,(j=1,2,…,m-1);
Determining DeltajThe maximum value and the minimum value of (j ═ 1,2, …, m-1) are respectively expressed as ΔmaxAnd Δmin
Setting a parameter lambda (lambda belongs to [1,5]), and finding out the following conditions:
Figure FDA0003137411330000034
delta ofj(j ═ 1,2, …, m-1) the corresponding successful clustering center, and it is recorded as the neighboring successful clustering center
Figure FDA0003137411330000035
And a failure clustering center, which is recorded as a neighboring failure clustering center
Figure FDA0003137411330000036
Figure FDA0003137411330000037
For selected
Figure FDA0003137411330000038
Calculate it and each
Figure FDA0003137411330000039
Is poor in the attribute of (1), is recorded as
Figure FDA00031374113300000310
Figure FDA00031374113300000311
Wherein, (j ═ 1,2, …, m-n-1), (k ═ 1,2, …, nF-1);
For selected
Figure FDA00031374113300000312
Calculate it and each
Figure FDA00031374113300000313
Is poor in the attribute of (1), is recorded as
Figure FDA00031374113300000314
Figure FDA00031374113300000315
Wherein, (j ═ 1,2, …, n), (k ═ 1,2, …, nF-1)。
6. The method for analyzing the influence factors of the automatic meter reading success rate based on the clustering and the time sequence analysis according to claim 5, wherein the screening of the influence factors for distinguishing the success class from the failure class according to the attribute difference specifically comprises:
setting attribute difference threshold gamma (gamma belongs to (0,1)), and counting
Figure FDA00031374113300000316
Each of which satisfies
Figure FDA0003137411330000041
The results are reported as:
Figure FDA0003137411330000042
in the same way, make statistics of
Figure FDA0003137411330000043
Each of the attributes satisfies:
Figure FDA0003137411330000044
the results are reported as:
Figure FDA0003137411330000045
according to the rules:
Figure FDA0003137411330000046
for a given column, calculate phikThe final result is summarized as:
Φi=[φ1 φ2 … φk],(k=1,2,…,nF-1)
traverse phiiIf phi iskIf greater than 0, will be phikThe corresponding attribute is screened as an impact factor.
7. The method for analyzing the influence factors of the automatic meter reading success rate based on the clustering and the time sequence analysis according to claim 1, wherein the obtaining of the attributes and the values of the influence factors according to the inverse mapping rule specifically comprises:
for discrete attributes, performing a feature dimension-lifting function and inverse remapping in an inverse manner;
for continuous attributes, performing a reverse normalization process;
and integrating the attributes and values of the influence factors screened out by the discrete attributes and the continuous attributes.
8. The method of claim 7, wherein the reversely executing the feature dimension-increasing function and the reversely remapping are specifically:
if it is not
Figure FDA0003137411330000051
Phi is thenkCorresponding to a discrete attribute, further if
Figure FDA0003137411330000052
Then phi can be determinedkCorresponding attribute being in D
Figure FDA0003137411330000053
By the equation:
Figure FDA0003137411330000054
m can be calculated, binding:
m=f1(vjm),(m=1,2...m)
the value v corresponding to the attribute can be obtained by calculationjm
9. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 7, wherein the reverse standardization process specifically comprises:
if it is not
Figure FDA0003137411330000055
Phi is thenkCorresponding continuous type attribute
Figure FDA0003137411330000056
Further, removing
Figure FDA0003137411330000057
The values in column k, in combination with the normalization formula:
Figure FDA0003137411330000058
and recorded Emax、EminMax, min, the value e of the attribute can be obtained by calculationi
10. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the step S4 specifically comprises:
selecting a date sequence:
T=[t1 t2 … tj]
for each T in the sequence TjRepeating the analysis process of S1-S3, and mergingScreened influence factors and value set thereof
Figure FDA0003137411330000059
To obtain
Figure FDA00031374113300000510
Simultaneously counting the occurrence number k of each factorn
Setting a degree threshold value sigma, (sigma belongs to [1, j ]]) Comparing the number of occurrences K of each factor in the set KnAnd size of σ, all knFactor of more than or equal to sigma and value composition set K1(ii) a All k are put togethernFactor < sigma and its value composition set K2
K1The medium elements are long-term factors influencing the success rate of automatic meter reading; k2The elements in the method are short-term factors influencing the success rate of automatic meter reading.
CN202110723230.0A 2021-06-29 2021-06-29 Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis Pending CN113344742A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110723230.0A CN113344742A (en) 2021-06-29 2021-06-29 Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110723230.0A CN113344742A (en) 2021-06-29 2021-06-29 Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis

Publications (1)

Publication Number Publication Date
CN113344742A true CN113344742A (en) 2021-09-03

Family

ID=77481173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110723230.0A Pending CN113344742A (en) 2021-06-29 2021-06-29 Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis

Country Status (1)

Country Link
CN (1) CN113344742A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881372A (en) * 2023-09-08 2023-10-13 清华大学 Water meter metering big data optimization processing method and system based on Internet of things

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881372A (en) * 2023-09-08 2023-10-13 清华大学 Water meter metering big data optimization processing method and system based on Internet of things
CN116881372B (en) * 2023-09-08 2023-12-05 清华大学 Water meter metering big data optimization processing method and system based on Internet of things

Similar Documents

Publication Publication Date Title
CN110097297B (en) Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium
CN110223196B (en) Anti-electricity-stealing analysis method based on typical industry feature library and anti-electricity-stealing sample library
CN110634080B (en) Abnormal electricity utilization detection method, device, equipment and computer readable storage medium
CN107609783B (en) Method and system for evaluating comprehensive performance of intelligent electric energy meter based on data mining
CN110610121B (en) Small-scale source load power abnormal data identification and restoration method based on curve clustering
CN111160401A (en) Abnormal electricity utilization judging method based on mean shift and XGboost
CN111177216B (en) Association rule generation method and device for comprehensive energy consumer behavior characteristics
CN111177208A (en) Power consumption abnormity detection method based on big data analysis
CN110738232A (en) grid voltage out-of-limit cause diagnosis method based on data mining technology
CN110889441A (en) Distance and point density based substation equipment data anomaly identification method
CN116148753A (en) Intelligent electric energy meter operation error monitoring system
CN115952429A (en) Self-adaptive DBSCAN abnormal battery identification method based on Euclidean distance without prior weight
CN115130578A (en) Incremental rough clustering-based online evaluation method for state of power distribution equipment
CN113344742A (en) Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis
CN108596227A (en) A kind of leading influence factor method for digging of user power utilization behavior
CN111126445A (en) Multi-step aggregation load prediction method for mass data of intelligent electric meter
CN111612054B (en) User electricity stealing behavior identification method based on nonnegative matrix factorization and density clustering
CN110781959A (en) Power customer clustering method based on BIRCH algorithm and random forest algorithm
CN111324790A (en) Load type identification method based on support vector machine classification
CN115733258A (en) Control method of all-indoor intelligent substation system based on Internet of things technology
CN115358355A (en) Method and device for judging main transformer oil temperature gauge and top layer oil temperature abnormity
CN111861141B (en) Power distribution network reliability assessment method based on fuzzy fault rate prediction
CN113902485A (en) Special power user industry identification method, device and equipment
CN113869601A (en) Power consumer load prediction method, device and equipment
CN113723671A (en) Data clustering analysis method based on big data of power utilization condition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210903