CN113344742A - Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis - Google Patents
Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis Download PDFInfo
- Publication number
- CN113344742A CN113344742A CN202110723230.0A CN202110723230A CN113344742A CN 113344742 A CN113344742 A CN 113344742A CN 202110723230 A CN202110723230 A CN 202110723230A CN 113344742 A CN113344742 A CN 113344742A
- Authority
- CN
- China
- Prior art keywords
- clustering
- meter reading
- automatic meter
- success rate
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000012300 Sequence Analysis Methods 0.000 title claims abstract description 20
- 238000000556 factor analysis Methods 0.000 title claims abstract description 9
- 238000004458 analytical method Methods 0.000 claims abstract description 16
- 230000007774 longterm Effects 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 18
- 239000011159 matrix material Substances 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 11
- 238000012216 screening Methods 0.000 claims description 9
- 238000003064 k means clustering Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000008030 elimination Effects 0.000 claims description 6
- 238000003379 elimination reaction Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000008014 freezing Effects 0.000 claims description 3
- 238000007710 freezing Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 239000000463 material Substances 0.000 abstract description 3
- 238000012545 processing Methods 0.000 description 15
- 230000005611 electricity Effects 0.000 description 14
- 238000004891 communication Methods 0.000 description 10
- 230000002159 abnormal effect Effects 0.000 description 7
- 238000007689 inspection Methods 0.000 description 5
- 238000009434 installation Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000011282 treatment Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000000819 phase cycle Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011272 standard treatment Methods 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Testing Or Calibration Of Command Recording Devices (AREA)
Abstract
The invention provides an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, which utilizes a big data clustering algorithm and an analysis method based on time sequence, not only can simultaneously analyze the influence of various factors on an automatic meter reading result, but also can find out long-term factors and short-term factors influencing the automatic meter reading success rate by prolonging the time sequence of analysis, thereby providing reference for improving the automatic meter reading success rate; and the analysis is carried out by a big data algorithm, so that the labor, material and time costs are greatly saved.
Description
Technical Field
The invention relates to the technical field of power consumption information acquisition of a power system, in particular to an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis.
Background
With the development of the smart power grid, diversified informationized and digitalized equipment is applied to the power utilization information acquisition system, so that the automatic acquisition of the power utilization information of the user is realized, and the power utilization management efficiency is greatly improved. In the electricity consumption information acquisition system, the meter reading success rate is the basis and the premise for carrying out analysis and treatment of line loss of the transformer area and refining operation decisions. However, in actual use, there are many reasons that affect the success rate of automatic meter reading, including system equipment factors, such as meter reading communication modules, communication modes, communication parameters, and the like; environmental factors such as GPRS signal interference and high mountain obstruction cause bad GPRS signals, communication abnormity caused by extreme environments and the like; human factors, such as weak wiring during installation, and a long distance between an equipment installation area and a control area, cause signal confusion, signal attenuation and the like. The problems that the success rate of automatic meter reading is not ideal and the success rate of automatic meter reading is difficult to improve can occur due to the influence of one or more factors in the information acquisition process. Therefore, if factors influencing the automatic meter reading success rate can be found out in a complex environment, a targeted treatment strategy can be formulated, and the automatic meter reading success rate is effectively improved.
Disclosure of Invention
The invention aims to provide an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, and aims to solve the problem that the influence factor of the automatic meter reading success rate cannot be obtained in the prior art, find out the influence factor of the automatic meter reading success rate in a complex environment and improve the automatic meter reading success rate.
In order to achieve the technical purpose, the invention provides an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, which comprises the following operations:
s1, acquiring relevant data of the area for automatic meter reading in the area to be analyzed to form a matrix D;
s2, dividing the relevant data of the transformer area into discrete characteristic data, continuous characteristic data and result identification data, respectively preprocessing, and combining the characteristic data into a matrix:
wherein I is a column vector formed by the ID of the electric energy meter,LEthe data acquisition method comprises the following steps of (1) freezing a column vector formed by electric energy reading time and mass codes at a measuring point day, and forming a column vector formed by 36 data respectively as the rest;
s3, performing K-means clustering on the samples based on the matrix D', marking clustering results by using result identification to obtain successful cluster centers and failed cluster centers, calculating attribute differences from each failed cluster center to adjacent successful cluster centers and adjacent failed cluster centers, screening and distinguishing influence factors of the successful clusters and the failed clusters according to the attribute differences, obtaining attributes and values of the influence factors according to a reverse mapping rule, counting the screened influence factors and sequencing according to occurrence times;
s4 selects a date sequence, executes steps S1-S3 for each date in the sequence, counts the occurrence frequency of each influence factor, compares the occurrence frequency with a set threshold value, and divides the influence factors into a long-term factor of the automatic meter reading success rate and a short-term factor of the automatic meter reading success rate.
Preferably, the data related to the distribution area comprises archive data, operation data, and geographic data and meteorological data of the distribution area.
Preferably, the preprocessing process of the discrete feature data includes value de-duplication, remapping, feature dimension raising and value influence degree difference elimination;
the value deduplication and remapping is specifically:
for discrete characteristic value XkAnd (k is 1,2 … 24), removing the weight of the value in the column vector, and recording the value after removing the weight as:
Vk=[vk1vk2…vkm],(m>0, m is a column vector XkNumber of values after weight removal)
Remapping various encoding values into continuous integer values starting from 1;
i.e. when m is greater than or equal to 2, based on
m=f1(vkm),(m=1,2…m)
Establishing vkmMapping to m, and mapping XkWhere x is 1,2 … 24knValue v ofkmIs replaced by m and is recorded as X after replacement'k,(k=1,2…24);
The specific process of characteristic dimension increasing and value influence degree difference elimination is as follows:
to X 'obtained by the last step'k(k-1, 2 … 24) based on
Em-1=f2(m)
Conversion into a matrix of m-1 columns, denotedWherein Em-1A row vector of m-1 columns with the m-1 th column being 1 and the remaining columns being 0, all columns being 0 when m is 1, m being equal to V in the previous stepkThe number of columns.
Preferably, the marking the clustering result by using the result identifier to obtain the successful clustering center and the failed clustering center specifically comprises:
the clustering result obtains m classes, which are marked as Ci(i ═ 1,2, …, m); the cluster center is marked as uc(i),(i=1,2,…,m);
With IiFor indexing, L in D' isEIs associated to CiAnd calculating CiThe proportion of the medium automatic meter reading failure is recorded as:
calculating the proportion of automatic meter reading failure in D', and recording as:
setting multiplying power theta (theta is belonged to [1,3 ]]) To Ci(i ═ 1,2, …, m) is labeled, if:
ri*θ≤ravg
then C will beiMarking as success class, marking as success class(i=1,2, …, m) with cluster centers as(i ═ 1,2, …, m); otherwise, C is addediMarked as lossiness, marked asi is 1,2, …, m), with the cluster center noted as(i=1,2,…,m)。
Preferably, the calculating the attribute difference from each failure class center to the adjacent success class center and the adjacent failure class center is specifically as follows:
for a given failure class centerCalculate it to all(i ═ 1,2, …, m) and(i ═ 1,2, …, m) the euclidean distance of the cluster centers other than themselves, denoted as Δj,(j=1,2,…,m-1);
Determining DeltajThe maximum value and the minimum value of (j ═ 1,2, …, m-1) are respectively expressed as ΔmaxAnd Δmin;
Setting a parameter lambda (lambda belongs to [1,5]), and finding out the following conditions:
delta ofj(j ═ 1,2, …, m-1) the corresponding successful clustering center, and it is recorded as the neighboring successful clustering center(j ═ 1,2, …, n); and a failure clustering center, which is recorded as a neighboring failure clustering center(j=1,2,…,m-n-1);
Wherein, (j ═ 1,2, …, m-n-1), (k ═ 1,2, …, nF-1);
Wherein, (j ═ 1,2, …, n), (k ═ 1,2, …, nF-1)。
Preferably, the screening of the impact factors for distinguishing the success class from the failure class according to the attribute difference specifically includes:
setting attribute difference threshold gamma (gamma belongs to (0,1)), and countingEach of which satisfies
The results are reported as:
(j=1,2,…,m-n-1),(k=1,2,…,nF-1);
the results are reported as:
(j=1,2,…,n),(k=1,2,…,nF-1);
according to the rules:
for a given column, calculate phikThe final result is summarized as:
Φi=[φ1 φ2…φk],(k=1,2,…,nF-1)
traverse phiiIf phi isk>0, then will phikThe corresponding attribute is screened as an impact factor.
Preferably, the obtaining of the attribute and the value of the impact factor according to the inverse mapping rule specifically includes:
for discrete attributes, performing a feature dimension-lifting function and inverse remapping in an inverse manner;
for continuous attributes, performing a reverse normalization process;
and integrating the attributes and values of the influence factors screened out by the discrete attributes and the continuous attributes.
Preferably, the inverse execution characteristic dimension-increasing function and the inverse remapping are specifically:
By the equation:
m can be calculated, binding:
m=f1(vjm),(m=1,2…m)
the value v corresponding to the attribute can be obtained by calculationjm。
Preferably, the reverse normalization process is specifically:
and recorded Emax、EminMax, min, the value e of the attribute can be obtained by calculationi。
Preferably, the step S4 is specifically:
selecting a date sequence:
T=[t1 t2…tj]
for each T in the sequence TjRepeating the analysis process of S1-S3, and combining the screened influence factors and the value set thereofTo obtainSimultaneously counting the occurrence number k of each factorn;
Setting a degree threshold value sigma, (sigma belongs to [1, j ]]) Comparing the number of occurrences K of each factor in the set KnAnd size of σ, all knFactor of more than or equal to sigma and value composition set K1(ii) a All k are put togethern<Factor of sigma and value composition set K2;
K1The medium elements are long-term factors influencing the success rate of automatic meter reading; k2The elements in the method are short-term factors influencing the success rate of automatic meter reading.
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
compared with the prior art, the method utilizes a big data clustering algorithm and a time sequence-based analysis method, not only can simultaneously analyze the influence of various factors on the automatic meter reading result, but also can find out long-term factors and short-term factors (generally temporary factors causing fluctuation of the automatic meter reading success rate) influencing the automatic meter reading success rate by prolonging the time sequence of analysis, and provides reference for improving the automatic meter reading success rate; and the analysis is carried out by a big data algorithm, so that the labor, material and time costs are greatly saved.
Drawings
Fig. 1 is a flowchart of an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis provided in an embodiment of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis provided by the embodiment of the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention discloses an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, including the following steps:
and S1, acquiring the relevant data of the area for automatic meter reading in the area to be analyzed, and forming a matrix D.
The embodiment of the invention analyzes and processes the existing archive data, operation data, basic geographical data of a transformer area, meteorological data of the transformer area and the like by utilizing the factor screening based on clustering and the factor classification based on time sequence analysis, and finds out the factors influencing the success rate of automatic meter reading.
And selecting a date to be analyzed, recording the date as t', and acquiring related file data and operation data of the intelligent electric meter and the transformer area which are automatically read in the area to be analyzed on the date, and geographic data and meteorological data of the transformer area.
The archive data is mainly acquired from a power utilization information acquisition system, and relates to basic archive data of an electric energy meter, an operation terminal, a metering point, a power utilization client and a distribution room, and the basic archive data mainly comprises an electric energy meter unique identifier (usually ID), an electric energy meter communication mode, an electric energy meter manufacturer, an electric energy meter communication baud rate, an electric energy meter installation date, an electric energy meter last inspection date, an electric energy meter version, an electric energy meter type, a phase sequence, a client type, a terminal manufacturer, a terminal acquisition mode, a terminal priority power supply mode, a terminal communication protocol, a terminal commissioning date, a terminal latest parameter modification time, a metering point wiring mode, a metering point metering mode, a metering point commissioning date, a metering point updating time, a client sub-type, a line ID, a power utilization client classification, a power utilization client industry classification, a power utilization client power utilization type, a power utilization client operation capacity, The last check date of the electricity utilization client, the urban and rural categories of the electricity utilization client and the public and private transformer identification of the transformer area count 30 data.
The operation data is mainly acquired from the electricity consumption information acquisition system, relates to the daily frozen electric energy reading number of the measuring point and the operation data of the terminal, and mainly comprises 4 data in total, namely a daily frozen electric energy reading time quality code of the measuring point, the online state of the terminal, a successful time tick mark of the terminal and a clock difference value between the terminal and the system.
The geographical data of the affiliated district is mainly the area of the lowest-level administrative region to which the district belongs, and can be acquired from national government websites of province/city/county, and 1 data is counted.
The meteorological data of the affiliated region is mainly meteorological data of an administrative region to which the region belongs, and the meteorological data comprises weather, wind power, temperature/air temperature data which account for 3 items.
And integrating all the data, taking I D of the electric energy meter as a unique identifier, taking the electric energy meter as an analysis object, and integrating the data into a matrix with n rows and 38 columns by combining the interrelationship among the electric energy meter, an operation terminal, a metering point, a transformer area and an administrative area, wherein n is the number of the electric energy meters in the selected analysis area, and n is greater than 0.
Wherein I ═]i1 i2…in]T,(n>0) A column vector formed by the ID of the electric energy meter; xj=[xj1 xj2…xjn]T,(n>0, j is 1,2 … 36) are column vectors respectively formed by the other 36 data except the unique identifier of the electric energy meter and the daily frozen electric energy reading time and quality code of the measuring point; l ═ L1 l2…ln]T,(n>0) And the column vector is formed by the time and quality codes of daily freezing electric energy reading meter reading of the measuring point.
And S2, dividing the relevant data of the transformer area into discrete characteristic data, continuous characteristic data and result identification data, respectively preprocessing the discrete characteristic data, the continuous characteristic data and the result identification data, and combining the characteristic data into a matrix D'.
The data are preprocessed and divided into three types: discrete feature data, continuous feature data, and result identification data.
For discrete feature data, the influence difference of different values of discrete features on the K-means clustering result is required to be eliminated under the condition that the feature dimension is as small as possible. For example, an electricity meter manufacturer has A, B, C three manufacturers, which are respectively coded as 1,2 and 3, the distances between each two values 1,2 and 3 are different, and if the spatial distance between samples is used for judging the affinity and the sparsity and clustering, A, B, C manufacturers which are equal to each other generate different influence degrees on clustering results; and if the A, B, C manufacturers are respectively coded as [ 00 ], [ 01 ] and [ 10 ], the distance between every two manufacturers is 1, and the influence degrees of the manufacturers on the K-means clustering result are the same.
Need to be aligned with XjAnd (j is 1,2 … 36), processing X which is taken as a discrete type in the step (j) and comprises 24 items which are recorded as X, wherein the X comprises an electric energy meter communication mode, an electric energy meter manufacturer, an electric energy meter communication baud rate, an electric energy meter version, an electric energy meter category, a phase sequence, a client type, a terminal manufacturer, a terminal acquisition mode, a terminal priority power supply mode, a terminal communication protocol, a metering point wiring mode, a metering point metering mode, a client subtype, a line ID, an electricity client classification, an electricity client industry classification, an electricity client electricity utilization category, an electricity client urban and rural category, a platform public and private transformer identifier, a terminal online state, a terminal time synchronization success identifier and weather, and the X is recorded as Xk,(k=1,2…24)。
The preprocessing process of the discrete feature data comprises the following steps: value deduplication and remapping; and (4) feature dimension increasing and value influence degree difference elimination.
The specific process of the value deduplication and remapping is as follows:
the values of the discrete features (e.g., manufacturer code, type code, etc.) are de-duplicated. I.e. for a given Xk(k is 1,2 … 24), the values in the column vector are deduplicated (null values and invalid values are also taken as separate valid values), and the deduplicated values are expressed as:
Vk=[vk1 vk2…vkm],(m>0, m is a column vector XkNumber of values after weight removal)
Then, remapping various encoding values into continuous integer values starting from 1;
i.e. when m is greater than or equal to 2, based on
m=f1(vkm),(m=1,2…m)
Establishing vkmMapping to m, and mapping XkWhere x is 1,2 … 24knValue v ofkmIs replaced by m and is recorded as X after replacement'k,(k=1,2…24)。
The specific process of characteristic dimension increasing and value influence degree difference elimination is as follows:
to X 'obtained by the last step'k(k-1, 2 … 24) based on
Em-1=f2(m)
Conversion into a matrix of m-1 columns, denoted(k-1, 2 … 24). Wherein Em-1A row vector of m-1 columns with the m-1 th column being 1 and the remaining columns being 0, all columns being 0 when m is 1, m being equal to V in the previous stepkThe number of columns.
For continuous characteristic data, abnormal value processing and standardization are mainly carried out, so that the values of all continuous characteristics are in a certain range, namely X is carried outjAnd (j ═ 1,2 … 36), processing the X which takes a continuous type, wherein the X comprises 12 items in total, which are recorded as X, including the installation date of the electric energy meter, the last inspection date of the electric energy meter, the terminal commissioning date, the latest parameter modification time of the terminal, the commissioning date of the metering point, the updating time of the metering point, the operating capacity of the electricity consumer, the last inspection date of the electricity consumer, the difference value of the clock of the terminal and the system, the area of the lowest-level administrative region of the platform area, the wind power and the temperatureg,(g=1,2…12)。
The preprocessing of continuous feature data is divided into two categories, including date/time type feature data preprocessing and other type feature data preprocessing.
The date/time characteristic data comprises 7 items in total, and is marked as X, wherein the 7 items comprise the installation date of the electric energy meter, the last inspection date of the electric energy meter, the commissioning date of the terminal, the latest parameter modification time of the terminal, the commissioning date of the metering point, the updating time of the metering point and the last inspection date of the electricity consumerg(g ═ 1,2 … 7), and the pretreatment comprises the following specific steps:
outlier processing, for a given XgProcessing abnormal value, i.e. XgNull values, invalid values, and the like in (g 1,2 … 7) are collectively replaced with XgThe minimum value (corresponding to date/time, then the earliest date/time) occurring in (d) is subtractedRemoved for 6 months, recorded as X 'after abnormal treatment'g,(g=1,2…7);
And data conversion, namely subtracting the value after the abnormal filling from the current time t' of the system, converting the value into a month, and marking the month as X ″, after the conversion is finishedg,(g=1,2…7);
Standardization, for X ″' obtained in the previous stepg(g ═ 1,2 … 7), based on the formula:
maximum value of the feature, EminMinimum value for characteristic)
When E ismaxAnd EminAnd (3) equality:
Rescaled(ei)=0.5*(max+min)
standardized treatment is carried out, and the standard treatment is recorded as(g ═ 1,2 … 7), and X ″, is recorded simultaneouslygCorresponding Emax、EminMax and min for use.
The other class feature data includes: the total number of 5 items, namely X, of the running capacity of the electricity customers, the clock difference value between the terminal and the system, the area of the lowest-level administrative region of the transformer area, the wind power and the temperatureg(g-8, 9 … 12), and the pretreatment comprises the following specific steps:
outlier processing, for a given XgProcessing abnormal value, i.e. XgNull values, invalid values, and the like in (g 8,9 … 12) are collectively replaced with XgThe maximum value (corresponding to date/time, the earliest date/time) of the abnormal processing is added with 1, and the value is recorded as X 'after the abnormal processing'g,(g=8,9…12);
Standardizing the data by using the same formula as the data standardization used in the date/time data processing process, and marking the standardized data as the standard data after the data is processed(g=8,9…12)。
For the result identification data, the processing of the daily frozen electric energy indicating value meter reading time quality code of the measuring point is needed, and the quality code is mainly analyzed according to the format, so that the value is 1 only when the automatic meter reading is successful, and the rest values are 0. Is marked as L after the treatment is finishedE. And the method is used for marking the clustered result.
The matrix after all data preprocessing is recorded as:
the number of rows of D' is n (n)>0) The number of rows isWherein v isjAs a discrete feature XjThe number of duplicate removal values.
S3, performing K-means clustering on the samples based on the matrix D', marking clustering results by using result identification to obtain successful cluster centers and failed cluster centers, calculating attribute differences from each failed cluster center to adjacent successful cluster centers and adjacent failed cluster centers, screening and distinguishing the influence factors of the successful clusters and the failed clusters according to the attribute differences, obtaining the attributes and values of the influence factors according to a reverse mapping rule, counting the screened influence factors and sequencing according to the occurrence times.
Clustering samples based on the characteristic data obtained by the processing, marking the clustering result by using a result identifier, marking the clustering result as a 'meter reading success class' and a 'meter reading failure class', and further screening and distinguishing factors of 'success' and 'failure' by analyzing the attribute difference between a 'meter reading failure class' clustering center and a 'meter reading success class' clustering center adjacent to the clustering center, wherein the specific process is as follows:
based on the affinity and the sparsity of sample characteristic data, the samples are divided into m classes by adopting K-means clustering, and the overall characteristic of each class is represented by the attribute of a clustering center.
Divide matrix D' by LEOuter coverThe fraction taken out was recorded as:
setting the clustering number m (m is equal to [15,30]), and using the cost function as:
uc(i) represents and x(i)Nearest cluster center point
The K-means clustering algorithm carries out clustering by taking I in X as ID and the rest as characteristics to obtain m classes which are marked as Ci(i ═ 1,2, …, m); the cluster center is marked as uc(i),(i=1,2,…,m)。
Respectively marking the clustering results (m classes) obtained in the last step as a 'meter reading failure class' or a 'meter reading success class', wherein the processing process is as follows:
with IiFor indexing, L in D' isEIs associated to CiAnd calculating CiThe proportion of the medium automatic meter reading failure is recorded as:
calculating the proportion of automatic meter reading failure in D', and recording as:
setting multiplying power theta (theta is belonged to [1,3 ]]) To Ci(i ═ 1,2, …, m) is labeled, if:
ri*θ≤ravg
then C will beiMarking as the class of automatic meter reading success(i ═ 1,2, …, m), with the cluster centers noted(i ═ 1,2, …, m); otherwise, C is addediMarking as failure of automatic meter reading(i ═ 1,2, …, m), with the cluster centers noted(i=1,2,…,m)。
Screening out the attribute with large difference by analyzing the attribute difference between the clustering center of the meter reading failure class and the clustering center of the meter reading success class, wherein the processing process is as follows:
for a given failure class centerCalculate it to all(i ═ 1,2, …, m) and(i ═ 1,2, …, m) the euclidean distance of the cluster centers other than themselves, denoted as Δj,(j=1,2,…,m-1);
Determining DeltajThe maximum value and the minimum value of (j ═ 1,2, …, m-1) are respectively expressed as ΔmaxAnd Δmin;
Setting a parameter lambda (lambda belongs to [1,5]), and finding out the following conditions:
delta ofj(j ═ 1,2, …, m-1) the corresponding successful clustering center, and it is recorded as the neighboring successful clustering center(j ═ 1,2, …, n); and a failure clustering center, which is recorded as a neighboring failure clustering center(j=1,2,…,m-n-1);
Wherein, (j ═ 1,2, …, m-n-1), (k ═ 1,2, …, nF-1);
Wherein, (j ═ 1,2, …, n), (k ═ 1,2, …, nF-1);
Setting attribute difference threshold gamma (gamma belongs to (0,1)), and countingEach of which satisfies
The results are reported as:
(j=1,2,…,m-n-1),(k=1,2,…,nF-1);
the results are reported as:
(j=1,2,…,n),(k=1,2,…,nF-1);
according to the rules:
for a given column, calculate phikThe final result is summarized as:
Φi=[φ1 φ2…φk],(k=1,2,…,nF-1)
traverse phiiIf phi isk>0, then will phikThe corresponding attribute is screened as an impact factor.
And performing reverse engineering according to the remapping rule, the characteristic dimension-increasing function and parameter and the standardized function and parameter, and calculating the attribute name and the attribute value corresponding to the factor.
According to the difference of the data preprocessing steps, the following two cases can be divided:
for discrete attributes, it is only necessary to perform the feature dimension-raising function and the inverse remapping reversely, and the specific process is as follows:
By the equation:
m can be calculated, binding:
m=f1(vjm),(m=1,2…m)
the value v corresponding to the attribute can be obtained by calculationjm。
For continuous attributes, an inverse normalization process is performed, which specifically includes the following steps:
and recorded Emax、EminMax, min, the value e of the attribute can be obtained by calculationi。
Integrating the influence factors (attributes) and corresponding values screened out by the discrete attributes and the continuous attributes, and recording as follows:
and S35, sorting the result.
Obtained by making a markAnd (i-1, 2, …, m), repeating the processes of the steps S33, S34 and S35, merging the screened factors, and counting the times of occurrence of the factors in the repeated iteration process.
Setting a threshold value tau of the occurrence times, (tau is equal to [1, m ∈ ]]) And filtering the combined factors, only keeping the factors with the occurrence frequency larger than tau, and sorting according to the occurrence frequency from high to low. The combined, filtered and sequenced influence factors and values thereof are recorded as a set
S4 selects a date sequence, executes steps S1-S3 for each date in the sequence, counts the occurrence frequency of each influence factor, compares the occurrence frequency with a set threshold value, and divides the influence factors into a long-term factor of the automatic meter reading success rate and a short-term factor of the automatic meter reading success rate.
Based on time sequence analysis, the results of factor screening are classified into long-term factors and short-term fluctuation factors, and the analysis steps are as follows:
selecting a date sequence:
T=[t1 t2…tj]
for each T in the sequence TjRepeating the analysis process of S1-S3, and combining the screened influence factors and the value set thereofTo obtainSimultaneously counting the occurrence number k of each factorn。
Setting a degree threshold value sigma, (sigma belongs to [1, j ]]) Comparing the number of occurrences K of each factor in the set KnAnd size of σ, all knFactor of more than or equal to sigma and value composition set K1(ii) a All k are put togethern<Factor of sigma and value composition set K2。
K1The medium elements are long-term factors influencing the success rate of automatic meter reading; k2The elements in the method are short-term factors influencing the success rate of automatic meter reading. If K is1Or/and K2And if the number is null, the factor influencing the automatic meter reading success rate is not found.
According to the embodiment of the invention, by utilizing a big data clustering algorithm and a time sequence-based analysis method, the influence of multiple factors on an automatic meter reading result can be simultaneously analyzed, and long-term factors and short-term factors (generally temporary factors causing fluctuation of the automatic meter reading success rate) influencing the automatic meter reading success rate can be found out by prolonging the time sequence of analysis, so that reference is provided for improving the automatic meter reading success rate; and the analysis is carried out by a big data algorithm, so that the labor, material and time costs are greatly saved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (10)
1. The automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis is characterized by comprising the following operations:
s1, acquiring relevant data of the area for automatic meter reading in the area to be analyzed to form a matrix D;
s2, dividing the relevant data of the transformer area into discrete characteristic data, continuous characteristic data and result identification data, respectively preprocessing, and combining the characteristic data into a matrix:
wherein I is a column vector formed by ID of the electric energy meter, LEThe data acquisition method comprises the following steps of (1) freezing a column vector formed by electric energy reading time and mass codes at a measuring point day, and forming a column vector formed by 36 data respectively as the rest;
s3, performing K-means clustering on the samples based on the matrix D', marking clustering results by using result identification to obtain successful cluster centers and failed cluster centers, calculating attribute differences from each failed cluster center to adjacent successful cluster centers and adjacent failed cluster centers, screening and distinguishing influence factors of the successful clusters and the failed clusters according to the attribute differences, obtaining attributes and values of the influence factors according to a reverse mapping rule, counting the screened influence factors and sequencing according to occurrence times;
s4 selects a date sequence, executes steps S1-S3 for each date in the sequence, counts the occurrence frequency of each influence factor, compares the occurrence frequency with a set threshold value, and divides the influence factors into a long-term factor of the automatic meter reading success rate and a short-term factor of the automatic meter reading success rate.
2. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the relevant data of the distribution room comprises archive data, operation data, and geographic data and meteorological data of the distribution room.
3. The method for analyzing the influence factors of the automatic meter reading success rate based on the clustering and the time sequence analysis as claimed in claim 1, wherein the preprocessing process of the discrete feature data comprises value duplication removal and remapping, feature dimension increasing and value influence degree difference elimination;
the value deduplication and remapping is specifically:
for discrete characteristic value XkAnd (k is 1, 2.. 24), carrying out duplicate removal on values in the column vectors, and recording the values after the duplicate removal as:
Vk=[vk1 vk2 … vkm](m >0, m being the column vector XkNumber of values after weight removal)
Remapping various encoding values into continuous integer values starting from 1;
i.e. when m is greater than or equal to 2, based on
m=f1(vkm),(m=1,2...m)
Establishing vkmMapping to m, and mapping Xk(k 1, 2.. 24) in which x isknValue v ofkmIs replaced by m and is recorded as X after replacement'k,(k=1,2...24);
The specific process of characteristic dimension increasing and value influence degree difference elimination is as follows:
to X 'obtained by the last step'k(k 1, 2.. 24) based on
Em-1=f2(m)
4. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the marking of clustering results by using result identification to obtain successful cluster centers and failed cluster centers specifically comprises:
the clustering result obtains m classes, which are marked as Ci(i ═ 1,2, …, m); the cluster center is marked as uc(i),(i=1,2,…,m);
With IiFor indexing, L in D' isEIs associated to CiAnd calculating CiThe proportion of the medium automatic meter reading failure is recorded as:
calculating the proportion of automatic meter reading failure in D', and recording as:
setting multiplying power theta (theta is belonged to [1,3 ]]) To Ci(i ═ 1,2, …, m) is labeled, if:
ri*θ≤ravg
5. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the calculating of the attribute difference from each failed clustering center to the adjacent successful clustering center and the adjacent failed clustering center specifically comprises:
for a given failure class centerCalculate it to allAndthe Euclidean distance of the cluster center except the cluster center is marked as deltaj,(j=1,2,…,m-1);
Determining DeltajThe maximum value and the minimum value of (j ═ 1,2, …, m-1) are respectively expressed as ΔmaxAnd Δmin;
Setting a parameter lambda (lambda belongs to [1,5]), and finding out the following conditions:
delta ofj(j ═ 1,2, …, m-1) the corresponding successful clustering center, and it is recorded as the neighboring successful clustering centerAnd a failure clustering center, which is recorded as a neighboring failure clustering center
Wherein, (j ═ 1,2, …, m-n-1), (k ═ 1,2, …, nF-1);
Wherein, (j ═ 1,2, …, n), (k ═ 1,2, …, nF-1)。
6. The method for analyzing the influence factors of the automatic meter reading success rate based on the clustering and the time sequence analysis according to claim 5, wherein the screening of the influence factors for distinguishing the success class from the failure class according to the attribute difference specifically comprises:
setting attribute difference threshold gamma (gamma belongs to (0,1)), and countingEach of which satisfies
The results are reported as:
the results are reported as:
according to the rules:
for a given column, calculate phikThe final result is summarized as:
Φi=[φ1 φ2 … φk],(k=1,2,…,nF-1)
traverse phiiIf phi iskIf greater than 0, will be phikThe corresponding attribute is screened as an impact factor.
7. The method for analyzing the influence factors of the automatic meter reading success rate based on the clustering and the time sequence analysis according to claim 1, wherein the obtaining of the attributes and the values of the influence factors according to the inverse mapping rule specifically comprises:
for discrete attributes, performing a feature dimension-lifting function and inverse remapping in an inverse manner;
for continuous attributes, performing a reverse normalization process;
and integrating the attributes and values of the influence factors screened out by the discrete attributes and the continuous attributes.
8. The method of claim 7, wherein the reversely executing the feature dimension-increasing function and the reversely remapping are specifically:
By the equation:
m can be calculated, binding:
m=f1(vjm),(m=1,2...m)
the value v corresponding to the attribute can be obtained by calculationjm。
9. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 7, wherein the reverse standardization process specifically comprises:
and recorded Emax、EminMax, min, the value e of the attribute can be obtained by calculationi。
10. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the step S4 specifically comprises:
selecting a date sequence:
T=[t1 t2 … tj]
for each T in the sequence TjRepeating the analysis process of S1-S3, and mergingScreened influence factors and value set thereofTo obtainSimultaneously counting the occurrence number k of each factorn;
Setting a degree threshold value sigma, (sigma belongs to [1, j ]]) Comparing the number of occurrences K of each factor in the set KnAnd size of σ, all knFactor of more than or equal to sigma and value composition set K1(ii) a All k are put togethernFactor < sigma and its value composition set K2;
K1The medium elements are long-term factors influencing the success rate of automatic meter reading; k2The elements in the method are short-term factors influencing the success rate of automatic meter reading.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110723230.0A CN113344742A (en) | 2021-06-29 | 2021-06-29 | Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110723230.0A CN113344742A (en) | 2021-06-29 | 2021-06-29 | Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113344742A true CN113344742A (en) | 2021-09-03 |
Family
ID=77481173
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110723230.0A Pending CN113344742A (en) | 2021-06-29 | 2021-06-29 | Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113344742A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116881372A (en) * | 2023-09-08 | 2023-10-13 | 清华大学 | Water meter metering big data optimization processing method and system based on Internet of things |
-
2021
- 2021-06-29 CN CN202110723230.0A patent/CN113344742A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116881372A (en) * | 2023-09-08 | 2023-10-13 | 清华大学 | Water meter metering big data optimization processing method and system based on Internet of things |
CN116881372B (en) * | 2023-09-08 | 2023-12-05 | 清华大学 | Water meter metering big data optimization processing method and system based on Internet of things |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097297B (en) | Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium | |
CN110223196B (en) | Anti-electricity-stealing analysis method based on typical industry feature library and anti-electricity-stealing sample library | |
CN110634080B (en) | Abnormal electricity utilization detection method, device, equipment and computer readable storage medium | |
CN107609783B (en) | Method and system for evaluating comprehensive performance of intelligent electric energy meter based on data mining | |
CN110610121B (en) | Small-scale source load power abnormal data identification and restoration method based on curve clustering | |
CN111160401A (en) | Abnormal electricity utilization judging method based on mean shift and XGboost | |
CN111177216B (en) | Association rule generation method and device for comprehensive energy consumer behavior characteristics | |
CN111177208A (en) | Power consumption abnormity detection method based on big data analysis | |
CN110738232A (en) | grid voltage out-of-limit cause diagnosis method based on data mining technology | |
CN110889441A (en) | Distance and point density based substation equipment data anomaly identification method | |
CN116148753A (en) | Intelligent electric energy meter operation error monitoring system | |
CN115952429A (en) | Self-adaptive DBSCAN abnormal battery identification method based on Euclidean distance without prior weight | |
CN115130578A (en) | Incremental rough clustering-based online evaluation method for state of power distribution equipment | |
CN113344742A (en) | Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis | |
CN108596227A (en) | A kind of leading influence factor method for digging of user power utilization behavior | |
CN111126445A (en) | Multi-step aggregation load prediction method for mass data of intelligent electric meter | |
CN111612054B (en) | User electricity stealing behavior identification method based on nonnegative matrix factorization and density clustering | |
CN110781959A (en) | Power customer clustering method based on BIRCH algorithm and random forest algorithm | |
CN111324790A (en) | Load type identification method based on support vector machine classification | |
CN115733258A (en) | Control method of all-indoor intelligent substation system based on Internet of things technology | |
CN115358355A (en) | Method and device for judging main transformer oil temperature gauge and top layer oil temperature abnormity | |
CN111861141B (en) | Power distribution network reliability assessment method based on fuzzy fault rate prediction | |
CN113902485A (en) | Special power user industry identification method, device and equipment | |
CN113869601A (en) | Power consumer load prediction method, device and equipment | |
CN113723671A (en) | Data clustering analysis method based on big data of power utilization condition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210903 |