CN113344742A

CN113344742A - Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis

Info

Publication number: CN113344742A
Application number: CN202110723230.0A
Authority: CN
Inventors: 李尔园; 宋先慧; 傅洋; 鞠永乾
Original assignee: Integrated Electronic Systems Lab Co Ltd
Current assignee: Integrated Electronic Systems Lab Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-09-03

Abstract

The invention provides an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, which utilizes a big data clustering algorithm and an analysis method based on time sequence, not only can simultaneously analyze the influence of various factors on an automatic meter reading result, but also can find out long-term factors and short-term factors influencing the automatic meter reading success rate by prolonging the time sequence of analysis, thereby providing reference for improving the automatic meter reading success rate; and the analysis is carried out by a big data algorithm, so that the labor, material and time costs are greatly saved.

Description

Automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis

Technical Field

The invention relates to the technical field of power consumption information acquisition of a power system, in particular to an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis.

Background

With the development of the smart power grid, diversified informationized and digitalized equipment is applied to the power utilization information acquisition system, so that the automatic acquisition of the power utilization information of the user is realized, and the power utilization management efficiency is greatly improved. In the electricity consumption information acquisition system, the meter reading success rate is the basis and the premise for carrying out analysis and treatment of line loss of the transformer area and refining operation decisions. However, in actual use, there are many reasons that affect the success rate of automatic meter reading, including system equipment factors, such as meter reading communication modules, communication modes, communication parameters, and the like; environmental factors such as GPRS signal interference and high mountain obstruction cause bad GPRS signals, communication abnormity caused by extreme environments and the like; human factors, such as weak wiring during installation, and a long distance between an equipment installation area and a control area, cause signal confusion, signal attenuation and the like. The problems that the success rate of automatic meter reading is not ideal and the success rate of automatic meter reading is difficult to improve can occur due to the influence of one or more factors in the information acquisition process. Therefore, if factors influencing the automatic meter reading success rate can be found out in a complex environment, a targeted treatment strategy can be formulated, and the automatic meter reading success rate is effectively improved.

Disclosure of Invention

The invention aims to provide an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, and aims to solve the problem that the influence factor of the automatic meter reading success rate cannot be obtained in the prior art, find out the influence factor of the automatic meter reading success rate in a complex environment and improve the automatic meter reading success rate.

In order to achieve the technical purpose, the invention provides an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, which comprises the following operations:

s1, acquiring relevant data of the area for automatic meter reading in the area to be analyzed to form a matrix D;

s2, dividing the relevant data of the transformer area into discrete characteristic data, continuous characteristic data and result identification data, respectively preprocessing, and combining the characteristic data into a matrix:

wherein I is a column vector formed by the ID of the electric energy meter,L^Ethe data acquisition method comprises the following steps of (1) freezing a column vector formed by electric energy reading time and mass codes at a measuring point day, and forming a column vector formed by 36 data respectively as the rest;

s3, performing K-means clustering on the samples based on the matrix D', marking clustering results by using result identification to obtain successful cluster centers and failed cluster centers, calculating attribute differences from each failed cluster center to adjacent successful cluster centers and adjacent failed cluster centers, screening and distinguishing influence factors of the successful clusters and the failed clusters according to the attribute differences, obtaining attributes and values of the influence factors according to a reverse mapping rule, counting the screened influence factors and sequencing according to occurrence times;

s4 selects a date sequence, executes steps S1-S3 for each date in the sequence, counts the occurrence frequency of each influence factor, compares the occurrence frequency with a set threshold value, and divides the influence factors into a long-term factor of the automatic meter reading success rate and a short-term factor of the automatic meter reading success rate.

Preferably, the data related to the distribution area comprises archive data, operation data, and geographic data and meteorological data of the distribution area.

Preferably, the preprocessing process of the discrete feature data includes value de-duplication, remapping, feature dimension raising and value influence degree difference elimination;

the value deduplication and remapping is specifically:

for discrete characteristic value X_kAnd (k is 1,2 … 24), removing the weight of the value in the column vector, and recording the value after removing the weight as:

V_k＝[v_k1v_k2…v_km],(m>0, m is a column vector X_kNumber of values after weight removal)

Remapping various encoding values into continuous integer values starting from 1;

i.e. when m is greater than or equal to 2, based on

m＝f₁(v_km),(m＝1,2…m)

Establishing v_kmMapping to m, and mapping X_kWhere x is 1,2 … 24_knValue v of_kmIs replaced by m and is recorded as X after replacement'_k,(k＝1,2…24)；

The specific process of characteristic dimension increasing and value influence degree difference elimination is as follows:

to X 'obtained by the last step'_k(k-1, 2 … 24) based on

E_m-1＝f₂(m)

Conversion into a matrix of m-1 columns, denoted

Wherein E_m-1A row vector of m-1 columns with the m-1 th column being 1 and the remaining columns being 0, all columns being 0 when m is 1, m being equal to V in the previous step_kThe number of columns.

Preferably, the marking the clustering result by using the result identifier to obtain the successful clustering center and the failed clustering center specifically comprises:

the clustering result obtains m classes, which are marked as C_i(i ═ 1,2, …, m); the cluster center is marked as u_c(i),(i＝1,2,…,m)；

With I_iFor indexing, L in D' is^EIs associated to C_iAnd calculating C_iThe proportion of the medium automatic meter reading failure is recorded as:

(r_i∈[0,1]，i＝1，2，…,m)

calculating the proportion of automatic meter reading failure in D', and recording as:

setting multiplying power theta (theta is belonged to [1,3 ]]) To C_i(i ═ 1,2, …, m) is labeled, if:

r_i*θ≤r_avg

then C will be_iMarking as success class, marking as success class

(i＝1，2, …, m) with cluster centers as

(i ═ 1,2, …, m); otherwise, C is added_iMarked as lossiness, marked as

i is 1,2, …, m), with the cluster center noted as

(i＝1,2,…，m)。

Preferably, the calculating the attribute difference from each failure class center to the adjacent success class center and the adjacent failure class center is specifically as follows:

for a given failure class center

Calculate it to all

(i ═ 1,2, …, m) and

(i ═ 1,2, …, m) the euclidean distance of the cluster centers other than themselves, denoted as Δ_j，(j＝1,2,…，m-1)；

Determining Delta_jThe maximum value and the minimum value of (j ═ 1,2, …, m-1) are respectively expressed as Δ_maxAnd Δ_min；

Setting a parameter lambda (lambda belongs to [1,5]), and finding out the following conditions:

delta of_j(j ═ 1,2, …, m-1) the corresponding successful clustering center, and it is recorded as the neighboring successful clustering center

(j ═ 1,2, …, n); and a failure clustering center, which is recorded as a neighboring failure clustering center

(j＝1，2，…，m-n-1)；

For selected

Calculate it and each

Is poor in the attribute of (1), is recorded as

Wherein, (j ═ 1,2, …, m-n-1), (k ═ 1,2, …, n_F-1)；

For selected

Calculate it and each

Is poor in the attribute of (1), is recorded as

Wherein, (j ═ 1,2, …, n), (k ═ 1,2, …, n_F-1)。

Preferably, the screening of the impact factors for distinguishing the success class from the failure class according to the attribute difference specifically includes:

setting attribute difference threshold gamma (gamma belongs to (0,1)), and counting

Each of which satisfies

(j＝1，2，…，m-n-1)，(k＝1,2，…，n_F-1)

The results are reported as:

(j＝1，2，…，m-n-1)，(k＝1，2，…，n_F-1)；

in the same way, make statistics of

Each of the attributes satisfies:

(j＝1，2，…，n)，(k＝1，2，…，n_F-1)

the results are reported as:

(j＝1，2，…，n)，(k＝1,2，…，n_F-1)；

according to the rules:

(k＝1，2，…，n_F-1),

for a given column, calculate phi_kThe final result is summarized as:

Φ_i＝[φ₁ φ₂…φ_k]，(k＝1，2，…，n_F-1)

traverse phi_iIf phi is_k>0, then will phi_kThe corresponding attribute is screened as an impact factor.

Preferably, the obtaining of the attribute and the value of the impact factor according to the inverse mapping rule specifically includes:

for discrete attributes, performing a feature dimension-lifting function and inverse remapping in an inverse manner;

for continuous attributes, performing a reverse normalization process;

and integrating the attributes and values of the influence factors screened out by the discrete attributes and the continuous attributes.

Preferably, the inverse execution characteristic dimension-increasing function and the inverse remapping are specifically:

if it is not

Phi is then_kCorresponding to a discrete attribute, further if

(i-1, 2 … 24), phi can be determined_kCorresponding attribute being in D

By the equation:

m can be calculated, binding:

m＝f₁(v_jm)，(m＝1，2…m)

the value v corresponding to the attribute can be obtained by calculation_jm。

Preferably, the reverse normalization process is specifically:

if it is not

Phi is then_kCorresponding continuous type attribute

Further, removing

The values in column k, in combination with the normalization formula:

and recorded E_max、E_minMax, min, the value e of the attribute can be obtained by calculation_i。

Preferably, the step S4 is specifically:

selecting a date sequence:

T＝[t₁ t₂…t_j]

for each T in the sequence T_jRepeating the analysis process of S1-S3, and combining the screened influence factors and the value set thereof

To obtain

Simultaneously counting the occurrence number k of each factor_n；

Setting a degree threshold value sigma, (sigma belongs to [1, j ]]) Comparing the number of occurrences K of each factor in the set K_nAnd size of σ, all k_nFactor of more than or equal to sigma and value composition set K₁(ii) a All k are put together_n<Factor of sigma and value composition set K₂；

K₁The medium elements are long-term factors influencing the success rate of automatic meter reading; k₂The elements in the method are short-term factors influencing the success rate of automatic meter reading.

The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:

compared with the prior art, the method utilizes a big data clustering algorithm and a time sequence-based analysis method, not only can simultaneously analyze the influence of various factors on the automatic meter reading result, but also can find out long-term factors and short-term factors (generally temporary factors causing fluctuation of the automatic meter reading success rate) influencing the automatic meter reading success rate by prolonging the time sequence of analysis, and provides reference for improving the automatic meter reading success rate; and the analysis is carried out by a big data algorithm, so that the labor, material and time costs are greatly saved.

Drawings

Fig. 1 is a flowchart of an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis provided in an embodiment of the present invention.

Detailed Description

In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.

The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis provided by the embodiment of the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention discloses an automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis, including the following steps:

and S1, acquiring the relevant data of the area for automatic meter reading in the area to be analyzed, and forming a matrix D.

The embodiment of the invention analyzes and processes the existing archive data, operation data, basic geographical data of a transformer area, meteorological data of the transformer area and the like by utilizing the factor screening based on clustering and the factor classification based on time sequence analysis, and finds out the factors influencing the success rate of automatic meter reading.

And selecting a date to be analyzed, recording the date as t', and acquiring related file data and operation data of the intelligent electric meter and the transformer area which are automatically read in the area to be analyzed on the date, and geographic data and meteorological data of the transformer area.

The archive data is mainly acquired from a power utilization information acquisition system, and relates to basic archive data of an electric energy meter, an operation terminal, a metering point, a power utilization client and a distribution room, and the basic archive data mainly comprises an electric energy meter unique identifier (usually ID), an electric energy meter communication mode, an electric energy meter manufacturer, an electric energy meter communication baud rate, an electric energy meter installation date, an electric energy meter last inspection date, an electric energy meter version, an electric energy meter type, a phase sequence, a client type, a terminal manufacturer, a terminal acquisition mode, a terminal priority power supply mode, a terminal communication protocol, a terminal commissioning date, a terminal latest parameter modification time, a metering point wiring mode, a metering point metering mode, a metering point commissioning date, a metering point updating time, a client sub-type, a line ID, a power utilization client classification, a power utilization client industry classification, a power utilization client power utilization type, a power utilization client operation capacity, The last check date of the electricity utilization client, the urban and rural categories of the electricity utilization client and the public and private transformer identification of the transformer area count 30 data.

The operation data is mainly acquired from the electricity consumption information acquisition system, relates to the daily frozen electric energy reading number of the measuring point and the operation data of the terminal, and mainly comprises 4 data in total, namely a daily frozen electric energy reading time quality code of the measuring point, the online state of the terminal, a successful time tick mark of the terminal and a clock difference value between the terminal and the system.

The geographical data of the affiliated district is mainly the area of the lowest-level administrative region to which the district belongs, and can be acquired from national government websites of province/city/county, and 1 data is counted.

The meteorological data of the affiliated region is mainly meteorological data of an administrative region to which the region belongs, and the meteorological data comprises weather, wind power, temperature/air temperature data which account for 3 items.

And integrating all the data, taking I D of the electric energy meter as a unique identifier, taking the electric energy meter as an analysis object, and integrating the data into a matrix with n rows and 38 columns by combining the interrelationship among the electric energy meter, an operation terminal, a metering point, a transformer area and an administrative area, wherein n is the number of the electric energy meters in the selected analysis area, and n is greater than 0.

Wherein I ═]i₁ i₂…i_n]^T，(n>0) A column vector formed by the ID of the electric energy meter; x_j＝[x_j1 x_j2…x_jn]^T，(n>0, j is 1,2 … 36) are column vectors respectively formed by the other 36 data except the unique identifier of the electric energy meter and the daily frozen electric energy reading time and quality code of the measuring point; l ═ L₁ l₂…l_n]^T,(n>0) And the column vector is formed by the time and quality codes of daily freezing electric energy reading meter reading of the measuring point.

And S2, dividing the relevant data of the transformer area into discrete characteristic data, continuous characteristic data and result identification data, respectively preprocessing the discrete characteristic data, the continuous characteristic data and the result identification data, and combining the characteristic data into a matrix D'.

The data are preprocessed and divided into three types: discrete feature data, continuous feature data, and result identification data.

For discrete feature data, the influence difference of different values of discrete features on the K-means clustering result is required to be eliminated under the condition that the feature dimension is as small as possible. For example, an electricity meter manufacturer has A, B, C three manufacturers, which are respectively coded as 1,2 and 3, the distances between each two values 1,2 and 3 are different, and if the spatial distance between samples is used for judging the affinity and the sparsity and clustering, A, B, C manufacturers which are equal to each other generate different influence degrees on clustering results; and if the A, B, C manufacturers are respectively coded as [ 00 ], [ 01 ] and [ 10 ], the distance between every two manufacturers is 1, and the influence degrees of the manufacturers on the K-means clustering result are the same.

Need to be aligned with X_jAnd (j is 1,2 … 36), processing X which is taken as a discrete type in the step (j) and comprises 24 items which are recorded as X, wherein the X comprises an electric energy meter communication mode, an electric energy meter manufacturer, an electric energy meter communication baud rate, an electric energy meter version, an electric energy meter category, a phase sequence, a client type, a terminal manufacturer, a terminal acquisition mode, a terminal priority power supply mode, a terminal communication protocol, a metering point wiring mode, a metering point metering mode, a client subtype, a line ID, an electricity client classification, an electricity client industry classification, an electricity client electricity utilization category, an electricity client urban and rural category, a platform public and private transformer identifier, a terminal online state, a terminal time synchronization success identifier and weather, and the X is recorded as X_k，(k＝1,2…24)。

The preprocessing process of the discrete feature data comprises the following steps: value deduplication and remapping; and (4) feature dimension increasing and value influence degree difference elimination.

The specific process of the value deduplication and remapping is as follows:

the values of the discrete features (e.g., manufacturer code, type code, etc.) are de-duplicated. I.e. for a given X_k(k is 1,2 … 24), the values in the column vector are deduplicated (null values and invalid values are also taken as separate valid values), and the deduplicated values are expressed as:

V_k＝[v_k1 v_k2…v_km],(m>0, m is a column vector X_kNumber of values after weight removal)

Then, remapping various encoding values into continuous integer values starting from 1;

i.e. when m is greater than or equal to 2, based on

m＝f₁(v_km),(m＝1,2…m)

Establishing v_kmMapping to m, and mapping X_kWhere x is 1,2 … 24_knValue v of_kmIs replaced by m and is recorded as X after replacement'_k,(k＝1,2…24)。

to X 'obtained by the last step'_k(k-1, 2 … 24) based on

E_m-1＝f₂(m)

Conversion into a matrix of m-1 columns, denoted

(k-1, 2 … 24). Wherein E_m-1A row vector of m-1 columns with the m-1 th column being 1 and the remaining columns being 0, all columns being 0 when m is 1, m being equal to V in the previous step_kThe number of columns.

For continuous characteristic data, abnormal value processing and standardization are mainly carried out, so that the values of all continuous characteristics are in a certain range, namely X is carried out_jAnd (j ═ 1,2 … 36), processing the X which takes a continuous type, wherein the X comprises 12 items in total, which are recorded as X, including the installation date of the electric energy meter, the last inspection date of the electric energy meter, the terminal commissioning date, the latest parameter modification time of the terminal, the commissioning date of the metering point, the updating time of the metering point, the operating capacity of the electricity consumer, the last inspection date of the electricity consumer, the difference value of the clock of the terminal and the system, the area of the lowest-level administrative region of the platform area, the wind power and the temperature_g,(g＝1,2…12)。

The preprocessing of continuous feature data is divided into two categories, including date/time type feature data preprocessing and other type feature data preprocessing.

The date/time characteristic data comprises 7 items in total, and is marked as X, wherein the 7 items comprise the installation date of the electric energy meter, the last inspection date of the electric energy meter, the commissioning date of the terminal, the latest parameter modification time of the terminal, the commissioning date of the metering point, the updating time of the metering point and the last inspection date of the electricity consumer_g(g ═ 1,2 … 7), and the pretreatment comprises the following specific steps:

outlier processing, for a given X_gProcessing abnormal value, i.e. X_gNull values, invalid values, and the like in (g 1,2 … 7) are collectively replaced with X_gThe minimum value (corresponding to date/time, then the earliest date/time) occurring in (d) is subtractedRemoved for 6 months, recorded as X 'after abnormal treatment'_g,(g＝1,2…7)；

And data conversion, namely subtracting the value after the abnormal filling from the current time t' of the system, converting the value into a month, and marking the month as X ″, after the conversion is finished_g,(g＝1,2…7)；

Standardization, for X ″' obtained in the previous step_g(g ═ 1,2 … 7), based on the formula:

(max＝1.0,min＝0.0,E_max

maximum value of the feature, E_minMinimum value for characteristic)

When E is_maxAnd E_minAnd (3) equality:

Rescaled(e_i)＝0.5*(max+min)

standardized treatment is carried out, and the standard treatment is recorded as

(g ═ 1,2 … 7), and X ″, is recorded simultaneously_gCorresponding E_max、E_minMax and min for use.

The other class feature data includes: the total number of 5 items, namely X, of the running capacity of the electricity customers, the clock difference value between the terminal and the system, the area of the lowest-level administrative region of the transformer area, the wind power and the temperature_g(g-8, 9 … 12), and the pretreatment comprises the following specific steps:

outlier processing, for a given X_gProcessing abnormal value, i.e. X_gNull values, invalid values, and the like in (g 8,9 … 12) are collectively replaced with X_gThe maximum value (corresponding to date/time, the earliest date/time) of the abnormal processing is added with 1, and the value is recorded as X 'after the abnormal processing'_g,(g＝8,9…12)；

Standardizing the data by using the same formula as the data standardization used in the date/time data processing process, and marking the standardized data as the standard data after the data is processed

(g＝8,9…12)。

For the result identification data, the processing of the daily frozen electric energy indicating value meter reading time quality code of the measuring point is needed, and the quality code is mainly analyzed according to the format, so that the value is 1 only when the automatic meter reading is successful, and the rest values are 0. Is marked as L after the treatment is finished^E. And the method is used for marking the clustered result.

The matrix after all data preprocessing is recorded as:

the number of rows of D' is n (n)>0) The number of rows is

Wherein v is_jAs a discrete feature X_jThe number of duplicate removal values.

S3, performing K-means clustering on the samples based on the matrix D', marking clustering results by using result identification to obtain successful cluster centers and failed cluster centers, calculating attribute differences from each failed cluster center to adjacent successful cluster centers and adjacent failed cluster centers, screening and distinguishing the influence factors of the successful clusters and the failed clusters according to the attribute differences, obtaining the attributes and values of the influence factors according to a reverse mapping rule, counting the screened influence factors and sequencing according to the occurrence times.

Clustering samples based on the characteristic data obtained by the processing, marking the clustering result by using a result identifier, marking the clustering result as a 'meter reading success class' and a 'meter reading failure class', and further screening and distinguishing factors of 'success' and 'failure' by analyzing the attribute difference between a 'meter reading failure class' clustering center and a 'meter reading success class' clustering center adjacent to the clustering center, wherein the specific process is as follows:

based on the affinity and the sparsity of sample characteristic data, the samples are divided into m classes by adopting K-means clustering, and the overall characteristic of each class is represented by the attribute of a clustering center.

Divide matrix D' by L^EOuter coverThe fraction taken out was recorded as:

x is an n (n)>0) Go to,

Column (denoted as n)_F) A matrix of (a);

setting the clustering number m (m is equal to [15,30]), and using the cost function as:

u_c(i) represents and x⁽ⁱ⁾Nearest cluster center point

The K-means clustering algorithm carries out clustering by taking I in X as ID and the rest as characteristics to obtain m classes which are marked as C_i(i ═ 1,2, …, m); the cluster center is marked as u_c(i),(i＝1,2,…,m)。

Respectively marking the clustering results (m classes) obtained in the last step as a 'meter reading failure class' or a 'meter reading success class', wherein the processing process is as follows:

r_i*θ≤r_avg

then C will be_iMarking as the class of automatic meter reading success

(i ═ 1,2, …, m), with the cluster centers noted

(i ═ 1,2, …, m); otherwise, C is added_iMarking as failure of automatic meter reading

(i ═ 1,2, …, m), with the cluster centers noted

(i＝1,2,…,m)。

Screening out the attribute with large difference by analyzing the attribute difference between the clustering center of the meter reading failure class and the clustering center of the meter reading success class, wherein the processing process is as follows:

for a given failure class center

Calculate it to all

(i ═ 1,2, …, m) and

(i ═ 1,2, …, m) the euclidean distance of the cluster centers other than themselves, denoted as Δ_j,(j＝1,2,…,m-1)；

(j＝1,2，…，m-n-1)；

For selected

Calculate it and each

Is poor in the attribute of (1), is recorded as

Wherein, (j ═ 1,2, …, m-n-1), (k ═ 1,2, …, n_F-1)；

For selected

Calculate it and each

Is poor in the attribute of (1), is recorded as

Wherein, (j ═ 1,2, …, n), (k ═ 1,2, …, n_F-1)；

Each of which satisfies

(j＝1，2，…，m-n-1)，(k＝1，2，…，n_F-1)

The results are reported as:

(j＝1，2，…，m-n-1)，(k＝1,2，…，n_F-1)；

in the same way, make statistics of

Each of the attributes satisfies:

(j＝1,2，…，n)，(k＝1，2，…，n_F-1)

the results are reported as:

(j＝1，2，…，n)，(k＝1，2，…，n_F-1)；

according to the rules:

(k＝1，2，…，n_F-1),

for a given column, calculate phi_kThe final result is summarized as:

Φ_i＝[φ₁ φ₂…φ_k],(k＝1,2，…，n_F-1)

And performing reverse engineering according to the remapping rule, the characteristic dimension-increasing function and parameter and the standardized function and parameter, and calculating the attribute name and the attribute value corresponding to the factor.

According to the difference of the data preprocessing steps, the following two cases can be divided:

for discrete attributes, it is only necessary to perform the feature dimension-raising function and the inverse remapping reversely, and the specific process is as follows:

if it is not

Phi is then_kCorresponding to a discrete attribute, further if

(i＝1，2…24)

Then phi can be determined_kCorresponding attribute being in D

By the equation:

m can be calculated, binding:

m＝f₁(v_jm)，(m＝1,2…m)

the value v corresponding to the attribute can be obtained by calculation_jm。

For continuous attributes, an inverse normalization process is performed, which specifically includes the following steps:

if it is not

Phi is then_kCorresponding continuous type attribute

Further, removing

The values in column k, in combination with the normalization formula:

Integrating the influence factors (attributes) and corresponding values screened out by the discrete attributes and the continuous attributes, and recording as follows:

and S35, sorting the result.

Obtained by making a mark

And (i-1, 2, …, m), repeating the processes of the steps S33, S34 and S35, merging the screened factors, and counting the times of occurrence of the factors in the repeated iteration process.

Setting a threshold value tau of the occurrence times, (tau is equal to [1, m ∈ ]]) And filtering the combined factors, only keeping the factors with the occurrence frequency larger than tau, and sorting according to the occurrence frequency from high to low. The combined, filtered and sequenced influence factors and values thereof are recorded as a set

Based on time sequence analysis, the results of factor screening are classified into long-term factors and short-term fluctuation factors, and the analysis steps are as follows:

selecting a date sequence:

T＝[t₁ t₂…t_j]

To obtain

Simultaneously counting the occurrence number k of each factor_n。

Setting a degree threshold value sigma, (sigma belongs to [1, j ]]) Comparing the number of occurrences K of each factor in the set K_nAnd size of σ, all k_nFactor of more than or equal to sigma and value composition set K₁(ii) a All k are put together_n<Factor of sigma and value composition set K₂。

K₁The medium elements are long-term factors influencing the success rate of automatic meter reading; k₂The elements in the method are short-term factors influencing the success rate of automatic meter reading. If K is₁Or/and K₂And if the number is null, the factor influencing the automatic meter reading success rate is not found.

According to the embodiment of the invention, by utilizing a big data clustering algorithm and a time sequence-based analysis method, the influence of multiple factors on an automatic meter reading result can be simultaneously analyzed, and long-term factors and short-term factors (generally temporary factors causing fluctuation of the automatic meter reading success rate) influencing the automatic meter reading success rate can be found out by prolonging the time sequence of analysis, so that reference is provided for improving the automatic meter reading success rate; and the analysis is carried out by a big data algorithm, so that the labor, material and time costs are greatly saved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. The automatic meter reading success rate influence factor analysis method based on clustering and time sequence analysis is characterized by comprising the following operations:

wherein I is a column vector formed by ID of the electric energy meter, L^EThe data acquisition method comprises the following steps of (1) freezing a column vector formed by electric energy reading time and mass codes at a measuring point day, and forming a column vector formed by 36 data respectively as the rest;

2. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the relevant data of the distribution room comprises archive data, operation data, and geographic data and meteorological data of the distribution room.

3. The method for analyzing the influence factors of the automatic meter reading success rate based on the clustering and the time sequence analysis as claimed in claim 1, wherein the preprocessing process of the discrete feature data comprises value duplication removal and remapping, feature dimension increasing and value influence degree difference elimination;

the value deduplication and remapping is specifically:

for discrete characteristic value X_kAnd (k is 1, 2.. 24), carrying out duplicate removal on values in the column vectors, and recording the values after the duplicate removal as:

V_k＝[v_k1 v_k2 … v_km](m >0, m being the column vector X_kNumber of values after weight removal)

i.e. when m is greater than or equal to 2, based on

m＝f₁(v_km)，(m＝1，2...m)

Establishing v_kmMapping to m, and mapping X_k(k 1, 2.. 24) in which x is_knValue v of_kmIs replaced by m and is recorded as X after replacement'_k，(k＝1，2...24)；

to X 'obtained by the last step'_k(k 1, 2.. 24) based on

E_m-1＝f₂(m)

Conversion into a matrix of m-1 columns, denoted

4. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the marking of clustering results by using result identification to obtain successful cluster centers and failed cluster centers specifically comprises:

the clustering result obtains m classes, which are marked as C_i(i ═ 1,2, …, m); the cluster center is marked as u_c(i)，(i＝1，2，…，m)；

r_i*θ≤r_avg

then C will be_iMarking as success class, marking as success class

Simultaneous clustering centers as

Otherwise, C is added_iMarked as lossiness, marked as

Simultaneous clustering centers as

5. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the calculating of the attribute difference from each failed clustering center to the adjacent successful clustering center and the adjacent failed clustering center specifically comprises:

for a given failure class center

Calculate it to all

And

the Euclidean distance of the cluster center except the cluster center is marked as delta_j，(j＝1，2，…，m-1)；

And a failure clustering center, which is recorded as a neighboring failure clustering center

For selected

Calculate it and each

Is poor in the attribute of (1), is recorded as

Wherein, (j ═ 1,2, …, m-n-1), (k ═ 1,2, …, n_F-1)；

For selected

Calculate it and each

Is poor in the attribute of (1), is recorded as

Wherein, (j ═ 1,2, …, n), (k ═ 1,2, …, n_F-1)。

6. The method for analyzing the influence factors of the automatic meter reading success rate based on the clustering and the time sequence analysis according to claim 5, wherein the screening of the influence factors for distinguishing the success class from the failure class according to the attribute difference specifically comprises:

Each of which satisfies

The results are reported as:

in the same way, make statistics of

Each of the attributes satisfies:

the results are reported as:

according to the rules:

for a given column, calculate phi_kThe final result is summarized as:

Φ_i＝[φ₁ φ₂ … φ_k]，(k＝1，2，…，n_F-1)

traverse phi_iIf phi is_kIf greater than 0, will be phi_kThe corresponding attribute is screened as an impact factor.

7. The method for analyzing the influence factors of the automatic meter reading success rate based on the clustering and the time sequence analysis according to claim 1, wherein the obtaining of the attributes and the values of the influence factors according to the inverse mapping rule specifically comprises:

for continuous attributes, performing a reverse normalization process;

8. The method of claim 7, wherein the reversely executing the feature dimension-increasing function and the reversely remapping are specifically:

if it is not

Phi is then_kCorresponding to a discrete attribute, further if

Then phi can be determined_kCorresponding attribute being in D

By the equation:

m can be calculated, binding:

m＝f₁(v_jm)，(m＝1，2...m)

the value v corresponding to the attribute can be obtained by calculation_jm。

9. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 7, wherein the reverse standardization process specifically comprises:

if it is not

Phi is then_kCorresponding continuous type attribute

Further, removing

The values in column k, in combination with the normalization formula:

10. The method for analyzing influence factors of automatic meter reading success rate based on clustering and time sequence analysis according to claim 1, wherein the step S4 specifically comprises:

selecting a date sequence:

T＝[t₁ t₂ … t_j]

for each T in the sequence T_jRepeating the analysis process of S1-S3, and mergingScreened influence factors and value set thereof

To obtain

Simultaneously counting the occurrence number k of each factor_n；

Setting a degree threshold value sigma, (sigma belongs to [1, j ]]) Comparing the number of occurrences K of each factor in the set K_nAnd size of σ, all k_nFactor of more than or equal to sigma and value composition set K₁(ii) a All k are put together_nFactor < sigma and its value composition set K₂；