CN109447490B

CN109447490B - User address-based abnormal change relation discrimination method

Info

Publication number: CN109447490B
Application number: CN201811307364.9A
Authority: CN
Inventors: 曾嵘; 邵航建; 傅航天
Original assignee: Hangzhou Zhicheng Electronic Technology Co ltd
Current assignee: Hangzhou Zhicheng Electronic Technology Co ltd
Priority date: 2018-11-05
Filing date: 2018-11-05
Publication date: 2022-05-27
Anticipated expiration: 2038-11-05
Also published as: CN109447490A

Abstract

The invention mainly discloses a user address-based method for judging abnormal user change relationship, which comprises the following steps: inputting a user data set T, selecting a valid user data set T, and selecting k₁Construction of k with individual address code as large address₁A hierarchy of decision trees, each hierarchy of decision trees containing k₂Individual set of user data, traverse k₁Each user data set in the layer decision tree, if the number of users in the user data set is less than l₁If all users in the user data set are judged as users becoming suspect, and the rest T is_UProcessing each user data set to obtain P cluster sets { Q₁,Q₂,...,Q_pP cluster sets Q are traversed_kIf the set Q is clustered_kThe number of users in is less than l₂Then, the cluster set Q is output_kAll users in the system are suspected users. The invention has the characteristics of saving time, labor and capital cost and improving the input-output ratio.

Description

User address-based abnormal change relation discrimination method

Technical Field

The invention relates to the technical field of user address information acquisition, in particular to a user change relation abnormity discrimination method based on a user address.

Background

With the rapid development of the power grid, the power grid management is changed from the original rough form into lean form. In recent years, due to the rapid development of cities and the problem of historical power grid management, the power distribution network management still has a point to be improved. The household variable relationship has a large influence on power distribution network management, and the influence comprises a large amount of marketing services such as power failure notification to households, synchronous line loss of a transformer area, equipment management, business expansion and the like. The household variable relationship refers to the relationship between the electricity consumers and the distribution transformer. The identification method of the error of the household variable relationship mainly comprises the steps of manual field judgment or hardware addition in batch, and a large amount of time and capital cost are required. At present, various power supply enterprises have carried out a plurality of times of centralized manual investigation work, and the accuracy of the household variable relationship reaches a higher level. Considering that the number of the remaining abnormal users in the household variable relationship is relatively small and the electricity utilization behavior is relatively hidden, the examination is carried out by using manpower or additionally installing equipment, a large amount of time, labor and capital costs are needed, and the input-output ratio is extremely low.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a user-to-user relationship abnormity discrimination method based on user addresses, which has the characteristics of saving time, labor and capital cost and improving input-output ratio.

In order to achieve the purpose, the invention is realized by the following technical scheme: a user variation relation abnormity discrimination method based on user addresses comprises the following steps:

1) inputting a user set T, wherein the user set T comprises n users { H }₁,H₂,...,H_nEach user contains 7 address codes;

2) selecting effective user set T, and selecting user H with missing address and abnormal address_iCarrying out treatment;

3) at a selected k₁Construction of individual address codes as large addresses k₁A hierarchical decision tree, where 1 ≦ k₁Less than or equal to 7; then each layer of the decision tree contains k₂Set of users { T }₁,T₂,...,T_k2T, each user set_jIn which contains k₃Individual user T_j＝{H₁,H₂,...,H_k3J is more than or equal to 1 and less than or equal to k₂，1≤k₃≤n；

4) By user's set T_jDefining a node threshold value as a node set of a decision tree₁Go through k₁Number of users k in each node set in the hierarchical decision tree₃If k is₃＜l₁Then output the user set T_jAll users in the system are suspected users and the users are deleted; otherwise, the user set T_jAll the users in the system are normal users and are not processed;

5) left over T_uA set of users, each user having 7-k₁Carrying out Z-score normalization processing on the address codes;

6) defining the minimum number of classes as p and the threshold value of the cluster set as l₂；

7) Selecting the weighted minimum distance to carry out hierarchical clustering analysis to obtain P cluster sets { Q₁,Q₂,...,Q_p}；

8) Traversing p cluster sets Q_kIf the set Q is clustered_kThe number of users in is less than l₂Then, the cluster set Q is output_kAll users in the system are suspected users; otherwise, cluster set Q_kAll users in the system are normal users and are not processed;

9) and (5) combining the step 4 and the step 8, and outputting all the users becoming suspicious.

The invention is further configured to: in step 1, the 7 address codes include province code, city code, district code, street code, residence committee code, road code and district code.

The invention is further configured to: in the step 2, a specific processing method for selecting the effective user set T is as follows: if the number n of the users is larger than 10000, directly eliminating the users with missing address code information or abnormal address codes in the user set T; if the number n of users is less than 10000, the address code information of the users is lost, if the address codes of two adjacent users are the same, the users are supplemented by the address codes, and the users with abnormal address codes are removed; if the address codes of two adjacent users are different, the user address codes in the same meter box or the user address codes in the same access point are seen, if the meter box or the access point only has the user, the user address codes are removed, and if the meter box or the access point also has other users, the user addresses of the same meter box or the same access point are used for supplement.

The invention is further configured to: in step 3, each layer of decision tree contains the number k of user sets₂All are different.

The invention is further configured to: in the step 5, the calculation formula of the Z-score normalization process is as follows:

wherein sigma is the standard deviation of data, mu is the average value of samples, x refers to the address code of the user, and the normalization process makes the data of different magnitudes uniformly converted into the same magnitude.

The invention is further configured to: in said step 7, P clusters [ Q ]₁,Q₂,...,Q_p]The specific algorithm is as follows: the input weight vector is [ W ]₁,W₂,…,W_m-k1]Performing hierarchical clustering analysis, wherein each user is one class, performing minimum distance calculation on any two classes each time, classifying the two classes with the closest distance into one class, and terminating when the number of clusters is reduced to P to obtain P cluster sets [ Q ]₁,Q₂,...,Q_p]。

The invention is further configured to: the calculation mode of any two types of minimum distances adopts an Euclidean distance calculation mode.

The invention has the following beneficial effects: because a series of power supply equipment such as a transformer, a meter box and the like arranged in a power supply station are distributed and managed according to a user address, the user address comprises certain regularity. The individual abnormal users are accurately identified by using the user addresses in the same region, so that the user variation relationship is judged, the abnormal response to large addresses (such as city and county codes) is rapid, the data acquisition is simple, the algorithm framework is simple, the parameter control is less, the algorithm efficiency is effectively improved, the query speed is accelerated, the error rate is reduced, the time and the capital cost are saved, and the input-output ratio is improved.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1, a method for determining an anomaly in a user-dependent relationship based on a user address includes the following steps: 1) inputting a user set T, wherein the user set T comprises n users { H }₁,H₂,...,H_nEach user contains 7 address codes; the 7 address codes comprise province code, city code, district code, street code, residence committee code and road codeA code and a cell code;

2) selecting effective user set T, and selecting user H with missing address and abnormal address_iCarrying out treatment; the specific processing method for selecting the effective user set T comprises the following steps: if the number n of the users is larger than 10000, directly eliminating the users with missing address code information or abnormal address codes in the user set T; if the number n of users is less than 10000, the address code information of the users is lost, if the address codes of two adjacent users are the same, the users are supplemented by the address codes, and the users with abnormal address codes are removed; if the address codes of two adjacent users are different, looking at the address code of the user in the same meter box or the address code of the user in the same access point, if the meter box or the access point only has the user, removing the user, and if the meter box or the access point has other users, supplementing the user with the address of the user in the same meter box or the same access point;

3) at a selected k₁Construction of k with individual address code as large address₁A hierarchical decision tree, where 1 ≦ k₁Less than or equal to 7; then each layer of the decision tree contains k₂Set of users { T }₁,T₂,...,T_k2T, each user set_jIn which contains k₃Individual user T_j＝{H₁,H₂,...,H_k3J is more than or equal to 1 and less than or equal to k₂，1≤k₃N is less than or equal to n; each layer of decision tree contains the number k of user sets₂All are different; for example, three address codes of province code, city code and street code are used as large addresses to construct three layers of decision trees, the first layer of province code includes 3301, 3302 and 3303, and the first layer of decision tree includes total number k of user sets₂Is 3, the three user sets are respectively T₁，T₂，T₃Each user set contains the number of users equal to l₁And (6) comparing.

4) By user's set T_jDefining a node threshold value as a node set of a decision tree₁，(l₁Selected according to actual occurrence in the field) through k₁Number of users k in each node set in the hierarchical decision tree₃If k is₃＜l₁Then output the user set T_jAll users in the system become suspicion usersAnd delete the user; otherwise, the user set T_jAll the users in the system are normal users and are not processed;

5) left over T_uA set of users, each user having 7-k₁The address code is normalized by Z-score, and the calculation formula is as follows:

wherein sigma is data standard deviation, mu is sample average value, x refers to user address code, and normalization processing enables data of different magnitudes to be uniformly converted into the same magnitude;

6) defining the minimum number of classes as p and the threshold value of the cluster set as l₂；(l₂And p is selected according to actual occurrence situation on site)

7) Selecting weighted minimum distance for hierarchical clustering analysis, and inputting weighted vector as W₁,W₂,…,W_m-k1](the weight vector selection is determined by the actual conditions in the field, e.g. [9,7,5,3,1 ]]) Performing hierarchical clustering analysis, wherein each user is one class, performing minimum distance calculation on any two classes each time, and the calculation mode adopts Euclidean distance calculation mode, such as two classes Q₁(x₁₁,x₁₂,…,x_1n) And Q₂(x₂₁,x₂₂,…,x_2n) The Euclidean distance between the two elements is calculated by the following formula:

wherein x represents a user address code; the two classes with the nearest distance are classified into one class, and the clustering process is terminated when the number of clusters is reduced to P, so that P cluster sets { Q }₁,Q₂,...,Q_p}；

The above-mentioned embodiments are only used for explaining the inventive concept of the present invention, and do not limit the protection of the claims of the present invention, and any insubstantial modifications of the present invention using this concept shall fall within the protection scope of the present invention.

Claims

1. A user variation relation abnormity discrimination method based on user addresses is characterized in that: the method comprises the following steps:

3) at a selected k₁Construction of k with individual address code as large address₁A hierarchical decision tree, where 1 ≦ k₁Less than or equal to 7; then each layer of the decision tree contains k₂Set of users { T }₁,T₂,...,T_k2T, each user set_jIn which contains k₃Individual user T_j＝{H₁,H₂,...,H _k3J is more than or equal to 1 and less than or equal to k₂，1≤k₃≤n；

5) left over T_uA set of users, each user having 7-k₁An address code of T_uCarrying out Z-score normalization processing on each user set;

2. The method according to claim 1, wherein the method for discriminating the anomaly of the user-dependent relationship based on the user address comprises: in step 1, the 7 address codes include province code, city code, district code, street code, residence committee code, road code and district code.

3. The method according to claim 1, wherein the method for discriminating the anomaly of the user-dependent relationship based on the user address comprises: in the step 2, a specific processing method for selecting the effective user set T is as follows: if the number n of the users is larger than 10000, directly eliminating the users with missing address code information or abnormal address codes in the user set T; if the number n of users is less than 10000, the address code information of the users is lost, if the address codes of two adjacent users are the same, the users are supplemented by the address codes, and the users with abnormal address codes are removed; if the address codes of two adjacent users are different, the user address codes in the same meter box or the user address codes in the same access point are seen, if the meter box or the access point only has the user, the user address codes are removed, and if the meter box or the access point also has other users, the user addresses of the same meter box or the same access point are used for supplement.

4. The method according to claim 1, wherein the method for discriminating the anomaly of the user-dependent relationship based on the user address comprises: in step 3, each layer of decision tree contains the number k of user sets₂All are different.

5. The method for judging the abnormality of the user-varying relationship based on the user address as claimed in claim 1, wherein: in said step 5, Z-score normalizationThe calculation formula for the processing is:

wherein sigma is data standard deviation, mu is sample average value, x is user address code, and normalization processing enables data of different magnitudes to be uniformly converted into the same magnitude.

6. The method for judging the abnormality of the user-varying relationship based on the user address as claimed in claim 1, wherein: in said step 7, P clusters [ Q ]₁,Q₂,...,Q_p]The specific algorithm is as follows: the input weight vector is [ W ]₁,W₂,…,W_m-k1]Performing hierarchical clustering analysis, wherein each user is one class, performing minimum distance calculation on any two classes each time, classifying the two classes with the closest distance into one class, and terminating when the number of clusters is reduced to P to obtain P cluster sets [ Q ]₁,Q₂,...,Q_p]。

7. The method according to claim 6, wherein the method for discriminating the anomaly of the user-dependent relationship based on the user address comprises: the calculation mode of any two types of minimum distances adopts an Euclidean distance calculation mode.