CN109447490B - User address-based abnormal change relation discrimination method - Google Patents

User address-based abnormal change relation discrimination method Download PDF

Info

Publication number
CN109447490B
CN109447490B CN201811307364.9A CN201811307364A CN109447490B CN 109447490 B CN109447490 B CN 109447490B CN 201811307364 A CN201811307364 A CN 201811307364A CN 109447490 B CN109447490 B CN 109447490B
Authority
CN
China
Prior art keywords
user
users
address
code
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811307364.9A
Other languages
Chinese (zh)
Other versions
CN109447490A (en
Inventor
曾嵘
邵航建
傅航天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Zhicheng Electronic Technology Co ltd
Original Assignee
Hangzhou Zhicheng Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Zhicheng Electronic Technology Co ltd filed Critical Hangzhou Zhicheng Electronic Technology Co ltd
Priority to CN201811307364.9A priority Critical patent/CN109447490B/en
Publication of CN109447490A publication Critical patent/CN109447490A/en
Application granted granted Critical
Publication of CN109447490B publication Critical patent/CN109447490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Water Supply & Treatment (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Small-Scale Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention mainly discloses a user address-based method for judging abnormal user change relationship, which comprises the following steps: inputting a user data set T, selecting a valid user data set T, and selecting k1Construction of k with individual address code as large address1A hierarchy of decision trees, each hierarchy of decision trees containing k2Individual set of user data, traverse k1Each user data set in the layer decision tree, if the number of users in the user data set is less than l1If all users in the user data set are judged as users becoming suspect, and the rest T isUProcessing each user data set to obtain P cluster sets { Q1,Q2,...,QpP cluster sets Q are traversedkIf the set Q is clusteredkThe number of users in is less than l2Then, the cluster set Q is outputkAll users in the system are suspected users. The invention has the characteristics of saving time, labor and capital cost and improving the input-output ratio.

Description

User address-based abnormal change relation discrimination method
Technical Field
The invention relates to the technical field of user address information acquisition, in particular to a user change relation abnormity discrimination method based on a user address.
Background
With the rapid development of the power grid, the power grid management is changed from the original rough form into lean form. In recent years, due to the rapid development of cities and the problem of historical power grid management, the power distribution network management still has a point to be improved. The household variable relationship has a large influence on power distribution network management, and the influence comprises a large amount of marketing services such as power failure notification to households, synchronous line loss of a transformer area, equipment management, business expansion and the like. The household variable relationship refers to the relationship between the electricity consumers and the distribution transformer. The identification method of the error of the household variable relationship mainly comprises the steps of manual field judgment or hardware addition in batch, and a large amount of time and capital cost are required. At present, various power supply enterprises have carried out a plurality of times of centralized manual investigation work, and the accuracy of the household variable relationship reaches a higher level. Considering that the number of the remaining abnormal users in the household variable relationship is relatively small and the electricity utilization behavior is relatively hidden, the examination is carried out by using manpower or additionally installing equipment, a large amount of time, labor and capital costs are needed, and the input-output ratio is extremely low.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a user-to-user relationship abnormity discrimination method based on user addresses, which has the characteristics of saving time, labor and capital cost and improving input-output ratio.
In order to achieve the purpose, the invention is realized by the following technical scheme: a user variation relation abnormity discrimination method based on user addresses comprises the following steps:
1) inputting a user set T, wherein the user set T comprises n users { H }1,H2,...,HnEach user contains 7 address codes;
2) selecting effective user set T, and selecting user H with missing address and abnormal addressiCarrying out treatment;
3) at a selected k1Construction of individual address codes as large addresses k1A hierarchical decision tree, where 1 ≦ k1Less than or equal to 7; then each layer of the decision tree contains k2Set of users { T }1,T2,...,Tk2T, each user setjIn which contains k3Individual user Tj={H1,H2,...,Hk3J is more than or equal to 1 and less than or equal to k2,1≤k3≤n;
4) By user's set TjDefining a node threshold value as a node set of a decision tree1Go through k1Number of users k in each node set in the hierarchical decision tree3If k is3<l1Then output the user set TjAll users in the system are suspected users and the users are deleted; otherwise, the user set TjAll the users in the system are normal users and are not processed;
5) left over TuA set of users, each user having 7-k1Carrying out Z-score normalization processing on the address codes;
6) defining the minimum number of classes as p and the threshold value of the cluster set as l2
7) Selecting the weighted minimum distance to carry out hierarchical clustering analysis to obtain P cluster sets { Q1,Q2,...,Qp};
8) Traversing p cluster sets QkIf the set Q is clusteredkThe number of users in is less than l2Then, the cluster set Q is outputkAll users in the system are suspected users; otherwise, cluster set QkAll users in the system are normal users and are not processed;
9) and (5) combining the step 4 and the step 8, and outputting all the users becoming suspicious.
The invention is further configured to: in step 1, the 7 address codes include province code, city code, district code, street code, residence committee code, road code and district code.
The invention is further configured to: in the step 2, a specific processing method for selecting the effective user set T is as follows: if the number n of the users is larger than 10000, directly eliminating the users with missing address code information or abnormal address codes in the user set T; if the number n of users is less than 10000, the address code information of the users is lost, if the address codes of two adjacent users are the same, the users are supplemented by the address codes, and the users with abnormal address codes are removed; if the address codes of two adjacent users are different, the user address codes in the same meter box or the user address codes in the same access point are seen, if the meter box or the access point only has the user, the user address codes are removed, and if the meter box or the access point also has other users, the user addresses of the same meter box or the same access point are used for supplement.
The invention is further configured to: in step 3, each layer of decision tree contains the number k of user sets2All are different.
The invention is further configured to: in the step 5, the calculation formula of the Z-score normalization process is as follows:
Figure GDA0003601275790000031
wherein sigma is the standard deviation of data, mu is the average value of samples, x refers to the address code of the user, and the normalization process makes the data of different magnitudes uniformly converted into the same magnitude.
The invention is further configured to: in said step 7, P clusters [ Q ]1,Q2,...,Qp]The specific algorithm is as follows: the input weight vector is [ W ]1,W2,…,Wm-k1]Performing hierarchical clustering analysis, wherein each user is one class, performing minimum distance calculation on any two classes each time, classifying the two classes with the closest distance into one class, and terminating when the number of clusters is reduced to P to obtain P cluster sets [ Q ]1,Q2,...,Qp]。
The invention is further configured to: the calculation mode of any two types of minimum distances adopts an Euclidean distance calculation mode.
The invention has the following beneficial effects: because a series of power supply equipment such as a transformer, a meter box and the like arranged in a power supply station are distributed and managed according to a user address, the user address comprises certain regularity. The individual abnormal users are accurately identified by using the user addresses in the same region, so that the user variation relationship is judged, the abnormal response to large addresses (such as city and county codes) is rapid, the data acquisition is simple, the algorithm framework is simple, the parameter control is less, the algorithm efficiency is effectively improved, the query speed is accelerated, the error rate is reduced, the time and the capital cost are saved, and the input-output ratio is improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, a method for determining an anomaly in a user-dependent relationship based on a user address includes the following steps: 1) inputting a user set T, wherein the user set T comprises n users { H }1,H2,...,HnEach user contains 7 address codes; the 7 address codes comprise province code, city code, district code, street code, residence committee code and road codeA code and a cell code;
2) selecting effective user set T, and selecting user H with missing address and abnormal addressiCarrying out treatment; the specific processing method for selecting the effective user set T comprises the following steps: if the number n of the users is larger than 10000, directly eliminating the users with missing address code information or abnormal address codes in the user set T; if the number n of users is less than 10000, the address code information of the users is lost, if the address codes of two adjacent users are the same, the users are supplemented by the address codes, and the users with abnormal address codes are removed; if the address codes of two adjacent users are different, looking at the address code of the user in the same meter box or the address code of the user in the same access point, if the meter box or the access point only has the user, removing the user, and if the meter box or the access point has other users, supplementing the user with the address of the user in the same meter box or the same access point;
3) at a selected k1Construction of k with individual address code as large address1A hierarchical decision tree, where 1 ≦ k1Less than or equal to 7; then each layer of the decision tree contains k2Set of users { T }1,T2,...,Tk2T, each user setjIn which contains k3Individual user Tj={H1,H2,...,Hk3J is more than or equal to 1 and less than or equal to k2,1≤k3N is less than or equal to n; each layer of decision tree contains the number k of user sets2All are different; for example, three address codes of province code, city code and street code are used as large addresses to construct three layers of decision trees, the first layer of province code includes 3301, 3302 and 3303, and the first layer of decision tree includes total number k of user sets2Is 3, the three user sets are respectively T1,T2,T3Each user set contains the number of users equal to l1And (6) comparing.
4) By user's set TjDefining a node threshold value as a node set of a decision tree1,(l1Selected according to actual occurrence in the field) through k1Number of users k in each node set in the hierarchical decision tree3If k is3<l1Then output the user set TjAll users in the system become suspicion usersAnd delete the user; otherwise, the user set TjAll the users in the system are normal users and are not processed;
5) left over TuA set of users, each user having 7-k1The address code is normalized by Z-score, and the calculation formula is as follows:
Figure GDA0003601275790000051
wherein sigma is data standard deviation, mu is sample average value, x refers to user address code, and normalization processing enables data of different magnitudes to be uniformly converted into the same magnitude;
6) defining the minimum number of classes as p and the threshold value of the cluster set as l2;(l2And p is selected according to actual occurrence situation on site)
7) Selecting weighted minimum distance for hierarchical clustering analysis, and inputting weighted vector as W1,W2,…,Wm-k1](the weight vector selection is determined by the actual conditions in the field, e.g. [9,7,5,3,1 ]]) Performing hierarchical clustering analysis, wherein each user is one class, performing minimum distance calculation on any two classes each time, and the calculation mode adopts Euclidean distance calculation mode, such as two classes Q1(x11,x12,…,x1n) And Q2(x21,x22,…,x2n) The Euclidean distance between the two elements is calculated by the following formula:
Figure GDA0003601275790000052
wherein x represents a user address code; the two classes with the nearest distance are classified into one class, and the clustering process is terminated when the number of clusters is reduced to P, so that P cluster sets { Q }1,Q2,...,Qp};
8) Traversing p cluster sets QkIf the set Q is clusteredkThe number of users in is less than l2Then, the cluster set Q is outputkAll users in the system are suspected users; otherwise, cluster set QkAll users in the system are normal users and are not processed;
9) and (5) combining the step 4 and the step 8, and outputting all the users becoming suspicious.
The above-mentioned embodiments are only used for explaining the inventive concept of the present invention, and do not limit the protection of the claims of the present invention, and any insubstantial modifications of the present invention using this concept shall fall within the protection scope of the present invention.

Claims (7)

1. A user variation relation abnormity discrimination method based on user addresses is characterized in that: the method comprises the following steps:
1) inputting a user set T, wherein the user set T comprises n users { H }1,H2,...,HnEach user contains 7 address codes;
2) selecting effective user set T, and selecting user H with missing address and abnormal addressiCarrying out treatment;
3) at a selected k1Construction of k with individual address code as large address1A hierarchical decision tree, where 1 ≦ k1Less than or equal to 7; then each layer of the decision tree contains k2Set of users { T }1,T2,...,Tk2T, each user setjIn which contains k3Individual user Tj={H1,H2,...,H k3J is more than or equal to 1 and less than or equal to k2,1≤k3≤n;
4) By user's set TjDefining a node threshold value as a node set of a decision tree1Go through k1Number of users k in each node set in the hierarchical decision tree3If k is3<l1Then output the user set TjAll users in the system are suspected users and the users are deleted; otherwise, the user set TjAll the users in the system are normal users and are not processed;
5) left over TuA set of users, each user having 7-k1An address code of TuCarrying out Z-score normalization processing on each user set;
6) defining the minimum number of classes as p and the threshold value of the cluster set as l2
7) Selecting the weighted minimum distance to carry out hierarchical clustering analysis to obtain P cluster sets { Q1,Q2,...,Qp};
8) Traversing p cluster sets QkIf the set Q is clusteredkThe number of users in is less than l2Then, the cluster set Q is outputkAll users in the system are suspected users; otherwise, cluster set QkAll users in the system are normal users and are not processed;
9) and (5) combining the step 4 and the step 8, and outputting all the users becoming suspicious.
2. The method according to claim 1, wherein the method for discriminating the anomaly of the user-dependent relationship based on the user address comprises: in step 1, the 7 address codes include province code, city code, district code, street code, residence committee code, road code and district code.
3. The method according to claim 1, wherein the method for discriminating the anomaly of the user-dependent relationship based on the user address comprises: in the step 2, a specific processing method for selecting the effective user set T is as follows: if the number n of the users is larger than 10000, directly eliminating the users with missing address code information or abnormal address codes in the user set T; if the number n of users is less than 10000, the address code information of the users is lost, if the address codes of two adjacent users are the same, the users are supplemented by the address codes, and the users with abnormal address codes are removed; if the address codes of two adjacent users are different, the user address codes in the same meter box or the user address codes in the same access point are seen, if the meter box or the access point only has the user, the user address codes are removed, and if the meter box or the access point also has other users, the user addresses of the same meter box or the same access point are used for supplement.
4. The method according to claim 1, wherein the method for discriminating the anomaly of the user-dependent relationship based on the user address comprises: in step 3, each layer of decision tree contains the number k of user sets2All are different.
5. The method for judging the abnormality of the user-varying relationship based on the user address as claimed in claim 1, wherein: in said step 5, Z-score normalizationThe calculation formula for the processing is:
Figure FDA0003601275780000021
wherein sigma is data standard deviation, mu is sample average value, x is user address code, and normalization processing enables data of different magnitudes to be uniformly converted into the same magnitude.
6. The method for judging the abnormality of the user-varying relationship based on the user address as claimed in claim 1, wherein: in said step 7, P clusters [ Q ]1,Q2,...,Qp]The specific algorithm is as follows: the input weight vector is [ W ]1,W2,…,Wm-k1]Performing hierarchical clustering analysis, wherein each user is one class, performing minimum distance calculation on any two classes each time, classifying the two classes with the closest distance into one class, and terminating when the number of clusters is reduced to P to obtain P cluster sets [ Q ]1,Q2,...,Qp]。
7. The method according to claim 6, wherein the method for discriminating the anomaly of the user-dependent relationship based on the user address comprises: the calculation mode of any two types of minimum distances adopts an Euclidean distance calculation mode.
CN201811307364.9A 2018-11-05 2018-11-05 User address-based abnormal change relation discrimination method Active CN109447490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811307364.9A CN109447490B (en) 2018-11-05 2018-11-05 User address-based abnormal change relation discrimination method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811307364.9A CN109447490B (en) 2018-11-05 2018-11-05 User address-based abnormal change relation discrimination method

Publications (2)

Publication Number Publication Date
CN109447490A CN109447490A (en) 2019-03-08
CN109447490B true CN109447490B (en) 2022-05-27

Family

ID=65550545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811307364.9A Active CN109447490B (en) 2018-11-05 2018-11-05 User address-based abnormal change relation discrimination method

Country Status (1)

Country Link
CN (1) CN109447490B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110231528B (en) * 2019-06-17 2021-05-28 国网重庆市电力公司电力科学研究院 Transformer household variation common knowledge identification method and device based on load characteristic model library

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108318759A (en) * 2018-01-25 2018-07-24 国网浙江海宁市供电有限公司 A kind of various dimensions taiwan area family becomes relation recognition method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7389209B2 (en) * 2002-05-03 2008-06-17 Sungard Energy Systems Inc. Valuing and optimizing scheduling of generation assets for a group of facilities
CN105160416A (en) * 2015-07-31 2015-12-16 国家电网公司 Transformer area reasonable line loss prediction method based on principal component analysis and neural network
CN107292765A (en) * 2017-06-21 2017-10-24 国网辽宁省电力有限公司 A kind of user power utilization behavior monitoring method based on clustering algorithm
CN107958395B (en) * 2017-12-13 2021-11-26 美林数据技术股份有限公司 Method for identifying abnormal users of power system
CN207637296U (en) * 2017-12-14 2018-07-20 国网北京市电力公司 Family becomes the test device of relationship identification system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108318759A (en) * 2018-01-25 2018-07-24 国网浙江海宁市供电有限公司 A kind of various dimensions taiwan area family becomes relation recognition method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Hongtao Xie,Fuhua Shang.The study of methods for post-pruning decision trees based on comprehensive evaluation standard.《 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)》.2014, *
丁晓等.基于配用电大数据的短期负荷预测.《电力工程技术》.2018,(第03期), *
王守相等.基于随机森林算法的台区合理线损率估计方法.《电力自动化设备》.2017,(第11期), *

Also Published As

Publication number Publication date
CN109447490A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN107169628B (en) Power distribution network reliability assessment method based on big data mutual information attribute reduction
CN110991786B (en) 10kV static load model parameter identification method based on similar daily load curve
CN106372747B (en) Random forest-based reasonable line loss rate estimation method for transformer area
CN111612053B (en) Calculation method for reasonable interval of line loss rate
CN112149967B (en) Power communication network vulnerability assessment method and system based on complex system theory
CN107909208A (en) Damage method drops in a kind of taiwan area distribution
Li et al. Intelligent anti-money laundering solution based upon novel community detection in massive transaction networks on spark
CN114519514B (en) Low-voltage transformer area reasonable line loss value measuring and calculating method, system and computer equipment
CN110705887A (en) Low-voltage transformer area operation state comprehensive evaluation method based on neural network model
CN110738232A (en) grid voltage out-of-limit cause diagnosis method based on data mining technology
CN112001441A (en) Power distribution network line loss anomaly detection method based on Kmeans-AHC hybrid clustering algorithm
CN109951499A (en) A kind of method for detecting abnormality based on network structure feature
CN113657678A (en) Power grid power data prediction method based on information freshness
El Mrabet et al. A performance comparison of data mining algorithms based intrusion detection system for smart grid
CN109447490B (en) User address-based abnormal change relation discrimination method
CN112508254B (en) Method for determining investment prediction data of transformer substation engineering project
CN111612054B (en) User electricity stealing behavior identification method based on nonnegative matrix factorization and density clustering
CN111651448A (en) Low-voltage topology identification method based on noise reduction differential evolution
CN114676931B (en) Electric quantity prediction system based on data center technology
CN116307844A (en) Low-voltage transformer area line loss evaluation analysis method
CN112241812B (en) Topology identification method for low-voltage distribution network based on single-side optimization and genetic algorithm cooperation
CN108123436B (en) Voltage out-of-limit prediction model based on principal component analysis and multiple regression algorithm
CN110995465A (en) Communication point panoramic view information operation and maintenance method and system
Jie et al. The study for data mining of distribution network based on particle swarm optimization with clustering algorithm method
CN113327047B (en) Power marketing service channel decision method and system based on fuzzy comprehensive model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Method for Distinguishing Abnormal User Change Relationship Based on User Address

Effective date of registration: 20230815

Granted publication date: 20220527

Pledgee: Bank of Jiangsu Limited by Share Ltd. Hangzhou branch

Pledgor: HANGZHOU ZHICHENG ELECTRONIC TECHNOLOGY Co.,Ltd.

Registration number: Y2023980052259