CN112330136A - Relevance mining method and device for abnormal electricity utilization analysis data set of large user - Google Patents
Relevance mining method and device for abnormal electricity utilization analysis data set of large user Download PDFInfo
- Publication number
- CN112330136A CN112330136A CN202011203430.5A CN202011203430A CN112330136A CN 112330136 A CN112330136 A CN 112330136A CN 202011203430 A CN202011203430 A CN 202011203430A CN 112330136 A CN112330136 A CN 112330136A
- Authority
- CN
- China
- Prior art keywords
- item
- analysis data
- user
- frequent
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005611 electricity Effects 0.000 title claims abstract description 107
- 238000004458 analytical method Methods 0.000 title claims abstract description 102
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 99
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000005065 mining Methods 0.000 title claims abstract description 25
- 230000002596 correlated effect Effects 0.000 claims description 16
- 238000012216 screening Methods 0.000 claims description 11
- 230000000875 corresponding effect Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000000586 desensitisation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/80—Management or planning
- Y02P90/82—Energy audits or management systems therefor
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Databases & Information Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Fuzzy Systems (AREA)
- General Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Water Supply & Treatment (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for mining the relevance of an abnormal electricity utilization analysis data set of a large user, wherein the method comprises the following steps: carrying out support degree judgment on the abnormal electricity utilization analysis data set of the large user to obtain a plurality of screened frequent item sets; and performing confidence judgment on the abnormal electricity utilization analysis data set through the screened frequent item sets to obtain a large-user abnormal electricity utilization analysis data set with relevance. According to the invention, through discrimination of the support degree and the confidence degree, the relevance among the abnormal electricity utilization analysis data set item sets of the large users is excavated, the operation data volume of the abnormal electricity utilization analysis algorithm is reduced, the system analysis efficiency is effectively improved, and the method has a good application prospect.
Description
Technical Field
The invention relates to a method and a device for mining relevance of an abnormal electricity utilization analysis data set of a large user, and belongs to the technical field of power systems.
Background
Along with the gradual increase of the dynamics of looking for the punishment of unusual power consumption action, the action of stealing electricity is more and more concealed, and technical means wherein is also higher and higher, and unusual power consumption action presents specialization, intellectuality, disguise. With the help of conventional statistical analysis being insufficient, the application of new generation artificial intelligence technology is becoming mainstream. The standardized data set is the basis of large-data machine learning analysis, but the abnormal electricity utilization analysis data set of a large user contains various data items and has large data volume, so that the data analysis calculation amount of the data set is extremely large, therefore, the abnormal electricity utilization analysis data set of the large user needs to be subjected to data mining, the association relation among the data items is determined, useless associated data are trimmed, the subsequent calculation amount is reduced, the application of an artificial intelligent analysis technology in the field of abnormal electricity utilization analysis of the user is met, and powerful support is provided for further realizing the intelligent analysis of the data and the deepened application of the data.
Disclosure of Invention
In order to overcome the defects, the invention discloses a method and a device for mining the relevance of an abnormal electricity utilization analysis data set of a large user, solves the problem that the subsequent calculation amount is large due to the large data volume of the abnormal electricity utilization analysis data set of the existing large user, and reduces the operation data volume of an abnormal electricity utilization analysis algorithm by mining the relevance among multi-source data.
In order to achieve the purpose, the invention adopts the technical scheme that: a relevance mining method for a large-user abnormal electricity utilization analysis data set comprises the following steps:
carrying out support degree judgment on the abnormal electricity utilization analysis data set of the large user to obtain a plurality of screened frequent item sets;
and performing confidence judgment on the abnormal electricity utilization analysis data set through the screened frequent item sets to obtain a large-user abnormal electricity utilization analysis data set with relevance.
Further, the large user abnormal electricity utilization analysis data set S is composed of n user information:
S={S1,S2,...,Sx,...,Sn} (29)
where x is 1,2, …, n, the xth user information SxComprises the following steps:
Sx={Lid,Lcl,Lcal,Ctv(t),Tag} (30)
wherein L isidFor locating information modules, LclIs a category information module, LcalFor calculating information modules, Ctv(T) is a time-varying information module, T is time, TagIs a calibration information module.
Further, the location information module LidThe method is used for marking various types of positioning information of the user and comprises the following steps:
Lid={nan,ncn,nid,nea,nlo,noi} (31)
wherein n isanNumbering elements, n, for the electric meter bureaucnNumbering elements, n, for usersidIs a user name element, neaAs a power-consuming address element, nloNumbering elements for lines, noiIdentity information elements of the collected object;
category information module LclThe method comprises the following steps:
Lcl={ncsc,nwm,nsc,ntc,non,netc,nml,nmm} (32)
wherein n iscscAs a user type element, nwmIs a wiring mode element, nscAs an element of the electric meter category, ntcElement, n, for industry ClassificationonTo organize the coded elements, netcIs an electrical property element, nmlAs a metering point level element, nmmIs a metering mode element;
computation information module LcalThe method comprises the following steps:
Lcal={ncc,nmc,nvc,nmvc,nmrc,ntf,nctf,nvtf} (33)
wherein n isccAs contract capacity element, nmcFor measuring dot capacity elements, nvcFor the user voltage class element, nmvcIs rated voltage element, n, of an electricity metermrcFor rated current element, n, of an electricity metertfIs a comprehensive multiplying power element, nctfIs a multiplying power element n of the current transformervtfIs a multiplying power element of the voltage transformer;
time-varying information module CtvThe inclusion terms of (t) are:
Ctv(t)={I(t),U(t),P(t),E(t),L(t)} (34)
wherein, I (t) is a current curve element, U (t) is a voltage curve element, P (t) is a power curve element, E (t) is a daily frozen electric quantity curve element, and L (t) is a line loss curve element;
calibration information module TagContaining item exception class element Ttype,
Tag={Ttype} (35)
The exception category elements include: no abnormity, no pressure loss, no current loss, series connection and three-phase unbalance.
Further, the method for judging the support degree of the abnormal electricity utilization analysis data set of the large user to obtain a plurality of frequent item sets after screening comprises the following steps:
extracting candidate 1 item set from abnormal electricity utilization analysis data set S of large userAnd calculating the support degree:
wherein, 1 item candidate setRepresenting a set of items containing only 1 element The mth element in the xth user information in the S;is the total number of times of occurrence of all mth information element values of the abnormal electricity consumption analysis data set S,finger-shapedIn candidate 1 item setThe number of times of occurrence of (a),
candidate 1 item setThe medium support degree is larger than the minimum frequency threshold SminCorresponding toReserving to obtain frequent 1 item set
candidate 2 item setRepresentation collection Is the l element in the x user information in S;to representAndin candidate 2 item setThe number of simultaneous occurrences in the process, and 2 sets of frequent items are screenedThe requirements are satisfied:
simultaneously, the following requirements are met:
is a 2 item setThe support degree is greater than the minimum frequency threshold SminThe corresponding candidate 2 item sets, and the 2 item sets which do not contain the frequent 1 item set elements are excluded through the formula (12);
sequentially calculating to obtain a frequent k item setAnd the k +1 term set does not satisfy the support threshold condition.
Further, the confidence degree judgment is carried out on the abnormal electricity utilization analysis data set through the screened frequent item sets, so that the abnormal electricity utilization analysis data set of the large user with relevance is obtained, and the method comprises the following steps:
confidence from frequent 2 item setStart discrimination for frequent 2 item setA certain set of 2 elementsFor the first item elementThe second item of the item set is an elementThe probability of (2) is the confidence of the 2-term set, and the confidence function of the 2-term set is expressed as:
representing frequent 2 item setsA certain set of 2 elementsThe degree of support of (c);representing a first item elementThe degree of support of (c);
judging whether the minimum confidence coefficient threshold value C is metminDetermining a strongly correlated frequent 2 item set
Sequentially judging to obtain a strongly correlated frequent k item setAnd further obtaining a large user abnormal electricity utilization analysis data set with relevance.
An association mining device for a large-user abnormal electricity utilization analysis data set comprises:
the frequent item set screening module is used for judging the support degree of the abnormal electricity utilization analysis data set of the large user to obtain a plurality of screened frequent item sets;
and the relevance mining module is used for carrying out confidence judgment on the abnormal electricity utilization analysis data set through the screened multiple frequent item sets to obtain a large-user abnormal electricity utilization analysis data set with relevance.
Further, the large user abnormal electricity utilization analysis data set S is composed of n user information:
S={S1,S2,...,Sx,...,Sn} (43)
where x is 1,2, …, n, the xth user information SxComprises the following steps:
Sx={Lid,Lcl,Lcal,Ctv(t),Tag} (44)
wherein L isidFor locating information modules, LclIs a category information module, LcalFor calculating information modules, Ctv(T) is a time-varying information module, T is time, TagIs a calibration information module.
Further, the location information module LidThe method is used for marking various types of positioning information of the user and comprises the following steps:
Lid={nan,ncn,nid,nea,nlo,noi} (45)
wherein n isanNumbering elements, n, for the electric meter bureaucnNumbering elements, n, for usersidIs a user name element, neaAs a power-consuming address element, nloNumbering elements for lines, noiIdentity information elements of the collected object;
category information module LclThe method comprises the following steps:
Lcl={ncsc,nwm,nsc,ntc,non,netc,nml,nmm} (46)
wherein n iscscFor user type elementElement, nwmIs a wiring mode element, nscAs an element of the electric meter category, ntcElement, n, for industry ClassificationonTo organize the coded elements, netcIs an electrical property element, nmlAs a metering point level element, nmmIs a metering mode element;
computation information module LcalThe method comprises the following steps:
Lcal={ncc,nmc,nvc,nmvc,nmrc,ntf,nctf,nvtf} (47)
wherein n isccAs contract capacity element, nmcFor measuring dot capacity elements, nvcFor the user voltage class element, nmvcIs rated voltage element, n, of an electricity metermrcFor rated current element, n, of an electricity metertfIs a comprehensive multiplying power element, nctfIs a multiplying power element n of the current transformervtfIs a multiplying power element of the voltage transformer;
time-varying information module CtvThe inclusion terms of (t) are:
Ctv(t)={I(t),U(t),P(t),E(t),L(t)} (48)
wherein, I (t) is a current curve element, U (t) is a voltage curve element, P (t) is a power curve element, E (t) is a daily frozen electric quantity curve element, and L (t) is a line loss curve element;
calibration information module TagContaining item exception class element Ttype,
Tag={Ttype} (49)
The exception category elements include: no abnormity, no pressure loss, no current loss, series connection and three-phase unbalance.
Further, the method for judging the support degree of the abnormal electricity utilization analysis data set of the large user to obtain a plurality of frequent item sets after screening comprises the following steps:
extracting candidate 1 item set from abnormal electricity utilization analysis data set S of large userAnd calculating the support degree:
wherein, 1 item candidate setRepresenting a set of items containing only 1 element The mth element in the xth user information in the S;is the total number of times of occurrence of all mth information element values of the abnormal electricity consumption analysis data set S,finger-shapedIn candidate 1 item setThe number of times of occurrence of (a),
candidate 1 item setThe medium support degree is larger than the minimum frequency threshold SminCorresponding toReserving to obtain frequent 1 item set
candidate 2 item setRepresentation collection Is the l element in the x user information in S;to representAndin candidate 2 item setThe number of simultaneous occurrences in the process, and 2 sets of frequent items are screenedThe requirements are satisfied:
simultaneously, the following requirements are met:
is a 2 item setThe support degree is greater than the minimum frequency threshold SminThe corresponding candidate 2 item sets, and the 2 item sets which do not contain the frequent 1 item set elements are excluded through the formula (12);
sequentially calculating to obtain a frequent k item setAnd the k +1 term set does not satisfy the support threshold condition.
Further, the confidence degree judgment is carried out on the abnormal electricity utilization analysis data set through the screened frequent item sets, so that the abnormal electricity utilization analysis data set of the large user with relevance is obtained, and the method comprises the following steps:
confidence from frequent 2 item setStart discrimination for frequent 2 item setA certain set of 2 elementsFor the first item elementThe second item of the item set is an elementThe probability of (2) is the confidence of the 2-term set, and the confidence function of the 2-term set is expressed as:
representing frequent 2 item setsA certain set of 2 elementsThe degree of support of (c);representing a first item elementThe degree of support of (c);
judging whether the minimum confidence coefficient threshold value C is metminDetermining a strongly correlated frequent 2 item set
Sequentially judging to obtain a strongly correlated frequent k item setAnd further obtaining a large user abnormal electricity utilization analysis data set with relevance.
The invention has the beneficial effects that: according to the invention, through discrimination of the support degree and the confidence degree, the relevance among the abnormal electricity utilization analysis data set item sets of the large users is excavated, the operation data volume of the abnormal electricity utilization analysis algorithm is reduced, the system analysis efficiency is effectively improved, and the method has a good application prospect.
Detailed Description
The present invention is further described below.
At present, abnormal user analysis data is mainly distributed in a power utilization information acquisition system, a marketing system and the like, and in order to promote data sharing and improve data interaction efficiency in a unified manner, a marketing basic data platform needs to be constructed, wherein the marketing basic data platform comprises data for use and acquisition, marketing data, negative control data, event data and the like related to abnormal power utilization analysis and power utilization behavior diagnosis, so that the requirement of a deep neural network on sample data is met, and the characteristics of electric quantity, power consumption and the like of abnormal power utilization behaviors are comprehensively reflected.
Example 1:
a relevance mining method for a large user abnormal electricity utilization analysis data set is used for carrying out relevance analysis and pruning processing on the large user abnormal electricity utilization analysis data set, and comprises the following steps:
the method comprises the following steps: scanning a large user abnormal electricity utilization analysis data set S, wherein the S is composed of n user information:
S={S1,S2,...,Sx,...,Sn} (57)
where x is 1,2, …, n, the xth user information SxIncludes a plurality of module information:
Sx={Lid,Lcl,Lcal,Ctv(t),Tag} (58)
wherein L isidFor locating information modules, LclIs a category information module, LcalFor calculating information modules, Ctv(T) is a time-varying information module, T is time, TagIs a calibration information module.
Location information module LidThe device consists of a plurality of encryption desensitization information elements and is used for marking various types of positioning information of users, and the encryption desensitization information elements comprise the following items:
Lid={nan,ncn,nid,nea,nlo,noi} (59)
wherein n isanNumbering elements, n, for the electric meter bureaucnNumbering elements, n, for usersidIs a user name element, neaAs a power-consuming address element, nloNumbering elements for lines, noiFor collecting object identity information elements.
Category information module LclWhich comprises the items:
Lcl={ncsc,nwm,nsc,ntc,non,netc,nml,nmm} (60)
wherein n iscscAs a user type element, nwmIs a wiring mode element, nscAs an element of the electric meter category, ntcElement, n, for industry ClassificationonTo organize the coded elements, netcIs an electrical property element, nmlAs a metering point level element, nmmIs a metering mode element.
Computation information module LcalWhich comprises the items:
Lcal={ncc,nmc,nvc,nmvc,nmrc,ntf,nctf,nvtf} (61)
wherein n isccAs contract capacity element, nmcFor measuring dot capacity elements, nvcFor the user voltage class element, nmvcIs rated voltage element, n, of an electricity metermrcFor rated current element, n, of an electricity metertfIs a comprehensive multiplying power element, nctfIs a multiplying power element n of the current transformervtfIs a multiplying power element of the voltage transformer.
Time-varying information module CtvThe inclusion terms of (t) are:
Ctv(t)={I(t),U(t),P(t),E(t),L(t)} (62)
wherein, I (t) is a current curve element, U (t) is a voltage curve element, P (t) is a power curve element, E (t) is a daily frozen electric quantity curve element, and L (t) is a line loss curve element.
Calibration information module TagContaining item exception class element Ttype,
Tag={Ttype} (63)
The exception category elements include: no anomaly, no voltage drop, no current drop, cascade, three-phase imbalance, etc.
Step two: carrying out support degree judgment on the abnormal electricity utilization analysis data set to obtain a plurality of screened frequent item sets;
positioning information module L for analyzing each user of data set S according to abnormal power utilizationidClass information module LclAnd a calculation information module LcalTime-varying information module Ctv(T), calibration information Module TagAll contained itemsTo extract a candidate 1 item setAnd calculating the support degree:
wherein the content of the first and second substances,the m-th element in the xth user information in the S, wherein the m maximum value is the total number of element types of the information module, namely the total number of the element types in the five information modules; candidate 1 item setRepresenting a set of items containing only 1 elementFor example, for the current transformer multiplying power element set {10/5,50/5,100/5,50/5,100/5 };
the total times of all the mth information element values of the abnormal electricity utilization analysis data set S, for example, the total times of the multiplying power elements of the current transformer is 5,finger-shapedIn the candidate1 item setIf the current transformer multiplying power is 50/5, the supporting degree is 2/5. Candidate 1 item setThe medium support degree is larger than the minimum frequency threshold SminCorresponding toReserving and selecting a frequent 1 item setThose meaningless item set rules are filtered out.
SminIs the minimum frequency threshold.
Scanning a large user abnormal electricity utilization analysis data set S, and extracting a candidate 2 item setCalculating its support
Candidate 2 item setRepresentation collection Is the l element in the x user information in S;to representAndin candidate 2 item setIn the case of simultaneous occurrence of events, e.g. SmIs a current transformer multiplying power element set {10/5,50/5,100/5,50/5,100/5}, SlIs an abnormal class element set { no abnormality, no flow, no pressure },the number of occurrences of {50/5, loss } in (b) is 2, the support degree is 2/5;
At the same time need to satisfy
The 2-item set that does not contain frequent 1-item set elements is excluded by equation (12). E.g., frequent 2 item set after screeningIs {50/5, lost flow }, {50/5, lost flow } }.
Scanning a large user abnormal electricity utilization analysis data set S, and extracting a candidate 3 item setComputingDegree of support thereof
At the same time need to satisfy
Calculating a frequent 4 item set according to the stepsFrequent 5 item setSet of k items in order to frequencyAnd the k +1 term set does not satisfy the support threshold condition.
Step three: and performing confidence judgment on the abnormal electricity utilization analysis data set through the screened frequent item sets to obtain an abnormal electricity utilization analysis data set with relevance.
And judging the probability of the simultaneous occurrence of the elements of the screened frequent item sets according to the support degree of the abnormal electricity utilization analysis data set, and measuring the accuracy of the mining information of the item sets.
Confidence from frequent 2 item setStart discrimination for frequent 2 item setA certain set of 2 elementsFor the first item elementThe second item of the item set is an elementThe probability of (2) is the confidence of this 2-term set, and the confidence function of the 2-term set can be expressed as:
representing frequent 2 item setsA certain set of 2 elementsThe degree of support of (c);representing a first item elementThe degree of support of (c);
Judging whether the minimum confidence coefficient threshold value C is metminDetermining a strongly correlated frequent 2 item set
For frequent 3 item setA certain set of 3 elements ofFor the first item elementThe second and third elements of the item set are simultaneouslyThe probability is the confidence of the 3 term set, and the confidence function of the frequent 3 term set can be expressed as:
representing frequent 3 item setsA certain set of 3 elements ofThe degree of support of (c);presentation elementThe degree of support of (c); judging whether the minimum confidence coefficient threshold value C is metminDetermining a strongly correlated frequent 3 item set
Judging to obtain a strong correlation frequent 4 item set according to the stepsStrongly correlated frequent 5 item setFrequent k item set in sequence to strong correlation
Strong correlation frequent 2 item set obtained by the methodStrongly correlated frequent 3 item setStrongly correlated frequent k term setThe elements of each item set are strongly correlated, and data items which are meaningless to abnormal electricity utilization analysis are removed, so that the data items can be used as basic data of the abnormal electricity utilization analysis.
In conclusion, the relevance mining method for the abnormal electricity utilization analysis data set of the large user is constructed, the relevance relation of the internal elements of the abnormal electricity utilization analysis data set is mined through the support degree and the confidence degree judgment, the meaningless relevance rule is removed, the calculated amount of a subsequent data analysis algorithm is effectively reduced, and the method has a good application prospect.
Example 2:
an association mining device for a large-user abnormal electricity utilization analysis data set comprises:
the frequent item set screening module is used for judging the support degree of the abnormal electricity utilization analysis data set of the large user to obtain a plurality of screened frequent item sets;
and the relevance mining module is used for carrying out confidence judgment on the abnormal electricity utilization analysis data set through the screened multiple frequent item sets to obtain a large-user abnormal electricity utilization analysis data set with relevance.
Further, the large user abnormal electricity utilization analysis data set S is composed of n user information:
S={S1,S2,...,Sx,...,Sn} (76)
where x is 1,2, …, n, the xth user information SxComprises the following steps:
Sx={Lid,Lcl,Lcal,Ctv(t),Tag} (77)
wherein L isidFor locating information modules, LclIs a category information module, LcalFor calculating information modules, Ctv(T) is a time-varying information module, T is time, TagIs a calibration information module.
Further, the location information module LidThe method is used for marking various types of positioning information of the user and comprises the following steps:
Lid={nan,ncn,nid,nea,nlo,noi} (78)
wherein n isanNumbering elements, n, for the electric meter bureaucnNumbering elements, n, for usersidIs a user name element, neaAs a power-consuming address element, nloNumbering elements for lines, noiIdentity information elements of the collected object;
category information module LclThe method comprises the following steps:
Lcl={ncsc,nwm,nsc,ntc,non,netc,nml,nmm} (79)
wherein n iscscAs a user type element, nwmIs a wiring mode element, nscAs an element of the electric meter category, ntcElement, n, for industry ClassificationonTo organize the coded elements, netcIs an electrical property element, nmlAs a metering point level element, nmmIs a metering mode element;
computation information module LcalThe method comprises the following steps:
Lcal={ncc,nmc,nvc,nmvc,nmrc,ntf,nctf,nvtf} (80)
wherein n isccAs contract capacity element, nmcFor measuring dot capacity elements, nvcFor the user voltage class element, nmvcIs rated voltage element, n, of an electricity metermrcFor rated current element, n, of an electricity metertfIs a comprehensive multiplying power element, nctfIs a multiplying power element n of the current transformervtfIs a multiplying power element of the voltage transformer;
time-varying information module CtvThe inclusion terms of (t) are:
Ctv(t)={I(t),U(t),P(t),E(t),L(t)} (81)
wherein, I (t) is a current curve element, U (t) is a voltage curve element, P (t) is a power curve element, E (t) is a daily frozen electric quantity curve element, and L (t) is a line loss curve element;
calibration information module TagContaining item exception class element Ttype,
Tag={Ttype} (82)
The exception category elements include: no abnormity, no pressure loss, no current loss, series connection and three-phase unbalance.
Further, the method for judging the support degree of the abnormal electricity utilization analysis data set of the large user to obtain a plurality of frequent item sets after screening comprises the following steps:
extracting candidate 1 item set from abnormal electricity utilization analysis data set S of large userAnd calculating the support degree:
wherein, 1 item candidate setRepresenting a set of items containing only 1 element The mth element in the xth user information in the S;is the total number of times of occurrence of all mth information element values of the abnormal electricity consumption analysis data set S,finger-shapedIn candidate 1 item setThe number of times of occurrence of (a),
candidate 1 item setThe medium support degree is larger than the minimum frequency threshold SminCorresponding toReserve to obtainTo frequent 1 item set
candidate 2 item setRepresentation collection Is the l element in the x user information in S;to representAndin candidate 2 item setThe number of simultaneous occurrences in the process, and 2 sets of frequent items are screenedThe requirements are satisfied:
at the same time satisfy
Is a 2 item setThe support degree is greater than the minimum frequency threshold SminThe corresponding candidate 2 item sets, and the 2 item sets which do not contain the frequent 1 item set elements are excluded through the formula (12);
sequentially calculating to obtain a frequent k item setAnd the k +1 term set does not satisfy the support threshold condition.
Further, the confidence degree judgment is carried out on the abnormal electricity utilization analysis data set through the screened frequent item sets, so that the abnormal electricity utilization analysis data set of the large user with relevance is obtained, and the method comprises the following steps:
confidence from frequent 2 item setStart discrimination for frequent 2 item setA certain set of 2 elementsFor the first item elementThe second item of the item set is an elementThe probability of (2) is the confidence of the 2-term set, and the confidence function of the 2-term set is expressed as:
representing frequent 2 item setsA certain set of 2 elementsThe degree of support of (c);representing a first item elementThe degree of support of (c);
judging whether the minimum confidence coefficient threshold value C is metminDetermining a strongly correlated frequent 2 item set
Sequentially judging to obtain a strongly correlated frequent k item setAnd further obtaining a large user abnormal electricity utilization analysis data set with relevance.
The foregoing illustrates and describes the principles, general features, and characteristics of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (10)
1. A relevance mining method for a large-user abnormal electricity utilization analysis data set is characterized by comprising the following steps:
carrying out support degree judgment on the abnormal electricity utilization analysis data set of the large user to obtain a plurality of screened frequent item sets;
and performing confidence judgment on the abnormal electricity utilization analysis data set through the screened frequent item sets to obtain a large-user abnormal electricity utilization analysis data set with relevance.
2. The correlation mining method for the large-user abnormal electricity utilization analysis data set according to claim 1, characterized in that: the abnormal electricity utilization analysis data set S of the large user consists of n user information:
S={S1,S2,...,Sx,...,Sn} (1)
where x is 1,2, …, n, the xth user information SxComprises the following steps:
Sx={Lid,Lcl,Lcal,Ctv(t),Tag} (2)
wherein L isidFor locating information modules, LclIs a category information module, LcalFor calculating information modules, Ctv(T) is a time-varying information module, T is time, TagIs a calibration information module.
3. The correlation mining method for the large-user abnormal electricity utilization analysis data set according to claim 2, characterized in that: location information module LidThe method is used for marking various types of positioning information of the user and comprises the following steps:
Lid={nan,ncn,nid,nea,nlo,noi} (3)
wherein n isanNumbering elements, n, for the electric meter bureaucnNumbering elements, n, for usersidIs a user name element, neaAs a power-consuming address element, nloNumbering elements for lines, noiIdentity information elements of the collected object;
category information module LclThe method comprises the following steps:
Lcl={ncsc,nwm,nsc,ntc,non,netc,nml,nmm} (4)
wherein n iscscAs a user type element, nwmIs a wiring mode element, nscAs an element of the electric meter category, ntcElement, n, for industry ClassificationonTo organize the coded elements, netcIs an electrical property element, nmlAs a metering point level element, nmmIs a metering mode element;
computation information module LcalThe method comprises the following steps:
Lcal={ncc,nmc,nvc,nmvc,nmrc,ntf,nctf,nvtf} (5)
wherein n isccAs contract capacity element, nmcFor measuring dot capacity elements, nvcFor the user voltage class element, nmvcIs rated voltage element, n, of an electricity metermrcFor rated current element, n, of an electricity metertfIs a comprehensive multiplying power element, nctfIs a multiplying power element n of the current transformervtfIs a multiplying power element of the voltage transformer;
time-varying information module CtvThe inclusion terms of (t) are:
Ctv(t)={I(t),U(t),P(t),E(t),L(t)} (6)
wherein, I (t) is a current curve element, U (t) is a voltage curve element, P (t) is a power curve element, E (t) is a daily frozen electric quantity curve element, and L (t) is a line loss curve element;
calibration information module TagContaining item exception class element Ttype,
Tag={Ttype} (7)
The exception category elements include: no abnormity, no pressure loss, no current loss, series connection and three-phase unbalance.
4. The correlation mining method for the large-user abnormal electricity utilization analysis data set according to claim 2, characterized in that: carrying out support degree judgment on the abnormal electricity utilization analysis data set of the large user to obtain a plurality of frequent item sets after screening, and the method comprises the following steps:
extracting candidate 1 item set from abnormal electricity utilization analysis data set S of large userAnd calculating the support degree:
wherein, 1 item candidate setRepresenting a set of items containing only 1 element The mth element in the xth user information in the S;is the total number of times of occurrence of all mth information element values of the abnormal electricity consumption analysis data set S,finger-shapedIn the candidate1 item setThe number of times of occurrence of (a),
candidate 1 item setThe medium support degree is larger than the minimum frequency threshold SminCorresponding toReserving to obtain frequent 1 item set
candidate 2 item setRepresentation collection Is the l element in the x user information in S;to representAndin candidate 2 item setThe number of simultaneous occurrences in the process, and 2 sets of frequent items are screenedThe requirements are satisfied:
simultaneously, the following requirements are met:
is a 2 item setThe support degree is greater than the minimum frequency threshold SminThe corresponding candidate 2 item sets, and the 2 item sets which do not contain the frequent 1 item set elements are excluded through the formula (12);
5. The correlation mining method for the large-user abnormal electricity utilization analysis data set according to claim 1, characterized in that: carrying out confidence judgment on the abnormal electricity utilization analysis data set through the screened frequent item sets to obtain a large-user abnormal electricity utilization analysis data set with relevance, wherein the method comprises the following steps:
confidence from frequent 2 item setStart discrimination for frequent 2 item setA certain set of 2 elementsFor the first item elementThe second item of the item set is an elementThe probability of (2) is the confidence of the 2-term set, and the confidence function of the 2-term set is expressed as:
representing frequent 2 item setsA certain set of 2 elementsThe degree of support of (c);representing a first item elementThe degree of support of (c);
judging whether the minimum confidence coefficient threshold value C is metminDetermining a strongly correlated frequent 2 item set
6. An association mining device for a large-user abnormal electricity utilization analysis data set is characterized by comprising:
the frequent item set screening module is used for judging the support degree of the abnormal electricity utilization analysis data set of the large user to obtain a plurality of screened frequent item sets;
and the relevance mining module is used for carrying out confidence judgment on the abnormal electricity utilization analysis data set through the screened multiple frequent item sets to obtain a large-user abnormal electricity utilization analysis data set with relevance.
7. The device for mining the relevance of the abnormal electricity utilization analysis data set of the large users according to claim 6, wherein: the abnormal electricity utilization analysis data set S of the large user consists of n user information:
S={S1,S2,...,Sx,...,Sn} (15)
where x is 1,2, …, n, the xth user information SxComprises the following steps:
Sx={Lid,Lcl,Lcal,Ctv(t),Tag} (16)
wherein L isidFor locating information modules, LclIs a category information module, LcalFor calculating information modules, Ctv(T) is a time-varying information module, T is time, TagIs a calibration information module.
8. The device for mining the relevance of the abnormal electricity utilization analysis data set of the large users according to claim 7, wherein: location information module LidThe method is used for marking various types of positioning information of the user and comprises the following steps:
Lid={nan,ncn,nid,nea,nlo,noi} (17)
wherein n isanNumbering elements, n, for the electric meter bureaucnNumbering elements, n, for usersidIs a user name element, neaAs a power-consuming address element, nloNumbering elements for lines, noiIdentity information elements of the collected object;
category information module LclThe method comprises the following steps:
Lcl={ncsc,nwm,nsc,ntc,non,netc,nml,nmm} (18)
wherein n iscscAs a user type element, nwmIs a wiring mode element, nscAs an element of the electric meter category, ntcElement, n, for industry ClassificationonTo organize the coded elements, netcIs an electrical property element, nmlAs a metering point level element, nmmIs a metering mode element;
computation information module LcalThe method comprises the following steps:
Lcal={ncc,nmc,nvc,nmvc,nmrc,ntf,nctf,nvtf} (19)
wherein n isccAs contract capacity element, nmcFor measuring dot capacity elements, nvcFor the user voltage class element, nmvcIs rated voltage element, n, of an electricity metermrcFor rated current element, n, of an electricity metertfIs a comprehensive multiplying power element, nctfIs a multiplying power element n of the current transformervtfIs a multiplying power element of the voltage transformer;
time-varying information module CtvThe inclusion terms of (t) are:
Ctv(t)={I(t),U(t),P(t),E(t),L(t)} (20)
wherein, I (t) is a current curve element, U (t) is a voltage curve element, P (t) is a power curve element, E (t) is a daily frozen electric quantity curve element, and L (t) is a line loss curve element;
calibration information module TagContaining item exception class element Ttype,
Tag={Ttype} (21)
The exception category elements include: no abnormity, no pressure loss, no current loss, series connection and three-phase unbalance.
9. The device for mining the relevance of the abnormal electricity utilization analysis data set of the large users according to claim 7, wherein: carrying out support degree judgment on the abnormal electricity utilization analysis data set of the large user to obtain a plurality of frequent item sets after screening, and the method comprises the following steps:
extracting candidate 1 item set from abnormal electricity utilization analysis data set S of large userAnd calculating the support degree:
wherein, 1 item candidate setRepresenting a set of items containing only 1 element The mth element in the xth user information in the S;is the total number of times of occurrence of all mth information element values of the abnormal electricity consumption analysis data set S,finger-shapedIn candidate 1 item setThe number of times of occurrence of (a),
candidate 1 item setThe medium support degree is larger than the minimum frequency threshold SminCorresponding toReserving to obtain frequent 1 item set
candidate 2 item setRepresentation collection Is the l element in the x user information in S;to representAndin candidate 2 item setThe number of simultaneous occurrences in the process, and 2 sets of frequent items are screenedThe requirements are satisfied:
simultaneously, the following requirements are met:
is a 2 item setThe support degree is greater than the minimum frequency threshold SminThe corresponding candidate 2 item sets, and the 2 item sets which do not contain the frequent 1 item set elements are excluded through the formula (12);
10. The device for mining the relevance of the abnormal electricity utilization analysis data set of the large users according to claim 6, wherein: carrying out confidence judgment on the abnormal electricity utilization analysis data set through the screened frequent item sets to obtain a large-user abnormal electricity utilization analysis data set with relevance, wherein the method comprises the following steps:
confidence from frequent 2 item setStart discrimination for frequent 2 item setA certain set of 2 elementsFor the first item elementThe second item of the item set is an elementThe probability of (2) is the confidence of the 2-term set, and the confidence function of the 2-term set is expressed as:
representing frequent 2 item setsA certain set of 2 elementsThe degree of support of (c);representing a first item elementThe degree of support of (c);
judging whether the minimum confidence coefficient threshold value C is metminDetermining a strongly correlated frequent 2 item set
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011203430.5A CN112330136A (en) | 2020-11-02 | 2020-11-02 | Relevance mining method and device for abnormal electricity utilization analysis data set of large user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011203430.5A CN112330136A (en) | 2020-11-02 | 2020-11-02 | Relevance mining method and device for abnormal electricity utilization analysis data set of large user |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112330136A true CN112330136A (en) | 2021-02-05 |
Family
ID=74324244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011203430.5A Pending CN112330136A (en) | 2020-11-02 | 2020-11-02 | Relevance mining method and device for abnormal electricity utilization analysis data set of large user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112330136A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115878908A (en) * | 2023-01-09 | 2023-03-31 | 华南理工大学 | Social network influence maximization method and system based on graph attention machine mechanism |
-
2020
- 2020-11-02 CN CN202011203430.5A patent/CN112330136A/en active Pending
Non-Patent Citations (2)
Title |
---|
李智雄等: ""高速公路用电行为异常检测算法及预警模型研究"", 《交通世界》 * |
段晓萌等: ""基于FP-growth算法的用电异常数据挖掘方法"", 《电子技术应用》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115878908A (en) * | 2023-01-09 | 2023-03-31 | 华南理工大学 | Social network influence maximization method and system based on graph attention machine mechanism |
CN115878908B (en) * | 2023-01-09 | 2023-06-02 | 华南理工大学 | Social network influence maximization method and system of graph annotation meaning force mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223196B (en) | Anti-electricity-stealing analysis method based on typical industry feature library and anti-electricity-stealing sample library | |
CN110097297B (en) | Multi-dimensional electricity stealing situation intelligent sensing method, system, equipment and medium | |
CN110781332A (en) | Electric power resident user daily load curve clustering method based on composite clustering algorithm | |
CN111738462B (en) | Fault first-aid repair active service early warning method for electric power metering device | |
CN107230013B (en) | Method for identifying abnormal power consumption and time positioning of distribution network users under unsupervised learning | |
CN110930198A (en) | Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment | |
CN109409444B (en) | Multivariate power grid fault type discrimination method based on prior probability | |
CN114519514B (en) | Low-voltage transformer area reasonable line loss value measuring and calculating method, system and computer equipment | |
CN109947815B (en) | Power theft identification method based on outlier algorithm | |
CN114611738A (en) | Load prediction method based on user electricity consumption behavior analysis | |
CN112184489A (en) | Power consumer grouping management system and method | |
CN111191909A (en) | Electricity stealing identification system based on data analysis of typical electricity stealing industry and historical electricity stealing sample library | |
CN111914942A (en) | Multi-table-combined one-use energy anomaly analysis method | |
CN106846170B (en) | Generator set trip monitoring method and monitoring device thereof | |
CN108596227A (en) | A kind of leading influence factor method for digging of user power utilization behavior | |
CN112330136A (en) | Relevance mining method and device for abnormal electricity utilization analysis data set of large user | |
Grigoras et al. | Processing of smart meters data for peak load estimation of consumers | |
CN117277566B (en) | Power grid data analysis power dispatching system and method based on big data | |
CN111861587A (en) | System and method for analyzing residential electricity consumption behavior based on hidden Markov model and forward algorithm | |
CN111639792A (en) | Method for intelligently adding bank ATM (automatic teller machine) money based on artificial intelligence | |
CN114372835B (en) | Comprehensive energy service potential customer identification method, system and computer equipment | |
CN112924743B (en) | Instrument state detection method based on current data | |
CN112487991B (en) | High-precision load identification method and system based on characteristic self-learning | |
CN115147242A (en) | Power grid data management system based on data mining | |
CN114066219A (en) | Electricity stealing analysis method for intelligently identifying electricity utilization abnormal points under incidence matrix |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210205 |