CN105373894A - Inspection data-based power marketing service diagnosis model establishing method and system - Google Patents

Inspection data-based power marketing service diagnosis model establishing method and system Download PDF

Info

Publication number
CN105373894A
CN105373894A CN201510817672.6A CN201510817672A CN105373894A CN 105373894 A CN105373894 A CN 105373894A CN 201510817672 A CN201510817672 A CN 201510817672A CN 105373894 A CN105373894 A CN 105373894A
Authority
CN
China
Prior art keywords
abnormal
information
diagnosis
model
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510817672.6A
Other languages
Chinese (zh)
Inventor
吴峰
武华
余飞鸥
吕浩晖
刘飞
潘炜
伍笑颜
陈碧仪
陈敬红
吴疆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Power Supply Bureau Co Ltd
Original Assignee
Guangzhou Power Supply Bureau Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Power Supply Bureau Co Ltd filed Critical Guangzhou Power Supply Bureau Co Ltd
Priority to CN201510817672.6A priority Critical patent/CN105373894A/en
Publication of CN105373894A publication Critical patent/CN105373894A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to an inspection data-based power marketing service diagnosis model establishing method and system. The method includes the following steps that: abnormal data information is acquired according to acquisition conditions; physical examination analysis is performed on the abnormal data information, so that association rules can be determined; an expert sample library is established according to the association rules; and a diagnosis model of inspection abnormalities is established according to the expert sample library. According to the inspection data-based power marketing service diagnosis model establishing method and system, the abnormal data information is acquired according to the acquisition conditions; physical examination analysis is performed on the abnormal data information, so that the association rules can be determined; the expert sample library is established according to the association rules; and the diagnosis model of the inspection abnormalities is established according to the expert sample library. With the inspection data-based power marketing service diagnosis model establishing method and system adopted, abnormality type and abnormality degree analysis is performed on abnormalities in power marketing data, and support can be provided for power inspection work.

Description

Method and system for establishing electric power marketing service diagnosis model based on inspection data
Technical Field
The invention relates to the field of electric power marketing monitoring systems, in particular to a method and a system for establishing an electric power marketing service diagnosis model based on inspection data.
Background
The electric power marketing inspection is the whole process check, management and supervision of the electric power marketing working quality and the service quality by a power supply enterprise according to national laws and regulations and enterprise regulations. The marketing inspection and monitoring is to supervise and check the behaviors of the units and personnel engaged in the electric power marketing work of the power supply enterprise in the electric power marketing process according to relevant national policies, laws, regulations and management regulations related to the marketing of the power supply enterprise.
In order to perform electric power marketing inspection work and comprehensively improve the level and application efficiency of information construction of an electric power marketing system, multi-dimensional analysis and deep excavation need to be carried out by combining abnormal information of monitoring and inspection; a set of perfect electric power marketing inspection diagnosis model is built, and dead data is changed into useful information supporting marketing decision. Therefore, the management level of the marketing inspection historical data is improved, and powerful decision support is provided for marketing inspection; the data mining is carried out on the incidence relation between the inspection objects in the historical inspection data to obtain reasonable rules, so that a basis is provided for marketing management decision making, marketing risks are comprehensively prevented, the marketing operation capacity, the customer service capacity and the management control capacity are improved, and certain guiding significance is provided for the inspection personnel to carry out inspection work.
Disclosure of Invention
In view of the foregoing, there is a need for a method and system for establishing a diagnostic model that supports power marketing inspection work.
A method for establishing a power marketing service diagnosis model based on inspection data comprises the following steps:
acquiring abnormal data information according to the acquisition condition;
performing physical examination analysis on the abnormal data information to determine an association rule;
establishing an expert sample library according to the association rule;
and establishing a diagnosis model for inspecting the abnormity according to the expert sample library.
According to the method for establishing the inspection data-based electric power marketing service diagnosis model, abnormal data information is acquired according to acquisition conditions, then physical examination analysis is carried out on the abnormal data information to determine association rules, an expert sample library is established according to the association rules, and finally the inspection abnormal diagnosis model is established according to the expert sample library. Therefore, the abnormity type and the abnormity degree of the abnormity in the power marketing data can be diagnosed, and support is provided for power marketing inspection work.
A power marketing service diagnosis model establishing system based on inspection data comprises the following steps:
the abnormal acquisition module is used for acquiring abnormal data information according to the acquisition condition;
the rule determining module is used for performing physical examination analysis on the abnormal data information to determine an association rule;
the sample determining module is used for establishing an expert sample library according to the association rule;
and the model establishing module is used for establishing a diagnosis model for inspecting the abnormity according to the expert sample library.
According to the system for establishing the inspection data-based electric power marketing service diagnosis model, the abnormal data information is firstly acquired by the abnormal acquisition module according to the acquisition condition, then the association rule is determined by the rule determination module through physical examination analysis on the abnormal data information, then the expert sample base is established by the sample determination module according to the association rule, and finally the diagnosis model for inspecting the abnormal data is established by the model establishment module according to the expert sample base. Therefore, the abnormity type and the abnormity degree of the abnormity in the power marketing data can be diagnosed, and support is provided for power marketing inspection work.
Drawings
FIG. 1 is a flow chart of a method for establishing a diagnosis model of electric marketing service based on audit data according to an embodiment;
FIG. 2 is a flow chart of a method for establishing a diagnosis model of electric marketing service based on audit data according to another embodiment;
FIG. 3 is a detailed flow chart of a step of FIG. 1;
FIG. 4 is a detailed flow chart of another step of FIG. 1;
FIG. 5 is a block diagram of a system for building a diagnosis model of electric marketing service based on audit data according to one embodiment;
FIG. 6 is a block diagram of a system for building a diagnosis model of electric marketing service based on audit data according to another embodiment;
FIG. 7 is a block diagram of one of the modules of FIG. 5;
fig. 8 is a unit configuration diagram of another module of fig. 5.
Detailed Description
To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present invention are shown in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "or/and" includes any and all combinations of one or more of the associated listed items.
As shown in fig. 1, a method for establishing a power marketing service diagnosis model based on audit data includes the steps of:
s100: and acquiring abnormal data information according to the acquisition condition.
The collection condition can be a query condition defined by a user, and can also be a preset query condition for a system for realizing the method for establishing the power marketing service diagnosis model based on the inspection data. The abnormal data information is historical abnormal electric marketing data discovered when electric marketing data is inspected in the existing inspection platform.
In one embodiment, before the step of collecting abnormal data information according to a collection condition, the method further includes the steps of: and acquiring the user-defined acquisition conditions.
S200: and performing physical examination analysis on the abnormal data information to determine an association rule.
S300: and establishing an expert sample library according to the association rule.
S400: and establishing a diagnosis model for inspecting the abnormity according to the expert sample library.
Therefore, historical abnormal data information can be fully utilized and changed into useful information supporting marketing decision, and a diagnosis model capable of checking the abnormal data is established. Diagnosing monitored real-time or historical electric marketing data through a diagnosis model, and diagnosing abnormal types and abnormal degrees of the abnormity in the electric marketing data so as to provide support for electric marketing inspection work; and (4) performing focus tracking on diagnosed abnormal services and difficult clients.
According to the method for establishing the inspection data-based electric power marketing service diagnosis model, abnormal data information is acquired according to acquisition conditions, then physical examination analysis is carried out on the abnormal data information to determine association rules, an expert sample library is established according to the association rules, and finally the inspection abnormal diagnosis model is established according to the expert sample library. Therefore, the abnormity type and the abnormity degree of the abnormity in the power marketing data can be diagnosed, and support is provided for power marketing inspection work.
In order to further improve the accuracy of the diagnosis model, in one embodiment, as shown in fig. 2, after step S400, the method further includes the steps of:
s500: and diagnosing real-time abnormal information monitored in real time through the diagnosis model, and determining the type and degree of the abnormal diagnosis.
The diagnosis abnormal type and the diagnosis abnormal degree form a diagnosis result, and the diagnosis abnormal type and the diagnosis abnormal degree obtained through diagnosis of the diagnosis model are the model diagnosis result.
S600: and receiving diagnosis result judgment information of whether the diagnosis abnormity type and the diagnosis abnormity degree are accurate or not.
The diagnosis result judgment information can be generally judged manually and input into a system for realizing the method for establishing the electric power marketing business diagnosis model based on the inspection data. Specifically, the diagnosis result determination information is specifically determined based on whether the manual diagnosis result and the model diagnosis result coincide with each other. In this embodiment, the result of the manual diagnosis is regarded as an accurate judgment.
The diagnosis abnormity type and the diagnosis abnormity degree obtained through manual diagnosis are the results of manual diagnosis.
The accuracy of manual diagnosis can be improved in a multi-person judgment mode, so that the accuracy of the judgment information of the diagnosis result is improved, and the accuracy of the diagnosis model is finally improved.
S700: and updating abnormal data information according to the diagnosis result judgment information, and updating the association rule, the expert sample library and the diagnosis model.
When the diagnosis result judgment information is that the manual judgment result is consistent with the model judgment result, the diagnosis model is judged to be accurate without reestablishing, and the diagnosis model is kept unchanged.
And when the diagnosis result judgment information is that the manual judgment result is inconsistent with the model judgment result, the diagnosis model judgment is not accurate enough and needs to be reestablished, so that the association rule, the expert sample library and the diagnosis model are updated again.
As shown in fig. 3, in one embodiment, step S200 specifically includes:
s210: and determining the support degree and the confidence degree among the abnormal data information item sets according to the abnormal data information.
The abnormal data information with n columns of different attributes is recorded as an n abnormal information item set, namely the n abnormal information item set comprises attribute values of n different attributes in the abnormal data information. The original abnormal data information has an attribute of not less than n columns. The expression form of the n abnormal information item sets is as follows: { A1,A2,…,An-1,AnThat is, the first column attribute value of the abnormal data information item set is A1, the second column attribute value is A2, … …, and the n-1 th item attribute value is An-1The n-th item has an attribute value of AnAnd the support degree of the n abnormal information item sets is as follows:
S u p p o r t ( A 1 , A 2 , ... , A n - 1 ⇒ A n ) = P ( A 1 ∪ A 2 ∪ ... ∪ A n - 1 ∪ A n )
wherein,
n set of exception information items { A1,A2,…,An-1,AnThe confidence of the is:
C o n f i d e n c e ( A 1 , A 2 , ... , A n - 1 ⇒ A n ) = P ( A n | A 1 ∪ A 2 ∪ ... ∪ A n - 1 )
P ( A n | A 1 ∪ A 2 ∪ ... ∪ A n - 1 ) = P ( A 1 ∪ A 2 ∪ ... ∪ A n - 1 ∪ A n ) P ( A 1 ∪ A 2 ∪ ... ∪ A n - 1 ) .
s220: and determining the minimum support degree and the minimum confidence degree according to the support degree and the confidence degree.
And taking the minimum support degree and the minimum confidence degree as a threshold value for measuring all the support degrees and all the confidence degrees, and respectively representing the lowest importance and the lowest reliability of the abnormal data information item set in the statistical sense.
S230: and determining a maximum abnormal information frequent item set according to the minimum support degree.
And finding out all abnormal information frequent item sets through the minimum support degree, namely finding the abnormal information item sets with the support degree being more than or equal to the minimum support degree threshold value as the abnormal information frequent item sets.
In the embodiment, according to the calculated minimum support threshold, the abnormal data information is connected, that is, 1 candidate abnormal information item set C is respectively connected1Eliminating abnormal information item set smaller than the threshold value to obtain 1 abnormal information frequent item set L1(ii) a The next step is represented by L1Self-join produces 2 candidate exception information item set C2Retention of C2Obtaining 2 abnormal information frequent item sets from the abnormal information item sets meeting the constraint conditions, and recording as L2(ii) a The next step is carried out by L2And L1Concatenating produces a set of 3 candidate exception information items C3Retention of C2Obtaining 3 exceptions from exception information item set satisfying constraint conditionSet of frequent items of information, denoted L3The above steps are repeated to obtain the maximum abnormal information frequent item set Lk
In one embodiment, when the abnormal data information is subjected to the connection operation, a pruning operation is also carried out, and a candidate abnormal information item set C is generatedkServes the purpose of reducing the search space. Due to candidate abnormal information item set CkIs a frequent item set L of abnormal informationk-1And L1All sets of non-empty exception information items of the set of exception information frequent items generated by concatenation, which are according to Apriori's property, must also be sets of exception information frequent items, so that sets of items that do not satisfy this property will not exist in the set of candidate exception information items CkIn (1), the process is pruning.
S240: and determining a pending association rule according to the maximum frequent abnormal information item set.
Maximum abnormal information frequent item set LkMeets the minimum support threshold, so that the frequent item set L can pass through the maximum abnormal informationkAnd determining the pending association rule.
S250: and determining the association rule according to the pending association rule and the minimum confidence.
In step S230, the abnormal information item set that does not exceed the minimum support threshold is removed, and a rule that can satisfy the minimum confidence threshold in the pending association rule is determined as an association rule, that is, a rule that satisfies both the minimum support threshold and the minimum confidence threshold.
In one embodiment, the ID3 algorithm is adopted, and the descending speed of the information entropy of each column of abnormal attributes in the expert sample library is used as the standard for constructing the order of the selected nodes of the decision tree model until the generated decision tree model can perfectly classify the training samples. And the abnormal attribute in the expert sample library is an attribute corresponding to an attribute value respectively contained in the abnormal data information item set determined according to the association rule.
Specifically, as shown in fig. 4, step S400 includes:
s410: and acquiring each column of abnormal attributes of the expert sample library, performing abnormal classification according to the abnormal attribute values of each column of the abnormal attributes, performing statistics, and determining the information gain value of each column of the abnormal attributes according to the statistical result.
In one embodiment, the abnormal attribute of the expert sample library is an attribute corresponding to an attribute value included in a maximum abnormal information frequent item set which is determined according to the association rule and meets a minimum support threshold.
The expert sample library comprises a plurality of abnormal data information records, and each abnormal data information record comprises a plurality of columns of abnormal attributes.
Suppose there are t irrelevant exception attribute values A in a list of exception attributes A1,A2,…,AtI.e. t irrelevant anomaly class information A1,A2,…,AtThen their average information amount, i.e. the average information amount of the abnormal attribute a, is:
I ( A 1 , A 2 , ... , A t ) = Σ j = 1 t I ( A j ) = Σ j = 1 t p ( A j ) log 2 1 p ( A j ) ,
wherein, p (A)j) Is that the value of the abnormal attribute A is AjThe probability of occurrence.
And (II) assuming that S is an expert sample library, namely a sample set of all abnormal data information determined according to the association rule, | S | is the number of samples of the abnormal data information sample set. Dividing the abnormal data information sample into m different abnormal information types C according to the abnormal attribute value of each column of abnormal attribute1,C2,…,CmThe sizes of these information categories, i.e. attribute values, are respectively C1,C2,…,CmThe number of abnormal data information records of (1) is marked as | C1|,|C2|,…,|CmI, then the expert sample library S is CjThe probability of a class is:
p ( S j ) = | C j | | S | .
the abnormal attribute A has a plurality of abnormal attribute values, and a sample subset with one abnormal attribute value being v is taken and is marked as Sv. On the branch node after the exception attribute A is selected, the entropy of the sample subset Sv of the node is determined to be E (Sv). In order to obtain the expected entropy value caused by the abnormal attribute A, calculating the weighted sum of the entropies of each sample subset Sv, wherein the weight value is the proportion p (Sv) of the sample subset Sv in the expert sample library S, namely | Sv |/| S |. The average information expected entropy of the anomaly property A is therefore:
e (S, a) ═ Σ p (sv) · E (sv); that is, E (S, a) ∑ Σ (| Sv |/| S |). E (Sv).
Then, the information gain value G (S, a) of the abnormality attribute a for the expert sample library S is:
G(S,A)=E(S)-E(S,A);
e (S) average information amount I (A) equal to abnormal attribute A1,A2,…,At)。
Thus, the information gain value of each abnormality attribute to the expert sample library S is determined.
S430: and determining the node position of each column of abnormal attributes in the decision tree model according to the information gain value.
The larger the information gain value G (S, A), the more information provided by the abnormal attribute A for classification, so that the attribute with the largest information gain value G (S, A) is selected as the root node of the decision tree model, and the information gain value is gradually decreased and classified downwards until the abnormal category is a leaf node, thereby forming the complete decision tree model.
As shown in fig. 5, a system for establishing a power marketing service diagnosis model based on audit data includes:
and an anomaly collection module 100, configured to collect anomaly data information according to a collection condition.
The acquisition condition can be a user-defined query condition, and can also be a query condition preset by a system for establishing a power marketing service diagnosis model based on inspection data. The abnormal data information is historical abnormal electric marketing data discovered when electric marketing data is inspected in the existing inspection platform.
In one embodiment, the system further includes:
and the condition acquisition module (not shown) is used for acquiring the user-defined acquisition conditions.
And a rule determining module 200, configured to perform physical examination analysis on the abnormal data information to determine an association rule.
And the sample determining module 300 is used for establishing an expert sample library according to the association rule.
And the model establishing module 400 is used for establishing a diagnosis model for checking the abnormity according to the expert sample library.
Therefore, historical abnormal data information can be fully utilized and changed into useful information supporting marketing decision, and a diagnosis model capable of checking the abnormal data is established. Diagnosing monitored real-time or historical electric marketing data through a diagnosis model, and diagnosing abnormal types and abnormal degrees of the abnormity in the electric marketing data so as to provide support for electric marketing inspection work; and (4) performing focus tracking on diagnosed abnormal services and difficult clients.
In the above system for establishing a diagnosis model of power marketing service based on audit data, the anomaly acquisition module 100 firstly acquires anomaly data information according to acquisition conditions, then the rule determination module 200 performs physical examination analysis on the anomaly data information to determine association rules, then the sample determination module 300 establishes an expert sample library according to the association rules, and finally the model establishment module 400 establishes a diagnosis model of audit anomaly according to the expert sample library. Therefore, the abnormity type and the abnormity degree of the abnormity in the power marketing data can be diagnosed, and support is provided for power marketing inspection work.
In order to further improve the accuracy of the diagnosis model, in one embodiment, as shown in fig. 6, the system for establishing the electric marketing service diagnosis model based on the audit data may further include:
the model diagnosis module 500 is configured to diagnose real-time abnormal information monitored in real time through the diagnosis model, and determine a diagnosis abnormal type and a diagnosis abnormal degree.
The diagnosis abnormal type and the diagnosis abnormal degree form a diagnosis result, and the diagnosis abnormal type and the diagnosis abnormal degree obtained through model diagnosis are the model diagnosis result.
A result receiving module 600, configured to receive diagnosis result determination information indicating whether the diagnosis abnormality type and the diagnosis abnormality degree are accurate.
The diagnosis result judgment information can be generally judged manually and input into the establishment system of the electric power marketing service diagnosis model based on the inspection data. Specifically, the diagnosis result determination information is specifically determined based on whether the manual diagnosis result and the model diagnosis result coincide with each other. In this embodiment, the result of the manual diagnosis is regarded as an accurate judgment.
The diagnosis abnormity type and the diagnosis abnormity degree obtained through manual diagnosis are the results of manual diagnosis.
The accuracy of manual diagnosis can be improved in a multi-person judgment mode, so that the accuracy of the judgment information of the diagnosis result is improved, and the accuracy of the diagnosis model is finally improved.
And the model updating module 700 is used for updating the abnormal data information according to the diagnosis result judgment information and updating the association rule, the expert sample library and the diagnosis model.
When the diagnosis result judgment information is that the manual judgment result is consistent with the model judgment result, the diagnosis model is judged to be accurate without reestablishing, and the diagnosis model is kept unchanged.
And when the diagnosis result judgment information is that the manual judgment result is inconsistent with the model judgment result, the diagnosis model judgment is not accurate enough and needs to be reestablished, so that the association rule, the expert sample library and the diagnosis model are updated again.
As shown in fig. 7, in one embodiment, the rule determining module 200 specifically includes:
the feature determining unit 210 is configured to determine, according to the abnormal data information, a support degree and a confidence degree between abnormal data information item sets.
The abnormal data information with n columns of attributes is recorded as an n abnormal information item set, namely the n abnormal information item set comprises attribute values of n different attributes in the abnormal data information. The original abnormal data information has an attribute of not less than n columns. The expression form of the n abnormal information item sets is as follows: { A1,A2,…,An-1,AnThat is, the first column attribute value of the abnormal data information item set is A1, the second column attribute value is A2, … …, and the n-1 th item attribute value is An-1The n-th item has an attribute value of AnAnd the support degree of the n abnormal information item sets is as follows:
S u p p o r t ( A 1 , A 2 , ... , A n - 1 ⇒ A n ) = P ( A 1 ∪ A 2 ∪ ... ∪ A n - 1 ∪ A n )
wherein,
n set of exception information items { A1,A2,…,An-1,AnThe confidence of the is:
C o n f i d e n c e ( A 1 , A 2 , ... , A n - 1 ⇒ A n ) = P ( A n | A 1 ∪ A 2 ∪ ... ∪ A n - 1 )
P ( A n | A 1 ∪ A 2 ∪ ... ∪ A n - 1 ) = P ( A 1 ∪ A 2 ∪ ... ∪ A n - 1 ∪ A n ) P ( A 1 ∪ A 2 ∪ ... ∪ A n - 1 ) .
a minimum feature determining unit 220, configured to determine a minimum support degree and a minimum confidence degree according to the support degree and the confidence degree.
And taking the minimum support degree and the minimum confidence degree as a threshold value for measuring all the support degrees and all the confidence degrees, and respectively representing the lowest importance and the lowest reliability of the abnormal data information item set in the statistical sense.
A frequent item set determining unit 230, configured to determine the maximum abnormal information frequent item set according to the minimum support degree.
And finding out all abnormal information frequent item sets through the minimum support degree, namely finding the abnormal information item sets with the support degree being more than or equal to the minimum support degree threshold value as the abnormal information frequent item sets.
In the embodiment, according to the calculated minimum support threshold, the abnormal data information is connected, that is, 1 candidate abnormal information item set C is respectively connected1Eliminating abnormal information item set smaller than the threshold value to obtain 1 abnormal information frequent item set L1(ii) a The next step is represented by L1Self-join produces 2 candidate exception information item set C2Retention of C2Obtaining 2 abnormal information frequent item sets from the abnormal information item sets meeting the constraint conditions, and recording as L2(ii) a The next step is carried out by L2And L1Concatenating produces a set of 3 candidate exception information items C3Retention of C2Obtaining 3 abnormal information frequent item sets from the abnormal information item sets meeting the constraint conditions, and recording as L3The above steps are repeated to obtain the maximum abnormal information frequent item set Lk
In one embodiment, when the abnormal data information is subjected to the connection operation, a pruning operation is also carried out, and a candidate abnormal information item set C is generatedkServes the purpose of reducing the search space. Due to candidate abnormal information item set CkIs a frequent item set L of abnormal informationk-1And L1All sets of non-empty exception information items of the set of exception information frequent items generated by concatenation, which are according to Apriori's property, must also be sets of exception information frequent items, so that sets of items that do not satisfy this property will not exist in the set of candidate exception information items CkIn (1), the process is pruning.
A pending rule determining unit 240, configured to determine a pending association rule according to the maximum frequent abnormal information item set.
Maximum abnormal information frequent item set LkA minimum support threshold is met and thus mayFrequent itemsets L through maximum anomaly informationkAnd determining the pending association rule.
An association rule determining unit 250, configured to determine the association rule according to the association rule to be determined and the minimum confidence.
In the frequent item set determination unit 230, an abnormal information item set that does not exceed the minimum support threshold has been culled. The association rule determining unit 250 determines a rule that can satisfy the minimum confidence threshold among the pending association rules as an association rule. I.e., association rules, are rules that satisfy both a minimum support threshold and a minimum confidence threshold.
In one embodiment, the ID3 algorithm is adopted, and the descending speed of the information entropy of each column of abnormal attributes in the expert sample library is used as the standard for constructing the order of the selected nodes of the decision tree model until the generated decision tree model can perfectly classify the training samples. And the abnormal attribute in the expert sample library is an attribute corresponding to an attribute value respectively contained in the abnormal data information item set determined according to the association rule.
Specifically, as shown in fig. 8, the model building module 400 includes:
a gain determining unit 410, configured to obtain each column of the abnormal attributes of the expert sample library, perform abnormal classification according to the abnormal attribute values of each column of the abnormal attributes, perform statistics, and determine the information gain value of each column of the abnormal attributes according to the statistical result.
In one embodiment, the abnormal attribute of the expert sample library is an attribute corresponding to an attribute value included in a maximum abnormal information frequent item set which is determined according to the association rule and meets a minimum support threshold.
The expert sample library comprises a plurality of abnormal data information records, and each abnormal data information record comprises a plurality of columns of abnormal attributes.
Suppose there are t irrelevant exception attribute values A in a list of exception attributes A1,A2,…,AtI.e. t irrelevant anomaly class information A1,A2,…,AtThen their average information amount, i.e. the average information amount of the abnormal attribute a, is:
I ( A 1 , A 2 , ... , A t ) = Σ j = 1 t I ( A j ) = Σ j = 1 t p ( A j ) log 2 1 p ( A j ) ,
wherein, p (A)j) Is that the value of the abnormal attribute A is AjThe probability of occurrence.
And (II) assuming that S is an expert sample library, namely a sample set of all abnormal data information determined according to the association rule, | S | is the number of samples of the abnormal data information sample set. Dividing the abnormal data information sample into m different abnormal data according to the abnormal attribute value of each column of abnormal attributeFrequent information category C1,C2,…,CmThe sizes of these information categories, i.e. attribute values, are respectively C1,C2,…,CmThe number of abnormal data information records of (1) is marked as | C1|,|C2|,…,|CmI, then the expert sample library S is CjThe probability of a class is:
p ( S j ) = | C j | | S | .
the abnormal attribute A has a plurality of abnormal attribute values, and a sample subset with one abnormal number attribute value being v is taken and is marked as Sv. On the branch node after the exception attribute A is selected, the entropy of the sample subset Sv of the node is determined to be E (Sv). In order to obtain the expected entropy value caused by the abnormal attribute A, calculating the weighted sum of the entropies of each sample subset Sv, wherein the weight value is the proportion p (Sv) of the sample subset Sv in the expert sample library S, namely | Sv |/| S |. The average information expected entropy of the anomaly property A is therefore:
e (S, a) ═ Σ p (sv) · E (sv); that is, E (S, a) ∑ Σ (| Sv |/| S |). E (Sv).
Then, the information gain value G (S, a) of the abnormality attribute a for the expert sample library S is:
G(S,A)=E(S)-E(S,A);
e (S) average information amount I (A) equal to abnormal attribute A1,A2,…,At)。
Thus, the information gain value of each abnormality attribute to the expert sample library S is determined.
A node determining unit 430, configured to determine a node position of each column of abnormal attributes in the decision tree model according to the information gain value.
The larger the information gain value G (S, A), the more information provided by the abnormal attribute A for classification, so that the attribute with the largest information gain value G (S, A) is selected as the root node of the decision tree model, and the information gain value is gradually decreased and classified downwards until the abnormal category is a leaf node, thereby forming the complete decision tree model.
The following description will take the classification and identification of power consumption abnormality of large users in the white cloud area of Guangzhou as an example.
Data information related to abnormal conditions is collected from custom queries as shown in table 1:
table 1 original abnormal data information list
Because the sampling professional and the sampling service are respectively determined to be the reading-checking-receiving service and the reading-checking-in (checking-receiving) service in the same month, the two items are not considered when the association degree calculation is carried out on each abnormal data information item set, and the user number is basically consistent with the total account number, so that only the user number is calculated. The support and confidence of the abnormal data information item set can be obtained as the following tables 2 and 3:
TABLE 2 degrees of support for various sets of anomaly data information items
TABLE 3 confidence level of various sets of anomalous data information items
Since there is no research significance, firstly, the abnormal data information items with the support degree confidence degree of 0 are removed, then the minimum support degree and the minimum confidence degree are determined to be 0.00008 and 0.00013 respectively from the rest abnormal information items, and then the expert sample library partial data is constructed according to the association rules after the maximum abnormal information frequent item set with the association rules generated by the minimum support degree and the minimum confidence degree is generated, as shown in table 4:
TABLE 4 expert sample library
To make the example calculation simple and understandable, only 3705990 and 3705979 in the electricity class, sample traffic, and anomaly class are taken as examples to calculate the average information amount of the anomaly class in the anomaly attributes of the decision tree model:
the numbers of samples with different abnormal attributes are counted respectively as the following table 5:
TABLE 5 abnormal attribute sample number statistical table
There are two types of results for the exception categories in the final exception attribute: 3705990 and 3705979, the statistics of the number of samples are A1,A2And then:
A1=641,A2=383;A=A1+A2=1024
the probability of belonging to each class respectively is calculated as:
P 1 = 641 1024 = 0.626 ; P 2 = 383 1024 = 0.374
the average information amount is:
I(A1,A2)=I(641,383)=-P1·log2P1-P2·log2P2=0.9537
in the sample service industry, the abnormal classes are 3705990 and 3705979 are A respectively1=256,A2The probability of belonging to each class is 0:
P 1 = 256 256 = 1 ; P 2 = 0 256 = 0
the average information amount is:
I(S1,S2)=I(256,0)=-P1·log2P1-P2·log2P2=0
in the sample service residence, the exception categories are 3705990 and 3705979 are A1=257,A2127, the probability of belonging to each class is:
P 1 = 257 384 ; P 2 = 127 384
the average information amount is:
I(S1,S2)=I(257,127)=-P1·log2P1-P2·log2P2=0.9157
in sample business, anomaly categories are 3705990 and 3705979 are A1=128,A2The probability of belonging to each class is 256:
P 1 = 128 384 ; P 2 = 256 384
the average information amount is:
I(S1,S2)=I(128,256)=-P1·log2P1-P2·log2P2=0.9183
the proportion of each group in the sampling service is respectively as follows:
large-scale industry: 256/1024 ═ 0.25;
the method comprises the following steps of (1) housing: 384/1024 ═ 0.375;
commercial: 384/1024-0.375.
Then the average information for the electricity usage categories is expected to be:
e (power utilization class) 0.375 × 0.9183+0.25 × 0+0.375 × 0.9157 0.6877
The information gain values for the electricity usage categories are therefore:
g (power utilization class) 0.9537-0.6877 0.266
The information gain value of each abnormal attribute is obtained through calculation, wherein the gain value of the electricity utilization type is the largest, so the electricity utilization type is selected as a root node, the internal node is the sampling service, and the final leaf node is the abnormal type.
1000 groups of sample data are extracted as verification data, and the results of diagnosing the abnormal category by the decision tree model are as follows:
automatic diagnosis of the model of Table 6
Among the 1000 groups of data, 782 groups of abnormal categories were accurately predicted, i.e., the prediction accuracy of the diagnostic model reached 78.2%. The method has higher accuracy and practicability, identifies services and difficult clients which are easy to be abnormal according to the diagnosis model, and performs key tracking, thereby finding and improving the abnormality in time, saving manpower, material resources and financial resources, improving the working efficiency and providing solid technical support for marketing inspection work.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for establishing a power marketing service diagnosis model based on inspection data is characterized by comprising the following steps:
acquiring abnormal data information according to the acquisition condition;
performing physical examination analysis on the abnormal data information to determine an association rule;
establishing an expert sample library according to the association rule;
and establishing a diagnosis model for inspecting the abnormity according to the expert sample library.
2. The method for establishing a diagnosis model of electricity marketing service based on audit data as claimed in claim 1, wherein after the step of establishing a diagnosis model of audit abnormality based on the expert sample database, further comprising the steps of:
diagnosing real-time abnormal information monitored in real time through the diagnosis model, and determining the type and degree of the abnormal diagnosis;
receiving diagnosis result judgment information of whether the diagnosis abnormity type and the diagnosis abnormity degree are accurate or not;
and updating abnormal data information according to the diagnosis result judgment information, and updating the association rule, the expert sample library and the diagnosis model.
3. The method for establishing a diagnosis model of electric marketing service based on audit data as claimed in claim 1, wherein the step of performing physical examination analysis on the abnormal data information to determine the association rule specifically includes:
determining support degree and confidence degree among abnormal data information item sets according to the abnormal data information;
determining a minimum support degree and a minimum confidence degree according to the support degree and the confidence degree;
determining a maximum abnormal information frequent item set according to the minimum support degree;
determining a to-be-determined association rule according to the maximum abnormal information frequent item set;
and determining the association rule according to the pending association rule and the minimum confidence.
4. The method for establishing a diagnosis model of electric marketing service based on audit data as claimed in claim 1,
the diagnostic model is a decision tree model, and the step of establishing the diagnostic model for inspecting the abnormality according to the expert sample library specifically comprises the following steps:
acquiring each column of abnormal attributes of the expert sample library, performing abnormal classification according to the abnormal attribute values of each column of abnormal attributes, performing statistics, and determining the information gain value of each column of abnormal attributes according to the statistical result;
and determining the node position of each column of abnormal attributes in the decision tree model according to the information gain value.
5. The method for establishing the inspection data-based electric marketing service diagnosis model according to claim 1, wherein before the step of collecting abnormal data information according to collection conditions, the method further comprises the steps of: and acquiring the user-defined acquisition conditions.
6. A power marketing service diagnosis model establishing system based on inspection data is characterized by comprising the following steps:
the abnormal acquisition module is used for acquiring abnormal data information according to the acquisition condition;
the rule determining module is used for performing physical examination analysis on the abnormal data information to determine an association rule;
the sample determining module is used for establishing an expert sample library according to the association rule;
and the model establishing module is used for establishing a diagnosis model for inspecting the abnormity according to the expert sample library.
7. The inspection data-based power marketing service diagnosis model building system according to claim 6, further comprising:
the model diagnosis module is used for diagnosing real-time abnormal information monitored in real time through the diagnosis model and determining the type and degree of the abnormal diagnosis;
the result receiving module is used for receiving diagnosis result judgment information of whether the diagnosis abnormity type and the diagnosis abnormity degree are accurate or not;
and the model updating module is used for updating abnormal data information according to the diagnosis result judgment information and updating the association rule, the expert sample library and the diagnosis model.
8. The system for establishing a diagnosis model of electricity marketing service based on audit data as claimed in claim 6, wherein the rule determining module specifically comprises:
the characteristic determining unit is used for determining the support degree and the confidence degree among abnormal data according to the abnormal data information;
a minimum feature determination unit, configured to determine a minimum support degree and a minimum confidence degree according to the support degree and the confidence degree;
a frequent item set determining unit, configured to determine a maximum abnormal information frequent item set according to the minimum support degree;
the undetermined rule determining unit is used for determining the undetermined association rule according to the maximum abnormal information frequent item set;
and the association rule determining unit is used for determining the association rule according to the pending association rule and the minimum confidence coefficient.
9. The inspection data-based power marketing service diagnosis model establishment system of claim 6, wherein the diagnosis model is a decision tree model, and the model establishment module specifically comprises:
the gain determining unit is used for acquiring each row of abnormal attributes of the expert sample library, performing abnormal classification according to the abnormal attribute values of each row of abnormal attributes, performing statistics, and determining the information gain value of each row of abnormal attributes according to the statistical result;
and the node determining unit is used for determining the node position of each column of abnormal attribute in the decision tree model according to the information gain value.
10. The system for establishing a diagnosis model of electric marketing service based on audit data as claimed in claim 6, wherein the system for establishing a diagnosis model of electric marketing service based on audit data further comprises:
and the condition acquisition module is used for acquiring the user-defined acquisition conditions.
CN201510817672.6A 2015-11-20 2015-11-20 Inspection data-based power marketing service diagnosis model establishing method and system Pending CN105373894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510817672.6A CN105373894A (en) 2015-11-20 2015-11-20 Inspection data-based power marketing service diagnosis model establishing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510817672.6A CN105373894A (en) 2015-11-20 2015-11-20 Inspection data-based power marketing service diagnosis model establishing method and system

Publications (1)

Publication Number Publication Date
CN105373894A true CN105373894A (en) 2016-03-02

Family

ID=55376073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510817672.6A Pending CN105373894A (en) 2015-11-20 2015-11-20 Inspection data-based power marketing service diagnosis model establishing method and system

Country Status (1)

Country Link
CN (1) CN105373894A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106802916A (en) * 2016-12-09 2017-06-06 国网北京市电力公司 Method of calibration, the apparatus and system of the transaction record data of electric automobile
CN108268988A (en) * 2016-12-30 2018-07-10 航天信息股份有限公司 A kind of grain purchases business management method and system
CN109189827A (en) * 2018-08-16 2019-01-11 阿里巴巴集团控股有限公司 Time Series Processing method and apparatus, electronic equipment
CN109886016A (en) * 2018-12-27 2019-06-14 慧安金科(北京)科技有限公司 For detecting the method, equipment and computer readable storage medium of abnormal data
CN110413658A (en) * 2019-07-23 2019-11-05 中经柏诚科技(北京)有限责任公司 A kind of chain of evidence construction method based on the fact the correlation rule
CN110737685A (en) * 2019-10-24 2020-01-31 南方电网科学研究院有限责任公司 Data anomaly judgment method for electric power big data
CN110929036A (en) * 2019-11-29 2020-03-27 南方电网数字电网研究院有限公司 Electric power marketing inspection management method and device, computer equipment and storage medium
CN111178672A (en) * 2019-12-02 2020-05-19 广东电网有限责任公司 Intelligent inspection method based on balance
CN112183990A (en) * 2020-09-22 2021-01-05 国网冀北电力有限公司计量中心 Self-adaptive inspection monitoring management platform and method based on big data machine learning
CN113191688A (en) * 2021-05-26 2021-07-30 重庆高新技术产业研究院有限责任公司 Commercial data diagnosis and analysis method based on Internet of things and big data
CN113420069A (en) * 2021-06-24 2021-09-21 平安科技(深圳)有限公司 Association rule mining method, system, terminal and storage medium based on abnormal samples
CN113591813A (en) * 2021-09-29 2021-11-02 国网江苏省电力有限公司营销服务中心 Association rule algorithm-based abnormity studying and judging method, model construction method and device
CN113628024A (en) * 2021-08-25 2021-11-09 国网河北省电力有限公司沧州供电分公司 Financial data intelligent auditing system and method based on big data platform system
CN115759236A (en) * 2022-12-30 2023-03-07 北京德风新征程科技有限公司 Model training method, information sending method, device, equipment and medium
CN115840922A (en) * 2022-09-15 2023-03-24 杭州齐智科技有限公司 Charging abnormal behavior analysis method based on deep learning algorithm
CN116361059A (en) * 2023-05-19 2023-06-30 湖南三湘银行股份有限公司 Diagnosis method and diagnosis system for abnormal root cause of banking business

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106802916A (en) * 2016-12-09 2017-06-06 国网北京市电力公司 Method of calibration, the apparatus and system of the transaction record data of electric automobile
CN106802916B (en) * 2016-12-09 2020-02-07 国网北京市电力公司 Method, device and system for verifying transaction record data of electric vehicle
CN108268988A (en) * 2016-12-30 2018-07-10 航天信息股份有限公司 A kind of grain purchases business management method and system
CN108268988B (en) * 2016-12-30 2022-06-10 航天信息股份有限公司 Grain purchasing business management method and system
CN109189827A (en) * 2018-08-16 2019-01-11 阿里巴巴集团控股有限公司 Time Series Processing method and apparatus, electronic equipment
CN109189827B (en) * 2018-08-16 2022-04-15 创新先进技术有限公司 Time sequence processing method and device and electronic equipment
CN109886016A (en) * 2018-12-27 2019-06-14 慧安金科(北京)科技有限公司 For detecting the method, equipment and computer readable storage medium of abnormal data
CN110413658A (en) * 2019-07-23 2019-11-05 中经柏诚科技(北京)有限责任公司 A kind of chain of evidence construction method based on the fact the correlation rule
CN110737685A (en) * 2019-10-24 2020-01-31 南方电网科学研究院有限责任公司 Data anomaly judgment method for electric power big data
CN110929036A (en) * 2019-11-29 2020-03-27 南方电网数字电网研究院有限公司 Electric power marketing inspection management method and device, computer equipment and storage medium
CN110929036B (en) * 2019-11-29 2023-05-05 南方电网数字电网研究院有限公司 Electric power marketing inspection management method, electric power marketing inspection management device, computer equipment and storage medium
CN111178672A (en) * 2019-12-02 2020-05-19 广东电网有限责任公司 Intelligent inspection method based on balance
CN111178672B (en) * 2019-12-02 2022-07-19 广东电网有限责任公司 Intelligent inspection method based on balance
CN112183990A (en) * 2020-09-22 2021-01-05 国网冀北电力有限公司计量中心 Self-adaptive inspection monitoring management platform and method based on big data machine learning
CN113191688A (en) * 2021-05-26 2021-07-30 重庆高新技术产业研究院有限责任公司 Commercial data diagnosis and analysis method based on Internet of things and big data
CN113420069A (en) * 2021-06-24 2021-09-21 平安科技(深圳)有限公司 Association rule mining method, system, terminal and storage medium based on abnormal samples
CN113420069B (en) * 2021-06-24 2023-08-11 平安科技(深圳)有限公司 Association rule mining method, system, terminal and storage medium based on abnormal samples
CN113628024A (en) * 2021-08-25 2021-11-09 国网河北省电力有限公司沧州供电分公司 Financial data intelligent auditing system and method based on big data platform system
CN113591813A (en) * 2021-09-29 2021-11-02 国网江苏省电力有限公司营销服务中心 Association rule algorithm-based abnormity studying and judging method, model construction method and device
CN113591813B (en) * 2021-09-29 2022-02-08 国网江苏省电力有限公司营销服务中心 Association rule algorithm-based abnormity studying and judging method, model construction method and device
CN115840922A (en) * 2022-09-15 2023-03-24 杭州齐智科技有限公司 Charging abnormal behavior analysis method based on deep learning algorithm
CN115840922B (en) * 2022-09-15 2023-08-18 杭州齐智能源科技股份有限公司 Deep learning algorithm-based charging abnormal behavior analysis method
CN115759236A (en) * 2022-12-30 2023-03-07 北京德风新征程科技有限公司 Model training method, information sending method, device, equipment and medium
CN115759236B (en) * 2022-12-30 2024-01-12 北京德风新征程科技股份有限公司 Model training method, information sending method, device, equipment and medium
CN116361059A (en) * 2023-05-19 2023-06-30 湖南三湘银行股份有限公司 Diagnosis method and diagnosis system for abnormal root cause of banking business
CN116361059B (en) * 2023-05-19 2023-08-08 湖南三湘银行股份有限公司 Diagnosis method and diagnosis system for abnormal root cause of banking business

Similar Documents

Publication Publication Date Title
CN105373894A (en) Inspection data-based power marketing service diagnosis model establishing method and system
CN106951984B (en) Dynamic analysis and prediction method and device for system health degree
US10031829B2 (en) Method and system for it resources performance analysis
CN106874693A (en) A kind of medical big data analysis process system and method
US20200241518A1 (en) Systems and methods for determining relationships between defects
CN106951360B (en) Data statistical integrity calculation method and system
CN117828539B (en) Intelligent data fusion analysis system and method
CN109472075A (en) A kind of base station performance analysis method and system
CN116485020B (en) Supply chain risk identification early warning method, system and medium based on big data
CN112330095A (en) Quality management method based on decision tree algorithm
CN104239722A (en) Forecasting method based on recognition of correlational relationship between factors
CN111242170B (en) Food inspection and detection project prediction method and device
CN114429245A (en) Analysis display method of engineering cost data
CN115719283A (en) Intelligent accounting management system
CN111723136A (en) Single-dimensional clustering analysis method for classified and graded treatment of grid events
CN114548493A (en) Method and system for predicting current overload of electric energy meter
CN113824580B (en) Network index early warning method and system
CN113283512A (en) Data anomaly detection method, device, equipment and storage medium
CN110990384B (en) Big data platform BI analysis method
CN113393169B (en) Financial industry transaction system performance index analysis method based on big data technology
CN104809253A (en) Internet data analysis system
TW201913255A (en) Method for detecting and diagnosing an abnormal process
CN110956340A (en) Engineering test detection data management early warning decision method
CN117556256B (en) Private domain service label screening system and method based on big data
Fransson et al. Finding patterns in vehicle diagnostic trouble codes: A data mining study applying associative classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160302