CN111177130A - Relay protection data integrity checking method and system based on correlation algorithm - Google Patents

Relay protection data integrity checking method and system based on correlation algorithm Download PDF

Info

Publication number
CN111177130A
CN111177130A CN201911309372.1A CN201911309372A CN111177130A CN 111177130 A CN111177130 A CN 111177130A CN 201911309372 A CN201911309372 A CN 201911309372A CN 111177130 A CN111177130 A CN 111177130A
Authority
CN
China
Prior art keywords
item
association rule
determining
incomplete
frequent item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911309372.1A
Other languages
Chinese (zh)
Inventor
郭鹏
王文焕
杨国生
詹荣荣
张烈
康逸群
闫周天
李妍霏
张瀚方
王丽敏
姜宏丽
申华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Original Assignee
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI filed Critical State Grid Corp of China SGCC
Priority to CN201911309372.1A priority Critical patent/CN111177130A/en
Publication of CN111177130A publication Critical patent/CN111177130A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a relay protection data integrity checking method and a system based on a correlation algorithm, wherein the method comprises the following steps: determining a project set according to the acquired historical records, and constructing a transaction set; respectively mining frequent item sets by utilizing the item sets and the transaction sets based on different attribute information; determining an association rule according to a plurality of frequent item sets in the frequent item set, and establishing an association rule base; acquiring current relay protection data, and determining a strategy to determine an incomplete record according to a preset incomplete record; searching the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine an association rule matched with the incomplete record, and determining the actual value of the uncertain attribute of the incomplete record by using the association rule matched with the incomplete record. The invention uses the inferred value to replace the preset value, so that the verified data is more in line with the incidence relation of the big data, and data support can be provided for the research based on the relay protection big data.

Description

Relay protection data integrity checking method and system based on correlation algorithm
Technical Field
The invention relates to the technical field of relay protection data processing, in particular to a relay protection data integrity checking method and system based on an association algorithm.
Background
For relay protection big data, ensuring the integrity of the data is an important target of data cleaning, and therefore, firstly checking the integrity of the data and then predicting the missing attribute value of incomplete data by adopting a certain method.
The general method for checking the integrity of data includes: (1) checking the integrity of the data by adopting a certain coding rule, wherein the checking comprises parity check, checksum, CRC check and the like; (2) and using the integrity data set, comparing the data to be verified with the items in the integrity data set, and judging the integrity of the data. For example, the integrity check data set of the relay protection action information includes contents such as equipment, a substation, a data set to which an information point belongs, an information name, standard semantics, an information value, and a time. For the vacancy attribute values in incomplete data, the most probable values are often used for filling, such as regression prediction, interpolation estimation, etc.
The data integrity checking method and the method for predicting the missing attribute value in the incomplete data are only suitable for specific occasions and have no universality.
Disclosure of Invention
The invention provides a relay protection data integrity checking method and system based on a correlation algorithm, and aims to solve the problem of checking the integrity of relay protection data.
In order to solve the above problem, according to an aspect of the present invention, there is provided a method for checking integrity of relay protection data based on a correlation algorithm, the method including:
determining an item set according to the attribute value sets of different attributes in the acquired historical records, and constructing a transaction set by using the acquired historical records;
respectively mining a frequent item set by utilizing the item set and the transaction set based on different attribute information;
determining an association rule according to a plurality of frequent item sets in the frequent item set, and establishing an association rule base;
acquiring current relay protection data, and determining a strategy to determine an incomplete record according to a preset incomplete record;
searching the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine an association rule matched with the incomplete record, and determining the actual value of the uncertain attribute of the incomplete record by using the association rule matched with the incomplete record.
Preferably, the mining of frequent item sets using the item sets and the transaction sets based on different attribute information includes:
step 21, comparing the support degree of each item in the item set with a preset support degree threshold, and screening items with the support degree greater than or equal to the preset support degree threshold for reservation to obtain a1 item frequent item set;
step 22, setting k to 2;
step 23, in the (k-1) item frequent item set, determining a union of any two item sets with different last elements, and judging whether all subsets of each union are in the (k-1) item frequent item set;
step 24, if all the subsets of a certain union set are in the (k-1) item frequent item sets, calculating the support degree of the union set, and screening the item sets with the support degree being greater than or equal to a preset support degree threshold value for reservation to obtain k item frequent item sets;
step 25, judging whether the item number of the (k-1) item frequent item set is more than or equal to 2; if yes, updating k to k +1, and returning to step 23; otherwise, the process is finished.
Preferably, the determining an association rule according to a plurality of frequent item sets in the frequent item set and establishing an association rule base includes:
for any one of the multiple frequent item sets, determining multiple corresponding antecedents and consequent items according to elements in the multiple frequent item sets to respectively determine multiple initial association rules;
screening the initial association rules with the confidence degrees larger than or equal to a preset confidence degree threshold value in the plurality of initial association rules as strong association rules, and establishing an association rule base by using the strong association rules.
Preferably, the screening of the initial association rules with the confidence level greater than or equal to a preset confidence level threshold in the plurality of initial association rules includes:
step 31, selecting one item in a plurality of frequent item sets;
step 32, setting g to 2;
step 33, screening 1-postpiece initial association rules from the multiple frequent item sets, comparing the confidence of each initial association rule with a confidence threshold, and determining the initial association rules with the confidence greater than or equal to the confidence threshold as strong association rules;
step 34, forming the posterites of the (g-1) -posterite strong association rule in the multiple frequent item sets into a (g-1) -posterite set, taking a union set of 2 posterites with only 1 different elements in the (g-1) -posterite set, and judging whether all items contained in the union set are in the (g-1) -posterite set;
step 35, if all items contained in a certain union set are in the (g-1) -back-piece set, taking the union set as a back-piece to form an association rule of the multiple items of frequent item sets, judging whether the confidence of the association rule is greater than or equal to a confidence threshold, and if so, determining the association rule as a strong association rule;
step 36, judging whether the current g is smaller than the difference value between the number of items in the frequent item set and 1; if yes, updating g to g +1, and returning to step 34; otherwise, the process is finished.
Preferably, the determining an incomplete record according to a preset incomplete record determining policy includes:
and if the attribute value of certain attribute of a certain record in the obtained current relay protection data is a preset filling value or a null value, determining that the record is an incomplete record.
Preferably, the searching the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine the association rule matching the incomplete record, and determining the actual value of the uncertain attribute of the incomplete record by using the association rule matching the incomplete record comprises:
matching the attribute values of the determined attributes of the incomplete records with the previous items of each association rule in the association rule base to determine matching association rules;
and taking the attribute value corresponding to the latter item of the matching association rule as the actual value of the uncertain attribute of the incomplete record, and filling.
According to another aspect of the present invention, there is provided a system for checking integrity of relay protection data based on a correlation algorithm, the system including:
the transaction set construction unit is used for determining an item set according to the attribute value sets of different attributes in the acquired historical records and constructing a transaction set by using the acquired historical records;
the frequent item set determining unit is used for respectively mining frequent item sets by utilizing the item set and the transaction set based on different attribute information;
the association rule base establishing unit is used for determining association rules according to a plurality of frequent item sets in the frequent item set and establishing an association rule base;
the incomplete record determining unit is used for acquiring current relay protection data and determining an incomplete record according to a preset incomplete record determining strategy;
and the data checking unit is used for searching the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine an association rule matched with the incomplete record, and determining the actual value of the uncertain attribute of the incomplete record by using the association rule matched with the incomplete record.
Preferably, the mining of the frequent item sets by the frequent item set constructing unit based on different attribute information includes:
step 21, comparing the support degree of each item in the item set with a preset support degree threshold, and screening items with the support degree greater than or equal to the preset support degree threshold for reservation to obtain a1 item frequent item set;
step 22, setting k to 2;
step 23, in the (k-1) item frequent item set, determining a union of any two item sets with different last elements, and judging whether all subsets of each union are in the (k-1) item frequent item set;
step 24, if all the subsets of a certain union set are in the (k-1) item frequent item sets, calculating the support degree of the union set, and screening the item sets with the support degree being greater than or equal to a preset support degree threshold value for reservation to obtain k item frequent item sets;
step 25, judging whether the item number of the (k-1) item frequent item set is more than or equal to 2; if yes, updating k to k +1, and returning to step 23; otherwise, the process is finished.
Preferably, the association rule base establishing unit determines the association rule according to a plurality of frequent item sets in the frequent item set, and establishes the association rule base, including:
the initial association rule determining module is used for determining a plurality of corresponding antecedents and postcedents according to elements in any one of the multiple frequent item sets so as to respectively determine a plurality of initial association rules;
and the association rule base establishing module is used for keeping the initial association rules with the confidence degrees larger than or equal to a preset confidence degree threshold value in the plurality of initial association rules as strong association rules and establishing an association rule base by using the strong association rules.
Preferably, the screening, by the association rule base establishing module, of the initial association rules with the confidence level greater than or equal to a preset confidence level threshold includes:
step 31, selecting a multiple item frequent item set;
step 32, setting g to 2;
step 33, screening 1-postpiece initial association rules from the multiple frequent item sets, comparing the confidence of each initial association rule with a confidence threshold, and determining the initial association rules with the confidence greater than or equal to the confidence threshold as strong association rules;
step 34, forming the posterites of the (g-1) -posterite strong association rule in the multiple frequent item sets into a (g-1) -posterite set, taking a union set of 2 posterites with only 1 different elements in the (g-1) -posterite set, and judging whether all items contained in the union set are in the (g-1) -posterite set;
step 35, if all items contained in a certain union set are in the (g-1) -back-piece set, taking the union set as a back-piece to form an association rule of the multiple items of frequent item sets, judging whether the confidence of the association rule is greater than or equal to a confidence threshold, and if so, determining the association rule as a strong association rule;
step 36, judging whether the current g is smaller than the difference value between the number of items in the frequent item set and 1; if yes, updating g to g +1, and returning to step 34; otherwise, the process is finished.
Preferably, the incomplete record determining unit determines an incomplete record according to a preset incomplete record determining strategy, and includes:
and if the attribute value of certain attribute of a certain record in the obtained current relay protection data is a preset filling value or a null value, determining that the record is an incomplete record.
Preferably, the data checking unit searches the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine an association rule matching the incomplete record, and determines the actual value of the uncertain attribute of the incomplete record by using the association rule matching the incomplete record, including:
matching the attribute values of the determined attributes of the incomplete records with the previous items of each association rule in the association rule base to determine matching association rules;
and taking the attribute value corresponding to the latter item of the matching association rule as the actual value of the uncertain attribute of the incomplete record, and filling.
The invention provides a method and a system for checking the integrity of relay protection data based on an association algorithm, wherein the integrity of the relay protection data is checked by using an association analysis method, the association relation of relay protection big data is firstly mined, then a suspected incomplete record is selected, the actual value of the uncertain attribute adopting the preset value is deduced according to the confirmed attribute and the big data association rule in the record, and the deduced value is used for replacing the preset value, so that the checked data is more in line with the association relation of the big data, a certain promotion effect is provided for the management and the application of the relay protection big data, and meanwhile, data support can be provided for the research based on the relay protection big data.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
fig. 1 is a flowchart of an integrity checking method 100 for relay protection data based on a correlation algorithm according to an embodiment of the present invention;
FIG. 2 is a flow chart of the Apriori algorithm; and
fig. 3 is a schematic structural diagram of an integrity checking system 300 for relay protection data based on a correlation algorithm according to an embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Fig. 1 is a flowchart of an integrity checking method 100 for relay protection data based on a correlation algorithm according to an embodiment of the present invention. As shown in fig. 1, in the integrity checking method for relay protection data based on an association algorithm provided in the embodiment of the present invention, an association analysis method is used to check the integrity of relay protection data, an association relationship of relay protection big data is first mined, then a suspected incomplete record is selected, an actual value of an uncertain attribute of a preset value is inferred according to a determined attribute and a big data association rule in the record, and the inferred value is used to replace the preset value, so that the checked data better conforms to the association relationship of the big data, a certain promotion effect is provided for the management and application of the relay protection big data, and a data support can be provided for the research based on the relay protection big data.
The Apriori algorithm is the most basic and common algorithm in association rule mining and is mainly used for quickly mining big data association rules. In big data association rule mining, indivisible minimum unit information i is calledItem, set Ik={i1,i2,…,ikIs a set of k-terms; let I be the set of all items, T ═ T1,t2,…,tnDenotes a set of data transactions, each transaction tiThe set of included items is all a subset of I. The number of transactions that contain a set of items is referred to as the frequency of the set of items. The association rule is in the form of
Figure BDA0002324090560000071
Are both a proper subset of I and
Figure BDA0002324090560000072
degree of support
Figure BDA0002324090560000073
Reflecting the probability of simultaneous occurrence of the items contained in A and B in the transaction set, the calculation formula is as follows:
Figure BDA0002324090560000074
confidence level
Figure BDA0002324090560000075
Reflecting the conditional probability of B occurring in the transaction containing A, the calculation formula is as follows:
Figure BDA0002324090560000081
if item set IkSupport degree of (I)k) More than or equal to min _ sup, min _ sup is the minimum support threshold, item set IkIs a frequent item set.
The Apriori algorithm uses a layer-by-layer iterative search method to find a frequent item set using a candidate set. The basic idea is as follows: first find all 1 item frequent item sets L1,L1For generating candidate item set C2To C2Is judged to dig out L2I.e., 2 sets of frequent items, and L2For finding C3And L3Go on so until it can't findTo more k items of frequent item sets LkUntil now. The algorithm exploits the Apriori property, candidate CkIf there is a subset Ck-1Not at predetermined Lk-1In, then candidate item set CkMust not be a frequent set of items and thus be deleted directly. The number of candidate items can be reduced through screening, and therefore the speed of association rule mining is increased. The Apriori algorithm flow is shown in fig. 2. Here candidate item set CkCorresponding to claim 2, any two final elements of the union of different sets of (k-1) item frequent-items.
The integrity checking method 100 for relay protection data based on the association algorithm provided by the embodiment of the invention starts from step 101, determines an item set according to the attribute value sets of different attributes in the acquired history record in step 101, and constructs a transaction set by using the acquired history record.
For example, in an embodiment of the present invention, for 22529 defect records, each record attribute includes: protection type, defect severity, whether protection is out of service, defect location, defect cause, equipment manufacturer. For program implementation convenience, the information of 6 dimensions is first coded. The protection types are 20 in total and are represented by A1-A20; the defect severity is represented by B1-B3 in 3 cases; whether the protection exits from the operation has 2 situations, which are represented by C1 and C2; the defect sites were 91 in total and were designated as D1-D91; the defect causes were 55 in total, and were represented by E1 to E55; the equipment manufacturers have 77 cases in common, and the cases are represented by F1 to F77. Therefore, the set of items I is { a1 to a20, B1 to B3, C1, C2, D1 to D91, E1 to E55, and F1 to F77 }.
Each data record covers 6 attributes of protection type, defect severity, protection exit, defect position, defect reason and equipment manufacturer, so each record forms a transaction set. For example, the 1 st and 2 nd defect entries constitute the first 2 transactions, which are: a5, B2, C2, D83, E23, F22 and a5, B2, C2, D64, F39. The defect reason of the record 2 is "other", and no arbitrary item in the item set I matches with the defect reason, so the transaction 2 only contains 5 items.
In step 102, based on different attribute information, the item set and the transaction set are utilized to respectively mine a frequent item set.
Preferably, the mining of frequent item sets using the item sets and the transaction sets based on different attribute information includes:
step 21, comparing the support degree of each item in the item set with a preset support degree threshold, and screening items with the support degree greater than or equal to the preset support degree threshold for reservation to obtain a1 item frequent item set;
step 22, setting k to 2;
step 23, in the (k-1) item frequent item set, determining a union of any two item sets with different last elements, and judging whether all subsets of each union are in the (k-1) item frequent item set;
step 24, if all the subsets of a certain union set are in the (k-1) item frequent item sets, calculating the support degree of the union set, and screening the item sets with the support degree being greater than or equal to a preset support degree threshold value for reservation to obtain k item frequent item sets;
step 25, judging whether the item number of the (k-1) item frequent item set is more than or equal to 2; if yes, updating k to k +1, and returning to the step 3; otherwise, the process is finished.
In an embodiment of the present invention, the minimum support degree is set to be 0.5%, and the process of determining the frequent item set includes:
(1) a set of 1 item frequent items is first generated. And constructing 1 item frequent item set on the basis of the item sets. Specifically, according to the requirement of the minimum support degree of 0.5%, the items which do not meet the requirement of the minimum support degree in the item set { A1-A20, B1-B3, C1, C2, D1-D91, E1-E55 and F1-F77 } are deleted, and the rest items are listed in the 1 item frequent item set. For example, in protection types A1-A20, items that are not in the 1 item frequent item set are: a6 (intelligent terminal), a16 (short lead protection), a18 (generator protection), a19 (generator protection), and a20 (fault location device).
(2) A set of 2 frequent items is generated. In the 1 item frequent item set, any 2 item sets with different last elements are searched, and the union set is solved. And because all subsets of each union set are in the 1 item frequent item set, the support degree is directly calculated for the union set, if the support degree is more than or equal to the minimum support degree, the 2 item frequent item set is formed, otherwise, the support degree is deleted. For example, the support degree of the union set { a13, a5} of a13 (transformer protection) and a5 (line protection) is less than the minimum support degree, and therefore is not in the 2-item frequent item set, and the support degree of the union set { a5, B1} of a5 (line protection) and B1 (critical defect) is greater than the minimum support degree, and therefore is in the 2-item frequent item set.
(3) A set of 3 frequent items is generated. Searching any 2 item sets with different last elements in the 2 item frequent item sets, solving a union set of the item sets, judging whether all subsets of the union set are in the 2 item frequent item set, and if the subsets are not in the 2 item frequent item set, judging that the union set is not the 3 item frequent item set; and calculating the support degree of the union set which satisfies that any subset is a 2-item frequent item set, and if the support degree is greater than the minimum support degree, forming a 3-item frequent item set. For example, the union of the items { A5, B1}, { A5, B2} in the 2-item frequent-item set results in { A5, B1, B2}, and since the subset { B1, B2} of the union is not in the 2-item frequent-item set, the { A5, B1, B2} is not in the 3-item frequent-item set.
(4) And (k +1) item frequent item sets are generated according to the method on the basis of the k item frequent item sets, and the cyclic execution is carried out, wherein the cyclic execution is terminated when 6 item frequent item sets are obtained in the example.
In step 103, an association rule is determined according to a plurality of frequent item sets in the frequent item set, and an association rule base is established.
Preferably, the determining an association rule according to a plurality of frequent item sets in the frequent item set and establishing an association rule base includes:
for any one of the multiple frequent item sets, determining multiple corresponding antecedents and consequent items according to elements in the multiple frequent item sets to respectively determine multiple initial association rules;
screening the initial association rules with the confidence degrees larger than or equal to a preset confidence degree threshold value in the plurality of initial association rules as strong association rules, and establishing an association rule base by using the strong association rules.
Preferably, the screening of the initial association rules with the confidence level greater than or equal to a preset confidence level threshold in the plurality of initial association rules includes:
step 31, selecting a multiple item frequent item set;
step 32, setting g to 2;
step 33, screening 1-postpiece initial association rules from the multiple frequent item sets, comparing the confidence of each initial association rule with a confidence threshold, and determining the initial association rules with the confidence greater than or equal to the confidence threshold as strong association rules;
step 34, forming the posterites of the (g-1) -posterite strong association rule in the multiple frequent item sets into a (g-1) -posterite set, taking a union set of 2 posterites with only 1 different elements in the (g-1) -posterite set, and judging whether all items contained in the union set are in the (g-1) -posterite set;
step 35, if all items contained in a certain union set are in the (g-1) -back-piece set, taking the union set as a back-piece to form an association rule of the multiple items of frequent item sets, judging whether the confidence of the association rule is greater than or equal to a confidence threshold, and if so, determining the association rule as a strong association rule;
step 36, judging whether the current g is smaller than the difference value between the number of items in the frequent item set and 1; if yes, updating g to g +1, and returning to step 34; otherwise, the process is finished.
In the embodiment of the invention, the association rule is generated according to a 2-item frequent item set, a 3-item frequent item set, a 4-item frequent item set, a 5-item frequent item set and a 6-item frequent item set. Specifically, take the example of generating association rules from a 3-frequent item set. For one item { A12, B3, C1} in the 3-item frequent item set (A12 refers to protecting the fault information system substation, B3 refers to general defects, C1 refers to protecting against exit from run), a 1-back-piece association rule is generated:
a12, B3 → C1, confidence 93.78% > 85%, the association rule is a strong association rule;
a12, C1 → B3, confidence 86.48% > 85%, the association rule is a strong association rule;
b3, C1 → A12, the confidence level does not meet the requirement, and the association rule is not a strong association rule.
Thus, 1 set of the backing pieces { C1}, { B3} is obtained, and then { C1, B3} is taken as the backing piece
The confidence level of A12 → C1, B3 does not meet the requirement, and the association rule is not a strong association rule.
Therefore, the strong association rules generated by the 3-item frequent item set { A12, B3, D1} are A12, B3 → C1 and A12, C1 → B3.
And finally, establishing an association rule base according to the strong association rule obtained by each frequent item set.
In step 104, current relay protection data is obtained, and a strategy is determined according to a preset incomplete record to determine an incomplete record.
Preferably, the determining an incomplete record according to a preset incomplete record determining policy includes:
and if the attribute value of certain attribute of a certain record in the obtained current relay protection data is a preset filling value or a null value, determining that the record is an incomplete record.
In an embodiment of the present invention, it is considered that the item set I cannot cover all situations in the field, allowing the use of "other" or "NULL" to refer to other situations. However, due to the reasons of rough field filling and the like, part of the attribute fields of the item descriptions in the item set I with determined meanings can be used, and preset values are used for rough expression, so that incomplete data is caused. Thus, an incomplete record may be determined according to a preset padding value or a null value.
In step 105, the association rule base is searched according to the attribute values of the determined attributes of the determined incomplete records to determine the association rule matching the incomplete records, and the actual values of the uncertain attributes of the incomplete records are determined by using the association rule matching the incomplete records.
Preferably, the searching the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine the association rule matching the incomplete record, and determining the actual value of the uncertain attribute of the incomplete record by using the association rule matching the incomplete record comprises:
matching the attribute values of the determined attributes of the incomplete records with the previous items of each association rule in the association rule base to determine matching association rules;
and taking the attribute value corresponding to the latter item of the matching association rule as the actual value of the uncertain attribute of the incomplete record, and filling.
In the implementation mode of the invention, the defects are mainly characterized by protection types, defect severity, whether protection exits operation, defect positions, defect reasons and 6 dimensions of equipment manufacturers, which are respectively represented by A-F. Listing the possible values in each dimension constitutes a set of items I. Consider that items in item set I cannot cover all situations in the field, allowing the use of "other" or "NULL" to refer to other situations. However, due to the reasons of rough field filling and the like, part of the attribute fields of the item descriptions in the item set I with determined meanings can be used, and preset values are used for rough expression, so that incomplete data is caused.
According to the embodiment of the invention, after 22529 defect records obtained by statistics in 4 months from 2012 to 2019 are subjected to big data association analysis, the minimum confidence is set to be 85%, and the minimum support degree is 0.5%, 146 association rules are mined from 248 items of 6 attribute dimensions, and part of representative 16 association rules are shown in table 1.
TABLE 1 Association rules between relay protection defect data attributes
Figure BDA0002324090560000131
When the relay protection defect data record contains a suspected incomplete record, for example, { a ═ bus protection; b is severe; c is; d, opening the plug-in; e ═ others; f is manufacturer 1, and the defect cause field E is suspected to be an incomplete attribute. According to the association rule 9 in table 1, if "C ═ is ═ D ═ insert & F ═ manufacturer 1", the probability of "E ═ insert damage" is 92.21%, it can be inferred that "E ═ other" in the original record is an incomplete attribute, and "E ═ insert damage" should be used instead. And another protection as { A ═ other protection; b is severe; c is; d ═ an alternating current loop; e-external force failure; f is manufacturer 1, and the protection type field a is suspected to be an incomplete attribute. According to the association rule 5 in table 1, under the condition of "E ═ external force damage", the probability of "a ═ line protection" is 95.95%, and it can be inferred that "a ═ other protection" in the original record is an incomplete attribute and "a ═ line protection" should be used instead.
The relay protection big data can create good conditions for the promotion of professional application, and the data integrity is an important aspect for reflecting the data quality. According to the strong relevance of the relay protection big data, the embodiment of the invention applies Apriori algorithm to mine the relevance of the data, generates the relevance rule, checks the integrity of the relay protection data according to the relevance rule, and completes the prediction of null attribute values in the incomplete data. The article takes relay protection defect data as an example, and the relevance of 248 items of 6 dimensions of protection types, defect severity, protection exit or not, defect positions, defect reasons and equipment manufacturers is mined from 22529 defect records, so that the processing of incomplete data is completed, and the application effect is good.
Fig. 3 is a schematic structural diagram of an integrity checking system 300 for relay protection data based on a correlation algorithm according to an embodiment of the present invention. As shown in fig. 3, an integrity checking system 300 for relay protection data based on a correlation algorithm according to an embodiment of the present invention includes: a transaction set construction unit 301, a frequent item set determination unit 302, an association rule base establishment unit 303, an incomplete record determination unit 304, and a data check unit 305.
Preferably, the transaction set constructing unit 301 is configured to determine an item set according to attribute value sets of different attributes in the obtained history record, and construct a transaction set by using the obtained history record.
Preferably, the frequent item set determining unit 302 is configured to mine a frequent item set by using the item set and the transaction set respectively based on different attribute information.
Preferably, the frequent item set constructing unit 302, based on different attribute information, respectively mining a frequent item set by using the item set and the transaction set, including:
step 21, comparing the support degree of each item in the item set with a preset support degree threshold, and screening items with the support degree greater than or equal to the preset support degree threshold for reservation to obtain a1 item frequent item set;
step 22, setting k to 2;
step 23, in the (k-1) item frequent item set, determining a union of any two item sets with different last elements, and judging whether all subsets of each union are in the (k-1) item frequent item set;
step 24, if all the subsets of a certain union set are in the (k-1) item frequent item sets, calculating the support degree of the union set, and screening the item sets with the support degree being greater than or equal to a preset support degree threshold value for reservation to obtain k item frequent item sets;
step 25, judging whether the item number of the (k-1) item frequent item set is more than or equal to 2; if yes, updating k to k +1, and returning to step 23; otherwise, the process is finished.
Preferably, the association rule base establishing unit 303 is configured to determine an association rule according to a plurality of frequent item sets in the frequent item set, and establish an association rule base.
Preferably, the association rule base establishing unit 303 determines an association rule according to a plurality of frequent item sets in the frequent item set, and establishes an association rule base, including:
the initial association rule determining module is used for determining a plurality of corresponding antecedents and postcedents according to elements in any one of the multiple frequent item sets so as to respectively determine a plurality of initial association rules;
and the association rule base establishing module is used for screening the initial association rules with the confidence degrees larger than or equal to a preset confidence degree threshold value in the plurality of initial association rules as strong association rules and establishing an association rule base by using the strong association rules.
Preferably, the screening, by the association rule base establishing module, of the initial association rules with the confidence level greater than or equal to a preset confidence level threshold includes:
step 31, selecting a multiple item frequent item set;
step 32, setting g to 2;
step 33, screening 1-postpiece initial association rules from the multiple frequent item sets, comparing the confidence of each initial association rule with a confidence threshold, and determining the initial association rules with the confidence greater than or equal to the confidence threshold as strong association rules;
step 34, forming the posterites of the (g-1) -posterite strong association rule in the multiple frequent item sets into a (g-1) -posterite set, taking a union set of 2 posterites with only 1 different elements in the (g-1) -posterite set, and judging whether all items contained in the union set are in the (g-1) -posterite set;
step 35, if all items contained in a certain union set are in the (g-1) -back-piece set, taking the union set as a back-piece to form an association rule of the multiple items of frequent item sets, judging whether the confidence of the association rule is greater than or equal to a confidence threshold, and if so, determining the association rule as a strong association rule;
step 36, judging whether the current g is smaller than the difference value between the number of items in the frequent item set and 1; if yes, updating g to g +1, and returning to step 34; otherwise, the process is finished.
Preferably, the incomplete record determining unit 304 is configured to obtain current relay protection data, and determine an incomplete record according to a preset incomplete record determining policy.
Preferably, the incomplete record determining unit 304, determining an incomplete record according to a preset incomplete record determining strategy, includes:
and if the attribute value of certain attribute of a certain record in the obtained current relay protection data is a preset filling value or a null value, determining that the record is an incomplete record.
Preferably, the data checking unit 305 is configured to search the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine an association rule matching the incomplete record, and determine the actual value of the uncertain attribute of the incomplete record by using the association rule matching the incomplete record.
Preferably, the data checking unit 305, searching the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine the association rule matching the incomplete record, and determining the actual value of the uncertain attribute of the incomplete record by using the association rule matching the incomplete record, includes:
matching the attribute values of the determined attributes of the incomplete records with the previous items of each association rule in the association rule base to determine matching association rules;
and taking the attribute value corresponding to the latter item of the matching association rule as the actual value of the uncertain attribute of the incomplete record, and filling.
The integrity checking system 300 for relay protection data based on the association algorithm in the embodiment of the present invention corresponds to the integrity checking system method 100 for relay protection data based on the association algorithm in another embodiment of the present invention, and is not described herein again.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (12)

1. A relay protection data integrity checking method based on a correlation algorithm is characterized by comprising the following steps:
determining an item set according to the attribute value sets of different attributes in the acquired historical records, and constructing a transaction set by using the acquired historical records;
respectively mining a frequent item set by utilizing the item set and the transaction set based on different attribute information;
determining an association rule according to a plurality of frequent item sets in the frequent item set, and establishing an association rule base;
acquiring current relay protection data, and determining a strategy to determine an incomplete record according to a preset incomplete record;
searching the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine an association rule matched with the incomplete record, and determining the actual value of the uncertain attribute of the incomplete record by using the association rule matched with the incomplete record.
2. The method of claim 1, wherein the mining a frequent item set using the item set and the transaction set based on different attribute information comprises:
step 21, comparing the support degree of each item in the item set with a preset support degree threshold, and screening items with the support degree greater than or equal to the preset support degree threshold for reservation to obtain a1 item frequent item set;
step 22, setting k to 2;
step 23, in the (k-1) item frequent item set, determining a union of any two item sets with different last elements, and judging whether all subsets of each union are in the (k-1) item frequent item set;
step 24, if all the subsets of a certain union set are in the (k-1) item frequent item sets, calculating the support degree of the union set, and screening the item sets with the support degree being greater than or equal to a preset support degree threshold value for reservation to obtain k item frequent item sets;
step 25, judging whether the item number of the (k-1) item frequent item set is more than or equal to 2; if yes, updating k to k +1, and returning to step 23; otherwise, the process is finished.
3. The method of claim 1, wherein determining association rules based on a plurality of frequent itemsets in the frequent itemsets and building an association rule base comprises:
for any one of the multiple frequent item sets, determining multiple corresponding antecedents and consequent items according to elements in the multiple frequent item sets so as to respectively determine multiple initial association rules;
and screening the initial association rules of which the middle confidence degrees are greater than or equal to a preset confidence degree threshold value as strong association rules, and establishing an association rule base by using the strong association rules.
4. The method of claim 1, wherein the screening of the initial association rules with a confidence level greater than or equal to a preset confidence level threshold from among the plurality of initial association rules comprises:
step 31, selecting a multiple item frequent item set;
step 32, setting g to 2;
step 33, screening 1-postpiece initial association rules from the multiple frequent item sets, comparing the confidence of each initial association rule with a confidence threshold, and determining the initial association rules with the confidence greater than or equal to the confidence threshold as strong association rules;
step 34, forming the posterites of the (g-1) -posterite strong association rule in the multiple frequent item sets into a (g-1) -posterite set, taking a union set of 2 posterites with only 1 different elements in the (g-1) -posterite set, and judging whether all items contained in the union set are in the (g-1) -posterite set;
step 35, if all items contained in a certain union set are in the (g-1) -back-piece set, taking the union set as a back-piece to form an association rule of the multiple items of frequent item sets, judging whether the confidence of the association rule is greater than or equal to a confidence threshold, and if so, determining the association rule as a strong association rule;
step 36, judging whether the current g is smaller than the difference value between the number of items in the frequent item set and 1; if yes, updating g to g +1, and returning to step 34; otherwise, the process is finished.
5. The method of claim 1, wherein determining incomplete records according to a preset incomplete record determination strategy comprises:
and if the attribute value of certain attribute of a certain record in the obtained current relay protection data is a preset filling value or a null value, determining that the record is an incomplete record.
6. The method of claim 1, wherein searching the association rule base according to the attribute values of the determined attributes of the determined incomplete records to determine the association rule matching the incomplete records, and determining the actual values of the uncertain attributes of the incomplete records using the association rule matching the incomplete records comprises:
matching the attribute values of the determined attributes of the incomplete records with the previous items of each association rule in the association rule base to determine matching association rules;
and taking the attribute value corresponding to the latter item of the matching association rule as the actual value of the uncertain attribute of the incomplete record, and filling.
7. An integrity checking system of relay protection data based on a correlation algorithm, the system comprising:
the transaction set construction unit is used for determining an item set according to the attribute value sets of different attributes in the acquired historical records and constructing a transaction set by using the acquired historical records;
the frequent item set determining unit is used for respectively mining frequent item sets by utilizing the item set and the transaction set based on different attribute information;
the association rule base establishing unit is used for determining association rules according to a plurality of frequent item sets in the frequent item set and establishing an association rule base;
the incomplete record determining unit is used for acquiring current relay protection data and determining an incomplete record according to a preset incomplete record determining strategy;
and the data checking unit is used for searching the association rule base according to the attribute value of the determined attribute of the determined incomplete record to determine an association rule matched with the incomplete record, and determining the actual value of the uncertain attribute of the incomplete record by using the association rule matched with the incomplete record.
8. The system of claim 7, wherein the frequent items set constructing unit, based on different attribute information, respectively mines frequent items sets using the item sets and the transaction sets, and comprises:
step 21, comparing the support degree of each item in the item set with a preset support degree threshold, and screening items with the support degree greater than or equal to the preset support degree threshold for reservation to obtain a1 item frequent item set;
step 22, setting k to 2;
step 23, in the (k-1) item frequent item set, determining a union of any two item sets with different last elements, and judging whether all subsets of each union are in the (k-1) item frequent item set;
step 24, if all the subsets of a certain union set are in the (k-1) item frequent item sets, calculating the support degree of the union set, and screening the item sets with the support degree being greater than or equal to a preset support degree threshold value for reservation to obtain k item frequent item sets;
step 25, judging whether the item number of the (k-1) item frequent item set is more than or equal to 2; if yes, updating k to k +1, and returning to step 23; otherwise, the process is finished.
9. The system according to claim 7, wherein the association rule base establishing unit determines the association rule according to a plurality of frequent item sets in the frequent item set, and establishes the association rule base, including:
the initial association rule determining module is used for determining a plurality of corresponding antecedents and postcedents according to elements in any one of the multiple frequent item sets so as to respectively determine a plurality of initial association rules;
and the association rule base establishing module is used for screening the initial association rules with the confidence degrees larger than or equal to a preset confidence degree threshold value in the plurality of initial association rules as strong association rules and establishing an association rule base by using the strong association rules.
10. The system according to claim 9, wherein the association rule base establishing module for screening the initial association rules with confidence levels greater than or equal to a preset confidence level threshold from among the plurality of initial association rules comprises:
step 31, selecting a multiple item frequent item set;
step 32, setting g to 2;
step 33, screening 1-postpiece initial association rules from the multiple frequent item sets, comparing the confidence of each initial association rule with a confidence threshold, and determining the initial association rules with the confidence greater than or equal to the confidence threshold as strong association rules;
step 34, forming the posterites of the (g-1) -posterite strong association rule in the multiple frequent item sets into a (g-1) -posterite set, taking a union set of 2 posterites with only 1 different elements in the (g-1) -posterite set, and judging whether all items contained in the union set are in the (g-1) -posterite set;
step 35, if all items contained in a certain union set are in the (g-1) -back-piece set, taking the union set as a back-piece to form an association rule of the multiple items of frequent item sets, judging whether the confidence of the association rule is greater than or equal to a confidence threshold, and if so, determining the association rule as a strong association rule;
step 36, judging whether the current g is smaller than the difference value between the number of items in the frequent item set and 1; if yes, updating g to g +1, and returning to step 34; otherwise, the process is finished.
11. The system of claim 7, wherein the incomplete record determining unit determines the incomplete record according to a preset incomplete record determining strategy, comprising:
and if the attribute value of certain attribute of a certain record in the obtained current relay protection data is a preset filling value or a null value, determining that the record is an incomplete record.
12. The system of claim 7, wherein the data checking unit searches the association rule base according to the attribute values of the determined attributes of the incomplete records to determine the association rule matching the incomplete records, and determines the actual values of the uncertain attributes of the incomplete records by using the association rule matching the incomplete records, comprising:
matching the attribute values of the determined attributes of the incomplete records with the previous items of each association rule in the association rule base to determine matching association rules;
and taking the attribute value corresponding to the latter item of the matching association rule as the actual value of the uncertain attribute of the incomplete record, and filling.
CN201911309372.1A 2019-12-18 2019-12-18 Relay protection data integrity checking method and system based on correlation algorithm Pending CN111177130A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911309372.1A CN111177130A (en) 2019-12-18 2019-12-18 Relay protection data integrity checking method and system based on correlation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911309372.1A CN111177130A (en) 2019-12-18 2019-12-18 Relay protection data integrity checking method and system based on correlation algorithm

Publications (1)

Publication Number Publication Date
CN111177130A true CN111177130A (en) 2020-05-19

Family

ID=70650216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911309372.1A Pending CN111177130A (en) 2019-12-18 2019-12-18 Relay protection data integrity checking method and system based on correlation algorithm

Country Status (1)

Country Link
CN (1) CN111177130A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307086A (en) * 2020-10-30 2021-02-02 湖北烽火平安智能消防科技有限公司 Automatic data verification method and device in fire service
CN112395605A (en) * 2020-11-23 2021-02-23 国网四川省电力公司信息通信公司 Electric power Internet of things data fusion method based on association rules

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112307086A (en) * 2020-10-30 2021-02-02 湖北烽火平安智能消防科技有限公司 Automatic data verification method and device in fire service
CN112307086B (en) * 2020-10-30 2024-05-24 湖北烽火平安智能消防科技有限公司 Automatic data verification method and device in fire service
CN112395605A (en) * 2020-11-23 2021-02-23 国网四川省电力公司信息通信公司 Electric power Internet of things data fusion method based on association rules
CN112395605B (en) * 2020-11-23 2022-10-11 国网四川省电力公司信息通信公司 Electric power Internet of things data fusion method based on association rules

Similar Documents

Publication Publication Date Title
US9383973B2 (en) Code suggestions
CN110460460B (en) Service link fault positioning method, device and equipment
US10733520B2 (en) Making a prediction regarding development of a software product
EP2715561A2 (en) Systems and methods for recommending software applications
CN110059981B (en) Trust degree evaluation method and device and terminal equipment
CN111177130A (en) Relay protection data integrity checking method and system based on correlation algorithm
US20150066435A1 (en) System and method for cognitive alarm management for the power grid
CN106033574B (en) Method and device for identifying cheating behaviors
US20150355904A1 (en) Program visualization device, program visualization method, and program visualization program
CN110942314A (en) Abnormal account supervision method and device
CN108196975B (en) Data verification method and device based on multiple checksums and storage medium
CN111444093B (en) Method and device for determining quality of project development process and computer equipment
CN111782679A (en) Method and device for supervising data processing process, computer equipment and storage medium
CN114791865A (en) Method, system and medium for detecting self-consistency of configuration items based on relational graph
CN114528208A (en) Program error information identification method, device, equipment and medium
CN112765231A (en) Data processing method and device and computer readable storage medium
CN112698883A (en) Configuration data processing method, device, terminal and storage medium
CN110956552A (en) Insurance problem processing method, device, equipment and storage medium
CN111552703A (en) Data processing method and device
CN113724080B (en) Structured data right-determining method for electric power system trading platform
CN117786666B (en) Function code starting verification method and device, electronic equipment and storage medium
CN116702073B (en) Fault event prompting method and device, electronic equipment and storage medium
CN113986203B (en) Trigger automatic verification method and system, electronic equipment and storage medium
CN112882721B (en) Software package compiling method and device
CN109542906B (en) Equipment determination method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination