CN107169628B - Power distribution network reliability assessment method based on big data mutual information attribute reduction - Google Patents

Power distribution network reliability assessment method based on big data mutual information attribute reduction Download PDF

Info

Publication number
CN107169628B
CN107169628B CN201710244420.8A CN201710244420A CN107169628B CN 107169628 B CN107169628 B CN 107169628B CN 201710244420 A CN201710244420 A CN 201710244420A CN 107169628 B CN107169628 B CN 107169628B
Authority
CN
China
Prior art keywords
attribute
decision
condition
entropy
attributes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710244420.8A
Other languages
Chinese (zh)
Other versions
CN107169628A (en
Inventor
李妍
盛梦雨
刘婉兵
杜明秋
杨秉臻
杨晨光
王少荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201710244420.8A priority Critical patent/CN107169628B/en
Publication of CN107169628A publication Critical patent/CN107169628A/en
Application granted granted Critical
Publication of CN107169628B publication Critical patent/CN107169628B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of power distribution network planning, and provides a power distribution network reliability evaluation method based on big data mutual information attribute reduction. The method breaks through the limitations of the traditional Monte Carlo simulation and analysis method, and realizes the reliability evaluation of the power distribution network based on the mutual information attribute reduction of the big data aiming at the big data of the electric power.

Description

Power distribution network reliability assessment method based on big data mutual information attribute reduction
Technical Field
The invention relates to the field of power distribution network planning, in particular to a power distribution network reliability assessment method based on big data mutual information attribute reduction.
Background
With the development of technologies such as internet, database and the like and the automation of production environment, the fields such as finance, electric power, weather and the like generate massive and various rapidly-growing data, which is called as big data, and nowadays, the big data has penetrated into various fields, becomes an important production factor, and is becoming a new engine for promoting industrial revolution due to the huge utilization value thereof. The big data is mined and analyzed, main information of the big data is extracted and reasonably applied, and the value of the big data can be realized, the reliability of the power distribution network is a technical index strongly related to various factors, wherein the reliability of the power distribution network comprises data in various aspects such as air temperature, air speed, electricity sales, line loss rate and the like. The traditional reliability indexes are generally evaluated by using a plurality of indexes such as load point indexes, power failure time indexes, power failure economic indexes and the like through modeling or sampling simulation, but the analysis method has very large limitation when processing a complex electric power system and has long time consumption caused by state redundancy of a Monte Carlo sampling method, and a big data technology provides a new idea for carrying out reliability evaluation on a power distribution network.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a power distribution network reliability evaluation method based on big data mutual information attribute reduction. The method breaks through the limitations of the traditional Monte Carlo simulation and analysis method, and realizes the reliability evaluation of the power distribution network based on the mutual information attribute reduction of the big data aiming at the big data of the electric power.
The object of the invention is achieved by the following technical measures.
A power distribution network reliability assessment method based on big data mutual information attribute reduction carries out preprocessing on indexes related to power distribution network reliability, and comprises discretization of continuous indexes, mutual information values among the indexes are calculated based on concepts of information entropy, entropy correlation coefficients among the indexes are obtained after dimension removing operation is carried out, accordingly, correlation between each index and the reliability index and correlation between each index are judged, index reduction is carried out, then, BP neural networks are used for fitting nonlinear relations of the indexes which are obtained after reduction and are strongly correlated with the reliability index and mutually independent, and the defects of a neural network method are made up by combining with optimizing characteristics of a genetic algorithm. The method specifically comprises the following steps:
step 1: collecting a large amount of data related to the reliability of the power distribution network from academic, meteorological or statistical websites;
step 2: sorting out index values related to the reliability of the power distribution network from a plurality of data, namely sorting out a decision table for representing the corresponding relation between the reliability indexes and the related indexes, wherein the decision table comprises 1 decision attribute (namely, the reliability index) for representing the final reliability of the power distribution network and a plurality of condition attributes for representing factors related to the reliability;
and step 3: preprocessing data in the decision table: judging whether the value of the attribute is continuous or discrete according to all values of various attributes, calculating the number of the continuous attribute to be divided by using knowledge in mathematical statistics, and discretizing the continuous attribute by using an equidistant dispersion method;
and 4, step 4: calculating the probability of each attribute when the attribute takes a specific discrete value, then solving the respective information entropy of each attribute and the conditional entropy of the conditional attribute to the decision attribute, and further solving the mutual information between each conditional attribute and the decision attribute and between every two conditional attributes;
and 5: normalizing the mutual information between the condition attribute and the decision attribute calculated in the step 4, and solving an entropy correlation coefficient between the condition attribute and the decision attribute by combining the information entropy, so as to judge the correlation between the condition attribute and the decision attribute, wherein the smaller the entropy correlation coefficient is, the weaker the correlation is, a proper critical value is set to measure the correlation between the attributes, and the condition attribute with weak correlation with the decision attribute is removed;
step 6: similar to the method in the step 5, entropy correlation coefficients between every two condition attributes left after being removed in the step 5 are calculated, redundant condition attributes which are strongly correlated with the rest of condition attributes and weaker in correlation with decision attributes are screened out and deleted, and condition attribute sets which are strongly correlated with reliability indexes and are mutually independent are obtained, so that the purpose of reducing the attributes is achieved;
and 7: constructing a three-layer BP neural network to train the reduced attribute set, taking the condition attribute which is obtained in the step 6 and is strongly related to the reliability index as input, taking decision attribute data as output, and solving the connection weight between nodes of each layer in the network which minimizes the fitting error and the threshold values of a hidden layer and an output layer to obtain an optimal BP neural network model; in order to improve the training precision, the optimal initial weight and threshold value can be obtained by using a genetic algorithm.
In the above technical solution, the step 2 includes the following steps:
step 2.1: establishing an m multiplied by n distribution network reliability evaluation decision table according to a large amount of collected data related to the reliability of a certain city distribution network, wherein n represents the total number of decision attributes and condition attributes, the corresponding decision attributes and condition attributes form a group of attribute data, and m represents the total number (namely sample number) of the attribute data;
step 2.2: taking an index directly representing or determining the reliability of the power distribution network in the decision table as a decision attribute, such as: the reliability of power supply and other indexes related to reliability are taken as condition attributes, such as: month, air temperature, integrated voltage yield, etc.
In the above technical solution, the step 3 includes the following steps:
step 3.1: according to the values of all attributes in the decision table, whether the attribute data is continuous or discrete is judged, such as: the attributes such as year, month and the like are only fixed integers and are discrete data, and the attributes such as the power consumption, the load rate, the comprehensive voltage qualification rate and the like of the whole society can obtain all numerical values in an interval and are continuous data;
step 3.2: calculating the number of partitions into which the continuous attribute is to be divided according to the data distribution characteristics of all factors and related objective factors and a formula (1);
k=1.87×(m-1)2/5 (1)
wherein m is the number of samples of the attribute data, and k is the number of partitions of the continuous attribute value range;
step 3.3: and 3.2, calculating the interval length of the continuous attribute according to the number of the partitions calculated in the step 3.2, dividing the value range of the continuous attribute into k intervals by an equidistant dispersion method, assigning a discrete integer value to each interval, calculating the discretization result of the continuous attribute, and completing the discretization of the continuous data.
In the above technical solution, the step 4 includes the following steps:
step 4.1: counting the number of samples of each discrete integer value taken by each attribute, and calculating the probability of taking a specific discrete value by the attribute according to a formula (2);
Figure BDA0001270234590000031
in the formula, k represents the number of discretized partitions of the attribute X, XiThe i-th value, c (X), representing the attribute Xi) The representation attribute X takes the value XiU represents the total sample, i.e. the discourse domain, c (U) represents the total number of samples, p (X)i) The representation attribute X takes the value XiThe probability of (d);
step 4.2: according to the formulas (3) and (4), the respective information entropy of each attribute, the conditional entropy of a conditional attribute to a decision attribute and the conditional entropy of a certain conditional attribute to another conditional attribute are obtained, wherein the information entropy is used for measuring the information quantity provided by the attributes and also representing the ordering degree of the attribute sequence, and the conditional entropy represents the information quantity of another attribute under the premise that the certain attribute is completely known;
Figure BDA0001270234590000032
where h (x) represents the information entropy of attribute x;
Figure BDA0001270234590000041
in the formula, p (Y)j|Xi) Is shown at XiOn the premise of occurrence, YjProbability of occurrence, H (y | x) represents the conditional entropy of attribute y for x or the conditional entropy of y based on x;
step 4.3: using the calculation result of step 4.2, obtaining the mutual information between each condition attribute and decision attribute and between each two condition attributes according to formula (5) to represent the size of the shared information quantity between the attributes,
I(x,y)=H(y)-H(y|x) (5)
in the formula, h (y) represents the information entropy of the attribute y, and I (x, y) represents the mutual information of the attributes x and y, and can be considered as the information amount common to the attributes y and x.
In the above technical solution, the step 5 includes the following steps:
step 5.1: in order to eliminate dimension influence, the formula (6) is utilized to normalize the mutual information of the condition attribute and the decision attribute calculated in the step (4.3) to obtain an entropy correlation coefficient value, and accordingly, the correlation between the condition attribute and the decision attribute is judged, the smaller the entropy correlation coefficient is, the weaker the correlation is, and the smaller the effect of the condition attribute on the reliability evaluation of the power distribution network is;
Figure BDA0001270234590000042
in the formula, ρxyThe entropy correlation coefficient of the attributes x and y represents the correlation degree of x and y;
step 5.2: and (4) setting a critical value e1 according to the calculation result of the entropy correlation coefficient in the step 5.1, and when the entropy correlation coefficient of a certain condition attribute and a decision attribute is smaller than the critical value, considering that the condition attribute has little influence on the reliability of the power distribution network, and removing the condition attribute from the decision table.
In the above technical solution, the step 6 includes the following steps:
step 6.1: similar to the method in step 5, the entropy correlation coefficient between the condition attributes remaining after the elimination in step 5.2 is calculated;
step 6.2: setting a critical value e2 according to the calculation result of the entropy correlation coefficient in step 6.1, when the entropy correlation coefficients of the two condition attributes exceed the critical value, regarding that the correlation of the two attributes is strong, and representing the two attributes mutually, that is, the two attributes have approximately the same influence on the reliability of the power distribution network, at this time, comparing the entropy correlation coefficients between the two condition attributes and the decision attribute, deleting the condition attribute with weak correlation with the decision attribute, reducing the redundancy of the attribute set, and obtaining the condition attribute sets which are strongly correlated with the reliability index and are independent of each other.
In the above technical solution, the step 7 includes the following steps:
step 7.1: constructing a three-layer BP neural network to train the reduced attribute data, taking the condition attribute which is obtained in the step 6.2 and is strongly related to the reliability index as input, and taking the decision attribute as final output; assuming that the reduced decision table has p conditional attributes, the number of nodes of the input layer and the output layer is p and 1 respectively; randomly selecting b test samples from the m groups of attribute data, taking the rest samples as training samples of the neural network, wherein the samples comprise condition attributes and decision attribute values, and carrying out normalization processing on the data in the samples;
step 7.2: randomly generating initial connection weights of nodes of each layer in the h-group BP neural network and thresholds of a hidden layer and an output layer by using a computer, rewriting the initial connection weights and the thresholds into a binary coding form to form an initial solution space, and calculating the fitness of solution data in the solution space by combining the neural network; selecting the first c solution data with larger fitness as parent solution data, performing intersection and mutation operations on the parent data to obtain a child solution space, judging whether convergence occurs or not according to the fitness of the child solution data, if so, optimizing, stopping and outputting the optimal initial weight and threshold, otherwise, continuing the operations of selection, intersection and mutation;
step 7.3: decoding the initial weight and the threshold value calculated in the step 7.2, training the normalized sample by using a BP neural network to obtain the error of the estimated value and the true value of the decision attribute, judging whether the error meets the convergence condition, if not, adjusting the weight and the threshold value, and continuing to train the network; if so, the loop is stopped and the weight and threshold that minimizes the error are output.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a power distribution network reliability evaluation method based on mutual information and an improved BP neural network, which aims at a large amount of various data related to power distribution network reliability appearing in a big data background, obtains entropy-related coefficient values based on mutual information concepts and dimensionless operation on the basis of information entropy, screens out indexes strongly related to the power distribution network reliability, combines the BP neural network to model the indexes, and utilizes the optimizing characteristic of a genetic algorithm to make up the defect that the initial weight and the threshold value of the neural network cannot be determined, thereby realizing the comprehensive, accurate and rapid evaluation of the power distribution network reliability.
Drawings
FIG. 1 is a flow chart of power distribution network reliability assessment based on big data mutual information attribute reduction;
fig. 2 is a flow chart for reducing reliability related indexes of a power distribution network based on mutual information.
Detailed Description
The present invention will be further described below in order to make the technical means, the creation features and the objects of the present invention easy to understand.
Referring to fig. 1 and 2, an embodiment of the present invention provides a power distribution network reliability assessment method based on big data mutual information attribute reduction, which is performed sequentially according to the following steps:
step 1: acquiring a large amount of power distribution and utilization data of a certain city from the interior of a power enterprise, and acquiring data of various aspects related to the reliability of a power distribution network of the city from websites of weather, statistics and the like;
step 2: a 108 × 15 distribution network reliability evaluation decision table is prepared from the large amount of data collected in step 1, and includes 1 decision attribute, i.e., power supply reliability (Y,%), and 14 condition attributes, i.e., year (X1), month (X2), total social power consumption (X3, kWh), power sales (X4, kWh), 220kV and following line loss (X5,%), load rate (X6,%), maximum load (X7, kW), integrated voltage qualification (X8,%), month precipitation (X9, mm), month average air temperature (X10, ° c), month and day lighting hours (X11, h), month average air speed (X12, m/s), month and day wind and day numbers (X13, day), month and day rain and day numbers (X14, day), and there are 108 sets of attribute data;
and step 3: according to the values of all attributes in the decision table, whether the attribute data is continuous or discrete is judged, such as: the attributes such as year, month and the like are only fixed integers and are discrete data, and the values of the attributes such as the power consumption, the load rate, the comprehensive voltage qualification rate and the like of the whole society are taken from a certain continuous interval and are continuous data; in order to facilitate the subsequent data correlation analysis, discretization processing needs to be performed on continuous data, and the specific processing mode is as follows:
calculating the number of partitions into which the continuous attribute is to be divided according to the data distribution characteristics of all factors and related objective factors and a formula (1);
k=1.87×(m-1)2/5 (1)
in the formula, m is the total number of samples, and k is the number of partitions of the continuous attribute.
The number of divisions m, which is calculated according to the formula (1), is 1.87 × (108-1)2/512.12, i.e. choose to divide all attributes into 12 classes, and see table 1 for the results;
dividing the value of the continuous attribute x into k intervals by an equidistant dispersion method, and calculating the interval length l of the continuous attribute in discretization by using a formula (2)xAssigning a discrete integer value to each interval, namely discretizing the continuous data and then only taking the discrete integer values of 1, 2. And calculating a discretization result corresponding to each original value of the attribute according to a formula (3) to complete discretization, wherein the discretization result is shown in table 1.
Figure BDA0001270234590000061
In the formula, max ([ x ]) and min ([ x ]) are respectively the maximum value and the minimum value of all values in the attribute x, and k is the set discretization interval number.
Figure BDA0001270234590000071
In the formula, xiRepresenting the ith value of the attribute X before discretization, XiRepresenting the sum of x after discretizationiCorresponding ith value of attribute x, [ x]Meaning rounded down, i.e. the largest integer smaller than x.
TABLE 1 discretization results
Figure BDA0001270234590000072
And 4, step 4: counting the number of samples of each discrete integer value taken by each attribute by using the discretization result in the step 3, and calculating the probability when the attribute takes a specific discrete value according to a formula (4);
Figure BDA0001270234590000073
in the formula, k represents the number of discretized partitions of the attribute X, XiThe i-th value, c (X), representing the attribute Xi) The representation attribute X takes the value XiU represents the total sample, i.e. the discourse domain, c (U) represents the total number of samples, p (X)i) The representation attribute X takes the value XiThe probability of (c).
Respectively calculating the information entropy of each attribute, the conditional entropy of the conditional attribute to the decision attribute and the conditional entropy of one conditional attribute to another conditional attribute according to formulas (5) and (6) by using the probability distribution obtained above, wherein the information entropy is used for measuring the information quantity provided by the attributes and also representing the ordering degree of the attribute sequence, and the conditional entropy represents the information quantity of another attribute on the premise that one attribute is completely known;
Figure BDA0001270234590000074
in the formula, h (x) represents the information entropy of the attribute x.
Figure BDA0001270234590000081
In the formula, p (Y)j|Xi) Is shown at XiOn the premise of occurrence, YjThe probability of occurrence, H (y | x), represents the conditional entropy of attribute y for x or the conditional entropy of y based on x.
And (3) obtaining mutual information between various condition attributes and decision attributes and between condition attributes in pairs according to a formula (7) by using the calculation results so as to measure the size of the shared information quantity between the attributes.
I(x,y)=H(y)-H(y|x) (7)
In the formula, h (y) represents the information entropy of the attribute y, and I (x, y) represents the mutual information of the attributes x and y, and can be considered as the information amount common to the attributes y and x.
And 5: in order to eliminate dimension influence, normalizing the mutual information of the condition attribute and the decision attribute calculated in the step 4 by using a formula (8) to obtain an entropy correlation coefficient value, and accordingly judging the correlation between the condition attribute and the decision attribute, wherein the smaller the entropy correlation coefficient is, the weaker the correlation is, and the smaller the effect of the condition attribute on the reliability evaluation of the power distribution network is; each condition attribute xiThe entropy correlation coefficient between (i ═ 1, 2.., 14) and decision attribute y is shown in table 2;
Figure BDA0001270234590000082
in the formula, ρxyThe entropy correlation coefficient of the attributes x and y represents the correlation degree of x and y.
TABLE 2 entropy correlation coefficient between conditional and decision attributes
Condition attributes X1 X2 X3 X4 X5 X6 X7
Entropy correlation coefficient 0.2770 0.1488 0.1859 0.2027 0.1513 0.1578 0.1636
Condition attributes X8 X9 X10 X11 X12 X13 X14
Entropy correlation coefficient 0.2874 0.1353 0.1112 0.1645 0.1569 0.0947 0.1652
Setting a critical value e1 according to the calculation result of the entropy correlation coefficient, and when the entropy correlation coefficient of a certain condition attribute and a decision attribute is smaller than the critical value, considering that the condition attribute has little influence on the reliability of the power distribution network, and removing the condition attribute from the decision table; as shown in Table 2, the maximum entropy correlation coefficient between the condition attributes and the decision attributes does not exceed 0.3, wherein e1 is selected to be 0.15, and the condition attributes with the entropy correlation coefficient not exceeding e1 are removed, namely month X2, month precipitation X9, month average air temperature X10 and month windage number X13 are removed.
Step 6: similar to the method in step 5, entropy correlation coefficients among the condition attributes remaining after being removed in step 5 are calculated, a correlation matrix is established, and the calculation result is shown in table 3;
TABLE 3 entropy correlation coefficient between main conditional attributes
Figure BDA0001270234590000083
Figure BDA0001270234590000091
Setting a critical value e2 according to the value of the entropy correlation coefficient in the correlation matrix, when the entropy correlation coefficient of the two condition attributes exceeds the critical value, considering that the correlation of the two attributes is strong, and expressing the correlation of the two attributes mutually, namely, the two attributes have approximately the same influence on the reliability of the power distribution network, comparing the entropy correlation coefficient between the two condition attributes and the decision attribute, deleting the condition attribute with weak correlation with the decision attribute, obtaining a condition attribute set which is strongly correlated with the reliability index and is independent of each other, and achieving the purpose of attribute reduction;
as can be seen from table 3, the entropy correlation coefficients between X1 and X8, X3 and X4, and between X3 and X7 all exceed 0.5, and the threshold value e2 is selected to be 0.5, and the magnitude of the entropy correlation coefficients of these five condition attributes and decision attributes is X8> X1> X4> X3> X7, so that the relative redundant condition attribute year X1 and the total social power consumption X3 are eliminated.
And 7: constructing a three-layer BP neural network to train the reduced attribute data, taking the condition attribute which is obtained in the step 6 and is strongly related to the reliability index as input, taking decision attribute data as final output, and assuming that p condition attributes exist in a reduced decision table, the number of nodes of an input layer and an output layer is p and 1 respectively; in the present calculation example, 108 groups of sample data are totally obtained, 8 groups of sample data are randomly selected from the sample data as test samples, the rest 100 groups of sample data are used as training samples, the samples comprise condition attributes and decision attribute values, and normalization processing is carried out on the data in the samples;
randomly generating initial connection weights of nodes of each layer in the h-group BP neural network and thresholds of a hidden layer and an output layer by using a computer, rewriting the initial connection weights and the thresholds into a binary coding form to form an initial solution space, and calculating the fitness of solution data in the solution space by combining the neural network; selecting the first c solution data with larger fitness as parent solution data, performing intersection and mutation operations on the parent data to obtain a child solution space, judging whether convergence occurs or not according to the fitness of the child solution data, if so, optimizing, stopping and outputting the optimal initial weight and threshold, otherwise, continuing the operations of selection, intersection and mutation;
decoding the initial weight and the threshold value calculated in the last step and inputting the initial weight and the threshold value into a neural network, training 100 training samples subjected to normalization processing by using a BP neural network to obtain errors of a decision attribute estimated value and a true value, judging whether the errors meet a convergence condition or not, if not, adjusting the weight and the threshold value, and continuing training the network; if so, stopping circulation, and outputting the weight and the threshold value which enable the error to be minimum to obtain an optimal BP network model;
the reliability of 8 groups of test samples is evaluated by using the trained BP neural network model, the comparison between the evaluation result and the true value is shown in Table 4, and as can be seen from Table 4, the evaluation value is quite close to the actual value, the maximum absolute error is 0.004, and therefore, the evaluation effect of the evaluation method is good.
TABLE 4 prediction results
Serial number True value Prediction value Absolute error
1 99.989 99.990 0.001
2 99.973 99.973 0.000
3 99.974 99.975 0.001
4 99.989 99.985 0.004
5 99.994 99.992 0.002
6 99.980 99.981 0.001
7 99.988 99.987 0.001
8 99.987 99.987 0.000
Details not described in the present specification belong to the prior art known to those skilled in the art.

Claims (1)

1. A power distribution network reliability assessment method based on big data mutual information attribute reduction is characterized by comprising the following steps:
(1) collecting a large amount of data related to the reliability of the power distribution network from academic, meteorological or statistical websites;
(2) a decision table for representing the corresponding relation between the reliability indexes and the related indexes is sorted out from a plurality of data, wherein the decision table comprises 1 decision attribute for representing the reliability of the final power distribution network, namely the reliability index and a plurality of condition attributes for representing factors related to the reliability; the specific mode is as follows:
step one, establishing an m multiplied by n distribution network reliability evaluation decision table according to a large amount of collected data related to the reliability of a certain city distribution network, wherein n represents the total number of decision attributes and condition attributes, the corresponding decision attributes and condition attributes form a group of attribute data, and m represents the total group number of the attribute data, namely the sample number;
step two, taking an index which directly expresses or determines the reliability of the power distribution network in the decision table as a decision attribute, and taking other indexes related to the reliability as condition attributes;
(3) preprocessing data in the decision table: judging whether the value of the attribute is continuous or discrete according to all values of various attributes, calculating the number of the continuous attribute which is to be divided, and discretizing the continuous attribute by an equidistant dispersion method; the specific mode is as follows:
step one, judging whether attribute data is continuous or discrete according to values of all attributes in a decision table;
step two, calculating the number of partitions into which the continuous attribute is to be divided according to the data distribution characteristics of all factors and relevant objective factors and the following formula;
k=1.87×(m-1)2/5
wherein m is the number of samples of the attribute data, and k is the number of partitions of the continuous attribute value range;
calculating the interval length of the continuous attribute according to the calculated partition number, assigning a discrete integer value to each interval, and calculating the discretization result of the continuous attribute to complete the discretization of the continuous data;
(4) calculating the probability of each attribute when the attribute takes a specific discrete value, then calculating the respective information entropy of each attribute and the conditional entropy of the conditional attribute to the decision attribute, further calculating the mutual information between various conditional attributes and the decision attribute and the mutual information between one conditional attribute and another conditional attribute; the specific mode is as follows:
step one, counting the number of samples of each discrete integer value taken by each attribute, calculating the probability of the attribute taking a specific discrete value according to the following formula,
Figure FDA0002968625480000011
in the formula, k represents the number of discretized partitions of the attribute X, XiThe i-th value, c (X), representing the attribute Xi) The representation attribute X takes the value XiU represents the total sample, i.e. the discourse domain, c (U) represents the total number of samples, p (X)i) The representation attribute X takes the value XiThe probability of (d);
step two, solving the respective information entropy of each attribute, the conditional entropy of the conditional attribute to the decision attribute and the conditional entropy of one conditional attribute to another conditional attribute according to the following formulas;
Figure FDA0002968625480000021
where h (x) represents the information entropy of attribute x;
Figure FDA0002968625480000022
in the formula, p (Y)j|Xi) Is shown at XiOn the premise of occurrence, YjProbability of occurrence, H (y | x) represents the conditional entropy of attribute y for x or the conditional entropy of y based on x;
step three, using the calculation result of the step, obtaining mutual information between each condition attribute and the decision attribute and mutual information between one condition attribute and another condition attribute according to the following formula,
I(x,y)=H(y)-H(y|x)
in the formula, h (y) represents the information entropy of the attribute y, and I (x, y) represents the mutual information of the attributes x and y, and is the information amount shared by the attributes y and x;
(5) normalizing the mutual information between the condition attribute and the decision attribute calculated in the step (4), solving an entropy correlation coefficient between the condition attribute and the decision attribute by combining the information entropy, setting a critical value e1 according to the calculation result of the entropy correlation coefficient, and removing the entropy correlation coefficient between a certain condition attribute and the decision attribute from the decision table when the entropy correlation coefficient is smaller than the critical value; the specific mode is as follows:
step one, normalizing the mutual information of the calculated condition attribute and the decision attribute by using the following formula to obtain the entropy correlation coefficient value,
Figure FDA0002968625480000023
in the formula, ρxyThe entropy correlation coefficient of the attributes x and y represents the correlation degree of x and y;
step two, setting a critical value e1 according to the calculation result of the entropy correlation coefficient, and removing the entropy correlation coefficient of a certain condition attribute and a decision attribute from the decision table when the entropy correlation coefficient is smaller than the critical value;
(6) calculating entropy correlation coefficients among the remaining condition attributes after being removed in the step (5), setting a critical value e2 according to the value of the entropy correlation coefficients, judging that the two condition attributes have the same influence on the reliability of the power distribution network when the entropy correlation coefficients of the two condition attributes exceed the critical value, comparing the entropy correlation coefficients between the two condition attributes and the decision attribute at the moment, deleting the condition attribute with small entropy correlation coefficient between the two condition attributes and the decision attribute, and obtaining a reduced condition attribute set;
(7) constructing a three-layer BP neural network to train the reduced condition attribute set, taking the condition attribute obtained in the step (6) as input, taking decision attribute data as output, and solving the connection weight between nodes of each layer in the network which enables the fitting error to be minimum and the thresholds of a hidden layer and an output layer to obtain an optimal BP neural network model; the specific mode is as follows:
constructing a three-layer BP neural network to train the reduced attribute data, taking the obtained condition attribute as input, and taking the decision attribute as final output; assuming that the reduced decision table has p conditional attributes, the number of nodes of the input layer and the output layer is p and 1 respectively; randomly selecting b test samples from the m groups of attribute data, taking the rest samples as training samples of the neural network, wherein the samples comprise condition attributes and decision attribute values, and carrying out normalization processing on the data in the samples;
randomly generating initial connection weights of nodes of each layer in the h groups of BP neural networks and thresholds of a hidden layer and an output layer by using a computer, rewriting the initial connection weights and the thresholds into a binary coding form to form an initial solution space, and calculating the fitness of solution data in the solution space by combining the neural networks; selecting the first c solution data with larger fitness as parent solution data, performing intersection and mutation operations on the parent solution data to obtain a child solution space, judging whether convergence occurs or not according to the fitness of the child solution data, if so, optimizing, stopping and outputting the optimal initial weight and threshold, otherwise, continuing the operations of selection, intersection and mutation;
decoding the initial weight and the threshold value calculated in the previous step, training the normalized sample by using a BP neural network to obtain the error of the estimated value and the true value of the decision attribute, judging whether the error meets the convergence condition, if not, adjusting the weight and the threshold value, and continuing to train the network; if so, the loop is stopped and the weight and threshold that minimizes the error are output.
CN201710244420.8A 2017-04-14 2017-04-14 Power distribution network reliability assessment method based on big data mutual information attribute reduction Active CN107169628B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710244420.8A CN107169628B (en) 2017-04-14 2017-04-14 Power distribution network reliability assessment method based on big data mutual information attribute reduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710244420.8A CN107169628B (en) 2017-04-14 2017-04-14 Power distribution network reliability assessment method based on big data mutual information attribute reduction

Publications (2)

Publication Number Publication Date
CN107169628A CN107169628A (en) 2017-09-15
CN107169628B true CN107169628B (en) 2021-05-07

Family

ID=59849026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710244420.8A Active CN107169628B (en) 2017-04-14 2017-04-14 Power distribution network reliability assessment method based on big data mutual information attribute reduction

Country Status (1)

Country Link
CN (1) CN107169628B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197822B (en) * 2018-01-24 2022-06-21 贵州电网有限责任公司 Power distribution network fault line selection adaptability evaluation decision method
CN108665181A (en) * 2018-05-18 2018-10-16 中国电力科学研究院有限公司 A kind of appraisal procedure and device of distribution network reliability
CN108664752A (en) * 2018-05-23 2018-10-16 同济大学 A kind of process parameter optimizing method based on process rule and big data analysis technology
CN108846422B (en) * 2018-05-28 2021-08-31 中国人民公安大学 Account number association method and system across social networks
CN109165819B (en) * 2018-08-03 2021-09-14 国网山东省电力公司聊城供电公司 Active power distribution network reliability rapid evaluation method based on improved AdaBoost. M1-SVM
CN109242150A (en) * 2018-08-15 2019-01-18 中国南方电网有限责任公司超高压输电公司南宁监控中心 A kind of electric network reliability prediction technique
CN109636660A (en) * 2018-10-22 2019-04-16 广东精点数据科技股份有限公司 A kind of agricultural weather data redundancy removing method and system based on comentropy
CN109343367A (en) * 2018-10-26 2019-02-15 齐鲁工业大学 A method of based on network response surface flue gas desulfurization
CN109615246B (en) * 2018-12-14 2020-10-23 内蒙古电力(集团)有限责任公司内蒙古电力科学研究院分公司 Method for determining economic operation state of active power distribution network
CN110142803B (en) * 2019-05-28 2023-02-10 上海电力学院 Method and device for detecting working state of mobile welding robot system
CN110443320A (en) * 2019-08-13 2019-11-12 北京明略软件系统有限公司 The determination method and device of event similarity
CN113221442B (en) * 2020-12-24 2022-08-30 山东鲁能软件技术有限公司 Method and device for constructing health assessment model of power plant equipment
CN113326655A (en) * 2021-05-25 2021-08-31 广西电网有限责任公司电力科学研究院 Comprehensive evaluation method and device for reliability and economy of radiation type power distribution network
CN113220751A (en) * 2021-06-03 2021-08-06 国网江苏省电力有限公司营销服务中心 Metering system and evaluation method for multi-source data state quantity
CN113537734B (en) * 2021-06-28 2023-02-03 国网福建省电力有限公司经济技术研究院 Energy data application catalog extraction method based on maximum correlation minimum redundancy

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102879677A (en) * 2012-09-24 2013-01-16 西北工业大学 Intelligent fault diagnosis method based on rough Bayesian network classifier
CN103488802A (en) * 2013-10-16 2014-01-01 国家电网公司 EHV (Extra-High Voltage) power grid fault rule mining method based on rough set association rule
CN106503802A (en) * 2016-10-20 2017-03-15 上海电机学院 A kind of method of utilization genetic algorithm optimization BP neural network system
KR102059472B1 (en) * 2018-11-29 2019-12-30 대한민국 A System and Method for Prediction of Geomagnetic Disturbance Strength based on Solar Coronal Hole Information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102879677A (en) * 2012-09-24 2013-01-16 西北工业大学 Intelligent fault diagnosis method based on rough Bayesian network classifier
CN103488802A (en) * 2013-10-16 2014-01-01 国家电网公司 EHV (Extra-High Voltage) power grid fault rule mining method based on rough set association rule
CN106503802A (en) * 2016-10-20 2017-03-15 上海电机学院 A kind of method of utilization genetic algorithm optimization BP neural network system
KR102059472B1 (en) * 2018-11-29 2019-12-30 대한민국 A System and Method for Prediction of Geomagnetic Disturbance Strength based on Solar Coronal Hole Information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于粗糙集理论的配电网可靠性评估;黄海;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20140331;全文 *

Also Published As

Publication number Publication date
CN107169628A (en) 2017-09-15

Similar Documents

Publication Publication Date Title
CN107169628B (en) Power distribution network reliability assessment method based on big data mutual information attribute reduction
CN105701596A (en) Method for lean distribution network emergency maintenance and management system based on big data technology
CN106372747B (en) Random forest-based reasonable line loss rate estimation method for transformer area
CN112149873A (en) Low-voltage transformer area line loss reasonable interval prediction method based on deep learning
CN106649479A (en) Probability graph-based transformer state association rule mining method
WO2017071369A1 (en) Method and device for predicting user unsubscription
CN111105035A (en) Neural network pruning method based on combination of sparse learning and genetic algorithm
CN114519514B (en) Low-voltage transformer area reasonable line loss value measuring and calculating method, system and computer equipment
CN111178585A (en) Fault reporting amount prediction method based on multi-algorithm model fusion
CN111815060A (en) Short-term load prediction method and device for power utilization area
CN111882157A (en) Demand prediction method and system based on deep space-time neural network and computer readable storage medium
CN113469570A (en) Information quality evaluation model construction method, device, equipment and storage medium
CN105488598A (en) Medium-and-long time electric power load prediction method based on fuzzy clustering
CN112990776B (en) Distribution network equipment health degree evaluation method
CN114548493A (en) Method and system for predicting current overload of electric energy meter
CN113743453A (en) Population quantity prediction method based on random forest
CN113030633B (en) GA-BP neural network-based power distribution network fault big data analysis method and system
CN110135511B (en) Method and device for determining time section of power system and electronic equipment
CN112016858A (en) Subjective and objective weighting method-based public opinion risk evaluation method
CN115687788A (en) Intelligent business opportunity recommendation method and system
CN115081515A (en) Energy efficiency evaluation model construction method and device, terminal and storage medium
CN108123436B (en) Voltage out-of-limit prediction model based on principal component analysis and multiple regression algorithm
CN113919162A (en) Voltage sag risk early warning method based on simulation and multi-source measured data fusion
CN113327047B (en) Power marketing service channel decision method and system based on fuzzy comprehensive model
Indralaksono et al. Hierarchical Clustering and Deep Learning for Short-Term Load Forecasting with Influenced Factors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant