CN111310785A - National power grid mechanical external damage prediction method - Google Patents

National power grid mechanical external damage prediction method Download PDF

Info

Publication number
CN111310785A
CN111310785A CN202010041704.9A CN202010041704A CN111310785A CN 111310785 A CN111310785 A CN 111310785A CN 202010041704 A CN202010041704 A CN 202010041704A CN 111310785 A CN111310785 A CN 111310785A
Authority
CN
China
Prior art keywords
information
sub
historical
sample
power transmission
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010041704.9A
Other languages
Chinese (zh)
Inventor
吴和俊
熊志刚
王敏康
陆宇宁
程田宝
胡驰远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Huawang Information Technology Co ltd
Original Assignee
Hangzhou Huawang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Huawang Information Technology Co ltd filed Critical Hangzhou Huawang Information Technology Co ltd
Priority to CN202010041704.9A priority Critical patent/CN111310785A/en
Publication of CN111310785A publication Critical patent/CN111310785A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24317Piecewise classification, i.e. whereby each classification requires several discriminant rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for predicting mechanical external damage of a national power grid, which relates to the field of machine learning and is used for predicting mechanical external damage risks of a power tower pole, a power transmission line and a power transmission channel of the national power grid, and comprises the following steps: collecting historical information to form complete historical data, resampling the complete historical data, training a Catboost model, collecting current information to form complete current data, and predicting the mechanical external damage risk by using the trained Catboost model. The method provided by the invention can predict the probability and risk level of mechanical external damage risks of the power tower pole, the power transmission line and the power transmission channel, and deploy response measures in advance.

Description

National power grid mechanical external damage prediction method
[ technical field ] A method for producing a semiconductor device
The invention relates to the field of machine learning, in particular to a national power grid mechanical outages prediction method.
[ background of the invention ]
With the development of economy, infrastructure construction in various regions is actively carried out. In engineering construction, various large-scale engineering machines, such as cranes, excavators and the like, shuttle back and forth on a construction site, and bring about a great risk to nearby power towers and power transmission lines. The risk of damage to the power tower and the power transmission lines caused by the construction machine is called as the risk of external mechanical damage. Due to the fact that safety knowledge and safety consciousness of a few construction personnel have problems, the safety distance between the large-scale engineering machinery and a high-voltage line is difficult to control effectively, discharging and line collision accidents are prone to occurring, and large-area power failure is caused. The method not only brings personal injury to the offenders and causes loss to the power enterprises, but also affects the power consumption of factories and residents around the construction site. However, at present, the main measures for dealing with the mechanical external damage of the power tower and the power transmission line of the national power grid are still in the states of strengthening protection and emergency after problems occur, and no early warning is given in advance for the possibility of the mechanical external damage risk of the power tower, the power transmission line and the power transmission channel of the national power grid.
[ summary of the invention ]
In order to solve the problems, the invention provides a method for predicting mechanical external damage of a national power grid, which is used for predicting the probability and the risk level of mechanical external damage risks of a power tower pole, a power transmission line and a power transmission channel.
In order to achieve the purpose, the invention adopts the following technical scheme:
a national power grid mechanical external damage prediction method is used for predicting mechanical external damage risks of a power tower pole, a power transmission line and a power transmission channel of a national power grid, and comprises the following steps:
collecting historical information, and sorting the historical information to form complete historical data, wherein the complete historical data has a plurality of dimensions, and the dimensions are data characteristics;
resampling the complete historical data by adopting an SMOTE + Tomek Links algorithm to form a training data set;
training a Catboost model by using a training data set;
collecting current information, and sorting the current information to form complete current data;
and (4) based on the complete current data, performing mechanical outdamage risk prediction by using a trained Catboost model.
Optionally, the collected historical information includes deployment historical information of a field maintenance department, tower pole and line ledger historical information and meteorological historical information;
the on-site maintenance department deployment historical information comprises line defect sub-information, hidden danger sub-information and fault sub-information;
the weather history information comprises weather condition sub-information, temperature sub-information, humidity sub-information, wind speed sub-information and wind direction sub-information.
Optionally, sorting the historical information to form complete historical data specifically includes:
based on a power transmission line of a national power grid, taking historical information and sub-information thereof related to the same power transmission line as complete historical data, and taking different historical information and sub-information thereof as different dimensions of the complete historical data under the complete historical data;
sorting the historical information, and determining numerical value sub-information and/or non-numerical value sub-information under each piece of historical information;
completely supplementing missing numerical value sub information under each historical information;
carrying out one-hot coding on the non-numerical value sub-information under each historical information;
after the single-hot encoding, weather sub-information is constructed for the weather historical information, and the constructed weather sub-information comprises average statistics, maximum statistics and minimum statistics of daily temperature, daily humidity, daily wind speed and daily air pressure, and also comprises average monthly rainfall frequency statistics, average monthly snowfall frequency statistics and average monthly snowfall frequency statistics.
Optionally, when the missing numerical value sub-information under each historical information is completely supplemented, if the missing amount of the numerical value sub-information under a certain historical information exceeds half, deleting the historical information and the numerical value sub-information thereof; and if the missing amount of the numerical sub information under the historical information is not more than half, completely supplementing the historical information by using the average number, the median number or the row/column mode of the numerical sub information under the historical information.
Optionally, when the non-numerical value sub-information is subjected to the one-hot encoding, each non-numerical value sub-information under the same history information is respectively used as a state value, the number of bits of the state value is equal to the number of each non-numerical value sub-information under the history information, only one bit of each state value is 1, and the rest bits are 0.
Optionally, resampling the complete historical data by using the SMOTE + Tomek Links algorithm specifically includes:
screening a positive type sample from the complete historical data, wherein the positive type sample is a sample related to mechanical crushing in the complete historical data;
for the ith positive type sample xiUsing K nearest neighbor algorithm to obtain the distance ith positive sample xiThe nearest k positive samples are Euclidean distance of n-dimensional feature space between the positive samples, and then x is measured from the ith positive sampleiRandomly selecting one of the latest k positive samples to generate new data:
Figure BDA0002367989310000021
wherein x isnewIn order to be a new sample,
Figure BDA0002367989310000022
is a distance of the ith positive type sample xiThe most recent k positive class samples, δ ∈ [0,1 ]]A random number;
generating a new positive sample by adopting an SOMTE algorithm to obtain an expanded data set, finding out samples forming a Tomek Link pair in the expanded data set, wherein the data forming the Tomek Link pair meet the following conditions: in the sample set, sample xjAnd sample xkAre of different classes, sample xjAnd sample xkIs d (x)j,xk) If there is no sample xlSo that d (x)l,xj)<d(xj,xk) Or d (xl, x)k)<d(xj,xk) If true, then sample xjAnd sample xkForming a Tomek Link pair;
all samples that make up the Tomek Link pair are deleted.
Optionally, the parameters of the trained Catboost model include a learning rate learning _ rate, a tree maximum depth max _ depth, a maximum decision tree number iterations, an L2 regularization coefficient L2_ leaf _ reg, a loss function loss _ function, a numerical feature segmentation number border _ count, and a category feature segmentation number ctr _ border _ count.
Optionally, an AUC value output by the Catboost model is used as a fitness value of the Catboost model.
Optionally, the trained Catboost model is used for predicting mechanical external damage risks, including power tower pole prediction, power transmission line prediction and power transmission channel prediction;
dividing a power transmission line of a national power grid of a region to be predicted into first sections, wherein in any first section, the prediction result of any power tower pole is that mechanical outburst risk exists, and the mechanical outburst risk occurrence state of the power tower pole in the first section is risky; the mechanical external damage risk occurrence probability is the maximum probability that the prediction results of all the power towers in the first interval are at risk;
in any first interval, the prediction results of all the power tower poles are free of mechanical external damage risks, and the mechanical external damage risk occurrence state of the power tower poles in the first interval is free of risks;
the method comprises the following steps that the power transmission line is predicted to be in a risk occurrence state with the largest occurrence frequency in mechanical external damage risk occurrence states of all first section power towers in an area to be predicted, and the risk occurrence probability of the power transmission line is as follows: taking the maximum value of the mechanical external damage risk occurrence probability of all the power tower poles in the first interval in the area to be predicted;
dividing a power transmission channel of a national power grid of a region to be predicted into second regions, predicting the power transmission channel into a risk occurrence state with the largest occurrence frequency in mechanical external damage risk occurrence states of all second region power towers in the region to be predicted, wherein the risk occurrence probability of the power transmission channel is as follows: and taking the maximum value of the mechanical external damage risk occurrence probability of all the power tower poles in the second interval in the area to be predicted.
Optionally, the category of the collected current information is the same as that of the collected historical information; the steps of sorting the current information to form complete current data are the same as the steps of sorting the historical information to form complete historical data.
The invention has the following beneficial effects:
1. because the mechanical external broken data of the national power grid have different sources, more dirty data and unbalanced categories, the acquired data are processed, the missing data are supplemented, the non-numerical data are encoded, and the data are screened by adopting the SMOTE + Tomek Links algorithm, so that the negative influence on the algorithm effect caused by the imbalance of positive and negative samples is avoided, and the accuracy of the data and the accuracy of the prediction result are ensured;
2. the adoption of the Catboost model avoids overfitting of data in a training data set, reduces noise obtained from low-frequency categories, has excellent performance and robustness, is easy to use, further ensures the accuracy and stability of a prediction result, finds risks in time and avoids the risks, determines treatment measures in an auxiliary mode according to risk levels, reduces the loss of manpower and material resources, and improves the working efficiency of staff.
These features and advantages of the present invention will be disclosed in more detail in the following detailed description and the accompanying drawings. The best mode or means of the present invention will be described in detail with reference to the accompanying drawings, but the present invention is not limited thereto. In addition, the features, elements and components appearing in each of the following and in the drawings are plural and different symbols or numerals are labeled for convenience of representation, but all represent components of the same or similar construction or function.
[ description of the drawings ]
The invention will be further described with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of an embodiment of the present invention.
[ detailed description ] embodiments
The technical solutions of the embodiments of the present invention are explained and illustrated below with reference to the drawings of the embodiments of the present invention, but the following embodiments are only preferred embodiments of the present invention, and not all embodiments. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative effort belong to the protection scope of the present invention.
Reference in the specification to "one embodiment" or "an example" means that a particular feature, structure or characteristic described in connection with the embodiment itself may be included in at least one embodiment of the patent disclosure. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
Example (b):
as shown in fig. 1, the present embodiment provides a method for predicting mechanical external damage of a national power grid, which is used for predicting mechanical external damage risk of a power transmission line and a power tower of the national power grid, and includes the following steps:
acquiring historical information, wherein the historical information comprises on-site maintenance department deployment historical information, tower pole and line ledger historical information and meteorological historical information, and the on-site maintenance department deployment historical information comprises line defect sub-information, hidden danger sub-information and fault sub-information; the weather history information comprises weather condition sub-information, temperature sub-information, humidity sub-information, wind speed sub-information and wind direction sub-information.
And sorting the collected historical information to form complete historical data, wherein the complete historical data has a plurality of dimensionalities, and the dimensionalities are data characteristics. Because the mechanical external broken data of the national power grid have different sources and more dirty data, collected historical information needs to be processed, missing data needs to be supplemented, negative effects of the dirty data and the missing data on the subsequent algorithm effect are avoided, and non-numerical data needs to be coded, so that the data can be read and identified by a computer conveniently. The step of sorting the historical information to form complete historical data specifically comprises the following substeps:
based on the power transmission line of the national power grid, the historical information and the sub-information related to the same power transmission line are taken as a complete historical data. Under the condition of a complete historical data, different historical information and sub information thereof are used as different dimensionalities under the complete historical data;
sorting the historical information, determining numerical sub-information and/or non-numerical sub-information under each historical information,
and (3) completely supplementing missing numerical value sub-information under each historical information: if the missing quantity of the numerical value sub information under a certain historical information exceeds half, deleting the historical information and the numerical value sub information thereof; if the missing amount of the numerical value sub information under the historical information is not more than half, the historical information of the missing numerical value sub information is reasonably filled by adopting the similar numerical value sub information, and the average number, the median number or the row/column mode of the numerical value sub information under the historical information can be used for completely supplementing the historical information.
Carrying out one-hot coding on non-numerical value sub-information under each historical information: and respectively taking each non-numerical value sub-information under the same historical information as a state value, wherein the number of bits of the state value is equal to that of each non-numerical value sub-information under the historical information, only one bit of each state value is 1, and the rest bits are 0. After the single-hot coding is carried out, the non-numerical value sub-information under each historical information can be read, identified and calculated by a computer.
After the single-hot encoding, weather sub-information is constructed for the weather historical information, and the constructed weather sub-information comprises average statistics, maximum statistics and minimum statistics of daily temperature, daily humidity, daily wind speed and daily air pressure, and also comprises average monthly rainfall frequency statistics, average monthly snowfall frequency statistics and average monthly snowfall frequency statistics.
The sorting and completion of the historical information are completed, and after complete historical data are formed, the complete historical data of the mechanical external damage risk are unbalanced data because the mechanical external damage risk is less in the real environment. In order to avoid the influence of unbalanced data on the subsequent algorithm effect and ensure the accuracy of the data and the accuracy of the prediction result, the embodiment uses the SMOTE + Tomek Links algorithm to resample the complete historical data to form a training data set, and specifically includes the following sub-steps:
screening a positive sample from the complete historical data, wherein the positive sample is a sample related to mechanical crushing in the complete historical sample;
for the ith positive type sample xiUsing K nearest neighbor algorithm to obtain the distance ith positive sample xiThe nearest k positive samples are from Euclidean distance of n-dimensional feature space between the positive samples, and then from the ith positive sample xiRandomly selecting one of the latest k positive samples to generate a new sample:
Figure BDA0002367989310000051
wherein x isnewIn order to be a new sample,
Figure BDA0002367989310000052
is a distance of the ith positive type sample xiThe most recent k positive class samples, δ ∈ [0,1 ]]A random number;
the main idea of SMOTE algorithm is to generate new samples in some closely located positive samples to balance the classes. Since the SMOTE algorithm is not simply a copy of the positive type samples, overfitting of the positive type samples can be avoided to some extent.
Finding out two samples forming the Tomek Link pair from a sample set consisting of the new sample and the screened positive samples, wherein the samples forming the Tomek Link pair meet the following conditions: in the sample set, sample xjAnd sample xkAre of different classes, sample xjAnd sample xkIs d (x)j,xk) If there is no sample xlSo that d (x)l,xj)<d(xj,xk) Or d (xl, x)k)<d(xj,xk) If true, then sample xjAnd sample xkForming a Tomek Link pair;
all samples that make up the Tomek Link pair are deleted.
The Tomek Links algorithm is mainly used for resampling and cleaning complete historical data.
The SMOTE + Tomek Link algorithm is to combine an SMOTE up-sampling method with a Tomek Link down-sampling method. Firstly, generating a new positive sample by using a SMOTE method, and deleting a Tomek Link pair after obtaining an expanded sample set. The few samples generated by the common SMOTE method are obtained through linear difference values, the sample space of the positive sample is expanded while the class distribution is balanced, and the problem that the space which is not the positive sample possibly invades the positive sample is easily caused by overfitting of the model is generated. The Tomek links are then used to find noise points or boundary points in order to solve the problem of "intrusion" of the positive type samples into the space that originally belongs to the samples that are not the positive type samples.
By now, the conversion from the history information to the training data set has been completed, and the resulting training data set is used to train the Catboost model. The parameters of the Catboost model which are trained comprise a learning rate learning _ rate, a tree maximum depth max _ depth, a maximum decision tree number iterations, an L2 regularization coefficient L2_ leaf _ reg, a loss function loss _ function, a numerical feature segmentation number border _ count and a category feature segmentation number ctr _ border _ count; and the AUC value output by the Catboost model is used as the fitness value of the Catboost model.
The Catboost model is a new open-source gradient lifting framework, and has the following advantages:
automatically processing the category characteristics: the traditional ensemble learning algorithm is directly converted into numerical features for category feature processing, such as one-hot coding, but the category features are not divided by the size. Another way to deal with class features is to compute some statistics using tags, but with a tendency to over-fit. In order to avoid overfitting, the Catboost model uses a more effective strategy to randomly arrange the input set to generate a plurality of random arrangements, and the addition of the prior value helps to reduce noise obtained from the low-frequency category.
Combining the characteristics: the Catboost model adopts a symmetric completion binary tree, each time two paths are divided, the dividing sequence is random, the dimension number is not reduced after the feature is divided, but the feature used for dividing can be combined with another class feature to form a new feature, namely the most optimal solution is selected from all possible combinations.
Overcoming gradient deviation: the Catboost model, like all standard gradient boosting algorithms, fits the gradient of the current model by building a new tree. However, all classical lifting algorithms suffer from overfitting problems caused by biased point-state gradient estimation. Many algorithms that utilize GBDT technology (e.g., XGBoost, LightGBM) construct a tree in two stages: selecting a tree structure and calculating the values of the leaf nodes after the tree structure is fixed. To select the best tree structure, the algorithm constructs the tree by enumerating the different segmentations, computing values in the resulting leaf nodes, then computing scores for the resulting tree, and finally selecting the best segmentation. The values of the leaf nodes of both stages are calculated as approximations of the gradient or newton step size. In this embodiment, the first phase of the CatBoost adopts unbiased estimation of gradient step size, and the second phase is executed by using a conventional GBDT scheme.
The Catboost model has excellent performance and robustness, is easy to use, further ensures the accuracy and stability of a prediction result so as to find risks in time and avoid the risks, determines treatment measures in an auxiliary manner according to risk grades, reduces the loss of manpower and material resources and improves the working efficiency of staff.
Collecting current information, and sorting the current information to form complete current data, wherein the category of the collected current information is the same as that of the collected historical information; the steps of sorting the current information to form complete current data are the same as the steps of sorting the historical information to form complete historical data, and are not repeated here.
And based on the complete current data, performing mechanical outdamage risk prediction by using a trained Catboost model, wherein the mechanical outdamage risk prediction comprises power tower pole prediction, power transmission line prediction and power transmission channel prediction. Since the modeling object is a power tower and the final object of the service estimation includes a power transmission line and a power transmission channel, in this embodiment, the risk states of the power transmission line and the power transmission channel are estimated and predicted by adopting a divide-and-conquer concept, that is, the national power grid of the region to be predicted is divided into sections, the power transmission line and the power transmission channel are regarded as being formed by a plurality of sections, and each section is regarded as being formed by a plurality of continuous power towers. Therefore, the risk state of the whole power transmission line and power transmission channel can be estimated by predicting the risk state of the power tower. However, since the national standard makes a standard regulation on the power transmission channel, the division of the power transmission channel into sections should be performed based on the regulation on the power transmission channel made by the national standard, and the division manner is also specified in the national standard, and thus is performed according to the national standard regulation; the division of the power transmission line can be flexibly performed according to the actual prediction requirement, and the division is performed by using a 3km by 3km grid in the embodiment. The different division basis of the power transmission line and the power transmission channel results in that the section division for the power transmission line and the section division for the power transmission channel are not common, and in this embodiment, the section division for the power transmission line is taken as a first section, and the section division for the power transmission channel is taken as a second section, so as to show the difference:
in any first interval, if the prediction result of any power tower pole is that the mechanical external damage risk exists, the mechanical external damage risk occurrence state of the power tower pole in the first interval is risky; the trained Catboost model can directly calculate the probability that the prediction result of any power tower pole is at risk, and in the embodiment, the occurrence probability of mechanical external damage risk in a certain first interval is the maximum value of the probabilities that the prediction results of all power tower poles in the first interval are at risk;
in any first interval, the prediction results of all the power tower poles are free of mechanical external damage risks, and the mechanical external damage risk occurrence state of the power tower poles in the first interval is free of risks;
the method comprises the following steps that the power transmission line is predicted to be in a risk occurrence state with the largest occurrence frequency in mechanical external damage risk occurrence states of all first section power towers in an area to be predicted, and the risk occurrence probability of the power transmission line is as follows: taking the maximum value of the mechanical external damage risk occurrence probability of all the power tower poles in the first interval in the area to be predicted;
the prediction of the power transmission channel is the risk occurrence state with the largest occurrence frequency in the mechanical external damage risk occurrence states of all second interval power tower poles in the area to be predicted, and the power transmission channel risk occurrence probability is as follows: and taking the maximum value of the mechanical external damage risk occurrence probability of all the power tower poles in the second interval in the area to be predicted.
The predicted results are generally shown in the following table:
Figure BDA0002367989310000071
while the present invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Any modification which does not depart from the functional and structural principles of the present invention is intended to be included within the scope of the claims.

Claims (10)

1. The method for predicting the mechanical external damage of the national power grid is used for predicting the mechanical external damage risk of a power tower pole, a power transmission line and a power transmission channel of the national power grid, and comprises the following steps:
collecting historical information, and sorting the historical information to form complete historical data, wherein the complete historical data has a plurality of dimensions, and the dimensions are data characteristics;
resampling the complete historical data by adopting an SMOTE + Tomek Links algorithm to form a training data set;
training a Catboost model by using a training data set;
collecting current information, and sorting the current information to form complete current data;
and (4) based on the complete current data, performing mechanical outdamage risk prediction by using a trained Catboost model.
2. The method for predicting the mechanical damage of the national power grid according to claim 1, wherein the collected historical information comprises on-site maintenance department deployment historical information, tower pole and line ledger historical information and meteorological historical information;
the on-site maintenance department deployment historical information comprises line defect sub-information, hidden danger sub-information and fault sub-information;
the weather history information comprises weather condition sub-information, temperature sub-information, humidity sub-information, wind speed sub-information and wind direction sub-information.
3. The method for predicting the mechanical damage of the national grid according to claim 2, wherein the step of sorting the historical information to form complete historical data specifically comprises the steps of:
based on a power transmission line of a national power grid, taking historical information and sub-information thereof related to the same power transmission line as complete historical data, and taking different historical information and sub-information thereof as different dimensions of the complete historical data under the complete historical data;
sorting the historical information, and determining numerical value sub-information and/or non-numerical value sub-information under each piece of historical information;
completely supplementing missing numerical value sub information under each historical information;
carrying out one-hot coding on the non-numerical value sub-information under each historical information;
after the single-hot encoding, weather sub-information is constructed for the weather historical information, and the constructed weather sub-information comprises average statistics, maximum statistics and minimum statistics of daily temperature, daily humidity, daily wind speed and daily air pressure, and also comprises average monthly rainfall frequency statistics, average monthly snowfall frequency statistics and average monthly snowfall frequency statistics.
4. The method for predicting the mechanical damage of the national grid according to claim 3, wherein when the missing numerical sub-information under each historical information is completely supplemented, if the missing amount of the numerical sub-information under a certain historical information exceeds half, the historical information and the numerical sub-information thereof are deleted; and if the missing amount of the numerical sub information under the historical information is not more than half, completely supplementing the historical information by using the average number, the median number or the row/column mode of the numerical sub information under the historical information.
5. The method for predicting the mechanical damage of the national grid according to claim 3, wherein when the non-numerical value sub-information is subjected to the one-hot encoding, each non-numerical value sub-information in the same historical information is respectively used as a state value, the number of bits of the state value is equal to the number of each non-numerical value sub-information in the historical information, only one bit in each state value is 1, and the rest bits are 0.
6. The method for predicting the mechanical damage of the national grid according to claim 1, wherein the resampling of the complete historical data by adopting the SMOTE + TomekLinks algorithm specifically comprises:
screening a positive type sample from the complete historical data, wherein the positive type sample is data related to mechanical break in the complete historical data;
for the ith positive type sample xiUsing K nearest neighbor algorithm to obtain the distance ith positive sample xiThe nearest k positive samples are Euclidean distance of n-dimensional feature space between the positive samples, and then x is measured from the ith positive sampleiRandomly selecting one of the latest k positive samples to generate a new sample:
Figure FDA0002367989300000021
wherein x isnewIn order to be a new sample,
Figure FDA0002367989300000022
is a distance of the ith positive type sample xiThe most recent k positive class samples, δ ∈ [0,1 ]]A random number;
generating a new positive sample by adopting an SOMTE algorithm to obtain an expanded data set, finding out samples forming a Tomek Link pair in the expanded data set, wherein the samples forming the Tomek Link pair meet the following conditions: in the sample set, sample xjAnd sample xkAre of different classes, sample xjAnd sample xkIs d (x)j,xk) If there is no sample xlSo that d (x)l,xj)<d(xj,xk) Or d (xl, x)k)<d(xj,xk) If true, then sample xjAnd sample xkForming a Tomek Link pair;
all samples that make up the Tomek Link pair are deleted.
7. The national grid mechanical damage prediction method of claim 1, wherein the trained parameters of the Catboost model comprise a learning rate learning _ rate, a tree maximum depth max _ depth, a maximum decision tree number iterations, an L2 regularization coefficient L2_ leaf _ reg, a loss function loss _ function, a numerical feature segmentation number border _ count, and a category feature segmentation number ctr _ border _ count.
8. The method for predicting the mechanical damage of the national grid according to claim 1, wherein an AUC value output by the Catboost model is used as a fitness value of the Catboost model.
9. The national power grid mechanical outcropping prediction method according to claim 1, wherein a trained Catboost model is used for mechanical outcropping risk prediction, including power tower pole prediction, power transmission line prediction and power transmission channel prediction;
dividing a power transmission line of a national power grid of a region to be predicted into first sections, wherein in any first section, the prediction result of any power tower pole is that mechanical outburst risk exists, and the mechanical outburst risk occurrence state of the power tower pole in the first section is risky; the mechanical external damage risk occurrence probability is the maximum probability that the prediction results of all the power towers in the first interval are at risk;
in any first interval, the prediction results of all the power tower poles are free of mechanical external damage risks, and the mechanical external damage risk occurrence state of the power tower poles in the first interval is free of risks;
the method comprises the following steps that the power transmission line is predicted to be in a risk occurrence state with the largest occurrence frequency in mechanical external damage risk occurrence states of all first section power towers in an area to be predicted, and the risk occurrence probability of the power transmission line is as follows: taking the maximum value of the mechanical external damage risk occurrence probability of all the power tower poles in the first interval in the area to be predicted;
dividing a power transmission channel of a national power grid of a region to be predicted into second regions, predicting the power transmission channel into a risk occurrence state with the largest occurrence frequency in mechanical external damage risk occurrence states of all second region power towers in the region to be predicted, wherein the risk occurrence probability of the power transmission channel is as follows: and taking the maximum value of the mechanical external damage risk occurrence probability of all the power tower poles in the second interval in the area to be predicted.
10. The national grid mechanical damage prediction method according to one of claims 1 to 9, wherein the collected current information is of the same category as the collected historical information; the steps of sorting the current information to form complete current data are the same as the steps of sorting the historical information to form complete historical data.
CN202010041704.9A 2020-01-15 2020-01-15 National power grid mechanical external damage prediction method Pending CN111310785A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010041704.9A CN111310785A (en) 2020-01-15 2020-01-15 National power grid mechanical external damage prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010041704.9A CN111310785A (en) 2020-01-15 2020-01-15 National power grid mechanical external damage prediction method

Publications (1)

Publication Number Publication Date
CN111310785A true CN111310785A (en) 2020-06-19

Family

ID=71148745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010041704.9A Pending CN111310785A (en) 2020-01-15 2020-01-15 National power grid mechanical external damage prediction method

Country Status (1)

Country Link
CN (1) CN111310785A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113391256A (en) * 2021-05-28 2021-09-14 国网河北省电力有限公司营销服务中心 Electric energy meter metering fault analysis method and system of field operation terminal

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845731A (en) * 2017-02-20 2017-06-13 重庆邮电大学 A kind of potential renewal user based on multi-model fusion has found method
CN107341497A (en) * 2016-11-11 2017-11-10 东北大学 The unbalanced weighting data streams Ensemble classifier Forecasting Methodology of sampling is risen with reference to selectivity
CN107807309A (en) * 2017-10-27 2018-03-16 广东电网有限责任公司中山供电局 A kind of transmission line malfunction method for early warning and system based on big data
CN108375715A (en) * 2018-03-08 2018-08-07 中国电力科学研究院有限公司 A kind of distribution network line fault risk day prediction technique and system
CN108764544A (en) * 2018-05-17 2018-11-06 广东电网有限责任公司 Circuit hidden danger prediction technique and device
CN108898247A (en) * 2018-06-22 2018-11-27 国网湖南省电力有限公司 A kind of power grid Rainfall Disaster Risk Forecast Method, system and storage medium
CN109038813A (en) * 2018-07-26 2018-12-18 安徽南瑞继远电网技术有限公司 Power transmission line intelligent managing and control system
CN109255506A (en) * 2018-11-22 2019-01-22 重庆邮电大学 A kind of internet finance user's overdue loan prediction technique based on big data
CN109359896A (en) * 2018-12-10 2019-02-19 国网福建省电力有限公司 A kind of Guangdong power system method for prewarning risk based on SVM
CN109785289A (en) * 2018-12-18 2019-05-21 中国科学院深圳先进技术研究院 A kind of transmission line of electricity defect inspection method, system and electronic equipment
CN109978039A (en) * 2019-03-19 2019-07-05 南京邮电大学 A kind of lower fan blade icing prediction technique based on unbalanced dataset situation
CN110210686A (en) * 2019-06-13 2019-09-06 郑州轻工业学院 A kind of electricity charge risk model construction method of electric power big data
CN110414716A (en) * 2019-07-03 2019-11-05 北京科技大学 A kind of enterprise based on LightGBM breaks one's promise probability forecasting method and system
CN110675243A (en) * 2019-08-30 2020-01-10 北京银联金卡科技有限公司 Machine learning-fused credit prediction overdue method and system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341497A (en) * 2016-11-11 2017-11-10 东北大学 The unbalanced weighting data streams Ensemble classifier Forecasting Methodology of sampling is risen with reference to selectivity
CN106845731A (en) * 2017-02-20 2017-06-13 重庆邮电大学 A kind of potential renewal user based on multi-model fusion has found method
CN107807309A (en) * 2017-10-27 2018-03-16 广东电网有限责任公司中山供电局 A kind of transmission line malfunction method for early warning and system based on big data
CN108375715A (en) * 2018-03-08 2018-08-07 中国电力科学研究院有限公司 A kind of distribution network line fault risk day prediction technique and system
CN108764544A (en) * 2018-05-17 2018-11-06 广东电网有限责任公司 Circuit hidden danger prediction technique and device
CN108898247A (en) * 2018-06-22 2018-11-27 国网湖南省电力有限公司 A kind of power grid Rainfall Disaster Risk Forecast Method, system and storage medium
CN109038813A (en) * 2018-07-26 2018-12-18 安徽南瑞继远电网技术有限公司 Power transmission line intelligent managing and control system
CN109255506A (en) * 2018-11-22 2019-01-22 重庆邮电大学 A kind of internet finance user's overdue loan prediction technique based on big data
CN109359896A (en) * 2018-12-10 2019-02-19 国网福建省电力有限公司 A kind of Guangdong power system method for prewarning risk based on SVM
CN109785289A (en) * 2018-12-18 2019-05-21 中国科学院深圳先进技术研究院 A kind of transmission line of electricity defect inspection method, system and electronic equipment
CN109978039A (en) * 2019-03-19 2019-07-05 南京邮电大学 A kind of lower fan blade icing prediction technique based on unbalanced dataset situation
CN110210686A (en) * 2019-06-13 2019-09-06 郑州轻工业学院 A kind of electricity charge risk model construction method of electric power big data
CN110414716A (en) * 2019-07-03 2019-11-05 北京科技大学 A kind of enterprise based on LightGBM breaks one's promise probability forecasting method and system
CN110675243A (en) * 2019-08-30 2020-01-10 北京银联金卡科技有限公司 Machine learning-fused credit prediction overdue method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUI YANG等: "The Use of Data Mining Methods for the Prediction of Dementia: Evidence From the English Longitudinal Study of Aging", 《IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS》 *
邵亚洁: "基于复合CatBoost模型的P2P网贷违约分类预测", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113391256A (en) * 2021-05-28 2021-09-14 国网河北省电力有限公司营销服务中心 Electric energy meter metering fault analysis method and system of field operation terminal
CN113391256B (en) * 2021-05-28 2022-07-22 国网河北省电力有限公司营销服务中心 Electric energy meter metering fault analysis method and system of field operation terminal

Similar Documents

Publication Publication Date Title
CN112749904A (en) Power distribution network fault risk early warning method and system based on deep learning
CN110738355B (en) Urban waterlogging prediction method based on neural network
KR100756265B1 (en) System and method for predicting a coastal inundation using a scenario of a storm surge
CN111275193A (en) National power grid lightning stroke prediction method
CN112598883B (en) Power transmission line lightning-fall probability early warning method and early warning system based on Bayesian network
CN115099335A (en) Abnormal identification and feature screening method and system for multi-source heterogeneous data
Liu et al. A comprehensive risk analysis of transportation networks affected by rainfall‐induced multihazards
CN112232591A (en) Icing thickness intelligent early warning method based on meteorological factors
Zhao et al. A spatial case-based reasoning method for regional landslide risk assessment
CN111310785A (en) National power grid mechanical external damage prediction method
CN117332291B (en) Distributed photovoltaic-oriented regional resource monitoring layout method and system
CN111222709B (en) National power grid tree line discharge prediction method
CN111210086B (en) National power grid icing disaster prediction method
CN116307287B (en) Prediction method, system and prediction terminal for effective period of photovoltaic power generation
CN116663393A (en) Random forest-based power distribution network continuous high-temperature fault risk level prediction method
CN115423810B (en) Blade icing form analysis method for wind generating set
CN116703004A (en) Water system river basin intelligent patrol method and device based on pre-training model
CN116452850A (en) Road ponding area identification method based on data mining and deep learning
CN111291027B (en) Data preprocessing method
CN115511280A (en) Urban flood toughness evaluation method based on multi-mode data fusion
CN114564889A (en) Power distribution network gale disaster early warning method based on PCA model
CN111696330B (en) Classification method and system for wind disaster of power transmission line
CN111966758A (en) Power hidden danger checking method based on portrait data analysis technology
CN111275298B (en) Geological disaster risk early warning method based on power grid
CN114418194B (en) Tower damage prediction method and device based on data driving and model driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200619

RJ01 Rejection of invention patent application after publication