CN110690701A - Analysis method for influence factors of abnormal line loss - Google Patents
Analysis method for influence factors of abnormal line loss Download PDFInfo
- Publication number
- CN110690701A CN110690701A CN201910983595.XA CN201910983595A CN110690701A CN 110690701 A CN110690701 A CN 110690701A CN 201910983595 A CN201910983595 A CN 201910983595A CN 110690701 A CN110690701 A CN 110690701A
- Authority
- CN
- China
- Prior art keywords
- data
- line loss
- value
- wire
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 18
- 238000004458 analytical method Methods 0.000 title claims description 26
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 238000007637 random forest analysis Methods 0.000 claims abstract description 17
- 238000012360 testing method Methods 0.000 claims abstract description 16
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 9
- 238000005065 mining Methods 0.000 claims abstract 2
- 238000012545 processing Methods 0.000 claims description 21
- 238000007418 data mining Methods 0.000 claims description 12
- 238000013500 data storage Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000001422 normality test Methods 0.000 claims description 6
- 238000010219 correlation analysis Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 241000512668 Eunectes Species 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims description 3
- 230000007812 deficiency Effects 0.000 claims description 3
- 230000005611 electricity Effects 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 239000012212 insulator Substances 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 abstract description 6
- 230000009467 reduction Effects 0.000 abstract description 6
- 230000005540 biological transmission Effects 0.000 abstract description 4
- 238000012216 screening Methods 0.000 abstract description 4
- 230000008859 change Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 abstract description 3
- 238000007726 management method Methods 0.000 description 9
- 238000003066 decision tree Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000000556 factor analysis Methods 0.000 description 3
- 238000002790 cross-validation Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 230000005426 magnetic field effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000009418 renovation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Abstract
The invention discloses a method for analyzing influence factors of abnormal line loss, which comprises the following steps: s1, selecting a technical route, S2, preparing data, S3 and mining and analyzing the data. On one hand, a transportation inspection responsible person of a test point application unit can make a reasonable threshold value based on a current line loss prediction result, and when the line loss exceeds an acceptable range, the inspection is carried out; on the other hand, according to the model result, the influence degree of each factor change on the line loss can be analyzed, corresponding loss reduction measures are made, the transmission loss, particularly variable loss, of the power grid is reduced, and in the characteristic factor screening process, an algorithm combining characteristic correlation and characteristic importance is adopted, so that the selection of the influence factor of the line loss is more reasonable. Meanwhile, the training set of the random forest algorithm is random and the features are random, so that the algorithm is suitable for high-dimensional data and is more accurate in prediction.
Description
Technical Field
The invention relates to the technical field of electric power, in particular to an analysis method for influence factors of abnormal line loss.
Background
Line loss is the loss of electrical energy during transmission of the electrical power grid due to resistance effects, magnetic field effects and management aspects. The line loss can comprehensively reflect the planning design of the power grid, the quality of the running state of the power grid and the level of the management and operation level of the power grid. The line loss rate is abnormal due to reasons such as improper management and technology, and energy waste is caused, so line loss management work must be carried out, and the management level of line loss and the fine level of power grid operation management are effectively improved. In the line loss management work, analyzing abnormal line loss occupies a very important position. The work can effectively classify abnormal line loss, factors such as the cause, the property, the proportion of each component and the like of the unit line loss can be deeply known through line loss analysis, main factors influencing the loss are found out, corresponding measures are pertinently taken, and a larger loss reduction effect and economic benefit are obtained with less investment.
The traditional line loss management method is that a distribution network is partitioned, divided into partial areas, component partitions and distribution areas through professional cooperation of multiple systems and data resource fusion, on-line monitoring is conducted on all the parts, and then factors are eliminated one by one. Therefore, the difficulty of power grid management is increasing day by day, and the line loss renovation work becomes a problem which needs to be solved at present.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides an abnormal line loss influence factor analysis method which is based on line loss prediction and influence factor analysis, develops line technology loss reduction, adopts measures of adjusting a power grid operation mode, controlling reactive power, adjusting voltage and the like, provides decision support for the line technology loss reduction, and assists a power grid to operate efficiently and economically.
In order to achieve the purpose, the invention adopts the following technical scheme:
an analysis method for influence factors of abnormal line loss comprises the following steps:
s1, selecting a technical route: the technical route selection comprises a data storage part, a data calculation processing part and a data mining part;
the data storage part considers the requirements of project data collection and storage, selects an Oracle database for storage at the early stage, and selects a Hive (data warehouse) of a large data platform for data storage according to the data volume condition at the later stage;
the data calculation processing part is used for calculating, processing and training the data by using Pycharm, Anaconda and other related integrated tool environments;
the data mining analysis part is used for data preparation and data processing;
secondly, data rules are explored through data distribution, trend analysis, characteristic correlation analysis and other ways, and basis is provided for subsequent modeling;
analyzing line loss influence factors based on feature correlation and feature importance;
fourthly, constructing a line loss prediction model based on a random forest regression algorithm, and finally integrating to form an online application result;
s2, preparing data;
preprocessing data; the data of the section area of the wire is empty or modified according to the data which does not conform to the type of the wire: firstly, obtaining a model corresponding to data that the wire sectional area data is empty or does not conform to the wire model, then processing the model data to obtain the wire sectional area corresponding to the model, and finally filling a missing value or an abnormal value of the wire sectional area according to the corresponding relation: modifying the tower span data to be null or 0 value data: filling the tower span missing value by using the median; and (3) modifying the tower altitude for null data: filling with an average value: processing the data of the diameter of the insulator core rod to be empty or 0 value: filling or replacing according to the diameter of the core rod of the corresponding manufacturer and model; filling missing values of voltage and current: performing filling-up on the voltage or current data with the deficiency by adopting a smoothing method so as to fit and form a complete 96-point power utilization curve; if the numerical value exceeds the normal reasonable range and sudden increase and sudden decrease exist, the numerical value is regarded as a pulse value, and the pulse value is emptied and then is filled; deleting fields, such as component numbers, component names, line main keys, dates and the like, which are irrelevant to line loss analysis in the data width table;
checking normality; using a normality test method to perform normality test on data in a line loss data wide table of irrelevant fields such as deleted element numbers, wherein the original hypothesis of the test is that a test sample conforms to normal distribution, a judgment threshold value is 0.001, if the numerical value of a test result is less than 0.001, the original hypothesis is rejected, namely the data distribution does not conform to the normal distribution, the test results of all the fields in the wide table are less than the threshold value, all the fields do not conform to the normal distribution, and when the characteristic correlation is calculated, a calculation method based on the normal distribution cannot be used;
s3, data mining analysis: based on four types of parameter data such as weather, equipment, operation, power failure and the like, wide-list data based on line loss influence factor service are generated in an integrated mode, line loss influence factors are analyzed from two aspects of feature relevance and feature importance, data items with higher feature relevance and feature importance in a sorted mode are selected respectively, and the final line loss influence factors are determined and obtained.
Preferably, in step S2, the data source understands, based on the analysis in the service demand, that the internal data of the electric power for the analysis is obtained from the electric power information systems such as PMS2.0, marketing, synchronization line loss, and acquisition, the data period is from 2018 to 2019, and the total number of data records is about 152 ten thousand.
Preferably, the step S2 further includes: in the reliability aspect, line equipment data come from a PMS2.0 system, operation parameter data come from an electricity utilization information acquisition system, and although certain abnormal acquisition data exist, subsequent data mining is not influenced; in the aspect of integrity, the data of the line equipment has the problems that the data of the section area of the wire is empty or does not conform to the type of the wire, the data of the span of the tower is empty or 0 value, the altitude of the tower is empty, and the diameter of the core rod is empty or 0 value, and can be improved through subsequent data processing.
The invention has the following beneficial effects:
1. on one hand, a transportation inspection responsible person of a test point application unit can make a reasonable threshold value based on a current line loss prediction result, and when the line loss exceeds an acceptable range, the inspection is carried out; on the other hand, according to the model result, the influence degree of each factor change on the line loss can be analyzed, and corresponding loss reduction measures are made, so that the transmission loss, particularly variable loss, of the power grid is reduced.
2. In the process of screening the characteristic factors, an algorithm combining the characteristic correlation and the characteristic importance is adopted, so that the selection of the line loss influence factors is more reasonable. Meanwhile, the training set of the random forest algorithm is random and the features are random, so that the algorithm is suitable for high-dimensional data and is more accurate in prediction.
Detailed Description
In order that the above objects, features and advantages of the present invention will be readily understood, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as broadly as the present invention is capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
An analysis method for influence factors of abnormal line loss comprises the following steps:
s1, selecting a technical route: the technical route selection comprises a data storage part, a data calculation processing part and a data mining part;
the data storage part considers the requirements of project data collection and storage, selects an Oracle database for storage at the early stage, and selects a Hive (data warehouse) of a large data platform for data storage according to the data volume condition at the later stage;
the data calculation processing part is used for calculating, processing and training the data by using Pycharm, Anaconda and other related integrated tool environments;
the data mining analysis part is used for data preparation and data processing;
secondly, data rules are explored through data distribution, trend analysis, characteristic correlation analysis and other ways, and basis is provided for subsequent modeling;
analyzing line loss influence factors based on feature correlation and feature importance;
fourthly, constructing a line loss prediction model based on a random forest regression algorithm, and finally integrating to form an online application result;
s2, preparing data;
preprocessing data; the data of the section area of the wire is empty or modified according to the data which does not conform to the type of the wire: firstly, obtaining a model corresponding to data that the wire sectional area data is empty or does not conform to the wire model, then processing the model data to obtain the wire sectional area corresponding to the model, and finally filling a missing value or an abnormal value of the wire sectional area according to the corresponding relation: modifying the tower span data to be null or 0 value data: filling the tower span missing value by using the median; and (3) modifying the tower altitude for null data: filling with an average value: processing the data of the diameter of the insulator core rod to be empty or 0 value: filling or replacing according to the diameter of the core rod of the corresponding manufacturer and model; filling missing values of voltage and current: performing filling-up on the voltage or current data with the deficiency by adopting a smoothing method so as to fit and form a complete 96-point power utilization curve; if the numerical value exceeds the normal reasonable range and sudden increase and sudden decrease exist, the numerical value is regarded as a pulse value, and the pulse value is emptied and then is filled; deleting fields, such as component numbers, component names, line main keys, dates and the like, which are irrelevant to line loss analysis in the data width table;
checking normality; using a normality test method to perform normality test on data in a line loss data wide table of irrelevant fields such as deleted element numbers, wherein the original hypothesis of the test is that a test sample conforms to normal distribution, a judgment threshold value is 0.001, if the numerical value of a test result is less than 0.001, the original hypothesis is rejected, namely the data distribution does not conform to the normal distribution, the test results of all the fields in the wide table are less than the threshold value, all the fields do not conform to the normal distribution, and when the characteristic correlation is calculated, a calculation method based on the normal distribution cannot be used;
in step S2, the data source is understood based on the analysis in the service demand, the internal data of the electric power of the analysis is obtained from the electric power information systems such as PMS2.0, marketing, line loss at the same period, utilization and collection, the data period is from 2018 to 2019 and 6, and the total number of data records is about 152 ten thousand.
In step S2, the method further includes: in the reliability aspect, line equipment data come from a PMS2.0 system, operation parameter data come from an electricity utilization information acquisition system, and although certain abnormal acquisition data exist, subsequent data mining is not influenced; in the aspect of integrity, the data of the line equipment has the problems that the data of the section area of the wire is empty or does not conform to the type of the wire, the data of the span of the tower is empty or 0 value, the altitude of the tower is empty, and the diameter of the core rod is empty or 0 value, and can be improved through subsequent data processing.
S3, data mining analysis: based on four types of parameter data such as weather, equipment, operation, power failure and the like, wide-list data based on line loss influence factor service are generated in an integrated mode, line loss influence factors are analyzed from two aspects of feature relevance and feature importance, data items with higher feature relevance and feature importance in a sorted mode are selected respectively, and the final line loss influence factors are determined and obtained.
Supplementary explanation:
feature correlation analysis
Line loss classification
And classifying the data in the data width table according to the line loss. The classification criteria and their class names are as follows:
table1 line loss classification
Classification criteria | Category name |
Line loss<-0.8% | Negative loss |
-0.8%<Loss of wire = line<0 | Normal 0 |
0<Loss of wire = line<=3% | Normal 1 |
Line loss>3% | High loss |
Computing feature correlations
As can be seen from data exploration, the data are not in accordance with normal distribution, and the sperman correlation algorithm is used for calculating the field characteristic correlation.
Feature importance analysis
Deleting feature-strong fields
According to the feature correlation calculation result of each category, the fields with the feature correlation exceeding 0.9 are counted and only one of the fields with strong feature correlation is reserved according to the service requirement.
Algorithm selection
The currently popular feature importance calculation methods mainly include: decision tree, random forest, Xgboost. The decision tree algorithm has the problems of easy overfitting and easy falling into a local optimal solution, and the other two algorithms improve the defects of the decision tree by establishing a plurality of decision trees. For random forest and Xgboost algorithms, the random forest algorithm not only randomly collects samples, but also randomly selects features, so that the overfitting prevention capability is stronger. In combination with business practice, the characteristics in the existing sample may have no relation with line loss, and compared with the Xgboost training using all characteristics, the random forest randomly selects the characteristics of the characteristics, and may obtain better results. So a random forest is selected as the algorithm for calculating the feature importance.
Computing feature importance
And calculating the feature importance by using the data with the strong correlation field deleted and a random forest algorithm after necessary parameter adjustment.
Feature merging
And screening the characteristic correlation and characteristic importance fields, and combining the obtained fields. And deleting the fields with strong feature correlation in the fields after combination to obtain a feature combination result.
Feature selection
And (4) calculating the feature importance again by using the feature merging result, and deleting the fields with lower feature importance and taking the rest fields as the analysis results of the line loss influence factors of the line.
Table2 line loss influence factor analysis result
Category name | Field(s) |
General of | Input electric quantity, voltage difference between two sides, average reactive power, load rate, average voltage, line length and line running time |
Negative loss | Input electric quantity, voltage difference between two sides, average reactive power, load rate, average voltage, daytime wind power level and wire average operation time |
Normal 0 | Input electric quantity, voltage difference between two sides, average reactive power, load rate and average operation time of lead |
Normal 1 | Input electric quantity, voltage difference between two sides, average reactive power, load rate, average voltage, line length and line running time |
High loss | Input electric quantity, voltage difference between two sides, average reactive power, average voltage, line running time and wire average operation time |
Line loss prediction model based on random forest
The model is constructed by utilizing the random forest algorithm selected from three algorithms of decision trees, random forests and Xgboost through cross validation according to the data corresponding to the line loss influence factors of each category obtained by analyzing the line loss influence factors. And (4) adjusting parameters of necessary parameters in the algorithm by using grid search, and finally obtaining line loss prediction models of various categories.
Sample selection
The total data set is used as a sample of the line loss prediction model of the line because the type of the future line loss numerical value cannot be determined, a line loss prediction model of a specific type cannot be used, and only the line loss prediction model trained by using all data can be constructed.
Algorithm selection
Analyzing the line loss data, the problem is regression problem. The regression algorithm used in common use mainly includes the following: linear regression, SVR, decision tree, random forest, Xgboost. The above algorithm uses cross-validation to use the overall data set and default parameters for error results and its advantages and disadvantages as follows:
table3 algorithm selection
Algorithm | Advantages of the invention | Disadvantages of | Running error |
Decision tree | Simple and intuitive without data preprocessing basically | Easy overfitting and easy obtaining of local optimal solution | 3.0870 |
Random forest | Adapt to data with large data volume and high dimensionality | Some data that are too noisy are prone to overfitting | 0.2947 |
Xgboost | Strong over-fitting prevention capability and parallelization support | Large memory occupation | 0.4994 |
Parameter training
For the n _ estimators and min _ weight _ fraction _ leaf parameters, the grid search parameter is used, and the default values for the other parameters are used.
After parameter adjustment, the value of n _ estimators is 100, the value of min _ weight _ fraction _ leaf is 0, the training set error is 0.2080, and the test set error is 0.2600.
Results of the model
The actual prediction results of the model are as follows:
table4 predicts results
Input device | Voltage difference between two sides | Average reactive power | Rate of load | Average voltage | Line length | Line running time | Line loss | Predicting line loss |
14000 | -13.62 | 0.34 | 15.25 | 34686.39 | 20.07 | 35.23 | -0.50 | -0.28 |
29960 | -4.04 | 1.14 | 11.36 | 35407.54 | 6.66 | 11.89 | 0.00 | 0.08 |
108360 | 228.13 | 5.19 | 27.36 | 36451.37 | 4.48 | 76.85 | 0.39 | 0.47 |
38220 | 128.46 | 1.21 | 40.35 | 39311.33 | 8.86 | 9.01 | 0.00 | 0.14 |
113120 | 749.14 | 3.76 | 45.32 | 39666.85 | 20.61 | 7.83 | 1.73 | 1.36 |
21560 | 0.00 | 0.61 | 16.44 | 36065.54 | 28.20 | 1.70 | 1.30 | 1.04 |
201600 | 253.17 | 4.11 | 121.40 | 36174.69 | 3.09 | 16.40 | 0.35 | 0.42 |
12320 | -299.87 | 0.43 | 18.30 | 36858.28 | 20.00 | 28.56 | -0.57 | -0.27 |
135520 | 97.34 | 3.67 | 58.91 | 38040.62 | 2.58 | 18.81 | 0.21 | 0.26 |
On one hand, a transportation inspection responsible person of a test point application unit can make a reasonable threshold value based on a current line loss prediction result, and when the line loss exceeds an acceptable range, the inspection is carried out; on the other hand, according to the model result, the influence degree of each factor change on the line loss can be analyzed, and corresponding loss reduction measures are made to reduce the transmission loss of the power grid. In particular variable losses. In the process of screening the characteristic factors, an algorithm combining the characteristic correlation and the characteristic importance is adopted, so that the selection of the line loss influence factors is more reasonable. Meanwhile, the training set of the random forest algorithm is random and the features are random, so that the algorithm is suitable for high-dimensional data and is more accurate in prediction.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (3)
1. An analysis method for influence factors of abnormal line loss is characterized by comprising the following steps:
s1, selecting a technical route: the technical route selection comprises a data storage part, a data calculation processing part and a data mining part;
the data storage part considers the requirements of project data collection and storage, selects an Oracle database for storage at the early stage, and selects a Hive of a large data platform for data storage according to the data volume condition at the later stage;
the data calculation processing part is used for calculating, processing and training the data by using Pycharm, Anaconda and other related integrated tool environments;
the data mining analysis part is used for data preparation and data processing;
secondly, data rules are explored through data distribution, trend analysis, characteristic correlation analysis and other ways, and basis is provided for subsequent modeling;
analyzing line loss influence factors based on feature correlation and feature importance;
fourthly, constructing a line loss prediction model based on a random forest regression algorithm, and finally integrating to form an online application result;
s2, preparing data;
preprocessing data; the data of the section area of the wire is empty or modified according to the data which does not conform to the type of the wire: firstly, obtaining a model corresponding to data that the wire sectional area data is empty or does not conform to the wire model, then processing the model data to obtain the wire sectional area corresponding to the model, and finally filling a missing value or an abnormal value of the wire sectional area according to the corresponding relation: modifying the tower span data to be null or 0 value data: filling the tower span missing value by using the median; and (3) modifying the tower altitude for null data: filling with an average value: processing the data of the diameter of the insulator core rod to be empty or 0 value: filling or replacing according to the diameter of the core rod of the corresponding manufacturer and model; filling missing values of voltage and current: performing filling-up on the voltage or current data with the deficiency by adopting a smoothing method so as to fit and form a complete 96-point power utilization curve; if the numerical value exceeds the normal reasonable range and sudden increase and sudden decrease exist, the numerical value is regarded as a pulse value, and the pulse value is emptied and then is filled; deleting fields, such as component numbers, component names, line main keys, dates and the like, which are irrelevant to line loss analysis in the data width table;
checking normality; using a normality test method to perform normality test on data in a line loss data wide table of irrelevant fields such as deleted element numbers, wherein the original hypothesis of the test is that a test sample conforms to normal distribution, a judgment threshold value is 0.001, if the numerical value of a test result is less than 0.001, the original hypothesis is rejected, namely the data distribution does not conform to the normal distribution, the test results of all the fields in the wide table are less than the threshold value, all the fields do not conform to the normal distribution, and when the characteristic correlation is calculated, a calculation method based on the normal distribution cannot be used;
s3, data mining analysis: based on four types of parameter data such as weather, equipment, operation, power failure and the like, wide-list data based on line loss influence factor service are generated in an integrated mode, line loss influence factors are analyzed from two aspects of feature relevance and feature importance, data items with higher feature relevance and feature importance in a sorted mode are selected respectively, and the final line loss influence factors are determined and obtained.
2. The method as claimed in claim 1, wherein in step S2, the data source is obtained from PMS2.0, marketing, line loss in the same period, and power information system such as mining, based on the analysis understanding in the business requirement, the data period is from 2018 to 2019 and 6 months, and the total number of data records is about 152 ten thousand.
3. The method for analyzing abnormal line loss influence factors according to claim 1, wherein the step S2 further includes the following steps: in the reliability aspect, line equipment data come from a PMS2.0 system, operation parameter data come from an electricity utilization information acquisition system, and although certain abnormal acquisition data exist, subsequent data mining is not influenced; in the aspect of integrity, the data of the line equipment has the problems that the data of the section area of the wire is empty or does not conform to the type of the wire, the data of the span of the tower is empty or 0 value, the altitude of the tower is empty, and the diameter of the core rod is empty or 0 value, and can be improved through subsequent data processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910983595.XA CN110690701A (en) | 2019-10-16 | 2019-10-16 | Analysis method for influence factors of abnormal line loss |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910983595.XA CN110690701A (en) | 2019-10-16 | 2019-10-16 | Analysis method for influence factors of abnormal line loss |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110690701A true CN110690701A (en) | 2020-01-14 |
Family
ID=69113093
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910983595.XA Pending CN110690701A (en) | 2019-10-16 | 2019-10-16 | Analysis method for influence factors of abnormal line loss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110690701A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553816A (en) * | 2020-04-20 | 2020-08-18 | 北京北大软件工程股份有限公司 | Method and device for analyzing administrative review influence factors |
CN112232892A (en) * | 2020-12-14 | 2021-01-15 | 南京华苏科技有限公司 | Method for mining accessible users based on satisfaction of mobile operators |
CN112862243A (en) * | 2020-12-31 | 2021-05-28 | 易事特集团股份有限公司 | Power distribution network energy-saving loss-reducing system and method based on big data |
CN113536666A (en) * | 2021-06-21 | 2021-10-22 | 南昌航空大学 | Automatic analysis method for key influence factors of yield in glass insulator production |
CN114721835A (en) * | 2022-06-10 | 2022-07-08 | 湖南工商大学 | Method, system, device and medium for predicting energy consumption of edge data center server |
CN115081747A (en) * | 2022-07-27 | 2022-09-20 | 国网浙江省电力有限公司 | Data processing method based on knowledge graph technology |
-
2019
- 2019-10-16 CN CN201910983595.XA patent/CN110690701A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111553816A (en) * | 2020-04-20 | 2020-08-18 | 北京北大软件工程股份有限公司 | Method and device for analyzing administrative review influence factors |
CN111553816B (en) * | 2020-04-20 | 2023-11-03 | 北京北大软件工程股份有限公司 | Administrative multiple-proposal influence factor analysis method and device |
CN112232892A (en) * | 2020-12-14 | 2021-01-15 | 南京华苏科技有限公司 | Method for mining accessible users based on satisfaction of mobile operators |
CN112862243A (en) * | 2020-12-31 | 2021-05-28 | 易事特集团股份有限公司 | Power distribution network energy-saving loss-reducing system and method based on big data |
CN112862243B (en) * | 2020-12-31 | 2023-10-17 | 易事特集团股份有限公司 | Big data-based power distribution network energy-saving and loss-reducing system and method |
CN113536666A (en) * | 2021-06-21 | 2021-10-22 | 南昌航空大学 | Automatic analysis method for key influence factors of yield in glass insulator production |
CN114721835A (en) * | 2022-06-10 | 2022-07-08 | 湖南工商大学 | Method, system, device and medium for predicting energy consumption of edge data center server |
CN115081747A (en) * | 2022-07-27 | 2022-09-20 | 国网浙江省电力有限公司 | Data processing method based on knowledge graph technology |
CN115081747B (en) * | 2022-07-27 | 2022-11-11 | 国网浙江省电力有限公司 | Data processing method based on knowledge graph technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110690701A (en) | Analysis method for influence factors of abnormal line loss | |
CN107169628B (en) | Power distribution network reliability assessment method based on big data mutual information attribute reduction | |
CN107315884B (en) | Building energy consumption modeling method based on linear regression | |
CN107918830B (en) | Power distribution network running state evaluation method based on big data technology | |
CN112464094B (en) | Information recommendation method and device, electronic equipment and storage medium | |
CN110991786A (en) | 10kV static load model parameter identification method based on similar daily load curve | |
CN106372747B (en) | Random forest-based reasonable line loss rate estimation method for transformer area | |
CN112785108A (en) | Power grid operation data correlation analysis method and system based on regulation cloud | |
CN110705859A (en) | PCA-self-organizing neural network-based method for evaluating running state of medium and low voltage distribution network | |
CN114065605A (en) | Intelligent electric energy meter running state detection and evaluation system and method | |
CN110889565B (en) | Distribution network routing inspection period calculation method based on multi-dimensional matrix decision | |
CN113435627A (en) | Work order track information-based electric power customer complaint prediction method and device | |
CN115953186A (en) | Network appointment demand pattern recognition and short-time demand prediction method | |
CN112200209A (en) | Poor user identification method based on day-to-day power consumption | |
CN116933010A (en) | Load rate analysis and evaluation method and system based on multi-source data fusion and deep learning | |
CN109409780B (en) | Change processing method, device, computer equipment and storage medium | |
CN113689079A (en) | Transformer area line loss prediction method and system based on multivariate linear regression and cluster analysis | |
CN111339167A (en) | Method for analyzing influence factors of transformer area line loss rate based on K-means and principal component linear regression | |
CN109840536A (en) | A kind of power grid power supply reliability horizontal clustering method and system | |
CN112329971A (en) | Modeling method of investment decision model of power transmission and transformation project | |
CN115860797A (en) | Electric quantity demand prediction method suitable for new electricity price reform situation | |
CN115409264A (en) | Power distribution network emergency repair stagnation point position optimization method based on feeder line fault prediction | |
CN109934489B (en) | Power equipment state evaluation method | |
CN113327047A (en) | Power marketing service channel decision method and system based on fuzzy comprehensive model | |
CN113537758A (en) | Manufacturing industry high-quality development comprehensive evaluation method and system based on big data technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200114 |