CN113591993A - Power material demand prediction method based on space-time clustering - Google Patents

Power material demand prediction method based on space-time clustering Download PDF

Info

Publication number
CN113591993A
CN113591993A CN202110883120.0A CN202110883120A CN113591993A CN 113591993 A CN113591993 A CN 113591993A CN 202110883120 A CN202110883120 A CN 202110883120A CN 113591993 A CN113591993 A CN 113591993A
Authority
CN
China
Prior art keywords
data
demand
clustering
materials
power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110883120.0A
Other languages
Chinese (zh)
Other versions
CN113591993B (en
Inventor
向泽江
胡俊
马晓燕
陈竞翔
黄云飞
钱冬
周子岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaneng Energy Transportation Industry Holding Co ltd
Shanghai Huaneng E Commerce Co ltd
Original Assignee
Huaneng Energy Transportation Industry Holding Co ltd
Shanghai Huaneng E Commerce Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaneng Energy Transportation Industry Holding Co ltd, Shanghai Huaneng E Commerce Co ltd filed Critical Huaneng Energy Transportation Industry Holding Co ltd
Priority to CN202110883120.0A priority Critical patent/CN113591993B/en
Publication of CN113591993A publication Critical patent/CN113591993A/en
Application granted granted Critical
Publication of CN113591993B publication Critical patent/CN113591993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/087Inventory or stock management, e.g. order filling, procurement or balancing against orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a power material demand prediction method based on space-time clustering, which comprises the steps of calling and cleaning historical demand data of power materials; extracting new characteristics of the cleaned data sample in a time dimension; replacing noise in the washed historical demand data of the power material to smooth the data noise; clustering the data subjected to replacement processing in time and space dimensions respectively, and dividing the clustered data into a training set and a test set; acquiring reference materials adjacent to the new materials by using a k-nearest neighbor algorithm according to the new characteristics, and updating the similar inner points; training a training set by using a plurality of training models, and predicting by using test set data; measuring the prediction accuracy of the training model in the target class by using the model accuracy evaluation index, and selecting the training model with the highest accuracy to predict the demand of the electric power materials; the invention can analyze the electric power materials in batches and can accurately grasp the requirement rule of the materials.

Description

Power material demand prediction method based on space-time clustering
Technical Field
The invention relates to the technical field of material demand prediction, in particular to a power material demand prediction method based on space-time clustering.
Background
The demand forecast is very important for the material plan of supply chain management, accurately forecasts the material demand, and has important significance for saving engineering cost and improving fund utilization rate. In the aspect of power material demand prediction, researchers have carried out a series of research works for solving the problem of power material demand prediction.
Foreign intelligent material management starts earlier, and many scholars use different methods to research the material demand prediction problem. The learners propose a neural network model based on the fuzzy genetic algorithm, the model is used for researching the intermittent material supply problem, and real data is used for verifying the model effect, so that the model effect is proved to be good.
Although the research on the material demand prediction problem in China is started late, in recent years, a large amount of research is also carried out on the material demand prediction problem by domestic scholars. For example, a learner adopts support vector machine regression to solve the problem of power grid material demand prediction, and an artificial fish swarm algorithm added with a chaotic search operator carries out optimization search on support vector machine parameters and a kernel function; some scholars introduce a demand forecasting method based on matrix decomposition, elements in a forecasting matrix are realized by establishing a matching matrix between projects and materials, but the interrelation between the projects and the interrelation between the materials are not considered.
In summary, although some research and attempts have been made in the aspect of power material demand prediction, the general problem is that the practicability is poor, and the data on which prediction is based is too ideal and is structured data expressed by a few attributes; secondly, the types of materials that can be effectively predicted are limited. The types of materials required by the power engineering are as many as tens of thousands, the materials belong to predicted objects, and the prediction of only a few materials has no practicability. In addition, the variety of materials is various, the materials used by a single power grid project are less, and the problem of data sparseness exists when historical material usage is used for prediction. Finally, the modes of the power material demand have commonality and regularity, and it is obviously not ideal to design a corresponding prediction model for each material.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned conventional problems.
Therefore, the invention provides a power material demand prediction method based on space-time clustering, which solves the problems that power materials are various in types and difficult to realize batch analysis of the power materials and the subjectivity of artificial analogy during clustering is too high.
In order to solve the technical problems, the invention provides the following technical scheme: calling historical demand data of the electric power materials through a historical purchase database, and performing data cleaning on the historical demand data of the electric power materials; sequentially carrying out autocorrelation inspection and material demand period analysis operation on the washed historical demand data of the electric power materials based on demand time and demand quantity in the historical demand data of the electric power materials so as to extract new characteristics of the washed data samples in a time dimension; replacing the noise in the washed historical demand data of the power material by using a Bin strategy, a cluster analysis strategy, a regression strategy and a moving average strategy to smooth the data noise; respectively clustering the replaced data in time and space dimensions by using a K-Means clustering strategy and a linking strategy in hierarchical clustering, and dividing the clustered data into a training set and a test set; acquiring reference materials which are close to the new materials by using a k-nearest neighbor algorithm according to the new characteristics, dividing the new materials into the nearest cluster class, and updating the class interior points; training the training set by utilizing a plurality of training models to fit the training models, and predicting the fitted training models by using test set data; and measuring the prediction accuracy of the training model in the target class by using the model accuracy evaluation index, selecting the training model with the highest accuracy as the prediction model in the target class, and completing the demand prediction of the electric power materials through the prediction model.
The invention discloses a preferable scheme of a power material demand prediction method based on space-time clustering, wherein the method comprises the following steps: the historical demand data of the electric power supplies comprises supply codes, demand companies, demand time, demand quantity, material description and material groups.
The invention discloses a preferable scheme of a power material demand prediction method based on space-time clustering, wherein the method comprises the following steps: the data cleaning comprises the step of completing missing values in the historical demand data of the power materials according to the mean value, the mode or zero; splitting material descriptions in historical demand data of the power materials to obtain type indexes of the materials; and removing repeated data in the historical demand data of the power supplies.
The invention discloses a preferable scheme of a power material demand prediction method based on space-time clustering, wherein the method comprises the following steps: the extracting new features includes performing an autocorrelation test by:
Figure BDA0003192898230000021
where ρ islIs an autocorrelation coefficient, XkIs a random variable at time point k, Xk-1Is a random variable at the time point k-1, Cov () is covariance, and Var () is variance; the new characteristics of the historical demand data of the cleaned power supplies on the time dimension comprise month characteristics of fluctuation silvery, demand intervals and wave crests.
The invention discloses a preferable scheme of a power material demand prediction method based on space-time clustering, wherein the method comprises the following steps: the K-Means clustering strategy comprises the steps of selecting a clustering center for each point to be clustered, wherein each point to be clustered is historical demand data of each power material; calculating the distance from each point to be clustered to a clustering center, and clustering each point to be clustered to a cluster closest to the clustering center; calculating the coordinate average value of all the points in each cluster, taking the coordinate average value as a new cluster center, and continuously clustering each point to be clustered to a new cluster closest to the cluster center; the calculation is stopped until the position of the cluster center no longer changes.
The invention discloses a preferable scheme of a power material demand prediction method based on space-time clustering, wherein the method comprises the following steps: the linking strategy comprises the steps of taking each point as a separate class to obtain N classes; wherein, the distance between the classes is the distance between the points contained in the N classes; and merging the two classes with the nearest distance into a new class, and recalculating the distances between the new class and all the old classes until merging into one class only and stopping merging and calculation.
The invention discloses a preferable scheme of a power material demand prediction method based on space-time clustering, wherein the method comprises the following steps: the update-class interior points include,
Figure BDA0003192898230000031
wherein k-0 represents a new material, k-1, …, n represents another material included in the category to which the new material belongs, and X representsktIs characteristic data of the material k at the time point t, DktIs the demand of the material k at the time point t, alpha is a constant term, and beta is an independent variable XktIs a random error term.
The invention discloses a preferable scheme of a power material demand prediction method based on space-time clustering, wherein the method comprises the following steps: the plurality of training models includes a proximity algorithm, a random forest classifier, a neural network algorithm, an ARIMA model, and a Prophet model.
The invention discloses a preferable scheme of a power material demand prediction method based on space-time clustering, wherein the method comprises the following steps: the model precision evaluation indexes comprise the following indexes,
Figure BDA0003192898230000032
wherein n iskIndicating the number of points, accuracy, contained in the target class kuIs the prediction accuracy of point u within class k.
The invention has the beneficial effects that: according to the method, through historical material demand data, a data prediction algorithm is called, the demand quantity and the demand period rule of future electric power materials are scientifically predicted, and scientific reference and decision support are provided for material purchasing or power plant production maintenance; the electric power material conditions of each time-space node are divided according to a time-space clustering method, the characteristics of time periodicity, space correlation and the like of material requirements are considered, and the category characteristics of the time-space nodes to be predicted can be distinguished more efficiently and intuitively; meanwhile, an ensemble learning method is introduced to integrate the prediction results of multiple models, so that the demand prediction precision is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
FIG. 1 is a schematic flow chart of a method for forecasting demand for electric power materials based on spatio-temporal clustering according to a first embodiment of the present invention;
fig. 2 is a schematic diagram illustrating comparison of accuracy evaluation results of an R2 SCORE model of a power material demand prediction method based on spatio-temporal clustering according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a comparison of the MSE SCORE model precision evaluation results of the power material demand prediction method based on spatio-temporal clustering according to the second embodiment of the present invention;
FIG. 4 is a comparison intention of the MAE SCORE model precision evaluation result of the power material demand prediction method based on spatio-temporal clustering according to the second embodiment of the present invention;
fig. 5 is a schematic diagram illustrating comparison of the accuracy evaluation results of the MAPE SCORE model of the power material demand prediction method based on spatio-temporal clustering according to the second embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to fig. 1, a first embodiment of the present invention provides a method for forecasting demand for electric power materials based on spatio-temporal clustering, including:
s1: and calling historical demand data of the electric power materials through a historical purchase database, and performing data cleaning on the historical demand data of the electric power materials.
It should be noted that the historical demand data of the electric power supplies includes supply codes, demand companies, demand time, demand quantity, material descriptions and material groups.
The data washing steps are as follows:
(1) complementing missing values in the historical demand data of the power materials according to the mean value, the mode or 0;
(2) splitting material descriptions in historical demand data of the power materials to obtain type indexes of the materials;
and converting the split data into data formats according to requirements, wherein the data formats comprise a text type, a numerical type, a date type and the like.
(3) And removing repeated data in the historical demand data of the power supplies.
S2: and sequentially carrying out autocorrelation inspection and material demand period analysis operation on the washed historical demand data of the electric power materials based on the demand time and the demand quantity in the historical demand data of the electric power materials so as to extract new characteristics of the washed data samples in the time dimension.
Since the time-series correlation model is required for the stability of the historical data, when a time-series analysis means is applied in the quantization process, autocorrelation test is required, specifically, autocorrelation test is performed by the following formula:
Figure BDA0003192898230000061
where ρ islIs an autocorrelation coefficient, XkIs a random variable at time point k, Xk-1Is a random variable at the time point k-1, Cov () is covariance, and Var () is variance;
the material demand period analysis is realized by means of Fourier transform, and time domain data are converted into frequency domain data; specifically, the time domain is regarded as superposition (frequency domain) of sine waves with different amplitudes and different phases, and the time sequence data is expanded into linear combination of trigonometric functions to obtain the coefficient of each expansion term, namely the Fourier coefficient; the larger the fourier coefficient, the more likely it is that the period of the sine wave to which it corresponds is the period of the data.
The new characteristics of the historical demand data of the cleaned power supplies obtained after the operation is completed on the time dimension comprise month characteristics of fluctuation silvery, demand intervals and wave crests.
S3: and replacing the noise in the washed historical demand data of the power material by using a Bin strategy, a cluster analysis strategy, a regression strategy and a moving average strategy so as to smooth the data noise.
It should be noted that the Bin strategy is to smooth a set of sorted data by using surrounding points (neighbors) of data points to be smoothed, and then distribute the sorted data into a plurality of buckets (called Bins); the cluster analysis strategy of the embodiment can be K-Means cluster, spectral cluster or cluster based on density, etc.; the regression strategy can use lasso regression, ridge regression, stepwise regression, etc.; the moving average strategy is a simple smooth prediction technology, and the basic idea is as follows: and calculating the time-sequence average value containing a certain number of terms in sequence according to the time-sequence data item by item in order to reflect the long-term trend.
S4: and respectively clustering the replaced data in time and space dimensions by using a K-Means clustering strategy and a linking strategy in hierarchical clustering, and dividing the clustered data into a training set and a test set.
(1) The specific implementation steps of the K-Means clustering strategy are as follows:
selecting a clustering center for each point to be clustered, wherein each point to be clustered is historical demand data of each power material;
calculating the distance from each point to be clustered to a clustering center, and clustering each point to be clustered to a cluster closest to the clustering center;
calculating the coordinate average value of all points in each cluster, taking the coordinate average value as a new cluster center, and continuously clustering each point to be clustered to a new cluster closest to the cluster center;
and fourthly, repeatedly executing the steps II and III until the clustering center does not move in a large range (namely the position of the clustering center is unchanged) or the clustering frequency reaches the requirement.
The clustering times can be set according to requirements.
(2) The specific implementation steps of the link strategy in the hierarchical clustering are as follows:
firstly, each point is taken as an independent class to obtain N classes;
wherein, the distance between the classes is the distance between the points contained in the N classes;
and secondly, merging the two classes with the shortest distance into a new class, and recalculating the distances between the new class and all the old classes until only one class is merged, and then stopping merging and calculation.
The distance between classes is equal to the minimum distance between two kinds of inner points, the distance is measured by Ward variance, and the calculation mode is as follows:
Figure BDA0003192898230000071
wherein u is a new cluster composed of s and T, s and T are cluster clusters, v is an unused cluster in the clustering forest, T | + | s | + | T |, and | is the number of observed values in the cluster clusters.
It should be noted that the number of the best clusters can be determined by manual selection or Gap static, and the basic idea is to select the k with the smallest sum of squared deviations in the class after continuously trying, where Gap static is the k that is found to have the largest difference from the expected value, and the calculation formula is as follows:
Gap(K)=E(logDk)-logDk
wherein D iskIs the Euclidean distance between sample points within the class, E is logDkSpecifically, the basic process of the algorithm is to first randomly generate as many random samples as the original samples in a uniform distribution in the area of the sampleThis, and K-means this random sample.
Preferably, the present embodiment divides the power material conditions of each time-space node according to a time-space clustering method, and considers the characteristics of the time periodicity, the spatial correlation, and the like of the material demand, so that the category characteristics of the time-space nodes to be predicted can be more efficiently and intuitively distinguished, and the observation and the category prediction of the material demand conditions in a period of time in the future are facilitated.
S5: and acquiring reference materials which are close to the new materials by using a k-nearest neighbor algorithm according to the new characteristics, dividing the new materials into the nearest cluster class, and updating the class-in points.
If a brand new material needs to be used, but the demand is unpredictable due to lack of historical data, a k-Nearest Neighbor (KNN) algorithm is used for finding out a reference material which is close to the new material according to the material characteristics, the new material is divided into Nearest cluster classes, historical demand data is manually constructed for the new material by using a covariate method, and the class interior points are updated, wherein the specific formula is as follows:
Figure BDA0003192898230000081
wherein k-0 represents a new material, k-1, …, n represents another material included in the category to which the new material belongs, and X representsktIs characteristic data of the material k at the time point t, DktIs the demand of the material k at the time point t, alpha is a constant term, and beta is an independent variable XktIs a random error term.
S6: training the training set with a plurality of training models to fit the training models, and predicting the fitted training models with the test set data.
The most suitable model with the highest fitting accuracy is found according to the points in each class, and the basic process is as follows: firstly, a plurality of training models (a proximity algorithm, a random forest classifier, a neural network algorithm, an ARIMA model and a Prophet model) are used for training a training set, the models are fitted, and the fitted models are predicted by using test set data.
Preferably, the embodiment establishes different prediction models in the class according to the clustering result, so that the prediction problems caused by different material demand modes and incompatible material demands of each power plant can be solved.
S7: and measuring the prediction accuracy of the training model in the target class by using the model accuracy evaluation index, selecting the training model with the highest accuracy as the prediction model in the target class, and completing the demand prediction of the electric power materials through the prediction model.
The model precision evaluation indexes comprise: MSE SCORE, MAE SCORE, MAPE SCORE and R2 SCORE, comprehensively measuring the prediction precision of each training model in the target class according to the indexes, and selecting the model with the highest precision as the prediction model of the target class; then, the above process is repeated for the next target class until the selection of the prediction model for all classes is completed.
The calculation formula of the prediction precision is as follows:
Figure BDA0003192898230000082
wherein n iskIndicating the number of points, accuracy, contained in the target class kuIs the prediction accuracy of point u within class k.
According to the method, through historical material demand data, a data prediction algorithm is called, the demand quantity and the demand period rule of future electric power materials are scientifically predicted, and scientific reference and decision support are provided for material purchasing or power plant production maintenance; meanwhile, in order to improve the demand prediction precision, an integrated learning method is introduced, the prediction results of multiple models are integrated, the stability of the prediction results can be ensured, the prediction results are not easily interfered by abnormal data, and the problem that the power plant introduces interference to the models due to the occurrence of events such as emergency and the like is solved.
Example 2
In order to verify and explain the technical effects adopted in the method, the embodiment selects the existing machine learning model and adopts the method to perform comparison test, and compares the test results by means of scientific demonstration to verify the real effect of the method.
The existing machine learning model is difficult to carry out batch analysis on electric power materials, and the prediction precision is low.
Compared with the existing machine learning model, the method has a higher electric power material demand prediction effect. In this embodiment, the existing machine learning model and the method respectively perform real-time prediction comparison on purchasing offline data provided by a certain group.
Taking the prediction of the existing machine learning model as a control group (random forest), taking the prediction of the model built by the method as an experimental group (our method) for experimental analysis, and taking MSE SCORE, MAE SCORE, MAPE SCORE and R2 SCORE as model precision evaluation indexes; the results of comparison between the control group and the experimental group are shown in fig. 2, 3, 4 and 5, respectively.
It can be seen from fig. 2, 3, 4 and 5 that the model of the method is superior to the existing machine learning model in prediction accuracy.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (9)

1. A power material demand prediction method based on space-time clustering is characterized by comprising the following steps: comprises the steps of (a) preparing a mixture of a plurality of raw materials,
calling historical demand data of the electric power materials through a historical purchase database, and performing data cleaning on the historical demand data of the electric power materials;
sequentially carrying out autocorrelation inspection and material demand period analysis operation on the washed historical demand data of the electric power materials based on demand time and demand quantity in the historical demand data of the electric power materials so as to extract new characteristics of the washed data samples in a time dimension;
replacing the noise in the washed historical demand data of the power material by using a Bin strategy, a cluster analysis strategy, a regression strategy and a moving average strategy to smooth the data noise;
respectively clustering the replaced data in time and space dimensions by using a K-Means clustering strategy and a linking strategy in hierarchical clustering, and dividing the clustered data into a training set and a test set;
acquiring reference materials which are close to the new materials by using a k-nearest neighbor algorithm according to the new characteristics, dividing the new materials into the nearest cluster class, and updating the class interior points;
training the training set by utilizing a plurality of training models to fit the training models, and predicting the fitted training models by using test set data;
and measuring the prediction accuracy of the training model in the target class by using the model accuracy evaluation index, selecting the training model with the highest accuracy as the prediction model in the target class, and completing the demand prediction of the electric power materials through the prediction model.
2. The power material demand prediction method based on spatio-temporal clustering of claim 1, characterized in that: the historical demand data of the electric power supplies comprises supply codes, demand companies, demand time, demand quantity, material description and material groups.
3. The power material demand prediction method based on spatio-temporal clustering of claim 2, characterized in that: the data cleaning comprises
Filling missing values in the historical demand data of the power materials according to the mean value, the mode or zero;
splitting material descriptions in historical demand data of the power materials to obtain type indexes of the materials;
and removing repeated data in the historical demand data of the power supplies.
4. The power material demand prediction method based on spatio-temporal clustering according to claim 2 or 3, characterized in that: the extracting of the new features comprises
The autocorrelation test was performed by the following formula:
Figure FDA0003192898220000021
where ρ islIs an autocorrelation coefficient, XkIs a random variable at time point k, Xk-1Is a random variable at the time point k-1, Cov () is covariance, and Var () is variance;
the new characteristics of the historical demand data of the cleaned power supplies on the time dimension comprise month characteristics of fluctuation silvery, demand intervals and wave crests.
5. The power material demand prediction method based on spatio-temporal clustering according to claim 1 or 2, characterized in that: the K-Means clustering strategy includes,
selecting a clustering center for each point to be clustered, wherein each point to be clustered is historical demand data of each power material;
calculating the distance from each point to be clustered to a clustering center, and clustering each point to be clustered to a cluster closest to the clustering center;
calculating the coordinate average value of all the points in each cluster, taking the coordinate average value as a new cluster center, and continuously clustering each point to be clustered to a new cluster closest to the cluster center;
the calculation is stopped until the position of the cluster center no longer changes.
6. The power material demand prediction method based on spatio-temporal clustering of claim 5, characterized in that: the linkage policy comprises that the linkage policy comprises,
taking each point as a separate class to obtain N classes; wherein, the distance between the classes is the distance between the points contained in the N classes;
and merging the two classes with the nearest distance into a new class, and recalculating the distances between the new class and all the old classes until merging into one class only and stopping merging and calculation.
7. The method for forecasting the demand of the electric power materials based on the space-time clustering of any one of claims 1, 2, 3 and 6, wherein: the update-class interior points include,
Figure FDA0003192898220000022
wherein k-0 represents a new material, k-1, …, n represents another material included in the category to which the new material belongs, and X representsktIs characteristic data of the material k at the time point t, DktIs the demand of the material k at the time point t, alpha is a constant term, and beta is an independent variable XktIs a random error term.
8. The power material demand prediction method based on spatio-temporal clustering of claim 7, characterized in that: the plurality of training models includes a proximity algorithm, a random forest classifier, a neural network algorithm, an ARIMA model, and a Prophet model.
9. The power material demand prediction method based on spatio-temporal clustering of claim 1, characterized in that: the model precision evaluation indexes comprise the following indexes,
Figure FDA0003192898220000031
wherein n iskIndicating the number of points, accuracy, contained in the target class kuIs the prediction accuracy of point u within class k.
CN202110883120.0A 2021-08-02 2021-08-02 Power material demand prediction method based on space-time clustering Active CN113591993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110883120.0A CN113591993B (en) 2021-08-02 2021-08-02 Power material demand prediction method based on space-time clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110883120.0A CN113591993B (en) 2021-08-02 2021-08-02 Power material demand prediction method based on space-time clustering

Publications (2)

Publication Number Publication Date
CN113591993A true CN113591993A (en) 2021-11-02
CN113591993B CN113591993B (en) 2022-08-09

Family

ID=78254040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110883120.0A Active CN113591993B (en) 2021-08-02 2021-08-02 Power material demand prediction method based on space-time clustering

Country Status (1)

Country Link
CN (1) CN113591993B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759885A (en) * 2023-01-09 2023-03-07 佰聆数据股份有限公司 Material sampling inspection method and device based on distributed material supply
CN116028838A (en) * 2023-01-09 2023-04-28 广东电网有限责任公司 Clustering algorithm-based energy data processing method and device and terminal equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130166350A1 (en) * 2011-06-28 2013-06-27 Smart Software, Inc. Cluster based processing for forecasting intermittent demand
US20150112900A1 (en) * 2013-10-23 2015-04-23 Honda Motor Co., Ltd. Time-series data prediction device, time-series data prediction method, and program
CN106203701A (en) * 2016-07-06 2016-12-07 吴本刚 A kind of power matching network builds material requirements prognoses system
CN109376924A (en) * 2018-10-18 2019-02-22 广东电网有限责任公司 A kind of method, apparatus, equipment and the readable storage medium storing program for executing of material requirements prediction
CN112614011A (en) * 2020-12-07 2021-04-06 国网北京市电力公司 Power distribution network material demand prediction method and device, storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130166350A1 (en) * 2011-06-28 2013-06-27 Smart Software, Inc. Cluster based processing for forecasting intermittent demand
US20150112900A1 (en) * 2013-10-23 2015-04-23 Honda Motor Co., Ltd. Time-series data prediction device, time-series data prediction method, and program
CN106203701A (en) * 2016-07-06 2016-12-07 吴本刚 A kind of power matching network builds material requirements prognoses system
CN109376924A (en) * 2018-10-18 2019-02-22 广东电网有限责任公司 A kind of method, apparatus, equipment and the readable storage medium storing program for executing of material requirements prediction
CN112614011A (en) * 2020-12-07 2021-04-06 国网北京市电力公司 Power distribution network material demand prediction method and device, storage medium and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759885A (en) * 2023-01-09 2023-03-07 佰聆数据股份有限公司 Material sampling inspection method and device based on distributed material supply
CN116028838A (en) * 2023-01-09 2023-04-28 广东电网有限责任公司 Clustering algorithm-based energy data processing method and device and terminal equipment
CN116028838B (en) * 2023-01-09 2023-09-19 广东电网有限责任公司 Clustering algorithm-based energy data processing method and device and terminal equipment

Also Published As

Publication number Publication date
CN113591993B (en) 2022-08-09

Similar Documents

Publication Publication Date Title
CN112508275B (en) Power distribution network line load prediction method and equipment based on clustering and trend indexes
He et al. Short-term wind power prediction based on EEMD–LASSO–QRNN model
CN110990461A (en) Big data analysis model algorithm model selection method and device, electronic equipment and medium
CN113591993B (en) Power material demand prediction method based on space-time clustering
CN109919353B (en) Distributed photovoltaic prediction method of ARIMA model based on spatial correlation
CN107992976B (en) Hot topic early development trend prediction system and prediction method
KR20100048738A (en) Method for classification and forecast of remote measuring power load patterns
CN111429034A (en) Method for predicting power distribution network fault
CN110503256A (en) Short-term load forecasting method and system based on big data technology
Wang et al. Automated machine learning for short-term electric load forecasting
Lu et al. A weekly load data mining approach based on hidden Markov model
CN110555058A (en) Power communication equipment state prediction method based on improved decision tree
CN111815054A (en) Industrial steam heat supply network short-term load prediction method based on big data
Parfenova et al. Forecasting models of agricultural process based on fuzzy time series
CN111882114A (en) Short-term traffic flow prediction model construction method and prediction method
CN117076691A (en) Commodity resource knowledge graph algorithm model oriented to intelligent communities
Zhang et al. The power big data-based energy analysis for intelligent community in smart grid
Brusaferri et al. Day ahead electricity price forecast by narx model with lasso based features selection
CN116245212A (en) PCA-LSTM-based power data anomaly detection and prediction method and system
CN115148307A (en) Material performance automatic prediction system
Ma et al. Data Driven Scheduling Knowledge Management for Smart Shop Floor
CN113449920A (en) Wind power prediction method, system and computer readable medium
Liu et al. Short-term Load Forecasting Approach with SVM and Similar Days Based on United Data Mining Technology
Huang et al. Supply chain network design based on fuzzy neural network and PSO
CN116777508B (en) Medical supply analysis management system and method based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant