CN114064794A

CN114064794A - Business expansion file mining and analyzing method based on big data technology

Info

Publication number: CN114064794A
Application number: CN202111453678.1A
Authority: CN
Inventors: 高宁; 曾玲; 梁海洪; 沈晓舟; 张博; 左越; 张蓓; 董阳; 付临; 周鑫; 刘蕤; 付海东
Original assignee: State Grid Corp of China SGCC; State Grid Liaoning Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Liaoning Electric Power Co Ltd
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2022-02-18

Abstract

The invention provides a business expansion archive mining and analyzing method based on big data technology. The method comprises the steps of establishing a business expansion filing data warehouse and mining and processing the business expansion filing data; processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data. The business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management. The business expansion filing data mining processing method adopts a combined prediction model to process data and obtain the deep level relation in the filing data. The invention has higher data processing efficiency, shortens the business expansion time and improves the economic benefit of enterprises. The method is suitable for being applied as a business expansion archive mining and analyzing method based on big data technology.

Description

Business expansion file mining and analyzing method based on big data technology

Technical Field

The invention relates to business expansion installation in the field of electric power, in particular to a business expansion installation archive mining analysis method based on a big data technology.

Background

The business expansion device is used for accepting customer electricity utilization application and making a safe, economic and reasonable power supply scheme according to the customer electricity utilization requirement and by combining the condition of a power supply network. Determining the investment of power supply projects, organizing the design and implementation of the power supply projects, organizing, coordinating and checking the design and implementation of internal projects of power customers, signing power supply and utilization contracts, installing meters, powering on and the like. The system is a general name of the service flow of the power supply department in the whole process from the customer application of power to the actual power utilization.

At present, a large amount of data information is generated in the process of business expansion and installation. The traditional data analysis method of the business expansion file only establishes a single data analysis model, and does not determine the deep level relation among data, so that the invalid utilization of the file data is caused.

Disclosure of Invention

In order to solve the problem of data utilization of the business expansion archive, the invention provides a business expansion archive mining and analyzing method based on a big data technology. The technical problem of the utilization of business expansion file data is solved by establishing a business expansion file data warehouse and processing the business expansion file data by using a combined prediction model.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a business expansion filing mining analysis method based on big data technology comprises the establishment of a business expansion filing data warehouse and a business expansion filing data mining processing method; processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data.

The business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management.

The business expansion filing data mining processing method adopts a combined prediction model to process data and obtain the deep level relation in the filing data.

The method has the advantages that the business expansion filing data are processed by utilizing the combined prediction model on the basis of establishing the business expansion filing data warehouse. After the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data. The analysis method has high data processing efficiency, and can effectively shorten the business expansion time and improve the economic benefit of enterprises by applying the analysis method to actual power operation. The method is suitable for being applied as a business expansion archive mining and analyzing method based on big data technology.

Drawings

FIG. 1 is a business expansion archive data warehouse organizational chart.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in the figure, the business expansion archive mining analysis method based on the big data technology comprises a business expansion archive data warehouse establishment method and a business expansion archive data mining processing method; processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data.

The business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management;

the state data layer stores the latest detailed data, and when the data enters the data warehouse from the outside, the data is directly put into the state data layer; it can be processed by a general database, such data also being referred to as the underlying data of the system.

The historical data layer decides whether the information stored in the historical data layer and the comprehensive data layer is the general trend reflected by the basic data or the trend changing along with time, and classifies, decomposes, summarizes and processes the basic data to acquire the information; and generating historical data by the basic data under a time control mechanism, and putting the historical data into a historical data layer for a current data layer, a comprehensive data layer and a special data layer to call.

The comprehensive data layer integrates and extracts basic data under a comprehensive mechanism to generate comprehensive data, and the comprehensive data is put into the comprehensive data layer and comprises various statistical data, indexes, evaluation calculation results and prediction analysis data.

The special data layer stores extended data of the industry, namely special data formed by processing basic data through a stored data analysis technology.

In the control management, in addition to the four data layers, external data of the power business department are needed to be used for decision support, and the data jointly form an information source of a data warehouse; in control management, data information from an information source and influencing a data warehouse is converted into a data warehouse mode by establishing an extractor; when data in the information source changes, the integrator filters, aggregates and merges the information with other information to integrate the new information into the data warehouse.

And converting the existing business expansion file stored in the power system according to the data warehouse mode, and processing the data in the business expansion file.

The business expansion device is an electric power service plan which is made by taking the operation of an electric power system power grid as a basis and meeting the requirements of users as targets. Data errors caused by sensor faults or parameter misdetections are inevitably encountered during long-term operation of the power system. In order to avoid the influence on data mining, the combined prediction model is adopted to process the data. The combined prediction method is used for predicting the same problem by adopting more than two different prediction methods. It can be a combination of several quantitative methods or a combination of several qualitative methods, but in practice it is more often the case that a combination of qualitative and quantitative methods is used. The main purpose of the combination is to comprehensively utilize information provided by various methods and improve the prediction accuracy as much as possible.

The neural network combiner selects a BP network structure with an input layer, a hidden layer and an output layer, and a Sigmoid function is adopted as an activation function; the learning rate adjustment formula is as follows by adopting an improved BP algorithm of the self-adaptive learning rate:

（1）

in formula (1), η (k) is the learning rate of the step; e (k) is the k step error of the network. When adjusting the learning rate, first checking whether the correction of the weight can reduce the error function; if the error is reduced, the learning rate is too small and should be properly increased; otherwise, the learning speed is reduced, so that the learning speed changes with the error, the learning step length is increased and tends to be stable, and the convergence speed of the neural network is accelerated. The specific calculation steps are as follows:

(1) reading the structural information and the data information of the neural network;

(2) initializing hidden layer weights V of neural network_pOutput layer weight W_pHidden layer weight Ψ, output layer threshold φ, assigned random value [ -1,1 [ ]]Initial values of learning rate and momentum factor;

(3) inputting sample data, and normalizing input and output data;

(4) selectingA sample is selected; the actual output y, the output of the network is calculated in the forward direction, and the output h is given_pThe actual output of the network is calculated forward, the output y of the hidden layer and the output layer_iThe calculation formula of (a) is as follows:

(2)

in formula (2), f (x) is a sigmoid function; m is the number of hidden layer nodes, and the number of hidden layers is detected through an empirical formula to determine:

(3)

in formula (3), p is the number of training samples and is the number of nodes of the input layer. Processing the extended file data of the business report by adopting a GARCH model in a hidden layer; the GARCH (1, 1) model expression is as follows:

（4）

in the formula (4), α₀、α₁And β are both constants. Alpha is alpha₀›1,α₁≥1,β≥0 ，h_tIs Z_tA condition variable of (a);

(5) calculating the error of an output layer, and counting the mean square error of the sample;

(6) the learning rate, the reverse correction weight, and the threshold are corrected. If the error precision is met, outputting a result; otherwise, turning to the step (4);

according to the process, the data are processed by adopting a combined prediction model, the processed business expansion data are combined into a data set, and the data are analyzed according to a clustering mining principle.

Cluster mining and business expansion archive data analysis:

aiming at the defects of the classic K-means algorithm, the genetic algorithm is adopted for improvement, and the improved DGK-means algorithm is utilized for mining and analyzing the file data applied and installed in the power service expansion. The DGK mean algorithm is described in detail as follows:

(1) firstly, encoding business expansion archive data; for the application in this patent, the number of clusters is an integer and 8 bytes are used for encoding;

(2) setting genetic parameters: group size q, crossover probability P_cProbability of mutation P_qThe ratio of new chromosomes in the new generation G and the maximum iteration number T;

(3) randomly generating an initial population, and generating a q initial population by using a random function;

(4) calculating the fitness of each individual in the population; the fitness is a parameter for evaluating the quality of the coding individual, namely the quality of the cluster number, and is used for judging whether the obtained cluster number accords with the distribution characteristics of the data. In previous studies, this number of clusters was used as input to the K-means clustering. Because the purpose of using the genetic algorithm is to find the optimal k value and the initial center j of the k-means clustering algorithm, which is the first step of the k-means algorithm, the accurate initial center does not need to be found, and good results can be obtained only by finding a place close to the center. The K-means algorithm is further optimized according to the initial center to find the final clustering center, so that the efficiency is influenced by clustering by using the K-means algorithm. In response to this deficiency, the present patent implements improved density-based clustering. The clustering steps are as follows:

1) performing initial clustering according to the density initial center selection method, wherein the clustering number is q;

2) each core data represents a cluster, and the average value of the data in the cluster is calculated to obtain a new cluster center of the cluster; the calculation formula is as follows:

（5）

in the formula (5), x_jCore data for each cluster; l c_iI is the total amount of data in the cluster;

3) respectively calculating the fitness of the clustering results; the calculation formula is as follows:

（6）

（7）

（8）

in the above formula, c_iIs the center point of the i cluster; db_cRepresents the distance between clusters of the entire dataset; d_cIs the intra-cluster distance of the data set; d (c)_i,c_j) Is the euclidean distance between two points; k is the number of clusters; num_iIs the number of data points contained in the i cluster; p is a radical of_i,jIs the j point of the i cluster, c_iIs the center point of the i cluster; reducing the number of clusters and merging the two clusters having the closest cluster spacing; the center of the newly synthesized cluster is the average of all the data objects of the original two clusters.

(5) For each iteration, the fitness of the data cluster results is calculated. Obtaining new individuals through selection, crossing and variation; repeating these steps until a maximum number of iterations is reached; and taking the k value of the maximum fitness in the new generation population as the optimal k value and the initial center, and outputting the result.

According to the clustering mining result of the algorithm, business expansion data are divided according to different user requirements, user groups and the relevance between the data, the relation between different power service requirements and the business expansion data is analyzed, and support is provided for improvement of related service efficiency in the future.

According to the content, the patent utilizes big data technology to mine and analyze the business report extension file of the power enterprise. And acquiring the deep-level relation in the archive data by a mining analysis method, and providing data and technical support for improvement of the related service level of the power enterprise in future.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.

Claims

1. A business expansion filing mining analysis method based on big data technology comprises the establishment of a business expansion filing data warehouse and a business expansion filing data mining processing method; the method is characterized in that: processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data.

2. The business expansion archive mining analysis method based on big data technology as claimed in claim 1, wherein the method comprises the following steps:

the state data layer stores the latest detailed data, and when the data enters the data warehouse from the outside, the data is directly put into the state data layer; it can be processed by a general database, such data also being referred to as basic data of the system;

the historical data layer decides whether the information stored in the historical data layer and the comprehensive data layer is the general trend reflected by the basic data or the trend changing along with time, and classifies, decomposes, summarizes and processes the basic data to acquire the information; generating historical data by the basic data under a time control mechanism, and putting the historical data into a historical data layer for a current data layer, a comprehensive data layer and a special data layer to call;

the comprehensive data layer integrates and extracts basic data under a comprehensive mechanism to generate comprehensive data, and the comprehensive data is put into the comprehensive data layer and comprises various statistical data, indexes, evaluation calculation results and prediction analysis data;

the special data layer is used for storing extended data of the industry, namely special data formed by processing basic data through a stored data analysis technology;

3. The business expansion archive mining analysis method based on big data technology as claimed in claim 1, wherein the method comprises the following steps:

the business expansion filing data mining processing method adopts a combined prediction model to process data, and obtains a deep level relation in filing data;

（1）

in formula (1), η (k) is the learning rate of the step; e (k) is the k step error of the network,

when adjusting the learning rate, first checking whether the correction of the weight can reduce the error function; if the error is reduced, the learning rate is too small and should be properly increased; otherwise, the learning speed is reduced, the learning speed is changed along with the error, the learning step length is increased and tends to be stable, and the convergence speed of the neural network is accelerated;

the specific calculation steps are as follows:

(3) inputting sample data, and normalizing input and output data;

(4) selecting a sample; the actual output y, the output of the network is calculated in the forward direction, and the output h is given_pThe actual output of the network is calculated forward, the output y of the hidden layer and the output layer_iThe calculation formula of (a) is as follows:

(2)

(3)

in formula (3), p is the number of training samples, the number of nodes of the input layer,

processing the extended file data of the business report by adopting a GARCH model in a hidden layer; the GARCH (1, 1) model expression is as follows:

（4）

in the formula (4), α₀、α₁And beta are both constant values, and are,

α₀›1,α₁≥1,β≥0 ，h_tis Z_tA condition variable of (a);

(6) correcting the learning rate, the reverse correction weight and the threshold;

if the error precision is met, outputting a result; otherwise, turning to the step (4);

according to the process, the data are processed by adopting a combined prediction model, the processed business expansion filing data form a data set, and the business expansion filing data are analyzed according to a clustering mining principle.

4. The business expansion archive mining analysis method based on big data technology as claimed in claim 3, wherein the method comprises the following steps:

the business expansion archive data analysis is characterized in that aiming at the defects of a classic K-means algorithm, a genetic algorithm is adopted to improve the business expansion archive data analysis, and an improved DGK-means algorithm is utilized to mine and analyze file data applied and installed in power service expansion;

the DGK mean algorithm is described in detail as follows:

(4) calculating the fitness of each individual in the population; the fitness is a parameter for evaluating the advantages and disadvantages of the coding individuals, namely the advantages and disadvantages of the cluster number, and is used for judging whether the obtained cluster number accords with the distribution characteristics of the data; the clustering steps are as follows:

（5）

（6）

（7）

（8）

in the above formula, c_iIs the center point of the i cluster; db_cRepresents the distance between clusters of the entire dataset; d_cIs the intra-cluster distance of the data set; d (c)_i,c_j) Is the euclidean distance between two points; k is the number of clusters; num_iIs the number of data points contained in the i cluster; p is a radical of_i,jIs the j point of the i cluster, c_iIs the center point of the i cluster; reducing the number of clusters and merging the two clusters having the closest cluster spacing; the center of the newly synthesized cluster is the average of all data objects of the original two clusters;

(5) for each iteration, calculating the fitness of the data cluster result;

obtaining new individuals through selection, crossing and variation; repeating these steps until a maximum number of iterations is reached; taking the k value of the maximum fitness in the new generation group as the optimal k value and an initial center, and outputting a result;