CN114064794A - Business expansion file mining and analyzing method based on big data technology - Google Patents

Business expansion file mining and analyzing method based on big data technology Download PDF

Info

Publication number
CN114064794A
CN114064794A CN202111453678.1A CN202111453678A CN114064794A CN 114064794 A CN114064794 A CN 114064794A CN 202111453678 A CN202111453678 A CN 202111453678A CN 114064794 A CN114064794 A CN 114064794A
Authority
CN
China
Prior art keywords
data
layer
business expansion
cluster
mining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111453678.1A
Other languages
Chinese (zh)
Inventor
高宁
曾玲
梁海洪
沈晓舟
张博
左越
张蓓
董阳
付临
周鑫
刘蕤
付海东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Liaoning Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Liaoning Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Liaoning Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202111453678.1A priority Critical patent/CN114064794A/en
Publication of CN114064794A publication Critical patent/CN114064794A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)

Abstract

The invention provides a business expansion archive mining and analyzing method based on big data technology. The method comprises the steps of establishing a business expansion filing data warehouse and mining and processing the business expansion filing data; processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data. The business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management. The business expansion filing data mining processing method adopts a combined prediction model to process data and obtain the deep level relation in the filing data. The invention has higher data processing efficiency, shortens the business expansion time and improves the economic benefit of enterprises. The method is suitable for being applied as a business expansion archive mining and analyzing method based on big data technology.

Description

Business expansion file mining and analyzing method based on big data technology
Technical Field
The invention relates to business expansion installation in the field of electric power, in particular to a business expansion installation archive mining analysis method based on a big data technology.
Background
The business expansion device is used for accepting customer electricity utilization application and making a safe, economic and reasonable power supply scheme according to the customer electricity utilization requirement and by combining the condition of a power supply network. Determining the investment of power supply projects, organizing the design and implementation of the power supply projects, organizing, coordinating and checking the design and implementation of internal projects of power customers, signing power supply and utilization contracts, installing meters, powering on and the like. The system is a general name of the service flow of the power supply department in the whole process from the customer application of power to the actual power utilization.
At present, a large amount of data information is generated in the process of business expansion and installation. The traditional data analysis method of the business expansion file only establishes a single data analysis model, and does not determine the deep level relation among data, so that the invalid utilization of the file data is caused.
Disclosure of Invention
In order to solve the problem of data utilization of the business expansion archive, the invention provides a business expansion archive mining and analyzing method based on a big data technology. The technical problem of the utilization of business expansion file data is solved by establishing a business expansion file data warehouse and processing the business expansion file data by using a combined prediction model.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a business expansion filing mining analysis method based on big data technology comprises the establishment of a business expansion filing data warehouse and a business expansion filing data mining processing method; processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data.
The business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management.
The business expansion filing data mining processing method adopts a combined prediction model to process data and obtain the deep level relation in the filing data.
The method has the advantages that the business expansion filing data are processed by utilizing the combined prediction model on the basis of establishing the business expansion filing data warehouse. After the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data. The analysis method has high data processing efficiency, and can effectively shorten the business expansion time and improve the economic benefit of enterprises by applying the analysis method to actual power operation. The method is suitable for being applied as a business expansion archive mining and analyzing method based on big data technology.
Drawings
FIG. 1 is a business expansion archive data warehouse organizational chart.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in the figure, the business expansion archive mining analysis method based on the big data technology comprises a business expansion archive data warehouse establishment method and a business expansion archive data mining processing method; processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data.
The business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management;
the state data layer stores the latest detailed data, and when the data enters the data warehouse from the outside, the data is directly put into the state data layer; it can be processed by a general database, such data also being referred to as the underlying data of the system.
The historical data layer decides whether the information stored in the historical data layer and the comprehensive data layer is the general trend reflected by the basic data or the trend changing along with time, and classifies, decomposes, summarizes and processes the basic data to acquire the information; and generating historical data by the basic data under a time control mechanism, and putting the historical data into a historical data layer for a current data layer, a comprehensive data layer and a special data layer to call.
The comprehensive data layer integrates and extracts basic data under a comprehensive mechanism to generate comprehensive data, and the comprehensive data is put into the comprehensive data layer and comprises various statistical data, indexes, evaluation calculation results and prediction analysis data.
The special data layer stores extended data of the industry, namely special data formed by processing basic data through a stored data analysis technology.
In the control management, in addition to the four data layers, external data of the power business department are needed to be used for decision support, and the data jointly form an information source of a data warehouse; in control management, data information from an information source and influencing a data warehouse is converted into a data warehouse mode by establishing an extractor; when data in the information source changes, the integrator filters, aggregates and merges the information with other information to integrate the new information into the data warehouse.
And converting the existing business expansion file stored in the power system according to the data warehouse mode, and processing the data in the business expansion file.
The business expansion device is an electric power service plan which is made by taking the operation of an electric power system power grid as a basis and meeting the requirements of users as targets. Data errors caused by sensor faults or parameter misdetections are inevitably encountered during long-term operation of the power system. In order to avoid the influence on data mining, the combined prediction model is adopted to process the data. The combined prediction method is used for predicting the same problem by adopting more than two different prediction methods. It can be a combination of several quantitative methods or a combination of several qualitative methods, but in practice it is more often the case that a combination of qualitative and quantitative methods is used. The main purpose of the combination is to comprehensively utilize information provided by various methods and improve the prediction accuracy as much as possible.
The business expansion filing data mining processing method adopts a combined prediction model to process data and obtain the deep level relation in the filing data.
The neural network combiner selects a BP network structure with an input layer, a hidden layer and an output layer, and a Sigmoid function is adopted as an activation function; the learning rate adjustment formula is as follows by adopting an improved BP algorithm of the self-adaptive learning rate:
Figure 932169DEST_PATH_IMAGE002
(1)
in formula (1), η (k) is the learning rate of the step; e (k) is the k step error of the network. When adjusting the learning rate, first checking whether the correction of the weight can reduce the error function; if the error is reduced, the learning rate is too small and should be properly increased; otherwise, the learning speed is reduced, so that the learning speed changes with the error, the learning step length is increased and tends to be stable, and the convergence speed of the neural network is accelerated. The specific calculation steps are as follows:
(1) reading the structural information and the data information of the neural network;
(2) initializing hidden layer weights V of neural networkpOutput layer weight WpHidden layer weight Ψ, output layer threshold φ, assigned random value [ -1,1 [ ]]Initial values of learning rate and momentum factor;
(3) inputting sample data, and normalizing input and output data;
(4) selectingA sample is selected; the actual output y, the output of the network is calculated in the forward direction, and the output h is givenpThe actual output of the network is calculated forward, the output y of the hidden layer and the output layeriThe calculation formula of (a) is as follows:
Figure DEST_PATH_IMAGE003
(2)
in formula (2), f (x) is a sigmoid function; m is the number of hidden layer nodes, and the number of hidden layers is detected through an empirical formula to determine:
Figure 100002_DEST_PATH_IMAGE005
(3)
in formula (3), p is the number of training samples and is the number of nodes of the input layer. Processing the extended file data of the business report by adopting a GARCH model in a hidden layer; the GARCH (1, 1) model expression is as follows:
Figure 486647DEST_PATH_IMAGE006
(4)
in the formula (4), α0、α1And β are both constants. Alpha is alpha0›1,α1≥1,β≥0 ,htIs ZtA condition variable of (a);
(5) calculating the error of an output layer, and counting the mean square error of the sample;
(6) the learning rate, the reverse correction weight, and the threshold are corrected. If the error precision is met, outputting a result; otherwise, turning to the step (4);
according to the process, the data are processed by adopting a combined prediction model, the processed business expansion data are combined into a data set, and the data are analyzed according to a clustering mining principle.
Cluster mining and business expansion archive data analysis:
aiming at the defects of the classic K-means algorithm, the genetic algorithm is adopted for improvement, and the improved DGK-means algorithm is utilized for mining and analyzing the file data applied and installed in the power service expansion. The DGK mean algorithm is described in detail as follows:
(1) firstly, encoding business expansion archive data; for the application in this patent, the number of clusters is an integer and 8 bytes are used for encoding;
(2) setting genetic parameters: group size q, crossover probability PcProbability of mutation PqThe ratio of new chromosomes in the new generation G and the maximum iteration number T;
(3) randomly generating an initial population, and generating a q initial population by using a random function;
(4) calculating the fitness of each individual in the population; the fitness is a parameter for evaluating the quality of the coding individual, namely the quality of the cluster number, and is used for judging whether the obtained cluster number accords with the distribution characteristics of the data. In previous studies, this number of clusters was used as input to the K-means clustering. Because the purpose of using the genetic algorithm is to find the optimal k value and the initial center j of the k-means clustering algorithm, which is the first step of the k-means algorithm, the accurate initial center does not need to be found, and good results can be obtained only by finding a place close to the center. The K-means algorithm is further optimized according to the initial center to find the final clustering center, so that the efficiency is influenced by clustering by using the K-means algorithm. In response to this deficiency, the present patent implements improved density-based clustering. The clustering steps are as follows:
1) performing initial clustering according to the density initial center selection method, wherein the clustering number is q;
2) each core data represents a cluster, and the average value of the data in the cluster is calculated to obtain a new cluster center of the cluster; the calculation formula is as follows:
Figure DEST_PATH_IMAGE007
(5)
in the formula (5), xjCore data for each cluster; l ciI is the total amount of data in the cluster;
3) respectively calculating the fitness of the clustering results; the calculation formula is as follows:
Figure DEST_PATH_IMAGE009
(6)
Figure 292054DEST_PATH_IMAGE010
(7)
Figure DEST_PATH_IMAGE011
(8)
in the above formula, ciIs the center point of the i cluster; dbcRepresents the distance between clusters of the entire dataset; dcIs the intra-cluster distance of the data set; d (c)i,cj) Is the euclidean distance between two points; k is the number of clusters; numiIs the number of data points contained in the i cluster; p is a radical ofi,jIs the j point of the i cluster, ciIs the center point of the i cluster; reducing the number of clusters and merging the two clusters having the closest cluster spacing; the center of the newly synthesized cluster is the average of all the data objects of the original two clusters.
(5) For each iteration, the fitness of the data cluster results is calculated. Obtaining new individuals through selection, crossing and variation; repeating these steps until a maximum number of iterations is reached; and taking the k value of the maximum fitness in the new generation population as the optimal k value and the initial center, and outputting the result.
According to the clustering mining result of the algorithm, business expansion data are divided according to different user requirements, user groups and the relevance between the data, the relation between different power service requirements and the business expansion data is analyzed, and support is provided for improvement of related service efficiency in the future.
According to the content, the patent utilizes big data technology to mine and analyze the business report extension file of the power enterprise. And acquiring the deep-level relation in the archive data by a mining analysis method, and providing data and technical support for improvement of the related service level of the power enterprise in future.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.

Claims (4)

1. A business expansion filing mining analysis method based on big data technology comprises the establishment of a business expansion filing data warehouse and a business expansion filing data mining processing method; the method is characterized in that: processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data.
2. The business expansion archive mining analysis method based on big data technology as claimed in claim 1, wherein the method comprises the following steps:
the business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management;
the state data layer stores the latest detailed data, and when the data enters the data warehouse from the outside, the data is directly put into the state data layer; it can be processed by a general database, such data also being referred to as basic data of the system;
the historical data layer decides whether the information stored in the historical data layer and the comprehensive data layer is the general trend reflected by the basic data or the trend changing along with time, and classifies, decomposes, summarizes and processes the basic data to acquire the information; generating historical data by the basic data under a time control mechanism, and putting the historical data into a historical data layer for a current data layer, a comprehensive data layer and a special data layer to call;
the comprehensive data layer integrates and extracts basic data under a comprehensive mechanism to generate comprehensive data, and the comprehensive data is put into the comprehensive data layer and comprises various statistical data, indexes, evaluation calculation results and prediction analysis data;
the special data layer is used for storing extended data of the industry, namely special data formed by processing basic data through a stored data analysis technology;
in the control management, in addition to the four data layers, external data of the power business department are needed to be used for decision support, and the data jointly form an information source of a data warehouse; in control management, data information from an information source and influencing a data warehouse is converted into a data warehouse mode by establishing an extractor; when data in the information source changes, the integrator filters, aggregates and merges the information with other information to integrate the new information into the data warehouse.
3. The business expansion archive mining analysis method based on big data technology as claimed in claim 1, wherein the method comprises the following steps:
the business expansion filing data mining processing method adopts a combined prediction model to process data, and obtains a deep level relation in filing data;
the neural network combiner selects a BP network structure with an input layer, a hidden layer and an output layer, and a Sigmoid function is adopted as an activation function; the learning rate adjustment formula is as follows by adopting an improved BP algorithm of the self-adaptive learning rate:
Figure 190804DEST_PATH_IMAGE001
(1)
in formula (1), η (k) is the learning rate of the step; e (k) is the k step error of the network,
when adjusting the learning rate, first checking whether the correction of the weight can reduce the error function; if the error is reduced, the learning rate is too small and should be properly increased; otherwise, the learning speed is reduced, the learning speed is changed along with the error, the learning step length is increased and tends to be stable, and the convergence speed of the neural network is accelerated;
the specific calculation steps are as follows:
(1) reading the structural information and the data information of the neural network;
(2) initializing hidden layer weights V of neural networkpOutput layer weight WpHidden layer weight Ψ, output layer threshold φ, assigned random value [ -1,1 [ ]]Initial values of learning rate and momentum factor;
(3) inputting sample data, and normalizing input and output data;
(4) selecting a sample; the actual output y, the output of the network is calculated in the forward direction, and the output h is givenpThe actual output of the network is calculated forward, the output y of the hidden layer and the output layeriThe calculation formula of (a) is as follows:
Figure 834275DEST_PATH_IMAGE002
(2)
in formula (2), f (x) is a sigmoid function; m is the number of hidden layer nodes, and the number of hidden layers is detected through an empirical formula to determine:
Figure 169442DEST_PATH_IMAGE003
(3)
in formula (3), p is the number of training samples, the number of nodes of the input layer,
processing the extended file data of the business report by adopting a GARCH model in a hidden layer; the GARCH (1, 1) model expression is as follows:
Figure 462145DEST_PATH_IMAGE004
(4)
in the formula (4), α0、α1And beta are both constant values, and are,
α0›1,α1≥1,β≥0 ,htis ZtA condition variable of (a);
(5) calculating the error of an output layer, and counting the mean square error of the sample;
(6) correcting the learning rate, the reverse correction weight and the threshold;
if the error precision is met, outputting a result; otherwise, turning to the step (4);
according to the process, the data are processed by adopting a combined prediction model, the processed business expansion filing data form a data set, and the business expansion filing data are analyzed according to a clustering mining principle.
4. The business expansion archive mining analysis method based on big data technology as claimed in claim 3, wherein the method comprises the following steps:
the business expansion archive data analysis is characterized in that aiming at the defects of a classic K-means algorithm, a genetic algorithm is adopted to improve the business expansion archive data analysis, and an improved DGK-means algorithm is utilized to mine and analyze file data applied and installed in power service expansion;
the DGK mean algorithm is described in detail as follows:
(1) firstly, encoding business expansion archive data; for the application in this patent, the number of clusters is an integer and 8 bytes are used for encoding;
(2) setting genetic parameters: group size q, crossover probability PcProbability of mutation PqThe ratio of new chromosomes in the new generation G and the maximum iteration number T;
(3) randomly generating an initial population, and generating a q initial population by using a random function;
(4) calculating the fitness of each individual in the population; the fitness is a parameter for evaluating the advantages and disadvantages of the coding individuals, namely the advantages and disadvantages of the cluster number, and is used for judging whether the obtained cluster number accords with the distribution characteristics of the data; the clustering steps are as follows:
1) performing initial clustering according to the density initial center selection method, wherein the clustering number is q;
2) each core data represents a cluster, and the average value of the data in the cluster is calculated to obtain a new cluster center of the cluster; the calculation formula is as follows:
Figure DEST_PATH_IMAGE005
(5)
in the formula (5), xjCore data for each cluster; l ciI is the total amount of data in the cluster;
3) respectively calculating the fitness of the clustering results; the calculation formula is as follows:
Figure 540960DEST_PATH_IMAGE006
(6)
Figure 38937DEST_PATH_IMAGE007
(7)
Figure 872901DEST_PATH_IMAGE008
(8)
in the above formula, ciIs the center point of the i cluster; dbcRepresents the distance between clusters of the entire dataset; dcIs the intra-cluster distance of the data set; d (c)i,cj) Is the euclidean distance between two points; k is the number of clusters; numiIs the number of data points contained in the i cluster; p is a radical ofi,jIs the j point of the i cluster, ciIs the center point of the i cluster; reducing the number of clusters and merging the two clusters having the closest cluster spacing; the center of the newly synthesized cluster is the average of all data objects of the original two clusters;
(5) for each iteration, calculating the fitness of the data cluster result;
obtaining new individuals through selection, crossing and variation; repeating these steps until a maximum number of iterations is reached; taking the k value of the maximum fitness in the new generation group as the optimal k value and an initial center, and outputting a result;
according to the clustering mining result of the algorithm, business expansion data are divided according to different user requirements, user groups and the relevance between the data, the relation between different power service requirements and the business expansion data is analyzed, and support is provided for improvement of related service efficiency in the future.
CN202111453678.1A 2021-12-01 2021-12-01 Business expansion file mining and analyzing method based on big data technology Pending CN114064794A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111453678.1A CN114064794A (en) 2021-12-01 2021-12-01 Business expansion file mining and analyzing method based on big data technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111453678.1A CN114064794A (en) 2021-12-01 2021-12-01 Business expansion file mining and analyzing method based on big data technology

Publications (1)

Publication Number Publication Date
CN114064794A true CN114064794A (en) 2022-02-18

Family

ID=80228184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111453678.1A Pending CN114064794A (en) 2021-12-01 2021-12-01 Business expansion file mining and analyzing method based on big data technology

Country Status (1)

Country Link
CN (1) CN114064794A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203344A (en) * 2022-09-14 2022-10-18 广东电网有限责任公司东莞供电局 Method for establishing power grid big data warehouse based on CIM model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537433A (en) * 2014-12-18 2015-04-22 国网冀北电力有限公司 Sold electricity quantity prediction method based on inventory capacities and business expansion characteristics
CN105809277A (en) * 2016-03-03 2016-07-27 国网浙江省电力公司 Big data based prediction method for the refining and managing of electric power marketing inspection
US20170161606A1 (en) * 2015-12-06 2017-06-08 Beijing University Of Technology Clustering method based on iterations of neural networks
CN109145031A (en) * 2018-08-20 2019-01-04 国网安徽省电力有限公司合肥供电公司 A kind of multi-source data multidimensional reconstructing method of service-oriented market access demand
CN109830303A (en) * 2019-02-01 2019-05-31 上海众恒信息产业股份有限公司 Clinical data mining analysis and aid decision-making method based on internet integration medical platform
CN112612820A (en) * 2020-12-07 2021-04-06 国网北京市电力公司 Data processing method and device, computer readable storage medium and processor
CN113361750A (en) * 2021-05-17 2021-09-07 国网安徽省电力有限公司淮北供电公司 Electricity sales amount prediction method based on business expansion large data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537433A (en) * 2014-12-18 2015-04-22 国网冀北电力有限公司 Sold electricity quantity prediction method based on inventory capacities and business expansion characteristics
US20170161606A1 (en) * 2015-12-06 2017-06-08 Beijing University Of Technology Clustering method based on iterations of neural networks
CN105809277A (en) * 2016-03-03 2016-07-27 国网浙江省电力公司 Big data based prediction method for the refining and managing of electric power marketing inspection
CN109145031A (en) * 2018-08-20 2019-01-04 国网安徽省电力有限公司合肥供电公司 A kind of multi-source data multidimensional reconstructing method of service-oriented market access demand
CN109830303A (en) * 2019-02-01 2019-05-31 上海众恒信息产业股份有限公司 Clinical data mining analysis and aid decision-making method based on internet integration medical platform
CN112612820A (en) * 2020-12-07 2021-04-06 国网北京市电力公司 Data processing method and device, computer readable storage medium and processor
CN113361750A (en) * 2021-05-17 2021-09-07 国网安徽省电力有限公司淮北供电公司 Electricity sales amount prediction method based on business expansion large data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAIHONG LIANG 等: "Big data technology-based mining and analysis of application and installation in power business expanding", IEEE, 4 October 2021 (2021-10-04), pages 304 - 308 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203344A (en) * 2022-09-14 2022-10-18 广东电网有限责任公司东莞供电局 Method for establishing power grid big data warehouse based on CIM model

Similar Documents

Publication Publication Date Title
CN108876054B (en) Short-term power load prediction method based on improved genetic algorithm optimization extreme learning machine
CN108846517B (en) Integration method for predicating quantile probabilistic short-term power load
CN110674999A (en) Cell load prediction method based on improved clustering and long-short term memory deep learning
CN105023042A (en) User electricity stealing suspicion analyzing device and method based on big data neural network algorithm
CN110766190A (en) Power distribution network load prediction method
CN109861211A (en) A kind of power distribution network dynamic reconfiguration method based on data-driven
CN110929399A (en) Wind power output typical scene generation method based on BIRCH clustering and Wasserstein distance
CN113139596A (en) Optimization algorithm of low-voltage transformer area line loss neural network
CN112580174A (en) Power distribution network line loss rate calculation method based on genetic algorithm optimization neural network
CN115809719A (en) Short-term load prediction correction method based on morphological clustering
CN115470862A (en) Dynamic self-adaptive load prediction model combination method
CN114064794A (en) Business expansion file mining and analyzing method based on big data technology
CN111832839A (en) Energy consumption prediction method based on sufficient incremental learning
CN112990776B (en) Distribution network equipment health degree evaluation method
CN110956304A (en) Distributed photovoltaic power generation capacity short-term prediction method based on GA-RBM
CN112149052A (en) Daily load curve clustering method based on PLR-DTW
CN116561569A (en) Industrial power load identification method based on EO feature selection and AdaBoost algorithm
CN112766537B (en) Short-term electric load prediction method
CN115619028A (en) Clustering algorithm fusion-based power load accurate prediction method
CN115081551A (en) RVM line loss model building method and system based on K-Means clustering and optimization
CN114676931A (en) Electric quantity prediction system based on data relay technology
CN112667394A (en) Computer resource utilization rate optimization method
CN112633565A (en) Photovoltaic power aggregation interval prediction method
CN112508239A (en) Energy storage output prediction method based on VAE-CGAN
CN114781685B (en) Large user electricity load prediction method and system based on big data mining technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination