CN114064794A - Business expansion file mining and analyzing method based on big data technology - Google Patents
Business expansion file mining and analyzing method based on big data technology Download PDFInfo
- Publication number
- CN114064794A CN114064794A CN202111453678.1A CN202111453678A CN114064794A CN 114064794 A CN114064794 A CN 114064794A CN 202111453678 A CN202111453678 A CN 202111453678A CN 114064794 A CN114064794 A CN 114064794A
- Authority
- CN
- China
- Prior art keywords
- data
- layer
- business expansion
- cluster
- mining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000005065 mining Methods 0.000 title claims abstract description 21
- 238000005516 engineering process Methods 0.000 title claims abstract description 16
- 238000007418 data mining Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 13
- 230000008569 process Effects 0.000 claims abstract description 11
- 230000002068 genetic effect Effects 0.000 claims abstract description 10
- 238000003064 k means clustering Methods 0.000 claims abstract description 7
- 238000003672 processing method Methods 0.000 claims abstract description 7
- 230000008901 benefit Effects 0.000 claims abstract description 5
- 238000004458 analytical method Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000007405 data analysis Methods 0.000 claims description 7
- 238000012937 correction Methods 0.000 claims description 4
- 230000006872 improvement Effects 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 2
- 210000000349 chromosome Anatomy 0.000 claims description 2
- 230000007547 defect Effects 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 claims description 2
- 230000035772 mutation Effects 0.000 claims description 2
- 238000010187 selection method Methods 0.000 claims description 2
- 238000012549 training Methods 0.000 claims description 2
- 238000009434 installation Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06315—Needs-based resource requirements planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Human Resources & Organizations (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Economics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Probability & Statistics with Applications (AREA)
- Water Supply & Treatment (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Genetics & Genomics (AREA)
- Physiology (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
Abstract
The invention provides a business expansion archive mining and analyzing method based on big data technology. The method comprises the steps of establishing a business expansion filing data warehouse and mining and processing the business expansion filing data; processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data. The business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management. The business expansion filing data mining processing method adopts a combined prediction model to process data and obtain the deep level relation in the filing data. The invention has higher data processing efficiency, shortens the business expansion time and improves the economic benefit of enterprises. The method is suitable for being applied as a business expansion archive mining and analyzing method based on big data technology.
Description
Technical Field
The invention relates to business expansion installation in the field of electric power, in particular to a business expansion installation archive mining analysis method based on a big data technology.
Background
The business expansion device is used for accepting customer electricity utilization application and making a safe, economic and reasonable power supply scheme according to the customer electricity utilization requirement and by combining the condition of a power supply network. Determining the investment of power supply projects, organizing the design and implementation of the power supply projects, organizing, coordinating and checking the design and implementation of internal projects of power customers, signing power supply and utilization contracts, installing meters, powering on and the like. The system is a general name of the service flow of the power supply department in the whole process from the customer application of power to the actual power utilization.
At present, a large amount of data information is generated in the process of business expansion and installation. The traditional data analysis method of the business expansion file only establishes a single data analysis model, and does not determine the deep level relation among data, so that the invalid utilization of the file data is caused.
Disclosure of Invention
In order to solve the problem of data utilization of the business expansion archive, the invention provides a business expansion archive mining and analyzing method based on a big data technology. The technical problem of the utilization of business expansion file data is solved by establishing a business expansion file data warehouse and processing the business expansion file data by using a combined prediction model.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a business expansion filing mining analysis method based on big data technology comprises the establishment of a business expansion filing data warehouse and a business expansion filing data mining processing method; processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data.
The business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management.
The business expansion filing data mining processing method adopts a combined prediction model to process data and obtain the deep level relation in the filing data.
The method has the advantages that the business expansion filing data are processed by utilizing the combined prediction model on the basis of establishing the business expansion filing data warehouse. After the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data. The analysis method has high data processing efficiency, and can effectively shorten the business expansion time and improve the economic benefit of enterprises by applying the analysis method to actual power operation. The method is suitable for being applied as a business expansion archive mining and analyzing method based on big data technology.
Drawings
FIG. 1 is a business expansion archive data warehouse organizational chart.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in the figure, the business expansion archive mining analysis method based on the big data technology comprises a business expansion archive data warehouse establishment method and a business expansion archive data mining processing method; processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data.
The business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management;
the state data layer stores the latest detailed data, and when the data enters the data warehouse from the outside, the data is directly put into the state data layer; it can be processed by a general database, such data also being referred to as the underlying data of the system.
The historical data layer decides whether the information stored in the historical data layer and the comprehensive data layer is the general trend reflected by the basic data or the trend changing along with time, and classifies, decomposes, summarizes and processes the basic data to acquire the information; and generating historical data by the basic data under a time control mechanism, and putting the historical data into a historical data layer for a current data layer, a comprehensive data layer and a special data layer to call.
The comprehensive data layer integrates and extracts basic data under a comprehensive mechanism to generate comprehensive data, and the comprehensive data is put into the comprehensive data layer and comprises various statistical data, indexes, evaluation calculation results and prediction analysis data.
The special data layer stores extended data of the industry, namely special data formed by processing basic data through a stored data analysis technology.
In the control management, in addition to the four data layers, external data of the power business department are needed to be used for decision support, and the data jointly form an information source of a data warehouse; in control management, data information from an information source and influencing a data warehouse is converted into a data warehouse mode by establishing an extractor; when data in the information source changes, the integrator filters, aggregates and merges the information with other information to integrate the new information into the data warehouse.
And converting the existing business expansion file stored in the power system according to the data warehouse mode, and processing the data in the business expansion file.
The business expansion device is an electric power service plan which is made by taking the operation of an electric power system power grid as a basis and meeting the requirements of users as targets. Data errors caused by sensor faults or parameter misdetections are inevitably encountered during long-term operation of the power system. In order to avoid the influence on data mining, the combined prediction model is adopted to process the data. The combined prediction method is used for predicting the same problem by adopting more than two different prediction methods. It can be a combination of several quantitative methods or a combination of several qualitative methods, but in practice it is more often the case that a combination of qualitative and quantitative methods is used. The main purpose of the combination is to comprehensively utilize information provided by various methods and improve the prediction accuracy as much as possible.
The business expansion filing data mining processing method adopts a combined prediction model to process data and obtain the deep level relation in the filing data.
The neural network combiner selects a BP network structure with an input layer, a hidden layer and an output layer, and a Sigmoid function is adopted as an activation function; the learning rate adjustment formula is as follows by adopting an improved BP algorithm of the self-adaptive learning rate:
in formula (1), η (k) is the learning rate of the step; e (k) is the k step error of the network. When adjusting the learning rate, first checking whether the correction of the weight can reduce the error function; if the error is reduced, the learning rate is too small and should be properly increased; otherwise, the learning speed is reduced, so that the learning speed changes with the error, the learning step length is increased and tends to be stable, and the convergence speed of the neural network is accelerated. The specific calculation steps are as follows:
(1) reading the structural information and the data information of the neural network;
(2) initializing hidden layer weights V of neural networkpOutput layer weight WpHidden layer weight Ψ, output layer threshold φ, assigned random value [ -1,1 [ ]]Initial values of learning rate and momentum factor;
(3) inputting sample data, and normalizing input and output data;
(4) selectingA sample is selected; the actual output y, the output of the network is calculated in the forward direction, and the output h is givenpThe actual output of the network is calculated forward, the output y of the hidden layer and the output layeriThe calculation formula of (a) is as follows:
in formula (2), f (x) is a sigmoid function; m is the number of hidden layer nodes, and the number of hidden layers is detected through an empirical formula to determine:
in formula (3), p is the number of training samples and is the number of nodes of the input layer. Processing the extended file data of the business report by adopting a GARCH model in a hidden layer; the GARCH (1, 1) model expression is as follows:
in the formula (4), α0、α1And β are both constants. Alpha is alpha0›1,α1≥1,β≥0 ,htIs ZtA condition variable of (a);
(5) calculating the error of an output layer, and counting the mean square error of the sample;
(6) the learning rate, the reverse correction weight, and the threshold are corrected. If the error precision is met, outputting a result; otherwise, turning to the step (4);
according to the process, the data are processed by adopting a combined prediction model, the processed business expansion data are combined into a data set, and the data are analyzed according to a clustering mining principle.
Cluster mining and business expansion archive data analysis:
aiming at the defects of the classic K-means algorithm, the genetic algorithm is adopted for improvement, and the improved DGK-means algorithm is utilized for mining and analyzing the file data applied and installed in the power service expansion. The DGK mean algorithm is described in detail as follows:
(1) firstly, encoding business expansion archive data; for the application in this patent, the number of clusters is an integer and 8 bytes are used for encoding;
(2) setting genetic parameters: group size q, crossover probability PcProbability of mutation PqThe ratio of new chromosomes in the new generation G and the maximum iteration number T;
(3) randomly generating an initial population, and generating a q initial population by using a random function;
(4) calculating the fitness of each individual in the population; the fitness is a parameter for evaluating the quality of the coding individual, namely the quality of the cluster number, and is used for judging whether the obtained cluster number accords with the distribution characteristics of the data. In previous studies, this number of clusters was used as input to the K-means clustering. Because the purpose of using the genetic algorithm is to find the optimal k value and the initial center j of the k-means clustering algorithm, which is the first step of the k-means algorithm, the accurate initial center does not need to be found, and good results can be obtained only by finding a place close to the center. The K-means algorithm is further optimized according to the initial center to find the final clustering center, so that the efficiency is influenced by clustering by using the K-means algorithm. In response to this deficiency, the present patent implements improved density-based clustering. The clustering steps are as follows:
1) performing initial clustering according to the density initial center selection method, wherein the clustering number is q;
2) each core data represents a cluster, and the average value of the data in the cluster is calculated to obtain a new cluster center of the cluster; the calculation formula is as follows:
in the formula (5), xjCore data for each cluster; l ciI is the total amount of data in the cluster;
3) respectively calculating the fitness of the clustering results; the calculation formula is as follows:
in the above formula, ciIs the center point of the i cluster; dbcRepresents the distance between clusters of the entire dataset; dcIs the intra-cluster distance of the data set; d (c)i,cj) Is the euclidean distance between two points; k is the number of clusters; numiIs the number of data points contained in the i cluster; p is a radical ofi,jIs the j point of the i cluster, ciIs the center point of the i cluster; reducing the number of clusters and merging the two clusters having the closest cluster spacing; the center of the newly synthesized cluster is the average of all the data objects of the original two clusters.
(5) For each iteration, the fitness of the data cluster results is calculated. Obtaining new individuals through selection, crossing and variation; repeating these steps until a maximum number of iterations is reached; and taking the k value of the maximum fitness in the new generation population as the optimal k value and the initial center, and outputting the result.
According to the clustering mining result of the algorithm, business expansion data are divided according to different user requirements, user groups and the relevance between the data, the relation between different power service requirements and the business expansion data is analyzed, and support is provided for improvement of related service efficiency in the future.
According to the content, the patent utilizes big data technology to mine and analyze the business report extension file of the power enterprise. And acquiring the deep-level relation in the archive data by a mining analysis method, and providing data and technical support for improvement of the related service level of the power enterprise in future.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.
Claims (4)
1. A business expansion filing mining analysis method based on big data technology comprises the establishment of a business expansion filing data warehouse and a business expansion filing data mining processing method; the method is characterized in that: processing the business expansion file data by using a combined prediction model on the basis of establishing a business expansion file data warehouse; after the k-means clustering is improved by using a genetic algorithm, data mining is performed to obtain the relationship between the archived data.
2. The business expansion archive mining analysis method based on big data technology as claimed in claim 1, wherein the method comprises the following steps:
the business expansion file data warehouse is established by adopting a dimension table method and comprises a state data layer, a historical data layer, a comprehensive data layer, a special data layer and control management;
the state data layer stores the latest detailed data, and when the data enters the data warehouse from the outside, the data is directly put into the state data layer; it can be processed by a general database, such data also being referred to as basic data of the system;
the historical data layer decides whether the information stored in the historical data layer and the comprehensive data layer is the general trend reflected by the basic data or the trend changing along with time, and classifies, decomposes, summarizes and processes the basic data to acquire the information; generating historical data by the basic data under a time control mechanism, and putting the historical data into a historical data layer for a current data layer, a comprehensive data layer and a special data layer to call;
the comprehensive data layer integrates and extracts basic data under a comprehensive mechanism to generate comprehensive data, and the comprehensive data is put into the comprehensive data layer and comprises various statistical data, indexes, evaluation calculation results and prediction analysis data;
the special data layer is used for storing extended data of the industry, namely special data formed by processing basic data through a stored data analysis technology;
in the control management, in addition to the four data layers, external data of the power business department are needed to be used for decision support, and the data jointly form an information source of a data warehouse; in control management, data information from an information source and influencing a data warehouse is converted into a data warehouse mode by establishing an extractor; when data in the information source changes, the integrator filters, aggregates and merges the information with other information to integrate the new information into the data warehouse.
3. The business expansion archive mining analysis method based on big data technology as claimed in claim 1, wherein the method comprises the following steps:
the business expansion filing data mining processing method adopts a combined prediction model to process data, and obtains a deep level relation in filing data;
the neural network combiner selects a BP network structure with an input layer, a hidden layer and an output layer, and a Sigmoid function is adopted as an activation function; the learning rate adjustment formula is as follows by adopting an improved BP algorithm of the self-adaptive learning rate:
in formula (1), η (k) is the learning rate of the step; e (k) is the k step error of the network,
when adjusting the learning rate, first checking whether the correction of the weight can reduce the error function; if the error is reduced, the learning rate is too small and should be properly increased; otherwise, the learning speed is reduced, the learning speed is changed along with the error, the learning step length is increased and tends to be stable, and the convergence speed of the neural network is accelerated;
the specific calculation steps are as follows:
(1) reading the structural information and the data information of the neural network;
(2) initializing hidden layer weights V of neural networkpOutput layer weight WpHidden layer weight Ψ, output layer threshold φ, assigned random value [ -1,1 [ ]]Initial values of learning rate and momentum factor;
(3) inputting sample data, and normalizing input and output data;
(4) selecting a sample; the actual output y, the output of the network is calculated in the forward direction, and the output h is givenpThe actual output of the network is calculated forward, the output y of the hidden layer and the output layeriThe calculation formula of (a) is as follows:
in formula (2), f (x) is a sigmoid function; m is the number of hidden layer nodes, and the number of hidden layers is detected through an empirical formula to determine:
in formula (3), p is the number of training samples, the number of nodes of the input layer,
processing the extended file data of the business report by adopting a GARCH model in a hidden layer; the GARCH (1, 1) model expression is as follows:
in the formula (4), α0、α1And beta are both constant values, and are,
α0›1,α1≥1,β≥0 ,htis ZtA condition variable of (a);
(5) calculating the error of an output layer, and counting the mean square error of the sample;
(6) correcting the learning rate, the reverse correction weight and the threshold;
if the error precision is met, outputting a result; otherwise, turning to the step (4);
according to the process, the data are processed by adopting a combined prediction model, the processed business expansion filing data form a data set, and the business expansion filing data are analyzed according to a clustering mining principle.
4. The business expansion archive mining analysis method based on big data technology as claimed in claim 3, wherein the method comprises the following steps:
the business expansion archive data analysis is characterized in that aiming at the defects of a classic K-means algorithm, a genetic algorithm is adopted to improve the business expansion archive data analysis, and an improved DGK-means algorithm is utilized to mine and analyze file data applied and installed in power service expansion;
the DGK mean algorithm is described in detail as follows:
(1) firstly, encoding business expansion archive data; for the application in this patent, the number of clusters is an integer and 8 bytes are used for encoding;
(2) setting genetic parameters: group size q, crossover probability PcProbability of mutation PqThe ratio of new chromosomes in the new generation G and the maximum iteration number T;
(3) randomly generating an initial population, and generating a q initial population by using a random function;
(4) calculating the fitness of each individual in the population; the fitness is a parameter for evaluating the advantages and disadvantages of the coding individuals, namely the advantages and disadvantages of the cluster number, and is used for judging whether the obtained cluster number accords with the distribution characteristics of the data; the clustering steps are as follows:
1) performing initial clustering according to the density initial center selection method, wherein the clustering number is q;
2) each core data represents a cluster, and the average value of the data in the cluster is calculated to obtain a new cluster center of the cluster; the calculation formula is as follows:
in the formula (5), xjCore data for each cluster; l ciI is the total amount of data in the cluster;
3) respectively calculating the fitness of the clustering results; the calculation formula is as follows:
in the above formula, ciIs the center point of the i cluster; dbcRepresents the distance between clusters of the entire dataset; dcIs the intra-cluster distance of the data set; d (c)i,cj) Is the euclidean distance between two points; k is the number of clusters; numiIs the number of data points contained in the i cluster; p is a radical ofi,jIs the j point of the i cluster, ciIs the center point of the i cluster; reducing the number of clusters and merging the two clusters having the closest cluster spacing; the center of the newly synthesized cluster is the average of all data objects of the original two clusters;
(5) for each iteration, calculating the fitness of the data cluster result;
obtaining new individuals through selection, crossing and variation; repeating these steps until a maximum number of iterations is reached; taking the k value of the maximum fitness in the new generation group as the optimal k value and an initial center, and outputting a result;
according to the clustering mining result of the algorithm, business expansion data are divided according to different user requirements, user groups and the relevance between the data, the relation between different power service requirements and the business expansion data is analyzed, and support is provided for improvement of related service efficiency in the future.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111453678.1A CN114064794A (en) | 2021-12-01 | 2021-12-01 | Business expansion file mining and analyzing method based on big data technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111453678.1A CN114064794A (en) | 2021-12-01 | 2021-12-01 | Business expansion file mining and analyzing method based on big data technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114064794A true CN114064794A (en) | 2022-02-18 |
Family
ID=80228184
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111453678.1A Pending CN114064794A (en) | 2021-12-01 | 2021-12-01 | Business expansion file mining and analyzing method based on big data technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114064794A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115203344A (en) * | 2022-09-14 | 2022-10-18 | 广东电网有限责任公司东莞供电局 | Method for establishing power grid big data warehouse based on CIM model |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104537433A (en) * | 2014-12-18 | 2015-04-22 | 国网冀北电力有限公司 | Sold electricity quantity prediction method based on inventory capacities and business expansion characteristics |
CN105809277A (en) * | 2016-03-03 | 2016-07-27 | 国网浙江省电力公司 | Big data based prediction method for the refining and managing of electric power marketing inspection |
US20170161606A1 (en) * | 2015-12-06 | 2017-06-08 | Beijing University Of Technology | Clustering method based on iterations of neural networks |
CN109145031A (en) * | 2018-08-20 | 2019-01-04 | 国网安徽省电力有限公司合肥供电公司 | A kind of multi-source data multidimensional reconstructing method of service-oriented market access demand |
CN109830303A (en) * | 2019-02-01 | 2019-05-31 | 上海众恒信息产业股份有限公司 | Clinical data mining analysis and aid decision-making method based on internet integration medical platform |
CN112612820A (en) * | 2020-12-07 | 2021-04-06 | 国网北京市电力公司 | Data processing method and device, computer readable storage medium and processor |
CN113361750A (en) * | 2021-05-17 | 2021-09-07 | 国网安徽省电力有限公司淮北供电公司 | Electricity sales amount prediction method based on business expansion large data |
-
2021
- 2021-12-01 CN CN202111453678.1A patent/CN114064794A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104537433A (en) * | 2014-12-18 | 2015-04-22 | 国网冀北电力有限公司 | Sold electricity quantity prediction method based on inventory capacities and business expansion characteristics |
US20170161606A1 (en) * | 2015-12-06 | 2017-06-08 | Beijing University Of Technology | Clustering method based on iterations of neural networks |
CN105809277A (en) * | 2016-03-03 | 2016-07-27 | 国网浙江省电力公司 | Big data based prediction method for the refining and managing of electric power marketing inspection |
CN109145031A (en) * | 2018-08-20 | 2019-01-04 | 国网安徽省电力有限公司合肥供电公司 | A kind of multi-source data multidimensional reconstructing method of service-oriented market access demand |
CN109830303A (en) * | 2019-02-01 | 2019-05-31 | 上海众恒信息产业股份有限公司 | Clinical data mining analysis and aid decision-making method based on internet integration medical platform |
CN112612820A (en) * | 2020-12-07 | 2021-04-06 | 国网北京市电力公司 | Data processing method and device, computer readable storage medium and processor |
CN113361750A (en) * | 2021-05-17 | 2021-09-07 | 国网安徽省电力有限公司淮北供电公司 | Electricity sales amount prediction method based on business expansion large data |
Non-Patent Citations (1)
Title |
---|
HAIHONG LIANG 等: "Big data technology-based mining and analysis of application and installation in power business expanding", IEEE, 4 October 2021 (2021-10-04), pages 304 - 308 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115203344A (en) * | 2022-09-14 | 2022-10-18 | 广东电网有限责任公司东莞供电局 | Method for establishing power grid big data warehouse based on CIM model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108876054B (en) | Short-term power load prediction method based on improved genetic algorithm optimization extreme learning machine | |
CN108846517B (en) | Integration method for predicating quantile probabilistic short-term power load | |
CN110674999A (en) | Cell load prediction method based on improved clustering and long-short term memory deep learning | |
CN105023042A (en) | User electricity stealing suspicion analyzing device and method based on big data neural network algorithm | |
CN110766190A (en) | Power distribution network load prediction method | |
CN109861211A (en) | A kind of power distribution network dynamic reconfiguration method based on data-driven | |
CN110929399A (en) | Wind power output typical scene generation method based on BIRCH clustering and Wasserstein distance | |
CN113139596A (en) | Optimization algorithm of low-voltage transformer area line loss neural network | |
CN112580174A (en) | Power distribution network line loss rate calculation method based on genetic algorithm optimization neural network | |
CN115809719A (en) | Short-term load prediction correction method based on morphological clustering | |
CN115470862A (en) | Dynamic self-adaptive load prediction model combination method | |
CN114064794A (en) | Business expansion file mining and analyzing method based on big data technology | |
CN111832839A (en) | Energy consumption prediction method based on sufficient incremental learning | |
CN112990776B (en) | Distribution network equipment health degree evaluation method | |
CN110956304A (en) | Distributed photovoltaic power generation capacity short-term prediction method based on GA-RBM | |
CN112149052A (en) | Daily load curve clustering method based on PLR-DTW | |
CN116561569A (en) | Industrial power load identification method based on EO feature selection and AdaBoost algorithm | |
CN112766537B (en) | Short-term electric load prediction method | |
CN115619028A (en) | Clustering algorithm fusion-based power load accurate prediction method | |
CN115081551A (en) | RVM line loss model building method and system based on K-Means clustering and optimization | |
CN114676931A (en) | Electric quantity prediction system based on data relay technology | |
CN112667394A (en) | Computer resource utilization rate optimization method | |
CN112633565A (en) | Photovoltaic power aggregation interval prediction method | |
CN112508239A (en) | Energy storage output prediction method based on VAE-CGAN | |
CN114781685B (en) | Large user electricity load prediction method and system based on big data mining technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |