CN116340437A - Multi-clustering method for large-scale multi-source heterogeneous data - Google Patents

Multi-clustering method for large-scale multi-source heterogeneous data Download PDF

Info

Publication number
CN116340437A
CN116340437A CN202310297924.1A CN202310297924A CN116340437A CN 116340437 A CN116340437 A CN 116340437A CN 202310297924 A CN202310297924 A CN 202310297924A CN 116340437 A CN116340437 A CN 116340437A
Authority
CN
China
Prior art keywords
data
distribution network
power distribution
fused
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310297924.1A
Other languages
Chinese (zh)
Inventor
张宏俊
李鹏
樊卫北
王汝传
徐鹤
朱枫
程海涛
薛状状
孟凡硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310297924.1A priority Critical patent/CN116340437A/en
Publication of CN116340437A publication Critical patent/CN116340437A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a multi-clustering method for large-scale multi-source heterogeneous data, which relates to the technical field of data processing and comprises the following steps: preprocessing heterogeneous data of different sources through an ETL tool, and converting the heterogeneous data into a unified target data format; then, according to the voltage level and the equipment type, collecting and classifying the measurement types; constructing a topology analysis engine for the classified multi-source heterogeneous data set according to the correlation between the power distribution network and the network element; rejecting unsatisfied data sets based on topology analysis to obtain data sets to be fused; the method comprises the steps of carrying out observation coefficient analysis on a data set to be fused, distributing a corresponding number of processing terminals to fuse the data set to be fused, improving data fusion efficiency, and realizing cross composite depth analysis on historical data and quasi-real-time data of a power distribution network; and outputting a data fusion result for the research and analysis of power distribution network staff, providing guidance for energy fine management and user service, timely achieving fault early warning and improving power safety.

Description

Multi-clustering method for large-scale multi-source heterogeneous data
Technical Field
The invention relates to the technical field of data processing, in particular to a multi-clustering method for large-scale multi-source heterogeneous data.
Background
With the deep development of smart grid construction, the professional directions, construction time and architecture of each service system of the power distribution network are different, and a large amount of multi-source heterogeneous data such as measurement data, service form data, account information data and the like are generated in the operation process, so that the structure is various, the source is complex, the time scale is non-uniform and the space scale is different; it is counted that a medium-scale distribution network will produce hundreds of TB of data each year; the data are mutually independent in the respective service systems, so that effective fusion cannot be realized, and the data efficiency cannot be fully mined and exerted; and when the data of the power equipment is abnormal, the related information cannot be timely and accurately pushed to related staff, so that fault early warning is achieved.
Based on this, some researches are performed on the above problems in the prior art, for example, patent application CN109241169a discloses a method for integrating a multi-source heterogeneous data fusion database of operation information of a power distribution network, which accesses different service subsystems of the power distribution network according to requirements to obtain a target data set, selects a data set meeting certain conditions in the target data set based on a topology analysis engine, constructs a data fusion model based on a regularized residual search method, eliminates bad data in the target data set after topology analysis processing, and then performs fusion. Patent application CN114238464a discloses a heterogeneous fusion method of multi-element energy data, which fuses data after preprocessing heterogeneous data from different sources. However, the existing multi-source heterogeneous data cluster analysis system cannot intelligently allocate the number of terminals for cluster analysis corresponding to the production operation data of the power distribution network, so that the resource utilization is low, and the data analysis efficiency is low.
Disclosure of Invention
In order to solve the technical problems, the invention provides a multi-clustering method for large-scale multi-source heterogeneous data, which is used for analyzing observation coefficients of a data set to be fused and intelligently allocating the allocation quantity of processing terminals according to the observation coefficients so as to improve the data processing efficiency.
The invention discloses a multi-clustering method for large-scale multi-source heterogeneous data, which comprises the following steps:
step one: in a selected time period, continuously accessing different service subsystems of the power distribution network according to requirements to acquire a target data set on line so as to form a multi-source heterogeneous data set;
step two: preprocessing heterogeneous data of different sources through an ETL tool to convert multiple formats of original data into a unified target data format; preprocessing comprises data screening and data restoration;
step three: classifying the preprocessed multi-source heterogeneous data set according to voltage class, equipment type and acquisition measurement type; constructing a topology analysis engine for the classified multi-source heterogeneous data set according to the correlation between the power distribution network and the network element;
step four: based on analysis of a topology analysis engine, selecting a data set which satisfies KCL law and has consistent voltage, current and power in the multi-source heterogeneous data set at the same time section, and removing the unsatisfied data set to obtain a data set to be fused; wherein the data set to be fused carries a time section;
step five: performing observation coefficient GF analysis on the data sets to be fused, and performing fusion on the data sets to be fused according to the corresponding number of processing terminals distributed by the observation coefficient GF, wherein the fusion is based on an HFCM clustering algorithm;
step six: outputting a data fusion result for the research and analysis of power distribution network staff, and providing guidance for energy fine management and user service; wherein the data fusion result carries a time section.
Further, the observation coefficient GF analysis is carried out on the data set to be fused, and the specific analysis steps are as follows:
acquiring a time section corresponding to a data set to be fused, and calling a research attraction value YG corresponding to the time section;
counting the data size of the data set to be fused as D1; acquiring a power distribution network corresponding to a data set to be fused, and calling a scale value GM and a fault coefficient GZ of the corresponding power distribution network;
the observation coefficient GF of the data set to be fused is calculated by using the formula GF=YG×g1+D1×g2+GM×g3+GZ×g4; wherein g1, g2, g3 and g4 are coefficient factors.
Further, the method for fusing the data sets to be fused by allocating a corresponding number of processing terminals according to the observation coefficients GF specifically comprises the following steps:
a comparison relation table of the observation coefficient range and the distribution quantity threshold value is stored in the database; firstly, determining an observation coefficient range corresponding to an observation coefficient GF, and then determining an allocation quantity threshold corresponding to the observation coefficient range and marking the allocation quantity threshold as L1, namely allocating L1 processing terminals to fuse the data sets to be fused.
Further, the method further comprises the following steps: the data fusion result is subjected to access monitoring, and the research attraction value YG analysis is carried out according to the access record, wherein the specific analysis steps are as follows:
acquiring an access record of a data fusion result within a preset time, wherein the access record comprises an access starting time and an access ending time; acquiring a time section corresponding to a data fusion result;
counting the access times of the time section as C1 for the same time section; accumulating the access time length of each access to obtain the total access time length ZT; the study attraction value YG of the time section was calculated using the formula yg=c1×a1+zt×a2, where a1, a2 are coefficient factors.
Further, the method further comprises the following steps: the power distribution network is subjected to scale value GM analysis, and specifically comprises the following steps:
acquiring a power supply area of a power distribution network; counting the length of a power supply line in the power supply area as DL, the number of power supplies as HL and the average power consumption as VL; the scale value GM of the power distribution network is calculated by using the formula gm=dl×a3+hl×a4+vl×a5, wherein a3, a4, a5 are coefficient factors.
Further, the method further comprises the following steps: carrying out maintenance tracking on the power distribution network, and carrying out fault coefficient GZ evaluation on the power distribution network according to maintenance information; the method comprises the following steps:
acquiring all overhaul information of the power distribution network in a preset time period; the overhaul information comprises a fault network element, overhaul duration and overhaul grades;
counting the overhaul times of the power distribution network as G1; marking the number of fault network elements in each overhaul information as GL, the overhaul duration as GT and the overhaul grade as GD; the maintenance value JXi is calculated by using a formula JXi =gl×b1+gt×b2+gd×b3, wherein b1, b2, b3 are coefficient factors;
comparing the service value JXi to a service threshold; counting the times of JXi which is greater than the overhaul threshold value as G2, and when JXi is greater than the overhaul threshold value, obtaining JXi and the difference value of the overhaul threshold value and summing to obtain an over-detection value CJ; calculating to obtain a super-detection coefficient CP by using a formula CP=G2×b4+CJ×b5, wherein b4 and b5 are coefficient factors; using the formula
Figure BDA0004143820830000041
And calculating to obtain a fault coefficient GZ, wherein f1 and f2 are coefficient factors.
Further, the voltage class in the third step is as follows: 35kV,20kV and 10kV; the device types are divided into different service subsystems: the transformer, the switch cabinet, the circuit; the collection and measurement types are divided into: state quantity and analog quantity, real-time data and non-real-time data.
The beneficial effects of the invention are as follows: according to the invention, the data fusion result is subjected to access monitoring by combining the data set to be fused, and the research attraction value YG analysis is performed according to the access record; performing scale value GM analysis on the power distribution network to obtain a power supply area of the power distribution network; the power supply line length, the number of power supply lines and the average household power consumption in the power supply area are combined to calculate and obtain a scale value GM of the power distribution network; performing maintenance tracking on the power distribution network, performing fault coefficient GZ evaluation on the power distribution network according to maintenance information, and calculating to obtain an observation coefficient GF of a data set to be fused; then, according to the observation coefficients GF, corresponding numbers of processing terminals are allocated to fuse the data sets to be fused, namely, the observation coefficients in different intervals correspond to different numbers of processing terminals; more processing terminals can be allocated for the data sets to be fused with high observation coefficients GF for data fusion, so that the maximization of resource utilization is realized, and the data processing efficiency is improved.
Drawings
FIG. 1 is a schematic block diagram of a multi-clustering method for large-scale multi-source heterogeneous data.
Detailed Description
The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, a multi-clustering method for large-scale multi-source heterogeneous data includes:
step one: in a selected time period, continuously accessing different service subsystems of the power distribution network according to requirements to acquire a target data set on line so as to form a multi-source heterogeneous data set;
step two: preprocessing heterogeneous data of different sources through an ETL tool to convert multiple formats of original data into a unified target data format; data screening, data restoration and the like are carried out through an ETL tool so as to unify the formats of the multi-source heterogeneous data; the method aims at primarily arranging the data, so that the data can be conveniently and accurately mined;
step three: classifying the preprocessed multi-source heterogeneous data set; classifying the multi-source heterogeneous data set according to voltage class, equipment type and acquisition measurement type; the voltage class is divided into: 35kV,20kV and 10kV; the device types are divided into different service subsystems: the transformer, the switch cabinet, the circuit; the collection and measurement types are divided into: state quantity and analog quantity, real-time data and non-real-time data;
step four: constructing a topology analysis engine for the classified multi-source heterogeneous data set according to the correlation between the power distribution network and the network element; wherein the topology analysis engine is built up in dependence of the following principle:
1) Establishing a dependency relationship for a breaker and a switch in a power distribution network according to the connection relationship of the power distribution network;
2) Information that network elements are interdependent: relationship of switch position information and acquisition quantity measurement;
3) The dependence relationship of the network element on the data comprises temporary dependence and fixed dependence under different operation modes, dependence of switch position and acquisition amount measurement and dependence on historical data;
4) The network element is used for protecting the information dependency relationship of the system and the alarm event, SOE event and SOE event in the distribution network operation information;
based on analysis of a topology analysis engine, selecting a data set which satisfies KCL law and has consistent voltage, current and power in the multi-source heterogeneous data set at the same time section, and removing the unsatisfied data set to obtain a data set to be fused; the data set to be fused carries a corresponding time section;
in this embodiment, the operation of the power distribution network in different time periods is different within 24 hours a day, and is divided into a peak period, a valley period and a stationary period; therefore, the data efficiency and the data value of the production operation data generated by the power distribution network in different periods are different;
step five: the method comprises the steps of carrying out observation coefficient analysis on a data set to be fused, and carrying out fusion on the data set to be fused by a corresponding number of processing terminals according to the observation coefficient GF distribution, wherein the fusion is based on an HFCM clustering algorithm and is used for mining the value of multi-source heterogeneous data, so as to realize interconnection and exchange sharing of the multi-source heterogeneous data; the specific analysis steps are as follows:
acquiring a time section corresponding to a data set to be fused, and calling a research attraction value of the corresponding time section to be YG; counting the data size of the data set to be fused as D1;
acquiring a power distribution network corresponding to a data set to be fused; the scale value of the corresponding power distribution network is called as GM, and the fault coefficient of the corresponding power distribution network is called as GZ; the observation coefficient GF of the data set to be fused is calculated by using the formula GF=YG×g1+D1×g2+GM×g3+GZ×g4; wherein g1, g2, g3, g4 are coefficient factors;
the allocation number of the processing terminals is determined to be L1 according to the observation coefficient GF, specifically: a comparison relation table of the observation coefficient range and the distribution quantity threshold value is stored in the database; firstly, determining an observation coefficient range corresponding to an observation coefficient GF, and then determining an allocation quantity threshold corresponding to the observation coefficient range and marking the allocation quantity threshold as L1; namely, L1 processing terminals are allocated to fuse the data sets to be fused;
step six: outputting a data fusion result for the research and analysis of power distribution network staff, and providing guidance for the energy fine management and user service so as to realize the supply and demand interaction between the user and the power grid; the data fusion result carries a corresponding time section.
According to the invention, a large amount of multi-source heterogeneous data generated by the power distribution network is classified, topologically analyzed and fused after bad data are removed, so that the problems of extraction, integration and data quality improvement of multi-source heterogeneous operation information of the power distribution network are solved, the cross composite depth analysis of historical data and quasi-real-time data of the power distribution network is realized, and when data abnormality occurs in power equipment, related information can be timely and accurately pushed to related staff, so that fault early warning is realized, and power safety is improved;
in addition, because the operation data of the power distribution network is massive, a set of efficient data processing framework is needed in order to complete the data processing task; firstly, based on analysis of a topology analysis engine, selecting a data set which satisfies KCL law in a multi-source heterogeneous data set at the same time section, and simultaneously, ensuring that voltage is consistent, current and power are satisfied, and removing unsatisfied data sets to obtain a data set to be fused; then, according to the observation coefficients GF of the data sets to be fused, a corresponding number of processing terminals are allocated to fuse the data sets to be fused, so that the resource utilization is maximized, and the data processing efficiency is improved;
wherein the method further comprises: the data fusion result is accessed and monitored, research attraction analysis is carried out according to the access record, and the specific analysis steps are as follows:
acquiring an access record of a data fusion result within a preset time, wherein the access record comprises an access starting time and an access ending time; acquiring a time section corresponding to a data fusion result;
counting the access times of the time section as C1 for the same time section; accumulating the access time length of each access to obtain the total access time length ZT; the study attraction value YG of the time section is calculated by using a formula YG=C1×a1+ZT×a2, wherein a1 and a2 are coefficient factors;
wherein the method further comprises: the method comprises the following steps of:
acquiring a power supply area of a power distribution network; counting the length of a power supply line in the power supply area as DL, the number of power supplies as HL and the average power consumption as VL; calculating a scale value GM of the power distribution network by using a formula GM=DL×a3+HL×a4+VL×a5, wherein a3, a4 and a5 are coefficient factors;
wherein the method further comprises: carrying out maintenance tracking on the power distribution network, recording maintenance information and carrying out fault coefficient evaluation on the power distribution network according to the maintenance information when the power distribution network is monitored to be overhauled; the method comprises the following steps:
acquiring all overhaul information of the power distribution network in a preset time period; the overhaul information comprises a fault network element, overhaul duration and overhaul grades; the maintenance grade is evaluated according to the input maintenance resources after maintenance is completed by maintenance personnel; the more maintenance resources are put into, the higher the maintenance grade is;
counting the overhaul times of the power distribution network as G1; marking the number of fault network elements in each overhaul information as GL, the overhaul duration as GT and the overhaul grade as GD; the maintenance value JXi is calculated by using a formula JXi =gl×b1+gt×b2+gd×b3, wherein b1, b2, b3 are coefficient factors;
comparing the service value JXi to a service threshold; counting the times of JXi which is greater than the overhaul threshold value as G2, and when JXi is greater than the overhaul threshold value, obtaining JXi and the difference value of the overhaul threshold value and summing to obtain an over-detection value CJ; calculating to obtain a super-detection coefficient CP by using a formula CP=G2×b4+CJ×b5, wherein b4 and b5 are coefficient factors;using the formula
Figure BDA0004143820830000071
And calculating to obtain a fault coefficient GZ, wherein f1 and f2 are coefficient factors.
The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas which are obtained by acquiring a large amount of data and performing software simulation to obtain the closest actual situation, and preset parameters and preset thresholds in the formulas are set by a person skilled in the art according to the actual situation or are obtained by simulating a large amount of data.
The working principle of the invention is as follows:
according to the multi-clustering method for the large-scale multi-source heterogeneous data, in the working process, different service subsystems which are connected into a power distribution network according to requirements are continuously connected to acquire a target data set on line in a selected time period so as to form a multi-source heterogeneous data set; preprocessing heterogeneous data of different sources through an ETL tool to convert multiple formats of original data into a unified target data format; classifying the preprocessed multi-source heterogeneous data set according to voltage levels, equipment types and acquisition measurement types; constructing a topology analysis engine for the classified multi-source heterogeneous data set according to the correlation between the power distribution network and the network element; based on analysis of a topology analysis engine, rejecting unsatisfied data sets to obtain data sets to be fused; based on an HFCM clustering algorithm, fusion is carried out on the data sets to be fused, a data fusion result is output, cross composite depth analysis of historical data and quasi-real-time data of the power distribution network is realized, guidance is provided for energy fine management and user service, and supply and demand interaction between a user and a power grid is realized;
wherein, still include: carrying out observation coefficient analysis on the data set to be fused; acquiring a time section corresponding to a data set to be fused, and calling a research attraction value of the corresponding time section to be YG; counting the data size of the data set to be fused as D1; acquiring a power distribution network corresponding to a data set to be fused; the scale value of the corresponding power distribution network is called as GM, and the fault coefficient of the corresponding power distribution network is called as GZ; the observation coefficient GF of the data set to be fused is calculated by using the formula GF=YG×g1+D1×g2+GM×g3+GZ×g4; and determining the distribution number of the processing terminals to be L1 according to the observation coefficient GF, namely distributing L1 processing terminals to fuse the data sets to be fused, and realizing the maximization of resource utilization, thereby improving the data processing efficiency.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (7)

1. A multi-clustering method for large-scale multi-source heterogeneous data is characterized by comprising the following steps:
step one: in a selected time period, continuously accessing different service subsystems of the power distribution network according to requirements to acquire a target data set on line so as to form a multi-source heterogeneous data set;
step two: preprocessing heterogeneous data of different sources through an ETL tool to convert multiple formats of original data into a unified target data format; preprocessing comprises data screening and data restoration;
step three: classifying the preprocessed multi-source heterogeneous data set according to voltage class, equipment type and acquisition measurement type; constructing a topology analysis engine for the classified multi-source heterogeneous data set according to the correlation between the power distribution network and the network element;
step four: based on analysis of a topology analysis engine, selecting a data set which satisfies KCL law and has consistent voltage, current and power in the multi-source heterogeneous data set at the same time section, and removing the unsatisfied data set to obtain a data set to be fused; wherein the data set to be fused carries a time section;
step five: performing observation coefficient GF analysis on the data sets to be fused, and performing fusion on the data sets to be fused according to the corresponding number of processing terminals distributed by the observation coefficient GF, wherein the fusion is based on an HFCM clustering algorithm;
step six: outputting a data fusion result for the research and analysis of power distribution network staff, and providing guidance for energy fine management and user service; wherein the data fusion result carries a time section.
2. The multi-clustering method for large-scale multi-source heterogeneous data according to claim 1, wherein the observation coefficient GF analysis is performed on the data set to be fused, and the specific analysis steps are as follows:
acquiring a time section corresponding to a data set to be fused, and calling a research attraction value YG corresponding to the time section;
counting the data size of the data set to be fused as D1; acquiring a power distribution network corresponding to a data set to be fused, and calling a scale value GM and a fault coefficient GZ of the corresponding power distribution network;
the observation coefficient GF of the data set to be fused is calculated by using the formula GF=YG×g1+D1×g2+GM×g3+GZ×g4; wherein g1, g2, g3 and g4 are coefficient factors.
3. The multi-clustering method for large-scale multi-source heterogeneous data according to claim 2, wherein the method is characterized in that the processing terminals with corresponding numbers are allocated to the data sets to be fused according to the observation coefficients GF for fusion, and specifically comprises the following steps:
a comparison relation table of the observation coefficient range and the distribution quantity threshold value is stored in the database; firstly, determining an observation coefficient range corresponding to an observation coefficient GF, and then determining an allocation quantity threshold corresponding to the observation coefficient range and marking the allocation quantity threshold as L1, namely allocating L1 processing terminals to fuse the data sets to be fused.
4. The multi-clustering method for large-scale multi-source heterogeneous data according to claim 2, wherein the access monitoring is performed on the data fusion result, and the research attraction value YG analysis is performed according to the access record, and the specific analysis steps are as follows:
acquiring an access record of a data fusion result within a preset time, wherein the access record comprises an access starting time and an access ending time; acquiring a time section corresponding to a data fusion result;
counting the access times of the time section as C1 for the same time section; accumulating the access time length of each access to obtain the total access time length ZT; the study attraction value YG of the time section was calculated using the formula yg=c1×a1+zt×a2, where a1, a2 are coefficient factors.
5. The multi-clustering method for large-scale multi-source heterogeneous data according to claim 2, wherein the method is characterized in that the power distribution network is subjected to scale value GM analysis, specifically:
acquiring a power supply area of a power distribution network; counting the length of a power supply line in the power supply area as DL, the number of power supplies as HL and the average power consumption as VL; the scale value GM of the power distribution network is calculated by using the formula gm=dl×a3+hl×a4+vl×a5, wherein a3, a4, a5 are coefficient factors.
6. The multi-clustering method for large-scale multi-source heterogeneous data according to claim 2, wherein the power distribution network is overhauled and tracked, and fault coefficients GZ of the power distribution network are evaluated according to overhauling information; the method comprises the following steps:
acquiring all overhaul information of the power distribution network in a preset time period; the overhaul information comprises a fault network element, overhaul duration and overhaul grades;
counting the overhaul times of the power distribution network as G1; marking the number of fault network elements in each overhaul information as GL, the overhaul duration as GT and the overhaul grade as GD; the maintenance value JXi is calculated by using a formula JXi =gl×b1+gt×b2+gd×b3, wherein b1, b2, b3 are coefficient factors;
comparing the service value JXi to a service threshold; counting the times of JXi which is greater than the overhaul threshold value as G2, and when JXi is greater than the overhaul threshold value, obtaining JXi and the difference value of the overhaul threshold value and summing to obtain an over-detection value CJ; calculating to obtain a super-detection coefficient CP by using a formula CP=G2×b4+CJ×b5, wherein b4 and b5 are coefficient factors; using the formula
Figure FDA0004143820820000031
And calculating to obtain a fault coefficient GZ, wherein f1 and f2 are coefficient factors.
7. The multi-clustering method for large-scale multi-source heterogeneous data according to claim 1, wherein the voltage class in the third step is as follows: 35kV,20kV and 10kV; the device types are divided into different service subsystems: the transformer, the switch cabinet, the circuit; the collection and measurement types are divided into: state quantity and analog quantity, real-time data and non-real-time data.
CN202310297924.1A 2023-03-24 2023-03-24 Multi-clustering method for large-scale multi-source heterogeneous data Pending CN116340437A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310297924.1A CN116340437A (en) 2023-03-24 2023-03-24 Multi-clustering method for large-scale multi-source heterogeneous data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310297924.1A CN116340437A (en) 2023-03-24 2023-03-24 Multi-clustering method for large-scale multi-source heterogeneous data

Publications (1)

Publication Number Publication Date
CN116340437A true CN116340437A (en) 2023-06-27

Family

ID=86891007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310297924.1A Pending CN116340437A (en) 2023-03-24 2023-03-24 Multi-clustering method for large-scale multi-source heterogeneous data

Country Status (1)

Country Link
CN (1) CN116340437A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117009921A (en) * 2023-08-04 2023-11-07 振宁(无锡)智能科技有限公司 Optimized data processing method and system of data fusion engine
CN117390008A (en) * 2023-12-11 2024-01-12 北京星球空天信息技术有限公司 Method and device for processing measurement data of multi-type observation instrument

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117009921A (en) * 2023-08-04 2023-11-07 振宁(无锡)智能科技有限公司 Optimized data processing method and system of data fusion engine
CN117009921B (en) * 2023-08-04 2024-02-23 振宁(无锡)智能科技有限公司 Optimized data processing method and system of data fusion engine
CN117390008A (en) * 2023-12-11 2024-01-12 北京星球空天信息技术有限公司 Method and device for processing measurement data of multi-type observation instrument
CN117390008B (en) * 2023-12-11 2024-04-12 北京星球空天信息技术有限公司 Method and device for processing measurement data of multi-type observation instrument

Similar Documents

Publication Publication Date Title
CN107402976B (en) Power grid multi-source data fusion method and system based on multi-element heterogeneous model
CN106022592B (en) Electricity consumption behavior abnormity detection and public security risk early warning method and device
CN116340437A (en) Multi-clustering method for large-scale multi-source heterogeneous data
CN108959424B (en) Operation method of urban electricity utilization map for monitoring load of power system
CN111291076B (en) Abnormal water use monitoring alarm system based on big data and construction method thereof
CN111429027A (en) Regional power transmission network operation multidimensional analysis method based on big data
CN113935562A (en) Intelligent rating and automatic early warning method for health condition of power equipment
CN109286188B (en) 10kV power distribution network theoretical line loss calculation method based on multi-source data set
CN107798395A (en) A kind of power grid accident signal automatic diagnosis method and system
CN103902816A (en) Electrification detection data processing method based on data mining technology
CN111581196A (en) Supply and distribution power grid intelligent data acquisition and arrangement system based on intelligent factory framework
Bhuiyan et al. Big data analysis of the electric power PMU data from smart grid
CN112688431A (en) Power distribution network load overload visualization method and system based on big data
CN112288303A (en) Method and device for determining line loss rate
CN115730749B (en) Power dispatching risk early warning method and device based on fusion power data
CN107844962B (en) Distribution network engineering cost data collection system based on standard data structure
CN113887823A (en) Self-adaptive extraction method for fault blackout line based on knowledge reasoning
CN117610214A (en) Intelligent power distribution network wiring planning method based on dynamic geographic features
Ferreira et al. A data-mining-based methodology for transmission expansion planning
CN107862459B (en) Metering equipment state evaluation method and system based on big data
CN117614141A (en) Multi-voltage-level coordination management method for power distribution network
CN113379314B (en) Intelligent annual production plan supervision method based on reasoning algorithm
Broderick et al. Clustering method and representative feeder selection for the California Solar Initiative
CN114662563A (en) Industrial electricity non-invasive load decomposition method based on gradient lifting algorithm
CN111313355A (en) Method for updating monitoring signal event rule under manual supervision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination