CN116340883A - Power distribution network data resource fusion method, device, equipment and storage medium - Google Patents

Power distribution network data resource fusion method, device, equipment and storage medium Download PDF

Info

Publication number
CN116340883A
CN116340883A CN202310323146.9A CN202310323146A CN116340883A CN 116340883 A CN116340883 A CN 116340883A CN 202310323146 A CN202310323146 A CN 202310323146A CN 116340883 A CN116340883 A CN 116340883A
Authority
CN
China
Prior art keywords
data
abnormal
sample
sample data
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310323146.9A
Other languages
Chinese (zh)
Inventor
乔俊峰
周爱华
黄晨宏
彭林
潘森
顾华
徐敏
陈敬德
裘洪斌
蒋静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Smart Grid Research Institute Co ltd
State Grid Corp of China SGCC
State Grid Shanghai Electric Power Co Ltd
Original Assignee
State Grid Smart Grid Research Institute Co ltd
State Grid Corp of China SGCC
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Smart Grid Research Institute Co ltd, State Grid Corp of China SGCC, State Grid Shanghai Electric Power Co Ltd filed Critical State Grid Smart Grid Research Institute Co ltd
Priority to CN202310323146.9A priority Critical patent/CN116340883A/en
Publication of CN116340883A publication Critical patent/CN116340883A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for fusing power distribution network data resources, wherein the method comprises the following steps: obtaining an initial data fusion result set; acquiring abnormal data and abnormal attributes corresponding to the abnormal data based on an initial data fusion result set; constructing a sample data set based on all the attributes contained in the initial data fusion result set and the normal data corresponding to each attribute; calculating a correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set; calculating a first fitting value of abnormal data according to the initial data fusion result set and the association coefficient; the abnormal data in the initial data fusion result set is replaced by the first fitting value of the abnormal data, the first fitting value is obtained by predicting the abnormal data, and the abnormal data is replaced by the predicted first fitting value, so that the updated data of the data fusion result set is more accurate.

Description

Power distribution network data resource fusion method, device, equipment and storage medium
Technical Field
The present invention relates to the field of resource fusion technologies, and in particular, to a method, an apparatus, a device, and a storage medium for power distribution network data resource fusion.
Background
At present, more and more abundant data resources are integrated in each service system and platform of the power distribution network, the evolution trend is aggravated by the acceleration formation of the novel power distribution network, and the service development and management elevation of the power distribution network is determined by the sharing and fusion degree of the data resources. Along with the proposal of the international flow network strategy, more urgent demands and higher demands are put forward for data fusion and sharing.
The data fusion of the power distribution network has certain specificity, and each power company has obvious 'information island' characteristics due to the influences of the construction and implementation stages, the technical performance, other economic factors, human factors and the like of a service system in the informatization construction process. Firstly, data heterogeneity, a certain difference exists in database systems adopted by all service systems, information coding and technical specifications are not uniform, data formats are not uniform, and data resources are difficult to convert and fuse; and secondly, a unified standard system is lacking, the power industry does not adopt the unified standard system to carry out informatization at present, the integration level between the information systems is low, the interconnectivity is poor, and the service cannot be cooperatively developed. Therefore, data anomalies easily occur in the process of data resource fusion of the power distribution network, and the difficult problem of inaccurate fusion results is caused.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a method, a device, equipment and a storage medium for fusing power distribution network data resources, so as to solve the technical problem that the fusion result is inaccurate when the power distribution network data are fused with the resources.
The technical scheme provided by the invention is as follows:
the first aspect of the embodiment of the invention provides a method for fusing power distribution network data resources, which comprises the following steps: carrying out resource fusion on the data of at least two data sets to be fused to obtain an initial data fusion result set; acquiring abnormal data and abnormal attributes corresponding to the abnormal data based on an initial data fusion result set; constructing a sample data set based on all the attributes contained in the initial data fusion result set and the normal data corresponding to each attribute; calculating a correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute; calculating a first fitting value of the abnormal data according to the initial data fusion result set and the association coefficient; and replacing the abnormal data in the initial data fusion result set with the first fitting value of the abnormal data to obtain an updated data fusion result set.
Optionally, the calculating, according to the sample data set and the abnormal attribute, a correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set includes: and inputting a sample data set and abnormal attributes into a multiple linear regression model, acquiring linear correlation coefficients between data corresponding to the abnormal attributes in the sample data set and the sample data set through the multiple linear regression model, and taking the linear correlation coefficients as correlation coefficients.
Optionally, the obtaining, by using a multiple linear regression model, a linear correlation coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set includes: calculating a second fitting value of data corresponding to the abnormal attribute in the sample data set according to a preset linear coefficient and the sample data set; extracting a true value of data corresponding to the abnormal attribute from the sample data set; and taking the preset linear coefficient corresponding to the situation that the sum of the variances between the second fitting value and the true value of the data corresponding to the abnormal attribute in the sample data set is minimum as the linear correlation coefficient.
Optionally, the taking the preset linear coefficient corresponding to the minimum sum of the second fitting value of the data corresponding to the abnormal attribute and the variance of the true value in the sample data set as the linear correlation coefficient includes: constructing an expression of the sum of the second fitting value of the data corresponding to the abnormal attribute in the sample data set and the variance of the true value; and deriving the expression, taking the preset linear coefficient corresponding to zero derivative as the preset linear coefficient corresponding to the minimum sum of variances, and taking the preset linear coefficient corresponding to the minimum sum of variances as a linear correlation coefficient.
Optionally, acquiring the anomaly data based on the initial set of data fusion results includes: and sequentially judging whether each data in the initial data fusion result set belongs to a set abnormal threshold range, and if so, considering the corresponding data as abnormal data.
Optionally, before calculating the correlation coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute, the method further includes: judging whether the number of data groups in the sample data set is smaller than the number of attributes in the initial data fusion result set; if the number of the data groups in the sample data set is smaller than the number of the attributes in the initial data fusion result set, reconstructing the sample data set until the number of the data groups in the sample data set is larger than or equal to the number of the attributes in the initial data fusion result set.
Optionally, calculating a first fitting value of the abnormal data according to the initial data fusion result set and the association coefficient includes: and multiplying the initial data fusion result set by the association coefficient to obtain a first fitting value of the abnormal data.
A second aspect of an embodiment of the present invention provides a power distribution network data resource fusion device, including: the initial fusion module is used for carrying out resource fusion on the data of at least two data sets to be fused to obtain an initial data fusion result set; the abnormal data acquisition module is used for acquiring abnormal data and abnormal attributes corresponding to the abnormal data based on the initial data fusion result set; the sample construction module is used for constructing a sample data set according to the complete historical data fusion result set; the correlation coefficient calculation module is used for calculating the correlation coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute; the fitting value calculation module is used for calculating a first fitting value of the abnormal data according to the initial data fusion result set and the association coefficient; and the updating module is used for replacing the abnormal data in the initial data fusion result set with the first fitting value of the abnormal data to obtain an updated data fusion result set.
Optionally, the association coefficient calculating module includes: the input module is used for inputting the sample data set and the abnormal attribute into the multiple linear regression model; and the linear regression module is used for acquiring a linear correlation coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set through a multiple linear regression model, and taking the linear correlation coefficient as a correlation coefficient.
Optionally, the linear regression module includes: the first acquisition module is used for calculating a second fitting value of data corresponding to the abnormal attribute in the sample data set according to a preset linear coefficient and the sample data set; the extraction module is used for extracting the true value of the data corresponding to the abnormal attribute from the sample data set; and the second acquisition module is used for taking the preset linear coefficient corresponding to the situation that the sum of the variance between the second fitting value and the true value of the data corresponding to the abnormal attribute in the sample data set is minimum as the linear correlation coefficient.
Optionally, the second obtaining module includes: a construction module, configured to construct an expression of a sum of a second fitting value of data corresponding to the abnormal attribute in the sample data set and a variance of the true value; and the derivation module is used for deriving the expression, taking the preset linear coefficient corresponding to zero derivative as the preset linear coefficient corresponding to the minimum sum of variances, and taking the preset linear coefficient corresponding to the minimum sum of variances as the linear correlation coefficient.
Optionally, the abnormal data acquisition module includes: the first judging module is used for judging whether each data in the initial data fusion result set belongs to a set abnormal threshold range in sequence, and if the data belong to the set abnormal threshold range, the corresponding data are considered to be abnormal data.
Optionally, the power distribution network data resource fusion device further includes: and the second judging module is used for judging whether the number of the data groups in the sample data set is smaller than the number of the attributes in the initial data fusion result set, and reconstructing the sample data set until the number of the data groups in the sample data set is larger than or equal to the number of the attributes in the initial data fusion result set if the number of the data groups in the sample data set is smaller than the number of the attributes in the initial data fusion result set.
Optionally, the fitting value calculating module includes: and the multiplication module is used for multiplying the initial data fusion result set and the association coefficient to obtain a first fitting value of the abnormal data.
A third aspect of an embodiment of the present invention provides an electronic device, including: the system comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions so as to execute the power distribution network data resource fusion method according to any one of the first aspect of the embodiment of the invention.
A fourth aspect of the embodiment of the present invention provides a computer readable storage medium, where computer instructions are stored, where the computer instructions are configured to cause a computer to perform the method for fusion of data resources of a power distribution network according to any one of the first aspect of the embodiment of the present invention.
From the above technical solutions, the embodiment of the present invention has the following advantages:
according to the method, the device, the equipment and the storage medium for fusing the data resources of the power distribution network, which are provided by the embodiment of the invention, the initial data fusion result set is obtained by fusing the data of at least two data sets to be fused; acquiring abnormal data and abnormal attributes corresponding to the abnormal data based on an initial data fusion result set; constructing a sample data set based on all the attributes contained in the initial data fusion result set and the normal data corresponding to each attribute; calculating a correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute; calculating a first fitting value of the abnormal data according to the initial data fusion result set and the association coefficient; the method comprises the steps of replacing abnormal data in an initial data fusion result set with a first fitting value of the abnormal data to obtain an updated data fusion result set, predicting the abnormal data to obtain the first fitting value, and comparing the first fitting value predicted by using the association coefficient among the data with the abnormal data, wherein the first fitting value predicted by using the association coefficient among the data is closer to correct data, so that the data of the updated data fusion result set can be more accurate by replacing the predicted first fitting value with the abnormal data.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for fusing data resources of a power distribution network in an embodiment of the present invention;
FIG. 2 is a schematic diagram of a process for implementing data resource fusion in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data resource fusion device of a power distribution network in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides a method for fusing data resources of a power distribution network, as shown in fig. 1, comprising the following steps:
step S100: and carrying out resource fusion on the data of at least two data sets to be fused to obtain an initial data fusion result set. Specifically, the data resources of different service systems are fused as different data sets to be fused, and the number of the data sets to be fused can be 2, 3, 4, etc. The number of data sets to be fused is 2, and the fusion of the measurement data of the new energy equipment and the equipment asset data in the novel power distribution network is taken as an example, wherein the measurement data comprises the electricity utilization data of the equipment, mainly the voltage, the current and other data of the electricity utilization of the user collected from the equipment, the equipment asset data comprises the basic file information of the equipment, and each measurement data comprises the data of the equipmentThe data and equipment asset data sets are independent of each other and have the characteristic of strong linear correlation, assuming the measurement data set has j attributes and the equipment asset data set has k attributes, as shown in Table 1, define the measurement data set as [ Mea ] 1 ,Mea 2 ,...,Mea j ]The equipment asset data set is [ Ea ] 1 ,Ea 2 ,...,Ea k ]And associate the two data sets by device number.
Table 1 measurement data and equipment asset data
Figure BDA0004152497230000061
Figure BDA0004152497230000071
After fusing the measurement data set and the equipment asset data set, an initial data fusion result set is obtained:
[Mea 1 ,Mea 2 ,...,Mea i-1 ,...,Mea i+1 ,Mea j ,Ea 1 ,Ea 2 ,...,Ea k ]
for ease of presentation, the initial set of data fusion results is redefined as P as shown in Table 2 Initial initiation =[P 1 ,P 2 ,...,P m ]。
Table 2 redefines columns of data
P 1 P 2 ... Q ... P j-1 P j P j+1 ... P m
3.15 35 ... ... 20 102 1 ... 0.979
1.01 45 ... ... 10 104 2 ... 0.993
2.01 32 ... ... 13 112 3 ... 0.985
0.99 21 ... ... 15 103 1 ... 0.994
0.71 32 ... ... 27 113 2 ... 0.998
1.32 27 ... ... 21 99 3 ... 0.983
1.09 30 ... ... 18 112 2 ... 0.993
1.12 21 ... ... 12 101 3 ... 0.991
1.21 32 ... ... 11 99 2 ... 0.990
1.89 40 ... ... 23 100 2 ... 0.981
... ... ... ... ... ... ... ... ... ...
Step S200: and acquiring the abnormal data and the abnormal attribute corresponding to the abnormal data based on the initial data fusion result set. Specifically, the abnormal data includes the conditions of data missing or overlarge deviation of the numerical value from the normal value, so that each data in the fused initial data fusion result can be verified and judged according to the conditions, whether the data is abnormal or not, and the abnormal data and the corresponding abnormal attribute can be found. For example Mea in Table 1 i The corresponding column data is shown as 99, while the normal value is floating between 20-50, mea i The corresponding column is defined as an anomaly set, as shown in Table 2, and P is recorded after redefining i The abnormal data in (a) is an abnormal set Q, i represents the serial number of the abnormal set Q in a data fusion result set, the abnormal attribute is the column names in the measurement data set and the equipment asset data set, and the abnormal attribute is current, voltage and the like.
Step S300: based on initial data fusion junctionAnd constructing a sample data set by all the attributes contained in the result set and the normal data corresponding to each attribute. Specifically, a sample dataset is defined as p= [ P ] 1 ,P 2 ,...,P m ]The sample data set comprises all the attributes in the initial data fusion result set, and the data corresponding to each attribute is normal data, namely, the sample data set is a data fusion result set without abnormal data after fusion, so that the data fusion result set without abnormal data can be obtained from the historical fusion data, and the sample data set is formed by the data fusion result set without abnormal data.
Step S400: and calculating the association coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute. Specifically, since the metrology data has a strong linear correlation, the data in the anomaly set Q has a correlation coefficient with the data in the sample data set under normal conditions of the data. Firstly, obtaining an abnormal attribute P corresponding to an abnormal set Q i Then obtain the attribute in the sample data set as P i Data corresponding to abnormal attributes in the sample data set, namely, data corresponding to abnormal attributes in the sample data set is recorded as a sample set Q Sample of Anomaly set Q and sample set Q Sample of Are all of attribute P i The corresponding data set is distinguished in that the data in the abnormal set Q is the data in the initial data fusion result set, which is recognized as the abnormal data in step S200, and the sample set Q Sample of The data in (a) is the attribute P in the sample data set i The corresponding data is normal data. By applying to sample set Q Sample of And analyzing the association relation with the data in the sample data set P to obtain association coefficients.
Step S500: and calculating a first fitting value of the abnormal data according to the initial data fusion result set and the association coefficient. Specifically, the association coefficient of the normal data corresponding to the initial data fusion result set and the abnormal set Q is the same as the association coefficient calculated in step S400, so that the predicted value of the normal data corresponding to the abnormal set Q, that is, the first fitting value, can be calculated according to the initial data fusion result set and the association coefficient.
Step S600: and replacing the abnormal data in the initial data fusion result set with a first fitting value of the abnormal data to obtain an updated data fusion result set. Specifically, after the updated data fusion result set is obtained, the updated data fusion result set is stored in a data fusion library. The first fitting value is a predicted value obtained by predicting normal data corresponding to the abnormal set Q according to the association coefficient and the initial data fusion result set, and the first fitting value of the abnormal data is predicted again through the association relation among the data, so that compared with the abnormal data in the fusion process, the real value of the data can be reflected more.
According to the power distribution network data resource fusion method provided by the embodiment of the invention, the data of at least two data sets to be fused are subjected to resource fusion to obtain an initial data fusion result set; acquiring abnormal data and abnormal attributes corresponding to the abnormal data based on an initial data fusion result set; constructing a sample data set based on all the attributes contained in the initial data fusion result set and the normal data corresponding to each attribute; calculating a correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute; calculating a first fitting value of abnormal data according to the initial data fusion result set and the association coefficient; the method comprises the steps of replacing abnormal data in an initial data fusion result set with a first fitting value of the abnormal data to obtain an updated data fusion result set, predicting the abnormal data to obtain the first fitting value, and comparing the first fitting value predicted by using the association coefficient among the data with the abnormal data, wherein the first fitting value predicted by using the association coefficient among the data is closer to correct data, so that the data of the updated data fusion result set can be more accurate by replacing the predicted first fitting value with the abnormal data.
In one embodiment, step S400, calculating, according to the sample data set and the abnormal attribute, a correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set, includes:
step S410: inputting the sample dataset and the anomaly property into a multiple linear regression model;
step S420: and acquiring a linear correlation coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set through a multiple linear regression model, and taking the linear correlation coefficient as a correlation coefficient.
For the characteristic that part of data in the distribution network data resources such as measurement data have strong linear correlation, abnormal data are analyzed, specifically, as shown in fig. 2, data corresponding to abnormal attributes in a sample data set and the sample data set are analyzed through a multiple linear regression model, data correlation characteristics are found from the heterogeneous data resources, and the correlation coefficients of the data, namely linear correlation coefficients, are found, so that the data fusion requirement is met, technical support is provided for the fusion and the sharing of the distribution network data resources, and the data value improvement and service deepening application of the distribution network are realized.
In one embodiment, step S420, obtaining, by using a multiple linear regression model, a linear correlation coefficient between data corresponding to an abnormal attribute in a sample data set and the sample data set includes:
step S421: calculating a second fitting value of data corresponding to the abnormal attribute in the sample data set according to the preset linear coefficient and the sample data set;
step S422: extracting a true value of data corresponding to the abnormal attribute from the sample data set;
step S423: and taking a preset linear coefficient corresponding to the condition that the sum of the variances between the second fitting value and the true value of the data corresponding to the abnormal attribute in the sample data set is minimum as a linear correlation coefficient.
Specifically, there is a linear relationship between the abnormal-attribute correspondence data in the sample data set and the sample data set, whereby the sample set Q constituted by the abnormal-attribute correspondence data in the sample data set Sample of The linear relationship between the sample dataset P is expressed as: q=wp, q=w 1 P 1 +W 2 P 2 +W 3 P 3 +...+W m P m +b, b is W 0 ,W 0 =1, w is a preset linear coefficient, and assuming that the sample data set P has n sets of data, the correspondence is:
Figure BDA0004152497230000101
mapping is a matrix describing the relationship between Q and W and P:
Figure BDA0004152497230000102
substituting the data in the sample data set P into the formula (2) to calculate a second fitting value
Figure BDA0004152497230000103
Then calculate a second fitting value
Figure BDA0004152497230000104
Sample set Q corresponding to abnormal data Sample of Sum of variances between true values +.>
Figure BDA0004152497230000105
When->
Figure BDA0004152497230000106
And when the vector is minimum, the corresponding W vector is the optimal preset linear coefficient, and the optimal preset linear coefficient is used as the final linear correlation coefficient. Specifically, when calculating the second fitting value, first initializing W, namely giving an initial value of W; substituting W and the sample data set P into formula (2) to calculate a second fitting value
Figure BDA0004152497230000107
Calculating a second fitting value +.>
Figure BDA0004152497230000108
And the sum of variances between the true values is stored; updating W, namely reassigning the W, repeating the steps based on the updated W to obtain a new variance sum, setting the repetition times according to the requirement, comparing the magnitudes of all variance sums and finding the minimum value, obtaining the W corresponding to the minimum variance sum, and taking the W as a final linear correlation coefficient.
In an embodiment, step S423, obtaining a preset linear coefficient corresponding to a case where a sum of a second fitting value of data corresponding to the abnormal attribute in the sample data set and a variance of the true value is minimum as the linear correlation coefficient includes:
step S4231: constructing an expression of the sum of the variances of the second fitting value and the true value of the data corresponding to the abnormal attribute in the sample data set;
step S4232: and deriving the expression, taking the preset linear coefficient corresponding to zero derivative as the preset linear coefficient corresponding to the minimum sum of variances, and taking the preset linear coefficient corresponding to the minimum sum of variances as the linear correlation coefficient.
Specifically, an expression of the sum of variances is constructed:
Figure BDA0004152497230000111
beta= (Q-WP) 2 Vector W corresponding to the minimum value of beta is obtained, and first, W is derived and +.>
Figure BDA0004152497230000112
When the derivative is 0, β is the minimum and the corresponding W is the required linear correlation coefficient. Illustratively, let derivative->
Figure BDA0004152497230000113
Solving to obtain 2P T PW=2P T Q, then w= (P T P) -1 P T Q。
The cost of obtaining the linear correlation coefficient by calculation is reduced by taking the linear correlation coefficient in the linear relation corresponding to the derivative being zero as the linear correlation coefficient in the linear relation corresponding to the sum of variances being minimum.
In one embodiment, step S200, obtaining abnormal data based on the initial data fusion result set includes:
and sequentially judging whether each data in the initial data fusion result set belongs to a set abnormal threshold range, and if so, considering the corresponding data as abnormal data.
Specifically, based on the situation that part of data in the initial data fusion result set is abnormal, the situation that the data is abnormal includes the situation that the data is missing or the deviation between the numerical value and the normal value is overlarge, and the like, therefore, whether the data is abnormal or not can be judged by verifying each data in the fused initial data fusion result, and abnormal data and corresponding abnormal attributes can be found. The missing data is replaced in the initial data fusion result set with a specified value or character, for example, a letter null or a value 999 indicating the data missing, the outlier threshold range including the specified value or character. Illustratively, an anomaly threshold range is greater than 100 or equal to null, indicating a data anomaly.
In one embodiment, step S500, calculating a first fitting value of the abnormal data according to the initial data fusion result set and the association coefficient, includes:
and multiplying the initial data fusion result set by the association coefficient to obtain a first fitting value of the abnormal data.
Specifically, the initial data fusion result set and the association coefficient are substituted into the linear relation q=wp, namely the initial data fusion result set and the association coefficient are multiplied to obtain a corresponding first fitting value, and when the initial data fusion result set and the association coefficient are multiplied, the value of the abnormal data in the initial data fusion result set is replaced by a set average value, so that the influence of excessive abnormal data deviation on the first fitting value is avoided.
After the first fitting value is obtained through calculation, the value of the first fitting value is required to be validated by manpower, and after the validation is passed, the next step is carried out, so that the result of the calculated first fitting value is avoided being abnormal.
In an embodiment, before calculating the association coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute, the power distribution network data resource fusion method further includes:
judging whether the number of data groups in the sample data set is smaller than the number of attributes in the initial data fusion result set;
if the number of the data groups in the sample data set is smaller than the number of the attributes in the initial data fusion result set, reconstructing the sample data set until the number of the data groups in the sample data set is larger than or equal to the number of the attributes in the initial data fusion result set.
Specifically, the number of data sets is the number of data, and one set of data is one row of data in table 1 or table 2. Illustratively, if the number of data sets n of the sample data set is smaller than the number of data attributes m, reconstructing the sample data set, and if the number of data pieces n of the sample data set is greater than or equal to the number of data attributes m, calculating a correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set from the sample data set and the abnormal attribute. When the number n of data sets of the sample data set is smaller than the number m of data attributes, the formula (2) cannot be established because the number n of data sets is too small, so that the number n of data sets of the sample data set is required to be larger than or equal to the number m of data attribute classes, and the correlation coefficient can be obtained through calculation of the formula (2).
The embodiment of the invention also provides a device for fusing the data resources of the power distribution network, as shown in fig. 3, the device for fusing the data resources of the power distribution network comprises:
the initial fusion module 301 is configured to perform resource fusion on data of at least two data sets to be fused to obtain an initial data fusion result set; the specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
The abnormal data acquisition module 302 is configured to acquire abnormal data and abnormal attributes corresponding to the abnormal data based on the initial data fusion result set; the specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
A sample construction module 303, configured to construct a sample data set according to the complete historical data fusion result set; the specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
A correlation coefficient calculation module 304, configured to calculate a correlation coefficient between data corresponding to an abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute; the specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
A fitting value calculating module 305, configured to calculate a first fitting value of the abnormal data according to the initial data fusion result set and the association coefficient; the specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
And the updating module 306 is configured to replace the abnormal data in the initial data fusion result set with the first fitting value of the abnormal data, so as to obtain an updated data fusion result set. The specific content refers to the corresponding parts of the above method embodiments, and will not be described herein.
The utility model provides a distribution network data resource fusion device, which is used for carrying out resource fusion on data of at least two data sets to be fused to obtain an initial data fusion result set; acquiring abnormal data and abnormal attributes corresponding to the abnormal data based on an initial data fusion result set; constructing a sample data set based on all the attributes contained in the initial data fusion result set and the normal data corresponding to each attribute; calculating a correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute; calculating a first fitting value of abnormal data according to the initial data fusion result set and the association coefficient; the method comprises the steps of replacing abnormal data in an initial data fusion result set with a first fitting value of the abnormal data to obtain an updated data fusion result set, predicting the abnormal data to obtain the first fitting value, and comparing the first fitting value predicted by using the association coefficient among the data with the abnormal data, wherein the first fitting value predicted by using the association coefficient among the data is closer to correct data, so that the data of the updated data fusion result set can be more accurate by replacing the predicted first fitting value with the abnormal data.
In one embodiment, the association coefficient calculation module 304 includes:
the input module is used for inputting the sample data set and the abnormal attribute into the multiple linear regression model;
and the linear regression module is used for acquiring the linear correlation coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set through the multiple linear regression model, and taking the linear correlation coefficient as the correlation coefficient.
In one embodiment, the linear regression module includes:
the first acquisition module is used for calculating a second fitting value of data corresponding to the abnormal attribute in the sample data set according to a preset linear coefficient and the sample data set;
the extraction module is used for extracting the true value of the data corresponding to the abnormal attribute from the sample data set;
and the second acquisition module is used for taking the preset linear coefficient corresponding to the situation that the sum of the variance between the second fitting value and the true value of the data corresponding to the abnormal attribute in the sample data set is minimum as the linear correlation coefficient.
In an embodiment, the second acquisition module includes:
the construction module is used for constructing an expression of the sum of the variances of the second fitting value and the true value of the data corresponding to the abnormal attribute in the sample data set;
and the derivation module is used for deriving the expression, taking the preset linear coefficient corresponding to zero derivative as the preset linear coefficient corresponding to the minimum sum of variances, and taking the preset linear coefficient corresponding to the minimum sum of variances as the linear correlation coefficient.
In one embodiment, the anomalous data acquisition module 302 comprises:
the first judging module is used for judging whether each data in the initial data fusion result set belongs to a set abnormal threshold range in sequence, and if the data belong to the set abnormal threshold range, the corresponding data are considered to be abnormal data.
In an embodiment, the power distribution network data resource fusion device further includes:
and the second judging module is used for judging whether the number of the data groups in the sample data set is smaller than the number of the attributes in the initial data fusion result set, and reconstructing the sample data set until the number of the data groups in the sample data set is larger than or equal to the number of the attributes in the initial data fusion result set if the number of the data groups in the sample data set is smaller than the number of the attributes in the initial data fusion result set.
In one embodiment, the fitting value calculation module 305 includes:
and the multiplication module is used for multiplying the initial data fusion result set and the association coefficient to obtain a first fitting value of the abnormal data.
The embodiment of the invention also provides an electronic device, as shown in fig. 4, including: the memory 420 and the processor 410 are in communication connection, the memory 420 stores computer instructions, and the processor 410 executes the computer instructions, thereby executing the power distribution network data resource fusion method according to the above embodiment of the present invention. Wherein the processor 410 and the memory 420 may be connected by a bus or other means. The processor 410 may be a central processing unit (Central Processing Unit, CPU). The processor 410 may also be a chip such as other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination thereof. Memory 420 acts as a non-transitory computer storage medium that may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as corresponding program instructions/modules, in embodiments of the present invention. The processor 410 executes various functional applications and data processing of the processor 410 by running non-transitory software programs, instructions and modules stored in the memory 420, i.e., to implement the method of power distribution network data resource fusion in the method embodiments described above. The memory 420 may include a storage program area that may store an operating device, an application program required for at least one function, and a storage data area; the storage data area may store data created by the processor 410, etc. In addition, the memory 420 may include high-speed random access memory 420, and may also include non-transitory memory 420, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 420 may optionally include memory 420 located remotely from processor 410, such remote memory 420 being connectable to processor 410 through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. One or more modules are stored in memory 420 that, when executed by processor 410, perform the power distribution network data resource fusion method as in the method embodiments described above. The specific details of the electronic device may be understood corresponding to the corresponding related descriptions and effects in the foregoing method embodiments, which are not repeated herein.
An embodiment of the present invention further provides a computer readable storage medium, as shown in fig. 5, on which a computer program 510 is stored, where the instructions, when executed by a processor, implement the steps of the method for fusion of data resources of a power distribution network in the foregoing embodiment. The storage medium also stores audio and video stream data, characteristic frame data, interactive request signaling, encrypted data, preset data size and the like. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above. Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program instructing the relevant hardware, and that the computer program 13 may be stored in a computer readable storage medium, which when executed may comprise the embodiment methods as described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (16)

1. The utility model provides a power distribution network data resource fusion method which is characterized by comprising the following steps:
carrying out resource fusion on the data of at least two data sets to be fused to obtain an initial data fusion result set;
acquiring abnormal data and abnormal attributes corresponding to the abnormal data based on an initial data fusion result set;
constructing a sample data set based on all the attributes contained in the initial data fusion result set and the normal data corresponding to each attribute;
calculating a correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute;
calculating a first fitting value of the abnormal data according to the initial data fusion result set and the association coefficient;
and replacing the abnormal data in the initial data fusion result set with the first fitting value of the abnormal data to obtain an updated data fusion result set.
2. The method for fusing data resources of a power distribution network according to claim 1, wherein calculating the correlation coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute comprises:
and inputting a sample data set and abnormal attributes into a multiple linear regression model, acquiring linear correlation coefficients between data corresponding to the abnormal attributes in the sample data set and the sample data set through the multiple linear regression model, and taking the linear correlation coefficients as correlation coefficients.
3. The method for fusing data resources of a power distribution network according to claim 2, wherein the obtaining, by a multiple linear regression model, a linear correlation coefficient between data corresponding to the abnormal attribute in the sample data set and the sample data set includes:
calculating a second fitting value of data corresponding to the abnormal attribute in the sample data set according to a preset linear coefficient and the sample data set;
extracting a true value of data corresponding to the abnormal attribute from the sample data set;
and taking the preset linear coefficient corresponding to the situation that the sum of the variances between the second fitting value and the true value of the data corresponding to the abnormal attribute in the sample data set is minimum as the linear correlation coefficient.
4. A method of power distribution network data resource fusion according to claim 3, wherein said taking the preset linear coefficient corresponding to the case where the sum of the second fitting value of the data corresponding to the abnormal attribute in the sample data set and the variance of the true value is minimum as the linear correlation coefficient includes:
constructing an expression of the sum of the second fitting value of the data corresponding to the abnormal attribute in the sample data set and the variance of the true value;
and deriving the expression, taking the preset linear coefficient corresponding to zero derivative as the preset linear coefficient corresponding to the minimum sum of variances, and taking the preset linear coefficient corresponding to the minimum sum of variances as a linear correlation coefficient.
5. The method of claim 1, wherein obtaining anomaly data based on an initial set of data fusion results comprises:
and sequentially judging whether each data in the initial data fusion result set belongs to a set abnormal threshold range, and if so, considering the corresponding data as abnormal data.
6. The power distribution network data resource fusion method according to claim 1, further comprising, before calculating correlation coefficients between data corresponding to the abnormal attribute in the sample data set and the sample data set from the sample data set and the abnormal attribute:
judging whether the number of data groups in the sample data set is smaller than the number of attributes in the initial data fusion result set;
if the number of the data groups in the sample data set is smaller than the number of the attributes in the initial data fusion result set, reconstructing the sample data set until the number of the data groups in the sample data set is larger than or equal to the number of the attributes in the initial data fusion result set.
7. The method of claim 1, wherein calculating a first fit value for the anomaly data based on the initial set of data fusion results and the correlation coefficient comprises:
and multiplying the initial data fusion result set by the association coefficient to obtain a first fitting value of the abnormal data.
8. A power distribution network data resource fusion device, comprising:
the initial fusion module is used for carrying out resource fusion on the data of at least two data sets to be fused to obtain an initial data fusion result set;
the abnormal data acquisition module is used for acquiring abnormal data and abnormal attributes corresponding to the abnormal data based on the initial data fusion result set;
the sample construction module is used for constructing a sample data set according to the complete historical data fusion result set;
the correlation coefficient calculation module is used for calculating the correlation coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set according to the sample data set and the abnormal attribute;
the fitting value calculation module is used for calculating a first fitting value of the abnormal data according to the initial data fusion result set and the association coefficient;
and the updating module is used for replacing the abnormal data in the initial data fusion result set with the first fitting value of the abnormal data to obtain an updated data fusion result set.
9. The power distribution network data resource fusion device of claim 8, wherein the association coefficient calculation module comprises:
the input module is used for inputting the sample data set and the abnormal attribute into the multiple linear regression model;
and the linear regression module is used for acquiring the linear correlation coefficient between the data corresponding to the abnormal attribute in the sample data set and the sample data set through the multiple linear regression model, and taking the linear correlation coefficient as the correlation coefficient.
10. The power distribution network data resource fusion device of claim 9, wherein the linear regression module comprises:
the first acquisition module is used for calculating a second fitting value of data corresponding to the abnormal attribute in the sample data set according to a preset linear coefficient and the sample data set;
the extraction module is used for extracting the true value of the data corresponding to the abnormal attribute from the sample data set;
and the second acquisition module is used for taking the preset linear coefficient corresponding to the situation that the sum of the variance between the second fitting value and the true value of the data corresponding to the abnormal attribute in the sample data set is minimum as the linear correlation coefficient.
11. The power distribution network data resource fusion device of claim 10, wherein the second acquisition module comprises:
the construction module is used for constructing an expression of the sum of the variances of the second fitting value and the true value of the data corresponding to the abnormal attribute in the sample data set;
and the derivation module is used for deriving the expression, taking the preset linear coefficient corresponding to zero derivative as the preset linear coefficient corresponding to the minimum sum of variances, and taking the preset linear coefficient corresponding to the minimum sum of variances as the linear correlation coefficient.
12. The power distribution network data resource fusion device of claim 8, wherein the abnormal data acquisition module comprises:
the first judging module is used for judging whether each data in the initial data fusion result set belongs to a set abnormal threshold range in sequence, and if the data belong to the set abnormal threshold range, the corresponding data are considered to be abnormal data.
13. The power distribution network data resource fusion device of claim 8, further comprising:
and the second judging module is used for judging whether the number of the data groups in the sample data set is smaller than the number of the attributes in the initial data fusion result set, and reconstructing the sample data set until the number of the data groups in the sample data set is larger than or equal to the number of the attributes in the initial data fusion result set if the number of the data groups in the sample data set is smaller than the number of the attributes in the initial data fusion result set.
14. The power distribution network data resource fusion device of claim 8, wherein the fitting value calculation module comprises:
and the multiplication module is used for multiplying the initial data fusion result set and the association coefficient to obtain a first fitting value of the abnormal data.
15. An electronic device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, the processor executing the computer instructions to perform the power distribution network data resource fusion method of any one of claims 1 to 7.
16. A computer readable storage medium storing computer instructions for causing the computer to perform the power distribution network data resource fusion method according to any one of claims 1 to 7.
CN202310323146.9A 2023-03-29 2023-03-29 Power distribution network data resource fusion method, device, equipment and storage medium Pending CN116340883A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310323146.9A CN116340883A (en) 2023-03-29 2023-03-29 Power distribution network data resource fusion method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310323146.9A CN116340883A (en) 2023-03-29 2023-03-29 Power distribution network data resource fusion method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116340883A true CN116340883A (en) 2023-06-27

Family

ID=86882041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310323146.9A Pending CN116340883A (en) 2023-03-29 2023-03-29 Power distribution network data resource fusion method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116340883A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150438A (en) * 2023-10-31 2023-12-01 成都汉度科技有限公司 Communication data fusion method and system based on edge calculation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150438A (en) * 2023-10-31 2023-12-01 成都汉度科技有限公司 Communication data fusion method and system based on edge calculation
CN117150438B (en) * 2023-10-31 2024-02-06 成都汉度科技有限公司 Communication data fusion method and system based on edge calculation

Similar Documents

Publication Publication Date Title
CN110609759B (en) Fault root cause analysis method and device
CN111159897B (en) Target optimization method and device based on system modeling application
CN114116065B (en) Method and device for acquiring topological graph data object and electronic equipment
CN111352712A (en) Cloud computing task tracking processing method and device, cloud computing system and server
KR102555607B1 (en) Method and apparatus for annotating data, device, storage medium and computer program
CN116340883A (en) Power distribution network data resource fusion method, device, equipment and storage medium
CN111966707A (en) Query statement generation method and device, electronic equipment and computer readable medium
TW201810105A (en) Verification system for software function and verification mathod therefor
CN110297820B (en) Data processing method, device, equipment and storage medium
CN111784246B (en) Logistics path estimation method
EP4258153A1 (en) Method, apparatus and device for transferring grid data of finite element model for nuclear island structure
CN114841267B (en) Real-time prediction method, device, electronic equipment and computer program product
CN113093702B (en) Fault data prediction method and device, electronic equipment and storage medium
CN112528500B (en) Evaluation method and evaluation equipment for scene graph construction model
CN116933189A (en) Data detection method and device
CN103745312A (en) A production quality control method
WO2021072646A1 (en) Method and system for using production data for rapid modeling, and storage medium
CN115550259B (en) Flow distribution method based on white list and related equipment
CN110060103A (en) A kind of method, apparatus linearly fixed a price, storage medium and electronic equipment
CN117132177B (en) Runoff forecasting model construction and runoff forecasting method based on multiple hypothesis test
CN111784248B (en) Logistics tracing method
CN117056663B (en) Data processing method and device, electronic equipment and storage medium
CN116843203B (en) Service access processing method, device, equipment, medium and product
CN111428118B (en) Method for detecting event reliability and electronic equipment
CN117132176A (en) Runoff forecasting model construction and runoff forecasting method based on forecasting factor screening

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination