CN111080118A - Quality evaluation method and system for new energy grid-connected data - Google Patents

Quality evaluation method and system for new energy grid-connected data Download PDF

Info

Publication number
CN111080118A
CN111080118A CN201911271803.XA CN201911271803A CN111080118A CN 111080118 A CN111080118 A CN 111080118A CN 201911271803 A CN201911271803 A CN 201911271803A CN 111080118 A CN111080118 A CN 111080118A
Authority
CN
China
Prior art keywords
data
evaluated
sequence
quality
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911271803.XA
Other languages
Chinese (zh)
Other versions
CN111080118B (en
Inventor
张帆
李洋
聂松松
宣东海
吴桂栋
朱广新
张羽舒
王春梅
张松
陈翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Big Data Center Of State Grid Corp Of China
Original Assignee
Big Data Center Of State Grid Corp Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Center Of State Grid Corp Of China filed Critical Big Data Center Of State Grid Corp Of China
Priority to CN201911271803.XA priority Critical patent/CN111080118B/en
Publication of CN111080118A publication Critical patent/CN111080118A/en
Application granted granted Critical
Publication of CN111080118B publication Critical patent/CN111080118B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention provides a quality evaluation method and an evaluation system of new energy grid-connected data, wherein the quality evaluation method searches a reference data sequence matched with a data sequence with a preset length in front of data to be evaluated from a reference sample sequence based on a vector matching method and a linear correlation method respectively, takes a quality mark of the next data of each reference data sequence as a candidate quality mark of the data to be evaluated, and determines the candidate quality mark with the most number of times as the quality mark of the data to be evaluated. According to the method, the quality of the new energy grid-connected data is evaluated based on vector matching and linear correlation, real-time evaluation of the quality of the new energy extra-illness data of different types is realized through the method, and the quality evaluation effect is good.

Description

Quality evaluation method and system for new energy grid-connected data
Technical Field
The invention belongs to the technical field of information technology and power transmission, and particularly relates to a quality evaluation method and an evaluation system for new energy grid-connected data.
Background
The new energy grid connection is an important application of the energy Internet, and because the new energy grid connection is green and environment-friendly, the new energy grid connection can greatly reduce the exhaust emission, and gradually replaces petrochemical energy, and becomes a main energy supply mode for future development.
The new energy power generation mainly comprises photovoltaic and wind power, and the power generation amount is limited by surrounding meteorological conditions such as illumination, wind speed and the like. Therefore, the new energy power generation has intermittence and instability, which can affect the normal operation of the network and increase the system scheduling cost, so that the power generation process needs to be effectively controlled. In order to realize accurate and effective control, large-scale distributed sampling needs to be carried out on the grid-connected data of the new energy power generation equipment.
Due to the self condition limitation of the sampling equipment and the high quality requirement of the sampling data, the quality of the grid-connected data needs to be directly, effectively and timely evaluated, so that the more effective monitoring control and strategy management of the new energy equipment can be realized through the subsequent data cleaning and data integration.
Disclosure of Invention
In order to overcome the existing problems or at least partially solve the problems, embodiments of the present invention provide a quality evaluation method and an evaluation system for new energy grid-connected data.
According to a first aspect of the embodiments of the present invention, a method for evaluating quality of new energy grid-connected data is provided, including:
searching a plurality of first reference data sequences which are completely the same as a first preset length data sequence before the data to be evaluated from the reference sample sequence based on a vector matching method;
searching a plurality of second reference data sequences with correlation coefficients reaching preset correlation coefficients with the correlation coefficients of a second preset length data sequence before the data to be evaluated from the reference sample sequence based on a linear correlation method;
taking the quality marks of the data after each first reference data sequence and the data after each second reference data sequence as candidate quality marks of the data to be evaluated to form a candidate quality mark set;
determining the candidate quality marker with the most times in the candidate quality marker set as the quality marker of the data to be evaluated;
wherein each data in the reference sample sequence has a corresponding mass signature.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, before searching a plurality of first reference data sequences which are completely the same as the first preset-length data sequences before the data to be evaluated from the reference sample sequence based on the vector matching method, the method further includes:
filtering each data in the historical data sample set, and calculating the distance between each data after filtering and before filtering to form the distance corresponding to each data;
and performing quality marking on each data according to the corresponding distance of each data, wherein each data carrying the quality marking forms a reference sample sequence.
Further, the searching, based on the vector matching method, for multiple first reference data sequences that are completely the same as a first preset-length data sequence before the data to be evaluated from the reference sample sequence further includes:
setting a first preset range corresponding to the number of the first reference data sequences, and adjusting the first preset length to enable the number of the first reference data sequences obtained through matching to be within the first preset range.
Further, the searching, based on the linear correlation method, for a plurality of second reference data sequences that are identical to a second data sequence with a preset length before the data to be evaluated from the reference sample sequence includes:
dividing the reference sample sequence according to a second preset length to form a plurality of reference sample subsequences;
and calculating a correlation coefficient between any reference sample subsequence and a second preset length data sequence before the data to be evaluated, wherein if the correlation coefficient reaches a preset correlation coefficient, the any reference sample subsequence is a second reference data sequence.
Further, the searching, from the reference sample sequence, a plurality of second reference data sequences whose correlation coefficients with a second data sequence of a preset length before the data to be evaluated reach a preset correlation coefficient based on a linear correlation method further includes:
setting a correlation coefficient threshold range and a second preset range corresponding to the number of second reference data sequences, and adjusting a second preset length and a correlation coefficient to enable the number of the obtained second reference data sequences to be within the second preset range;
wherein the correlation coefficient is adjusted within the correlation coefficient threshold range.
Further, after searching a plurality of first reference data sequences completely identical to a first preset-length data sequence before the data to be evaluated from the reference sample sequence based on the vector matching method, the method further includes:
taking the data value of the next data of each first reference data sequence as the expected value of the data to be evaluated to obtain a plurality of expected values of the data to be evaluated;
calculating a first distance between each expected value of the data to be evaluated and an actual value of the data to be evaluated to obtain a plurality of first distances;
the method, based on the linear correlation method, further includes after searching a plurality of second reference data sequences from the reference sample sequence, where a correlation coefficient with a second preset-length data sequence before data to be evaluated reaches a preset correlation coefficient:
calculating expected values of the data to be evaluated according to the data value of the next data of each second reference data sequence to obtain a plurality of expected values of the data to be evaluated;
and calculating a first distance between each expected value of the data to be evaluated and the actual value of the data to be evaluated to obtain a plurality of second distances.
Further, the calculating an expected value of the data to be evaluated according to the data value of the next data of each second reference data sequence to obtain a plurality of expected values of the data to be evaluated includes:
establishing a plurality of linear equation sets for any second reference data sequence and a second preset length data sequence before the data to be evaluated:
Figure BDA0002314407680000041
solving the equation set (1) to obtain a and b, wherein x is a second reference data sequence, y is a second preset length data sequence before the data to be evaluated, and xiAnd xjFor the ith and jth elements, y, of the second reference data sequence xiAnd yjThe data sequence to be evaluated is the ith element and the jth element of a second preset length data sequence y before the data to be evaluated, and n is a second preset length;
the expected value of the data to be evaluated is E (y)n+1)=a*xn+1+ b, wherein, xn+1For any data subsequent to the second reference data sequence, yn+1Is the data to be evaluated.
Further, before the step of taking the quality label of the data after each first reference data sequence and the quality label of the data after each second reference data sequence as the candidate quality label of the data to be evaluated, the step of forming the candidate quality label set further includes:
forming a distance set by a plurality of first distances and a plurality of second distances corresponding to the data to be evaluated, arranging all the distances in the distance set from small to large, and screening out a preset number of distances in the front of the sequence, wherein the distances are the first distances or the second distances;
taking the first reference data sequence or the second reference data sequence corresponding to each distance as a reference data sequence set;
correspondingly, taking the quality label of the latter data of each first reference data sequence and the latter data of each second reference data sequence as the candidate quality label of the data to be evaluated, and forming a candidate quality label set comprises:
and taking the quality mark of the next data of each reference data sequence in the reference data sequence set as a candidate quality mark of the data to be evaluated to form a candidate quality mark set, wherein each reference data sequence in the reference data sequence set is a first reference data sequence or a second reference data sequence.
According to a second aspect of the embodiments of the present invention, there is provided a quality evaluation system for new energy grid-connected data, including:
the first search module is used for searching a plurality of first reference data sequences which are completely the same as a first preset length data sequence before the data to be evaluated from the reference sample sequence based on a vector matching method;
the second searching module is used for searching a plurality of second reference data sequences, the correlation coefficient of which reaches a preset correlation coefficient, of a second preset length data sequence before the data to be evaluated from the reference sample sequence based on a linear correlation method;
the combination module is used for taking the quality marks of the data after each first reference data sequence and the data after each second reference data sequence as candidate quality marks of the data to be evaluated to form a candidate quality mark set;
a determining module, configured to determine a candidate quality marker with the highest number of times in the candidate quality marker set as a quality marker of the data to be evaluated;
wherein each data in the reference sample sequence has a corresponding mass signature.
According to a third aspect of the embodiments of the present invention, there is further provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor calls the program instruction to perform the method for evaluating the quality of the new energy grid-connected data provided by any one of the various possible implementation manners of the first aspect.
According to a fourth aspect of the embodiments of the present invention, there is further provided a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions cause the computer to execute the method for evaluating the quality of the new energy grid-connection data provided in any one of the various possible implementation manners of the first aspect.
The embodiment of the invention provides a quality evaluation method and an evaluation system for new energy grid-connected data, which are used for evaluating the quality of the new energy grid-connected data based on vector matching and linear correlation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic overall flow chart of a quality evaluation method of new energy grid-connected data according to an embodiment of the present invention;
FIG. 2 is a flow chart of quality labeling of historical sample data according to an embodiment of the present invention;
FIG. 3 is a flowchart of searching for a second reference data sequence based on a linear correlation method according to an embodiment of the present invention;
fig. 4 is a connection block diagram of a quality evaluation system of new energy grid-connected data according to an embodiment of the present invention;
fig. 5 is a schematic view of an overall structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Referring to fig. 1, a quality evaluation method of new energy grid-connected data is provided, and the method can accurately evaluate the quality of the new energy grid-connected data and provide a basis for acquiring the new energy grid-connected data. The quality evaluation method comprises the following steps:
s110, searching a plurality of first reference data sequences which are completely identical to a first preset length data sequence before the data to be evaluated from the reference sample sequences based on a vector matching method;
s120, searching a plurality of second reference data sequences with correlation coefficients reaching preset correlation coefficients with the correlation coefficients of a second preset length data sequence before the data to be evaluated from the reference sample sequences based on a linear correlation method;
s130, taking the quality marks of the data after each first reference data sequence and the data after each second reference data sequence as candidate quality marks of the data to be evaluated to form a candidate quality mark set;
s140, determining the candidate quality marker with the most times of the same number in the candidate quality marker set as the quality marker of the data to be evaluated;
wherein each data in the reference sample sequence has a corresponding mass signature.
It can be understood that before quality evaluation and classification are performed on data to be evaluated (new energy grid-connected data), a large amount of historical grid-connected data needs to be collected, quality marking is performed on each grid-connected data, and a quality label of each grid-connected data is obtained. And forming a reference sample sequence by the historical grid-connected data with the quality marks, so that the subsequent data to be evaluated can be used as a reference during quality evaluation. The collected historical grid-connected data are arranged according to a time sequence, for example, one grid-connected data is collected at each moment, and quality marking is carried out on the historical grid-connected data.
The embodiment of the invention provides two matching methods, namely a vector matching method and a linear correlation method for matching the data sequence.
Searching a plurality of first reference data sequences which are completely the same as a first preset length data sequence before the data to be evaluated from the reference sample sequence based on a vector matching method; and searching a plurality of second reference data sequences with correlation coefficients reaching preset correlation coefficients with the correlation coefficients of a second preset length data sequence before the data to be evaluated from the reference sample sequence based on a linear correlation method.
In the embodiment of the invention, a reference data sequence matched with a section of data sequence before the data to be evaluated is found from the reference sample sequence, and the quality mark of the searched reference data sequence is used as the reference quality mark of the data to be evaluated.
Specifically, the quality marker of the latter data of each first reference data sequence searched based on a vector matching method and the quality marker of the latter data of each second reference data sequence searched based on a linear correlation method are used as candidate quality markers of the data to be evaluated to form a candidate quality marker set; and determining the candidate quality marker with the most times in the candidate quality marker set as the quality marker of the data to be evaluated.
The embodiment of the invention carries out quality evaluation on the new energy grid-connected data based on vector matching and linear correlation, realizes real-time evaluation on the quality of the new energy grid-connected data of different types by the method, fuses the two matching methods, and has better quality evaluation effect on the new energy grid-connected data.
Referring to fig. 2, on the basis of the above embodiment, in the embodiment of the present invention, the quality marking of each data in the historical data sample set includes:
s210, filtering each data in the historical data sample set, and calculating the distance between each data after filtering and before filtering to form the distance corresponding to each data;
and S220, marking the quality of each data according to the corresponding distance of each data.
It can be understood that when a reference sample sequence is constructed, a large amount of historical new energy grid-connected data can be collected in advance, each grid-connected data is filtered by a filter, the distance between each data before filtering and each data after filtering (the absolute value of the difference between the data before filtering and the data after filtering) is calculated, and the distance corresponding to each data is obtained.
And performing quality marking on each data according to the distance, for example, marking the data with the distance of zero as a best quality class, and performing adaptive classification on other data, wherein the classes of the quality marks comprise better quality, medium quality, poorer quality and worst quality, each quality mark corresponds to a distance interval, the quality of the data with the smaller distance is better than that of the data with the larger distance, and each historical data is subjected to quality marking by the method. And forming the historical data with the quality marks into a reference sample sequence.
On the basis of the foregoing embodiments, in the embodiments of the present invention, the searching, based on the vector matching method, for multiple first reference data sequences that are completely the same as a first preset-length data sequence before data to be evaluated from the reference sample sequence further includes:
setting a first preset range corresponding to the number of the first reference data sequences, and adjusting the first preset length to enable the number of the first reference data sequences obtained through matching to be within the first preset range.
It can be understood that the matching process based on the vector matching method is to first set an initial length k of a segment of data sequence before the data to be evaluated, for example, k is 5. And searching a first reference data sequence which is completely the same as a data sequence with the length of k before the data to be evaluated from the reference sample sequence based on a vector matching method. If the number of the first reference sequences searched from the reference sample sequence is greater than a rated value (such as 20), the length k of the data sequence is increased by one, and the first reference data sequence which is completely the same as a segment of data sequence before the data to be estimated in the reference sample sequence is searched again. If the number of the first reference data sequences searched at this time is less than the minimum allowable value (for example, 6), the last length searched data sequence is the final first reference data sequence. If k takes different values, the number of first reference data sequences searched from the reference sample sequence is smaller than a minimum allowable value (for example, 6), and the reference data sequence searched from the reference sample sequence when k is 5 is taken as the final first reference data sequence.
For example, if the data to be evaluated is the nth data in a data sequence, then k equals to 5 data before the data to be evaluated to form a data sequence, and all reference data sequences identical to the data sequence are searched from the reference sample sequence based on a vector matching method, which is hereinafter referred to as a first reference data sequence, where the length of the first reference data sequence is also k. And adjusting the size of the k value to enable the number of the searched first reference data sequences to be between the maximum rated value and the minimum allowable value.
Referring to fig. 3, on the basis of the foregoing embodiments, in the embodiment of the present invention, searching, from the reference sample sequence, a plurality of second reference data sequences that are completely the same as a second data sequence of a preset length before the data to be evaluated based on a linear correlation method includes:
s310, dividing the reference sample sequence according to a second preset length to form a plurality of reference sample subsequences;
and S320, calculating a correlation coefficient between any reference sample subsequence and a second preset length data sequence before the data to be evaluated, wherein if the correlation coefficient reaches a preset correlation coefficient, the any reference sample subsequence is a second reference data sequence.
On the basis of the foregoing embodiments, in an embodiment of the present invention, based on a linear correlation method, searching, from the reference sample sequence, a plurality of second reference data sequences whose correlation coefficients with a second data sequence of a preset length before data to be evaluated reach a preset correlation coefficient further includes:
setting a correlation coefficient threshold range and a second preset range corresponding to the number of second reference data sequences, and adjusting a second preset length and a correlation coefficient to enable the number of the obtained second reference data sequences to be within the second preset range;
wherein the correlation coefficient is adjusted within the correlation coefficient threshold range.
It can be understood that the process of matching based on the linear correlation method is to search a sequence which is highly linearly correlated (the absolute value of the correlation coefficient is greater than 0.97) with a segment of data sequence before the data to be evaluated in the reference sample sequence as a reference sequence. First, an initial length m of a segment of data sequence preceding the data to be evaluated and an initial correlation coefficient s are set, for example, m is 5, and s is 0.97. If the number of second reference sequences searched from the reference sample sequence is larger than a certain threshold value (e.g. 20), the threshold of the correlation coefficient value s is increased (by step 0.01 until 1 is reached). If the number of reference sequences searched from the reference sample sequence is smaller than a minimum specified threshold value (e.g., 6), the threshold of the correlation coefficient value is decreased (by step size 0.01 until the absolute value of the correlation coefficient is reached 0.95). And if the number of all the reference sequences searched from the reference sample sequences is smaller than the minimum specific threshold value aiming at different correlation coefficient values, taking the reference sequence with the absolute value of the correlation coefficient of 0.97 as the final second reference data sequence.
For example, an initial length m of a segment of data sequence before the data to be evaluated and an initial correlation coefficient s are set, where m is 5 and s is 0.97, when matching is performed based on a linear correlation method, the reference sample sequence is divided by a sliding window, and when m is 5, the reference sample sequence is divided into multiple segments of reference sample subsequences according to the length of 5. Calculating a linear correlation coefficient between each reference sample subsequence and a data sequence before data to be evaluated, wherein the calculation mode of the correlation coefficient is as follows:
Figure BDA0002314407680000111
wherein x is a reference sample subsequence, y is a data sequence before data to be evaluated, and xiFor the i-th data of the reference sample subsequence, yiFor the ith data in a data sequence before the data to be evaluated, Ex is a sequence expected value of the reference sample subsequence, and an average value of all data in the sequence can be used as an expected value, Ey is a sequence expected value of a data sequence before the data to be evaluated, and an average value of all data in the sequence can be used as an expected value.
And calculating a correlation coefficient value between each reference sample subsequence and a data sequence before the data to be evaluated through the formula, wherein if the correlation coefficient value is close to 1, the linear correlation between the reference sample subsequence and the data sequence before the data to be evaluated is shown. And when the calculated correlation coefficient value of the reference sample subsequence and a data sequence before the data to be evaluated reaches 0.97, the reference sample subsequence is a matched second reference data sequence. And adjusting the values of m and s to ensure that the number of second reference data sequences which are searched from the reference sample sequence and are matched with a segment of data sequence before the data to be evaluated is in a set range.
On the basis of the foregoing embodiments, in the embodiments of the present invention, based on a vector matching method, after searching for a plurality of first reference data sequences that are completely the same as a first data sequence of a preset length before data to be evaluated from the reference sample sequence, the method further includes:
taking the data value of the next data of each first reference data sequence as the expected value of the data to be evaluated to obtain a plurality of expected values of the data to be evaluated;
calculating a first distance between each expected value of the data to be evaluated and an actual value of the data to be evaluated to obtain a plurality of first distances;
after the searching for a plurality of second reference data sequences with correlation coefficients reaching a preset correlation coefficient with the correlation coefficient of a second preset length data sequence before the data to be evaluated from the reference sample sequence based on the linear correlation method, the method further comprises:
calculating expected values of the data to be evaluated according to the data value of the next data of each second reference data sequence to obtain a plurality of expected values of the data to be evaluated;
and calculating a first distance between each expected value of the data to be evaluated and the actual value of the data to be evaluated to obtain a plurality of second distances.
It can be understood that, after a plurality of first reference data sequences matched with a segment of data sequence before the data to be evaluated are searched from the reference sample sequence based on a vector matching method, a data value of data after each first reference data sequence is used as an expected value of the data to be evaluated to obtain a plurality of expected values of the data to be evaluated, and a first distance between each expected value of the data to be evaluated and an actual value of the data to be evaluated (i.e., an absolute value of a difference between data values of two data) is calculated to obtain a plurality of first distances corresponding to the data to be evaluated.
Similarly, when a plurality of second reference data sequences matched with a section of data sequence before the data to be evaluated are searched from the reference sample sequence based on a linear correlation method, an expected value of the data to be evaluated is calculated according to a data value of data after each second reference data sequence to obtain a plurality of expected values of the data to be evaluated, and a second distance (i.e. an absolute value of a difference between the data values of two data) between each expected value of the data to be evaluated and an actual value of the data to be evaluated is calculated to obtain a plurality of second distances corresponding to the data to be evaluated.
On the basis of the foregoing embodiments, in the embodiments of the present invention, calculating an expected value of the data to be evaluated according to a data value of next data of each second reference data sequence, and obtaining a plurality of expected values of the data to be evaluated includes:
establishing a plurality of linear equation sets for any second reference data sequence and a second preset length data sequence before the data to be evaluated:
Figure BDA0002314407680000121
a and b are obtained through equation system solution, wherein x is a second reference data sequence, y is a second preset length data sequence before the data to be evaluated, and xiAnd xjFor the ith and jth elements, y, of the second reference data sequence xiAnd yjThe data sequence to be evaluated is the ith element and the jth element of a second preset length data sequence y before the data to be evaluated, and n is a second preset length;
the expected value of the data to be evaluated is E (y)n+1)=a*xn+1+ b, wherein, xn+1For any data subsequent to the second reference data sequence, yn+1Is the data to be evaluated.
It can be understood that, for the calculation of the linear correlation coefficient, a and b can be calculated on the basis of a linear correlation equation formed by simultaneously selecting a segment of data sequence before the data to be evaluated and two pieces of data at the same position in the second reference data sequence. If an unresolvable solution occurs (the values of the two position data are proportional, resulting in matrix row and column values close to zero), the partial result is discarded. Because the established equation set is a linear equation set, the coefficients a and b can be solved by only needing two pieces of data at the same position theoretically, but a and b calculated by the two pieces of data can not necessarily satisfy all the solving equations, so that a and b possibly calculated by the equation set have multiple groups, and finally, the obtained values of a and b are weighted and averaged to obtain the final coefficients a 'and b'. Compared with the least square algorithm, the influence of bad data (the matrix row and column values are close to 0) on the prediction result can be effectively eliminated (the linear correlation term of the least square method can cause the solution failure), and high-performance data fusion is realized through weighted average, so that the prediction precision is improved.
Calculating the correlation coefficients a and b of each second reference data sequence and a data sequence before the data to be evaluated through the equation set, and calculating the expected value of the data to be evaluated according to the data value of the data after each second reference data sequence, wherein the calculation formula is E (y)n+1)=a*xn+1+ b, for the plurality of second reference data sequences, a plurality of expected values of the data to be evaluated can be calculated, and then a second distance between each expected value of the data to be evaluated and the actual value of the data to be evaluated is calculated, so that a plurality of second distances corresponding to the data to be evaluated are obtained.
On the basis of the foregoing embodiments, in the embodiments of the present invention, taking the quality indicator of the data after each first reference data sequence and the quality indicator of the data after each second reference data sequence as the candidate quality indicator of the data to be evaluated, before forming the candidate quality indicator set, the method further includes:
forming a distance set by a plurality of first distances and a plurality of second distances corresponding to data to be evaluated, arranging all the distances in the distance set from small to large, and screening out a preset number of distances in the front of the sequence, wherein the distances are the first distances or the second distances;
taking the first reference data sequence or the second reference data sequence corresponding to each distance as a reference data sequence set;
correspondingly, taking the quality label of the latter data of each first reference data sequence and the latter data of each second reference data sequence as the candidate quality label of the data to be evaluated, and forming a candidate quality label set comprises:
and taking the quality mark of the next data of each reference data sequence in the reference data sequence set as a candidate quality mark of the data to be evaluated to form a candidate quality mark set, wherein each reference data sequence in the reference data sequence set is a first reference data sequence or a second reference data sequence.
It will be appreciated that each of the first distances calculated as described above corresponds to each of the first reference data sequences, and each of the second distances corresponds to each of the second reference data sequences. The embodiment of the invention combines the calculated plurality of first distances and the calculated plurality of second distances into a distance set, screens a preset number of distances with smaller distance values from the distance set, and takes the first reference data sequence or the second reference data sequence corresponding to the screened preset number of distances as a reference data sequence set.
For example, before the screening, assuming that the sum of the numbers of the first distance and the second distance is 200, the first 150 distances with smaller distances are taken to form a distance set, and the smaller distances are screened from the plurality of distances by the nearest neighbor classification method, wherein the smaller distances indicate that the data quality marks of the data to be evaluated and the data at the corresponding positions in the reference data sequence are closer, so that the quality evaluation classification result is more accurate.
And taking the quality mark of the next data of each reference data sequence in the reference data sequence set as a candidate quality mark of the data to be evaluated to form a candidate quality mark set, wherein each reference data sequence in the reference data sequence set is a first reference data sequence or a second reference data sequence.
And determining the candidate quality marker with the most times in the candidate quality marker set as the quality marker of the data to be evaluated. For example, as described above, there are 100 distances in the distance set, and then 100 quality marks corresponding to the data to be evaluated are obtained, for example, 40 quality marks are good quality, 30 quality marks are medium quality, 20 quality marks are poor quality, and 10 quality marks are worst quality, and then the good quality is determined as the quality mark of the data to be evaluated.
It should be noted that the data to be estimated in the above embodiments is a single data in a data sequence, and the data to be estimated may also be a data sequence to be estimated, that is, a small data sequence in a large data sequence. The method for evaluating the quality of the data sequence to be evaluated is the same as the method for evaluating the quality of a single data to be evaluated, except that when the first distance and the second distance are calculated, for example, the data sequence to be evaluated includes two data, that is, two consecutive data are subjected to quality evaluation, and the distance between the first 2 data of each reference data sequence (including the first reference data sequence and the second reference data sequence) and the data sequence to be evaluated is calculated by using the euclidean distance.
Referring to fig. 4, in an embodiment of the present invention, a quality evaluation system for new energy grid-connected data is provided, and is configured to implement the quality evaluation method in each of the foregoing embodiments. Therefore, the description and definition in each embodiment of the quality evaluation method of the new energy grid-connected data can be used for understanding each execution module in the embodiment of the present invention. Fig. 4 is a schematic diagram of an overall structure of a quality evaluation system of new energy grid-connected data according to an embodiment of the present invention, where the quality evaluation system includes a marking module 41, a first search module 42, a second search module 43, a combination module 44, and a determination module 45.
A marking module 41, configured to perform quality marking on each data in the historical data sample set to form a reference sample sequence;
the first search module 42 is configured to search, based on a vector matching method, multiple first reference data sequences that are completely the same as a first preset-length data sequence before data to be evaluated from the reference sample sequence;
a second searching module 43, configured to search, from the reference sample sequence, multiple second reference data sequences whose correlation coefficients with a second preset-length data sequence before the data to be evaluated reach preset correlation coefficients based on a linear correlation method;
a combination module 44, configured to use the quality label of the last data of each first reference data sequence and the quality label of the last data of each second reference data sequence as a candidate quality label of the data to be evaluated, so as to form a candidate quality label set;
a determining module 45, configured to determine a candidate quality indicator with the highest number of times in the candidate quality indicator set as the quality indicator of the data to be evaluated.
The quality evaluation system for the new energy grid-connected data provided by the embodiment of the invention corresponds to the quality evaluation method for the new energy grid-connected data provided by each embodiment, and the relevant technical features of the embodiment of the invention can refer to the relevant technical features of the quality evaluation method for the new energy grid-connected data provided by each embodiment, and are not described herein again.
The embodiment provides an electronic device, and fig. 5 is a schematic view of an overall structure of the electronic device according to the embodiment of the present invention, where the electronic device includes: at least one processor 01, at least one memory 02, and a bus 03; wherein, the processor 01 and the memory 02 complete the communication with each other through the bus 03; the memory 02 stores program instructions executable by the processor 01, and the processor calls the program instructions to execute the methods provided by the above method embodiments, for example, the method includes: performing quality marking on each data in the historical data sample set to form a reference sample sequence; searching a plurality of first reference data sequences which are completely the same as a first preset length data sequence before the data to be evaluated from the reference sample sequence based on a vector matching method; searching a plurality of second reference data sequences with correlation coefficients reaching preset correlation coefficients with the correlation coefficients of a second preset length data sequence before the data to be evaluated from the reference sample sequence based on a linear correlation method; taking the quality marks of the data after each first reference data sequence and the data after each second reference data sequence as candidate quality marks of the data to be evaluated to form a candidate quality mark set; and determining the candidate quality marker with the most times of the same number in the candidate quality marker set as the quality marker of the data to be evaluated.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: performing quality marking on each data in the historical data sample set to form a reference sample sequence; searching a plurality of first reference data sequences which are completely the same as a first preset length data sequence before the data to be evaluated from the reference sample sequence based on a vector matching method; searching a plurality of second reference data sequences with correlation coefficients reaching preset correlation coefficients with the correlation coefficients of a second preset length data sequence before the data to be evaluated from the reference sample sequence based on a linear correlation method; taking the quality marks of the data after each first reference data sequence and the data after each second reference data sequence as candidate quality marks of the data to be evaluated to form a candidate quality mark set; and determining the candidate quality marker with the most times of the same number in the candidate quality marker set as the quality marker of the data to be evaluated.
According to the quality evaluation method and the quality evaluation system for the new energy grid-connected data, provided by the embodiment of the invention, the quality evaluation is carried out on the new energy grid-connected data based on the vector matching, the linear correlation and the nearest classification method, the real-time evaluation on the quality of the new energy extra-illness data of different classes is realized through the method, and the quality evaluation effect is better.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A quality evaluation method of new energy grid-connected data is characterized by comprising the following steps:
searching a plurality of first reference data sequences which are completely the same as a first preset length data sequence before the data to be evaluated from the reference sample sequence based on a vector matching method;
searching a plurality of second reference data sequences with correlation coefficients reaching preset correlation coefficients with the correlation coefficients of a second preset length data sequence before the data to be evaluated from the reference sample sequence based on a linear correlation method;
taking the quality marks of the data after each first reference data sequence and the data after each second reference data sequence as candidate quality marks of the data to be evaluated to form a candidate quality mark set;
determining the candidate quality marker with the most times in the candidate quality marker set as the quality marker of the data to be evaluated;
wherein each data in the reference sample sequence has a corresponding mass signature.
2. The quality evaluation method according to claim 1, wherein before searching for a plurality of first reference data sequences identical to a first preset-length data sequence preceding the data to be evaluated from the reference sample sequence based on the vector matching method, the method further comprises:
filtering each data in the historical data sample set, and calculating the distance between each data after filtering and before filtering to form the distance corresponding to each data;
and performing quality marking on each data according to the corresponding distance of each data, wherein each data carrying the quality marking forms a reference sample sequence.
3. The quality evaluation method according to claim 1, wherein the searching for a plurality of first reference data sequences identical to a first preset-length data sequence preceding the data to be evaluated from the reference sample sequence based on the vector matching method further comprises:
setting a first preset range corresponding to the number of the first reference data sequences, and adjusting the first preset length to enable the number of the first reference data sequences obtained through matching to be within the first preset range.
4. The quality evaluation method according to claim 1, wherein the searching for a plurality of second reference data sequences identical to a second preset-length data sequence preceding the data to be evaluated from the reference sample sequence based on the linear correlation method comprises:
dividing the reference sample sequence according to a second preset length to form a plurality of reference sample subsequences;
and calculating a correlation coefficient between any reference sample subsequence and a second preset length data sequence before the data to be evaluated, wherein if the correlation coefficient reaches a preset correlation coefficient, the any reference sample subsequence is a second reference data sequence.
5. The quality evaluation method according to claim 4, wherein searching the reference sample sequence for a plurality of second reference data sequences having correlation coefficients reaching a preset correlation coefficient with a second preset-length data sequence preceding the data to be evaluated based on a linear correlation method further comprises:
setting a correlation coefficient threshold range and a second preset range corresponding to the number of second reference data sequences, and adjusting a second preset length and a correlation coefficient to enable the number of the obtained second reference data sequences to be within the second preset range;
wherein the correlation coefficient is adjusted within the correlation coefficient threshold range.
6. The quality evaluation method according to claim 1, wherein after searching for a plurality of first reference data sequences identical to a first preset-length data sequence before the data to be evaluated from the reference sample sequence based on the vector matching method, the method further comprises:
taking the data value of the next data of each first reference data sequence as the expected value of the data to be evaluated to obtain a plurality of expected values of the data to be evaluated;
calculating a first distance between each expected value of the data to be evaluated and an actual value of the data to be evaluated to obtain a plurality of first distances;
the method, based on the linear correlation method, further includes after searching a plurality of second reference data sequences from the reference sample sequence, where a correlation coefficient with a second preset-length data sequence before data to be evaluated reaches a preset correlation coefficient:
calculating expected values of the data to be evaluated according to the data value of the next data of each second reference data sequence to obtain a plurality of expected values of the data to be evaluated;
and calculating a first distance between each expected value of the data to be evaluated and the actual value of the data to be evaluated to obtain a plurality of second distances.
7. The quality evaluation method according to claim 6, wherein the calculating an expected value of the data to be evaluated according to the data value of the subsequent data of each second reference data sequence to obtain a plurality of expected values of the data to be evaluated comprises:
establishing a plurality of linear equation sets for any second reference data sequence and a second preset length data sequence before the data to be evaluated:
Figure FDA0002314407670000031
solving the equation set (1) to obtain a and b, wherein x is a second reference data sequence, y is a second preset length data sequence before the data to be evaluated, and xiAnd xjFor the ith and jth elements, y, of the second reference data sequence xiAnd yjThe data sequence to be evaluated is the ith element and the jth element of a second preset length data sequence y before the data to be evaluated, and n is a second preset length;
the expected value of the data to be evaluated is E (y)n+1)=a*xn+1+ b, wherein, xn+1For any data subsequent to the second reference data sequence, yn+1Is the data to be evaluated.
8. The quality assessment method according to claim 6, wherein the step of using the quality label of the data after each first reference data sequence and the data after each second reference data sequence as the candidate quality label of the data to be assessed further comprises:
forming a distance set by a plurality of first distances and a plurality of second distances corresponding to the data to be evaluated, arranging all the distances in the distance set from small to large, and screening out a preset number of distances in the front of the sequence, wherein the distances are the first distances or the second distances;
taking the first reference data sequence or the second reference data sequence corresponding to each distance as a reference data sequence set;
correspondingly, taking the quality label of the latter data of each first reference data sequence and the latter data of each second reference data sequence as the candidate quality label of the data to be evaluated, and forming a candidate quality label set comprises:
and taking the quality mark of the next data of each reference data sequence in the reference data sequence set as a candidate quality mark of the data to be evaluated to form a candidate quality mark set, wherein each reference data sequence in the reference data sequence set is a first reference data sequence or a second reference data sequence.
9. The utility model provides a new forms of energy are incorporated into power networks quality evaluation system of data which characterized in that includes:
the first search module is used for searching a plurality of first reference data sequences which are completely the same as a first preset length data sequence before the data to be evaluated from the reference sample sequence based on a vector matching method;
the second searching module is used for searching a plurality of second reference data sequences, the correlation coefficient of which reaches a preset correlation coefficient, of a second preset length data sequence before the data to be evaluated from the reference sample sequence based on a linear correlation method;
the combination module is used for taking the quality marks of the data after each first reference data sequence and the data after each second reference data sequence as candidate quality marks of the data to be evaluated to form a candidate quality mark set;
a determining module, configured to determine a candidate quality marker with the highest number of times in the candidate quality marker set as a quality marker of the data to be evaluated;
wherein each data in the reference sample sequence has a corresponding mass signature.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for evaluating the quality of new energy grid connection data according to any one of claims 1 to 8 when executing the program.
CN201911271803.XA 2019-12-12 2019-12-12 Quality evaluation method and system for new energy grid-connected data Active CN111080118B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911271803.XA CN111080118B (en) 2019-12-12 2019-12-12 Quality evaluation method and system for new energy grid-connected data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911271803.XA CN111080118B (en) 2019-12-12 2019-12-12 Quality evaluation method and system for new energy grid-connected data

Publications (2)

Publication Number Publication Date
CN111080118A true CN111080118A (en) 2020-04-28
CN111080118B CN111080118B (en) 2023-09-29

Family

ID=70314102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911271803.XA Active CN111080118B (en) 2019-12-12 2019-12-12 Quality evaluation method and system for new energy grid-connected data

Country Status (1)

Country Link
CN (1) CN111080118B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137876A1 (en) * 2009-12-07 2011-06-09 Yeh Peter Zei-Chan Data quality enhancement for smart grid applications
CN103617553A (en) * 2013-10-22 2014-03-05 芜湖大学科技园发展有限公司 Comprehensive promotion system of grid data quality
CN106845763A (en) * 2016-12-13 2017-06-13 全球能源互联网研究院 A kind of electric network reliability analysis method and device
CN108333468A (en) * 2018-01-05 2018-07-27 南京邮电大学 The recognition methods of bad data and device under a kind of active power distribution network
CN108345985A (en) * 2018-01-09 2018-07-31 国网瑞盈电力科技(北京)有限公司 A kind of power distribution network Data Quality Assessment Methodology and system
CN109118384A (en) * 2018-07-16 2019-01-01 湖南优利泰克自动化系统有限公司 A kind of Wind turbines healthy early warning method
CN109308571A (en) * 2018-08-29 2019-02-05 华北电力科学研究院有限责任公司 Distribution wire route becomes relationship detection method
CN110389975A (en) * 2019-08-01 2019-10-29 中南大学 Time series early stage classification method and equipment based on shapelet

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137876A1 (en) * 2009-12-07 2011-06-09 Yeh Peter Zei-Chan Data quality enhancement for smart grid applications
CN103617553A (en) * 2013-10-22 2014-03-05 芜湖大学科技园发展有限公司 Comprehensive promotion system of grid data quality
CN106845763A (en) * 2016-12-13 2017-06-13 全球能源互联网研究院 A kind of electric network reliability analysis method and device
CN108333468A (en) * 2018-01-05 2018-07-27 南京邮电大学 The recognition methods of bad data and device under a kind of active power distribution network
CN108345985A (en) * 2018-01-09 2018-07-31 国网瑞盈电力科技(北京)有限公司 A kind of power distribution network Data Quality Assessment Methodology and system
CN109118384A (en) * 2018-07-16 2019-01-01 湖南优利泰克自动化系统有限公司 A kind of Wind turbines healthy early warning method
CN109308571A (en) * 2018-08-29 2019-02-05 华北电力科学研究院有限责任公司 Distribution wire route becomes relationship detection method
CN110389975A (en) * 2019-08-01 2019-10-29 中南大学 Time series early stage classification method and equipment based on shapelet

Also Published As

Publication number Publication date
CN111080118B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
CN110825579B (en) Server performance monitoring method and device, computer equipment and storage medium
CN109873610B (en) Photovoltaic array fault diagnosis method based on IV characteristic and depth residual error network
CN113191918B (en) Moon wind-solar power generation power scene analysis method based on time sequence generation countermeasure network
CN107679089A (en) A kind of cleaning method for electric power sensing data, device and system
CN112149887A (en) PM2.5 concentration prediction method based on data space-time characteristics
CN105471647A (en) Power communication network fault positioning method
CN112508316B (en) Self-adaptive abnormality determination method and device in real-time abnormality detection system
CN115453356A (en) Power equipment running state monitoring and analyzing method, system, terminal and medium
CN114124734B (en) Network traffic prediction method based on GCN-Transformer integration model
CN114821852A (en) Power grid defect depth identification inspection robot control system based on characteristic pyramid
CN114881286A (en) Short-time rainfall prediction method based on deep learning
CN116471196B (en) Operation and maintenance monitoring network maintenance method, system and equipment
CN113536944A (en) Distribution line inspection data identification and analysis method based on image identification
CN111080118B (en) Quality evaluation method and system for new energy grid-connected data
CN112734201A (en) Multi-equipment overall quality evaluation method based on expected failure probability
CN115618286A (en) Transformer partial discharge type identification method, system, equipment, terminal and application
CN115908051A (en) Method for determining energy storage capacity of power system
CN116151799A (en) BP neural network-based distribution line multi-working-condition fault rate rapid assessment method
CN114971062A (en) Photovoltaic power prediction method and device
CN112348700B (en) Line capacity prediction method combining SOM clustering and IFOU equation
CN110716101B (en) Power line fault positioning method and device, computer and storage medium
CN113254485A (en) Real-time data flow abnormity detection method and system
CN112561153A (en) Scenic spot crowd gathering prediction method based on model integration
CN111507495A (en) Method and device for predicting missing wind measurement data
CN117880162B (en) Communication performance test method of intelligent Internet of things electric energy meter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant