CN115099370B - Evaluation data set construction method and system for flow-type industrial production data stream - Google Patents

Evaluation data set construction method and system for flow-type industrial production data stream Download PDF

Info

Publication number
CN115099370B
CN115099370B CN202211014655.5A CN202211014655A CN115099370B CN 115099370 B CN115099370 B CN 115099370B CN 202211014655 A CN202211014655 A CN 202211014655A CN 115099370 B CN115099370 B CN 115099370B
Authority
CN
China
Prior art keywords
time
sequence
data
element data
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211014655.5A
Other languages
Chinese (zh)
Other versions
CN115099370A (en
Inventor
南玉泽
王栋
党海峰
夏建涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Quanying Technology Co ltd
Original Assignee
Beijing Quanying Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Quanying Technology Co ltd filed Critical Beijing Quanying Technology Co ltd
Priority to CN202211014655.5A priority Critical patent/CN115099370B/en
Publication of CN115099370A publication Critical patent/CN115099370A/en
Application granted granted Critical
Publication of CN115099370B publication Critical patent/CN115099370B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The invention relates to a method and a system for constructing an evaluation data set for a flow-type industrial production data stream, wherein the method comprises the following steps: s1, selecting a time sequence list
Figure DEST_PATH_IMAGE002
And L 0 (ii) a S2, adopting a distance similarity screening strategy to obtain L 0 Medium dependent variables y and L i Obtaining the similarity list by the distance similarity of the medium dependent variable y; constructing a first trend data set D1 based on the similarity list and a preset construction mode; s3, selecting a sequence X with a specified length according to historical data of the production data stream, and acquiring the period T of the sequence X by adopting an autocorrelation coefficient processing mode 3 Based on T 3 Generating a time stamp list, and constructing a second trend data set D2 according to elements in the time stamp list and a preset construction mode; s4, adopting a time sequence list
Figure DEST_PATH_IMAGE002A
Acquiring an error sequence from the trained model to be evaluated, and constructing a third trend data set D3 based on the error sequence and a preset construction mode; and S5, sampling the D1, the D2 and the D3 to obtain an evaluation data set.

Description

Evaluation data set construction method and system for flow-type industrial production data stream
Technical Field
The invention relates to the technical field of process type industrial production, in particular to a method and a system for constructing an evaluation data set for a process type industrial production data stream.
Background
For a machine learning modeling project under a flow type industrial production scene, the effectiveness evaluation of a model should run through model training, model updating and on-line operation, and whether the effectiveness evaluation of a mathematical model is reasonable is established on the basis of a correct evaluation mode and an effective evaluation data set, that is, to realize the effect evaluation of the full life cycle of the model, a reasonable evaluation data set establishment method should be provided in addition to a correct evaluation method to ensure that data used for model testing are sufficient and correct.
The existing evaluation data set is mainly constructed by a leave-out method, a cross-validation method and a self-service method. However, in a process-type industrial production scene, an evaluation data set established by the existing method cannot well reflect the real data distribution situation of the model at the use time because the evaluation data set does not have the trend and the periodic characteristics of a time sequence, the validity and the generalization capability of the model cannot be correctly reflected on the data set, and the test on the model causes distortion because the evaluation data set is invalid.
Disclosure of Invention
Technical problem to be solved
In view of the above disadvantages and shortcomings of the prior art, the present invention provides a method and a system for constructing an evaluation data set for a flow-type industrial production data stream, which solves the technical problems that the existing evaluation data set does not have the characteristics of time-series trend and periodicity, and the validity and generalization capability of a model to be evaluated cannot be correctly reflected when the evaluation data set is evaluated.
(II) technical scheme
In order to achieve the purpose, the invention adopts the main technical scheme that:
the embodiment of the invention provides a method for constructing an evaluation data set for a flow-type industrial production data stream, which comprises the following steps:
s1, aiming at time sequence data of production data stream, selecting a time sequence list [ L ] on a time axis 1 , L 2 ,L i , …,L N ]And a specified sequence L 0 ,L 0 The corresponding time length and the update period T of the model to be evaluated 0 Are identical and specify the sequence L 0 Comprising [ T-T 0 ,t ]Generating element data of a data stream in a time period, wherein t is a current timestamp;
s2, adopting a distance similarity screening strategy to obtain L 0 Medium dependent variables y and L i The distance similarity of the medium dependent variable y and a similarity list is obtained; constructing a first trend data set D1 in time series data of the production data stream based on the similarity list and a preset construction mode;
s3, selecting a sequence X with a specified length according to historical data of the production data stream, and acquiring the period T of the sequence X by adopting an autocorrelation coefficient processing mode 3 Based on said T 3 Generating a time stamp list with a preset length k2, and constructing a second trend data set D2 in the time sequence data of the production data stream according to elements in the time stamp list and a preset construction mode;
s4, adopting a time sequence list L 1 , L 2 ,L i , …,L N ]Acquiring an error sequence from the trained model to be evaluated, and constructing a third trend data set D3 in the time sequence data of the production data stream based on the error sequence and a preset construction mode;
and S5, sampling the D1, the D2 and the D3 to obtain an evaluation data set for evaluating the model to be evaluated.
Preferably, the first and second liquid crystal display panels are,
the specified sequence L 0 The method comprises the following steps: z is a radical of 01 ,z 02 ,...z 0w ...z 0n
z 0w To specify the sequence L 0 W-th element data arranged in time series;
the time series L i The method comprises the following steps: a predetermined first time interval T 1 M pieces of element data arranged in chronological order in time series data of the production data stream before the current timestamp t in the stream;
wherein L is i =[z i1 ,z i2 ,...z ij ... z im ];
z ij Is a time sequence L i The jth element data arranged in time sequence;
wherein the content of the first and second substances,
Figure 466745DEST_PATH_IMAGE001
f is a preset value;
each element data includes: a timestamp corresponding to the element data, and an independent variable and a dependent variable y corresponding to a preset model to be evaluated.
Preferably, the S2 specifically includes:
s21, adopting a distance similarity screening strategy and based on an appointed sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Separately obtaining the specified sequences L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Distance similarity between each time series in (a);
s22, based on the designated sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Obtaining a similarity list according to the distance similarity between each time sequence;
the similarity list includes: time series List [ L 1 , L 2 ,L i , …,L N ]The medium K1 maximum distance similarities respectively correspond to the time sequences;
wherein K1 is a preset value, and K1 is more than or equal to 0 and less than or equal to N/10;
and S23, acquiring a first trend data set D1 in the historical production operation time sequence data based on the similarity list and a preset construction mode.
Preferably, the S21 specifically includes:
s211, aiming at the designated sequence L 0 And a time series L i Obtaining a specified sequence L 0 And a time series L i Corresponding distance matrix D (L0,Li)
Wherein the content of the first and second substances,
Figure 3906DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 625380DEST_PATH_IMAGE003
Figure 10225DEST_PATH_IMAGE004
to specify the sequence L 0 The dependent variable y in the w-th element data arranged in time sequence;
Figure 121269DEST_PATH_IMAGE005
is a time sequence L i The dependent variable y in the jth element data arranged according to the time sequence;
s212, based on the distance matrix D (L0,Li) Using a recursion formula (1), recursion of the distance matrix D (L0,Li) Middle element d 11 To element d mn Minimum distance L therebetween min (m, n) and the minimum distance L min (m, n) as a designated sequence L 0 And a time series L i Distance similarity between them;
the formula (1) is:
Figure 958775DEST_PATH_IMAGE006
wherein L is min (w, j) is the element d in the distance matrix 11 To any element d in the distance matrix wj A minimum distance of;
wherein the content of the first and second substances,
Figure 711836DEST_PATH_IMAGE007
Figure 403718DEST_PATH_IMAGE008
Figure 888926DEST_PATH_IMAGE009
preferably, the S23 specifically includes:
s231, acquiring a first timestamp set based on the similarity list;
the first set of timestamps includes: a timestamp corresponding to the last element data in each time sequence in the similarity list;
and S232, with each timestamp in the first timestamp set as a starting point, respectively obtaining element data in the production operation time sequence data in the period T0 backwards, and obtaining a union set to obtain a first trend data set D1.
Preferably, S3 specifically includes:
s31, based on the sequence X with the specified length and p preset translation time segments, acquiring two subdata sets corresponding to the sequence X with the specified length in any preset translation time segment;
the specified length of sequence X comprises: a second time interval T 2 H element data arranged according to a time sequence in the internal historical production operation time sequence data;
wherein the second time interval T 2 The internal period is more than or equal to 15 days;
X=[z 1 ,z 2 ,...z r ...,z h ];
z r for a second time interval T 2 The r-th element data arranged according to the time sequence in the internal historical production operation time series data;
the p preset translation time slices sequentially include: t is t 1 、t 2 、...t g ...t p
Wherein p is more than or equal to 0 and less than or equal to 30;
wherein, t g The method comprises the steps of obtaining a g-th preset translation time segment in p preset translation time segments;
the two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment are respectively a first sub data set and a second sub data set of the any preset translation time segment;
the first sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 As the start time of said any one of the pre-set translation time segments, at whichElement data in a sequence X of a specified length within any preset translation time segment;
the second sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 The starting time of the sequence X is used as the starting time of any one preset translation time segment, and the starting time is divided by the element data in the sequences X with other specified lengths in any one preset translation time segment;
s32, respectively acquiring autocorrelation coefficients of two subdata sets corresponding to the sequence X with the specified length in any preset translation time segment by adopting a formula (2);
the formula (2) is:
Figure 213728DEST_PATH_IMAGE010
Figure 776339DEST_PATH_IMAGE011
for a sequence X of a given length within a predetermined translation time segment t g The autocorrelation coefficients of the corresponding two subdata sets;
Figure 870197DEST_PATH_IMAGE012
is the mean value of the dependent variable y in the element data of the sequence X with the specified length;
X r the method comprises the steps of (1) obtaining a dependent variable y in the r-th element data in a sequence X with a specified length;
Figure 57465DEST_PATH_IMAGE013
for a sequence X of a given length within a predetermined translation time segment t g A dependent variable y in the r-th element data in the corresponding first sub data set;
Figure 525355DEST_PATH_IMAGE014
for sequences of specified length X inPredetermined translation time slice t g Dependent variable y in the r-th element data in the corresponding second sub data set;
s33, determining a first time period T corresponding to the sequence X with the specified length based on the autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment 3
Wherein the first time period T 3 Is the corresponding time interval between two adjacent peaks in the first curve;
the first curve is formed by connecting autocorrelation coefficients of two sub-data sets corresponding to all obtained sequences X with specified lengths in any preset translation time segment according to the arrangement sequence of the corresponding translation time segments;
s34, every other first time period T at the current moment T 3 Acquiring element data to obtain a timestamp corresponding to the element data to obtain a timestamp list A with K2 timestamps 2
Wherein, the first and the second end of the pipe are connected with each other,
Figure 620219DEST_PATH_IMAGE015
(ii) a K2 is more than or equal to 0 and less than or equal to h/50, and h is the number of element data of the sequence X with the specified length;
s35, based on the time stamp list A 2 Determining a second trend data set D2;
the S35 specifically includes:
by time stamp list A 2 Taking each timestamp as a starting point, and respectively obtaining a first time period T backwards 3 And merging the internal element data to obtain a second trend data set D2.
Preferably, the S4 specifically includes:
s41, time sequence list [ L ] to be selected on time axis 1 , L 2 ,L i , …,L N ]Respectively inputting each element data in each time sequence into a trained model to be evaluated for prediction, and obtaining a prediction result of each element data in each time sequenceFruit;
wherein the trained model to be evaluated is previously determined by the specified sequence L 0 The element data in (1) is trained;
s42, acquiring a total error of each time sequence based on the prediction result of each element data in each time sequence and the actual dependent variable y;
s43, determining K3 time sequences with the minimum total error based on the total error of each time sequence;
wherein K3 is a preset value, and K3 is more than or equal to 0 and less than or equal to N/10;
s44, acquiring a third trend data set D3 based on the K3 time sequences with the minimum total errors;
the S44 specifically includes:
respectively taking the time stamp of the last element data in the time sequence with the minimum K3 total errors as a starting point, and respectively obtaining the time period T backwards 0 And (4) merging the element data in the production operation time sequence data to obtain a third trend data set D3.
Preferably, the S42 specifically includes:
acquiring a total error of each time series by adopting a formula (3) based on a prediction result of each element data in each time series and an actual value of a dependent variable y;
the formula (3) is:
Figure 834163DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 926753DEST_PATH_IMAGE017
is a predicted value of the dependent variable y of the element data;
y is a dependent variable y in the element data;
m is a time sequence L i The number of element data of (a);
e i is a time sequence L i Total error of (2).
Preferably, the S5 specifically includes:
s51, aiming at the first trend data set D1, the second trend data set D2 and the third trend data set D3, respectively according to a first proportion w 1 A second proportion w 2 A third proportion w 3 Sampling is carried out, and a first trend sampling set D1, a second trend sampling set D2 and a third trend sampling set D3 are obtained correspondingly;
and S52, taking a union set of the first trend sample set D1, the second trend sample set D2 and the third trend sample set D3 and performing deduplication processing to obtain a final evaluation data set D.
On the other hand, the embodiment further provides an evaluation data set constructing system for the process-type industrial production data stream, which includes:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and the processor calls the program instructions to perform any of the above described profiling data set constructing methods for flowsheet industrial production data streams.
(III) advantageous effects
The invention has the beneficial effects that: the invention relates to a method and a system for constructing an evaluation data set facing a flow type industrial production data stream, wherein a first trend data set D1 is constructed by adopting a distance similarity screening strategy starting from a change rule of a time sequence; obtaining the period T of the sequence X by utilizing an autocorrelation coefficient processing mode 3 Through a period T 3 Further constructing a second trend data set D2 and a third trend data set D3 through the trained model to be evaluated; and finally, the data sets with different characteristics are included in the evaluation data set of the model to be evaluated, so that the constructed evaluation data set can better reflect the prediction accuracy and generalization capability of the model at a specific use moment.
Drawings
FIG. 1 is a flow chart of an evaluation data set construction method for a flow-type industrial production data flow according to the present invention;
FIG. 2 is a schematic diagram of error distribution of a model to be evaluated on time series data of an actual production data stream;
fig. 3 is a schematic diagram of error distribution of a model to be evaluated on an evaluation data set constructed by the evaluation data set construction method for a flow-type industrial production data flow in this embodiment;
fig. 4 is a schematic diagram of error distribution of a model to be evaluated on an evaluation data set constructed by a conventional outflow method.
Detailed Description
For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.
In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Example one
Referring to fig. 1, the embodiment provides a method for constructing an evaluation data set for a flow-type industrial production data stream, including:
s1, aiming at time sequence data of production data stream, selecting a time sequence list [ L ] on a time axis 1 , L 2 ,L i , …,L N ]And a specified sequence L 0 ,L 0 The corresponding time length and the update period T of the model to be evaluated 0 Are identical and specify the sequence L 0 Comprising [ T-T ] 0 ,t ]And generating element data of the data stream in the time period, wherein t is the current time stamp.
In practical application of this embodiment, the specified sequence L 0 The method comprises the following steps: z is a radical of 01 ,z 02 ,...z 0w ...z 0n
z 0w To specify the sequence L 0 The w-th element data arranged in time series.
The time series L i The method comprises the following steps: a predetermined first time interval T 1 M pieces of element data arranged in chronological order in time series data of the production data stream before the current time stamp t in the inner.
Wherein L is i =[z i1 ,z i2 ,...z ij ... z im ]。
z ij Is a time sequence L i In chronological orderjThe individual element data.
Wherein, the first and the second end of the pipe are connected with each other,
Figure 944256DEST_PATH_IMAGE001
and F is a preset value.
Each element data includes: the time stamp corresponding to the element data, and the independent variable and the dependent variable y corresponding to the preset model to be evaluated.
S2, adopting a distance similarity screening strategy to obtain L 0 Medium dependent variables y and L i The distance similarity of the medium dependent variable y and a similarity list is obtained; and constructing a first trend data set D1 in the time sequence data of the production data stream based on the similarity list and a preset construction mode.
The S2 specifically comprises the following steps:
s21, adopting a distance similarity screening strategy and based on an appointed sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Separately obtaining the specified sequences L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]The distance similarity between each time series.
Specifically, the S21 specifically includes:
s211, aiming at the designated sequence L 0 And a time series L i Obtaining a specified sequence L 0 And a time series L i Corresponding distance matrix D (L0,Li)
Wherein, the first and the second end of the pipe are connected with each other,
Figure 62385DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 114523DEST_PATH_IMAGE003
Figure 909173DEST_PATH_IMAGE004
to specify the sequence L 0 In chronological orderwDependent variable y in individual element data.
Figure 103654DEST_PATH_IMAGE005
Is a time sequence L i In chronological orderjDependent variable y in individual element data.
S212, based on the distance matrix D (L0,Li) Using a recursion formula (1), recursion of the distance matrix D (L0,Li) Middle element d 11 To element d mn Minimum distance L therebetween min (m, n) and the minimum distance L min (m, n) as a designated sequence L 0 And a time series L i The distance similarity between them.
The formula (1) is:
Figure 822212DEST_PATH_IMAGE006
wherein L is min (w, j) is the element d in the distance matrix 11 To any element d in the distance matrix wj The minimum distance of (c).
Wherein the content of the first and second substances,
Figure 994436DEST_PATH_IMAGE007
Figure 241878DEST_PATH_IMAGE008
Figure 765132DEST_PATH_IMAGE009
s22, based on the designated sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]The distance similarity between each time series in the time series is obtained, and a similarity list is obtained.
The similarity list includes: time series List [ L 1 , L 2 ,L i , …,L N ]And the K1 maximum distance similarities in the time series respectively correspond to the time series.
Wherein K1 is a preset value, and K1 is more than or equal to 0 and less than or equal to N/10.
And S23, acquiring a first trend data set D1 in the historical production operation time sequence data based on the similarity list and a preset construction mode.
Specifically, the S23 specifically includes:
s231, acquiring a first timestamp set based on the similarity list.
The first set of timestamps includes: and the time stamp corresponding to the last element data in each time sequence in the similarity list.
S232, taking each timestamp in the first timestamp set as a starting point, and respectively obtaining a period T backwards 0 And (4) merging the element data in the time sequence data of the production operation to obtain a first trend data set D1.
In the embodiment, the similarity of two time series with different lengths is quantified, and the periodic data set is combined to better accord with the characteristics of the time series data.
S3, selecting a sequence X with a specified length according to historical data of the production data stream, and acquiring the period T of the sequence X by adopting an autocorrelation coefficient processing mode 3 Based on said T 3 And generating a time stamp list with a preset length k2, and constructing a second trend data set D2 in the time sequence data of the production data stream according to elements in the time stamp list and a preset construction mode.
In practical application of this embodiment, S3 specifically includes:
s31, based on the sequence X with the specified length and p preset translation time slices, two sub data sets corresponding to the sequence X with the specified length in any preset translation time slice are obtained.
The sequence X of the specified length comprises: a second time interval T 2 Arranged in chronological order in time series data of internal historical production runshThe individual element data.
Wherein the second time interval T 2 The internal period is more than or equal to 15 days.
X=[z 1 ,z 2 ,...z r ...,z h ]。
z r For a second time interval T 2 The r-th element data in the time sequence of the internal historical production operation time sequence data.
The p preset translation time slices sequentially include: t is t 1 、t 2 、...t g ...t p
Wherein p is more than or equal to 0 and less than or equal to 30.
Wherein, t g For the g-th predetermined panning time interval of the p predetermined panning time intervals.
Two sub data sets corresponding to any preset translation time segment of the sequence X with the specified length are respectively a first sub data set of the any preset translation time segment and a second sub data set of the any preset translation time segment.
The first sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 Is used as the start time of any one of the preset translation time segments, and the element data in the sequence X with the specified length in any one of the preset translation time segments.
The second sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 Is taken as the start time of any one of the preset translation time segments, and is divided by the element data in the sequence X with other specified length in any one of the preset translation time segments.
And S32, respectively acquiring autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment by adopting a formula (2).
The formula (2) is:
Figure 490642DEST_PATH_IMAGE010
Figure 986215DEST_PATH_IMAGE011
for a sequence X of a given length within a predetermined translation time segment t g The autocorrelation coefficients of the corresponding two subdata sets.
Figure 857087DEST_PATH_IMAGE012
Is the mean of the dependent variable y in the element data of a sequence X of a specified length.
X r Is the dependent variable y in the r-th element data in the sequence X with the specified length.
Figure 87212DEST_PATH_IMAGE013
For a sequence X of a given length within a predetermined translation time segment t g And the dependent variable y in the r-th element data in the corresponding first sub data set.
Figure 865681DEST_PATH_IMAGE014
For a sequence X of a given length within a predetermined translation time segment t g And the dependent variable y in the r-th element data in the corresponding second sub data set.
S33, determining a first time period T corresponding to the sequence X with the specified length based on the autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment 3
Wherein the first time weekPeriod T 3 Is the corresponding time interval between two adjacent peaks in the first curve.
The first curve is formed by connecting autocorrelation coefficients of two sub data sets corresponding to all obtained sequences X with specified lengths in any preset translation time segment according to the arrangement sequence of the corresponding translation time segments.
S34, every other first time period T at the current moment T 3 Acquiring element data to obtain a timestamp corresponding to the element data to obtain a timestamp list A with K2 timestamps 2
Wherein, the first and the second end of the pipe are connected with each other,
Figure 481339DEST_PATH_IMAGE015
(ii) a K2 is more than or equal to 0 and less than or equal to h/50, and h is the number of element data of the sequence X with the specified length.
S35, based on the time stamp list A 2 A second trend data set D2 is determined.
The S35 specifically includes:
listing by time stamp A 2 Taking each timestamp as a starting point, and respectively obtaining a first time period T backwards 3 And merging the internal element data to obtain a second trend data set D2.
S4, adopting a time sequence list L 1 , L 2 ,L i , …,L N ]And training a model to be evaluated, acquiring an error sequence, and constructing a third trend data set D3 in the time sequence data of the production data stream based on the error sequence and a preset construction mode.
The S4 specifically includes:
s41, list of time series [ L ] to be selected on time axis 1 , L 2 ,L i , …,L N ]Inputting each element data in each time sequence into a trained model to be evaluated for prediction, and obtaining a prediction result of each element data in each time sequence.
Wherein the trained model to be evaluated is previously determined by the specified sequence L 0 Element data training inIn (1).
And S42, acquiring the total error of each time series based on the prediction result of each element data in each time series and the actual dependent variable y.
The S42 specifically includes:
the total error of each time series is obtained by using formula (3) based on the prediction result of each element data in each time series and the value of the actual dependent variable y.
The formula (3) is:
Figure 257534DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 974954DEST_PATH_IMAGE018
is a predicted value of the dependent variable y of the element data.
And y is a dependent variable y in the element data.
m is a time sequence L i Number of element data of (1).
e i Is a time sequence L i Total error of (2).
And S43, determining the time sequence with the minimum K3 total errors based on the total error of each time sequence.
Wherein K3 is a preset value, and K3 is more than or equal to 0 and less than or equal to N/10.
And S44, acquiring a third trend data set D3 based on the time sequence with the minimum K3 total errors.
The S44 specifically includes:
respectively taking the time stamp of the last element data in the time sequence with the minimum K3 total errors as a starting point, and respectively obtaining the time period T backwards 0 And (4) merging the element data in the production operation time sequence data to obtain a third trend data set D3.
In the embodiment, the historical approximate running state is selected based on the model to be evaluated, and the prediction capability of the model to be evaluated under the real use scene can be better reflected through the error selection data set obtained by the model to be evaluated.
And S5, sampling the D1, the D2 and the D3 to obtain an evaluation data set for evaluating the model to be evaluated.
Specifically, the S5 specifically includes:
s51, aiming at the first trend data set D1, the second trend data set D2 and the third trend data set D3, respectively according to a first proportion w 1 A second proportion w 2 A third proportion w 3 And sampling to obtain a first trend sample set D1, a second trend sample set D2 and a third trend sample set D3.
And S52, merging the first trend sample set D1, the second trend sample set D2 and the third trend sample set D3, and performing deduplication processing to obtain a final evaluation data set D.
In this embodiment, the data sets with different characteristics (the first trend sample set D1, the second trend sample set D2, and the third trend sample set D3) are included in the evaluation data set, so that the prediction capability and the generalization capability of the evaluation data set can be considered, and the evaluation of the model to be evaluated is more objective and comprehensive.
According to the method for constructing the evaluation data set facing the flow-type industrial production data stream, a first trend data set D1 is constructed by adopting a distance similarity screening strategy from a change rule of a time sequence; obtaining the period T of the sequence X by utilizing an autocorrelation coefficient processing mode 3 Through a period T 3 Further constructing a second trend data set D2 and a third trend data set D3 through the trained model to be evaluated; and finally, the data sets with different characteristics are included in the evaluation data set of the model to be evaluated, so that the constructed evaluation data set can better reflect the prediction accuracy and generalization capability of the model at a specific use moment.
Finally, in the practical application of the embodiment, the final evaluation data set D is input into the model to be evaluated, so as to obtain a corresponding more accurate evaluation result.
Example two
In order to better understand the scheme of the embodiment of the present invention, the steps of the embodiment of the present invention are described in detail below.
The embodiment provides an evaluation data set construction method for a flow-type industrial production data stream, which comprises the following steps:
101. selecting a time sequence list [ L ] on a time axis for time series data of a production data stream 1 , L 2 ,L i , …,L N ]And a specified sequence L 0 ,L 0 The corresponding time length and the update period T of the model to be evaluated 0 Are identical and specify the sequence L 0 Comprising [ T-T ] 0 ,t ]Generating element data of a data stream in a time period, wherein t is a current timestamp; that is to say specifying the sequence L 0 And the real operation data closest to the model to be evaluated on the time axis.
In practical application of this embodiment, the specified sequence L 0 The method comprises the following steps: z is a radical of 01 ,z 02 ,...z 0w ...z 0n
z 0w To specify the sequence L 0 W-th element data arranged in time series.
The time series L i The method comprises the following steps: a predetermined first time interval T 1 M pieces of element data arranged in chronological order in time series data of the production data stream before the current time stamp t in the stream.
Wherein L is i =[z i1 ,z i2 ,...z ij ... z im ]。
z ij Is a time sequence L i In chronological orderjThe individual element data.
Wherein the content of the first and second substances,
Figure 297394DEST_PATH_IMAGE001
and F is a preset value.
Each element data includes: a timestamp corresponding to the element data, and an independent variable and a dependent variable y corresponding to a preset model to be evaluated.
For example, in a specific application of this embodiment, table 1 is a table of the model to be evaluated with the main steam production as a dependent variableShould specify the sequence L 0 The independent variables comprise the amount of coal entering the furnace, the total air volume entering the furnace and the oxygen content of exhaust gas:
TABLE 1
Time stamp Amount of coal charged into the furnace Total air flow into furnace Oxygen content of exhaust gas Main steam production
1647050100000 15.721 76.5558 2.1969 171.1258
1647050106000 15.5656 75.834 2.4192 171.2354
1647050112000 15.4694 76.1088 2.4407 171.2512
1647050118000 16.0389 76.8488 2.362 172.0235
1647050124000 16.027 76.0773 2.4075 172.1235
1647050130000 16.7334 76.1407 2.1873 172.3258
1647050136000 16.993 76.2791 2.1918 173.3984
1647050142000 17.3402 76.0581 2.2747 173.9423
1647050148000 17.3856 77.4178 2.5426 173.4521
1647050154000 17.6544 77.3188 2.3211 174.2144
1647050160000 17.0855 77.5035 2.2665 174.6845
1647050166000 17.6559 77.9596 2.3259 174.9545
1647050172000 17.7959 79.6381 2.2277 175.3632
1647050178000 18.0086 79.8114 2.2929 175.8852
1647050184000 18.1417 79.6916 2.3011 175.8567
102. For a given sequence L 0 And a time series L i Obtaining a specified sequence L 0 And a time series L i Corresponding distance matrix D (L0,Li)
Wherein the content of the first and second substances,
Figure 315029DEST_PATH_IMAGE002
wherein, the first and the second end of the pipe are connected with each other,
Figure 58863DEST_PATH_IMAGE003
Figure 263579DEST_PATH_IMAGE004
to specify the sequence L 0 In chronological orderwDependent variable y in individual element data.
Figure 852692DEST_PATH_IMAGE005
Is a time sequence L i In chronological orderjDependent variable y in the individual element data.
103. Based on the distance matrix D (L0,Li) Using a recursion formula (1), recursion of the distance matrix D (L0,Li) Middle element d 11 To element d mn A minimum distance L therebetween min (m, n) and calculating the minimum distance L min (m, n) as a designated sequence L 0 And a time series L i The distance similarity between them.
The formula (1) is:
Figure 442942DEST_PATH_IMAGE006
wherein L is min (w, j) is the element d in the distance matrix 11 To any element d in the distance matrix wj The minimum distance of (c).
Wherein the content of the first and second substances,
Figure 92098DEST_PATH_IMAGE007
Figure 49690DEST_PATH_IMAGE008
Figure 708073DEST_PATH_IMAGE009
104. based on a given sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]The distance similarity between each time series in the time series is obtained, and a similarity list is obtained.
The similarity list includes: time series List [ L 1 , L 2 ,L i , …,L N ]And the K1 maximum distance similarities in the time series respectively correspond to the time series.
Wherein K1 is a preset value, and K1 is more than or equal to 0 and less than or equal to N/10.
105. And acquiring a first timestamp set based on the similarity list.
The first set of timestamps includes: and the time stamp corresponding to the last element data in each time sequence in the similarity list.
106. Respectively backward acquiring a period T by taking each timestamp in the first timestamp set as a starting point 0 And taking and collecting element data in the production running time sequence data to obtain a first trend data set D1.
107. Based on the sequence X with the specified length and p preset translation time segments, two subdata sets corresponding to the sequence X with the specified length in any preset translation time segment are obtained.
The sequence X of the specified length comprises: a second time interval T 2 H element data arranged according to the time sequence in the internal historical production operation time sequence data.
Wherein the second time interval T 2 The internal period is more than or equal to 15 days.
X=[z 1 ,z 2 ,...z r ...,z h ]。
z r For a second time interval T 2 The r-th element data arranged in time sequence in the internal historical production operation time series data.
The p preset translation time slices sequentially include: t is t 1 、t 2 、...t g ...t p
Wherein p is more than or equal to 0 and less than or equal to 30.
Wherein, t g For the g-th predetermined panning time period of the p predetermined panning time periods.
Two sub data sets corresponding to any preset translation time segment of the sequence X with the specified length are respectively a first sub data set of the any preset translation time segment and a second sub data set of the any preset translation time segment.
The first sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 Is taken as the start time of any one of the preset translation time segments, and the element data in the sequence X with the specified length in any one of the preset translation time segments.
The second sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 Is taken as the start time of any one of the preset translation time segments, and is divided by the element data in the sequence X with other specified length in any one of the preset translation time segments.
108. And (3) respectively acquiring autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment by adopting a formula (2).
The formula (2) is:
Figure 434721DEST_PATH_IMAGE010
Figure 67828DEST_PATH_IMAGE011
for a sequence X of a given length within a predetermined translation time segment t g The autocorrelation coefficients of the corresponding two subdata sets.
Figure 761983DEST_PATH_IMAGE012
Is the mean of the dependent variable y in the element data of a sequence X of a specified length.
X r Is the dependent variable y in the r-th element data in the sequence X with the specified length.
Figure 505948DEST_PATH_IMAGE020
For a sequence X of a given length within a predetermined translation time segment t g And the dependent variable y in the r-th element data in the corresponding first sub data set.
Figure 618261DEST_PATH_IMAGE022
For a sequence X of a given length within a predetermined translation time segment t g And the dependent variable y in the r-th element data in the corresponding second sub data set.
S33, determining a first time period T corresponding to the sequence X with the specified length based on the autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment 3
Wherein the first time period T 3 Is the corresponding time interval between two adjacent peaks in the first curve.
The first curve is formed by connecting autocorrelation coefficients of two sub data sets corresponding to all obtained sequences X with specified lengths in any preset translation time segment according to the arrangement sequence of the corresponding translation time segments.
109. At the current moment T, every other first time period T 3 Acquiring element data to obtain a timestamp corresponding to the element data to obtain a timestamp list A with K2 timestamps 2
Wherein the content of the first and second substances,
Figure 422269DEST_PATH_IMAGE015
(ii) a K2 is more than or equal to 0 and less than or equal to h/50, and h is the number of element data of the sequence X with the specified length.
110. Based on the time stamp list A 2 A second trend data set D2 is determined.
The 110 specifically includes:
by time stamp list A 2 Taking each timestamp as a starting point, and respectively obtaining a first time period T backwards 3 And merging the internal element data to obtain a second trend data set D2.
111. List of time sequences [ L ] to be selected on time axis 1 , L 2 ,L i , …,L N ]Inputting each element data in each time sequence into a trained model to be evaluated for prediction, and obtaining a prediction result of each element data in each time sequence.
Wherein the trained model to be evaluated is previously determined by the specified sequence L 0 Is trained.
112. And acquiring the total error of each time series based on the prediction result of each element data in each time series and the actual dependent variable y.
The 112 specifically includes:
the total error of each time series is obtained by using formula (3) based on the prediction result of each element data in each time series and the value of the actual dependent variable y.
The formula (3) is:
Figure 869299DEST_PATH_IMAGE023
wherein, the first and the second end of the pipe are connected with each other,
Figure 151376DEST_PATH_IMAGE018
is a predicted value of the dependent variable y of the element data.
And y is a dependent variable y in the element data.
m is a time sequence L i Number of element data of (1).
e i Is a time sequence L i The total error of (c).
113. And determining the K3 time sequences with the minimum total error based on the total error of each time sequence.
Wherein K3 is a preset value, and K3 is more than or equal to 0 and less than or equal to N/10.
114. A third trend data set D3 is obtained based on the time series for which the K3 total errors are minimal.
The 114 specifically includes:
respectively taking the time stamp of the last element data in the time sequence with the minimum K3 total errors as a starting point, and respectively obtaining a time period T backwards 0 And (4) merging the element data in the production operation time sequence data to obtain a third trend data set D3.
115. And sampling the D1, the D2 and the D3 to obtain an evaluation data set for evaluating the model to be evaluated.
Specifically, the 115 specifically includes:
aiming at the first trend data set D1, the second trend data set D2 and the third trend data set D3, respectively, according to a first proportion w 1 A second proportion w 2 A third proportion w 3 And sampling to obtain a first trend sample set D1, a second trend sample set D2 and a third trend sample set D3.
And merging the first trend sample set D1, the second trend sample set D2 and the third trend sample set D3, and performing deduplication processing to obtain a final evaluation data set D.
According to the method for constructing the evaluation data set facing the flow-type industrial production data stream, a first trend data set D1 is constructed by adopting a distance similarity screening strategy from a change rule of a time sequence; obtaining the period T of the sequence X by utilizing an autocorrelation coefficient processing mode 3 Through a period T 3 Further constructing a second trend data set D2 and a third trend data set D3 through the trained model to be evaluated; and finally, the data sets with different characteristics are included in the evaluation data set of the model to be evaluated, so that the constructed evaluation data set can better reflect the prediction accuracy and generalization capability of the model at a specific use moment.
Finally, in the practical application of the embodiment, the final evaluation data set D is input into the model to be evaluated, so as to obtain a corresponding more accurate evaluation result.
As can be seen from comparison among fig. 2, fig. 3, and fig. 4, the error distribution of the model to be evaluated on the evaluation data set constructed by the existing leave-out method is significantly greater than the error distribution of the model to be evaluated on the time series data of the actual production data stream and the error distribution of the model to be evaluated on the evaluation data set constructed by the evaluation data set construction method facing the flow industrial production data stream in this embodiment, and the error distribution of the model to be evaluated on the evaluation data set constructed by the evaluation data set construction method facing the flow industrial production data stream in this embodiment is within ± 3 and has an average value of 0, which indicates that the evaluation data set constructed by the evaluation data set construction method facing the flow industrial production data stream in this embodiment is closer to the actual operation data, i.e., the prediction performance of the model to be evaluated in the actual use can be better reflected.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. are used for convenience only and do not denote any order. These words are to be understood as part of the name of the component.
Furthermore, it should be noted that in the description of the present specification, the description of the term "one embodiment", "some embodiments", "examples", "specific examples" or "some examples", etc., means that a specific feature, structure, material or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the claims should be construed to include preferred embodiments and all changes and modifications that fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention should also include such modifications and variations.

Claims (8)

1. A method for constructing an evaluation data set oriented to a flow type industrial production data flow is characterized by comprising the following steps:
s1, aiming at time sequence data of production data stream, selecting a time sequence list [ L ] on a time axis 1 , L 2 ,L i ,…,L N ]And a specified sequence L 0 ,L 0 The corresponding time length and the update period T of the model to be evaluated 0 Are identical and specify the sequence L 0 Comprising [ T-T 0 ,t ]Generating element data of a data stream in a time period, wherein t is a current timestamp;
s2, aiming at the specified sequence L 0 And a time series L i Obtaining the specified sequence L 0 And a time series L i Corresponding distance matrix D (L0,Li) And based on said distance matrix D (L0,Li) Recursion formula is adopted to recur the distance matrix D (L0,Li) Middle element d 11 To element d mn A minimum distance L therebetween min (m, n) and the minimum distance L min (m, n) as a designated sequence L 0 And a time series L i Distance similarity therebetween; based on a given sequence L 0 And time series List [ L 1 , L 2 ,L i ,…,L N ]The distance similarity between each time series in the time series list L is obtained 1 , L 2 ,L i ,…,L N ]A similarity list of time sequences corresponding to the medium K1 maximum distance similarities respectively; based on the similarity list, acquiring a first timestamp set including timestamps corresponding to the last element data in each time sequence in the similarity list; taking each timestamp in the first timestamp set as a starting point, and respectively obtaining a period T backwards 0 Element data in the production running time sequence data are merged to obtain a first trend data set D1; wherein K1 is a preset value, and K1 is more than or equal to 0 and less than or equal to N/10;
s3, based on the sequence X with the specified length and p preset translation time slices, acquiring two subdata sets corresponding to the sequence X with the specified length in any preset translation time slice; respectively acquiring autocorrelation coefficients of two subdata sets corresponding to a sequence X with a specified length in any preset translation time segment; determining a first time period T corresponding to a sequence X with a specified length based on autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment 3 (ii) a At the current moment T, every other first time period T 3 Acquiring element data to obtain a timestamp corresponding to the element data to obtain a timestamp list A with K2 timestamps 2 (ii) a By time stamp list A 2 Taking each timestamp as a starting point, and respectively obtaining a first time period T backwards 3 Merging the internal element data to obtain a second trend data set D2; wherein K2 is more than or equal to 0 and less than or equal to h/50, and h is the number of element data of the sequence X with the specified length;
the sequence X of the specified length comprises: a second time interval T 2 H element data arranged according to a time sequence in the internal historical production operation time sequence data; wherein the second time interval T 2 The internal period is more than or equal to 15 days; x = [ z ] 1 ,z 2 ,...z r ...,z h ];z r For a second time interval T 2 The r-th element data are arranged in time sequence in the internal historical production operation time sequence data; the p preset translation time slices sequentially include: t is t 1 、t 2 、...t g ...t p (ii) a Wherein p is more than or equal to 0 and less than or equal to 30; wherein, t g The method comprises the steps of setting a g-th preset translation time slice in p preset translation time slices; wherein, the two subdata sets corresponding to the sequence X with the specified length in any preset translation time segment are respectively the first subdata set and the first subdata set of the any preset translation time segmentA second sub data set of the fixed translation time slice; the first sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 The start time of the translation time segment is used as the start time of any one preset translation time segment, and the element data in the sequence X with the specified length in any one preset translation time segment is used as the element data in the sequence X with the specified length; the second sub data set of any of the predetermined translation time slices includes: at said second time interval T 2 The starting time of the translation time segment is used as the starting time of any one preset translation time segment, and the element data in the sequence X with other specified lengths in the translation time segment are divided;
s4, time sequence list [ L ] selected on time axis 1 , L 2 ,L i ,…,L N ]Inputting each element data in each time sequence into a trained model to be evaluated for prediction, and obtaining a prediction result of each element data in each time sequence; acquiring a total error of each time sequence based on a prediction result of each element data in each time sequence and an actual dependent variable y; determining K3 time sequences with the minimum total error based on the total error of each time sequence; respectively taking the time stamp of the last element data in the time sequence with the minimum K3 total errors as a starting point, and respectively obtaining the time period T backwards 0 Element data in the production running time sequence data are merged to obtain a third trend data set D3; wherein K3 is a preset value, and K3 is more than or equal to 0 and less than or equal to N/10;
s5, aiming at the first trend data set D1, the second trend data set D2 and the third trend data set D3, respectively according to a first proportion w 1 A second proportion w 2 A third proportion w 3 Sampling is carried out, and a first trend sampling set D1, a second trend sampling set D2 and a third trend sampling set D3 are obtained correspondingly; and (3) merging the first trend sample set D1, the second trend sample set D2 and the third trend sample set D3, and performing deduplication processing to obtain a final evaluation data set D.
2. The method of claim 1,
the specified sequence L 0 The method comprises the following steps: z is a radical of formula 01 ,z 02 ,...z 0w ...z 0n
z 0w To specify the sequence L 0 W-th element data arranged in time order;
the time series L i The method comprises the following steps: a predetermined first time interval T 1 M pieces of element data arranged in chronological order in time series data of the production data stream before the current timestamp t in the stream;
wherein L is i =[z i1 ,z i2 ,...z ij ... z im ];
z ij Is a time sequence L i In chronological orderjAn individual element data;
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE001
f is a preset value;
each element data includes: a timestamp corresponding to the element data, and an independent variable and a dependent variable y corresponding to a preset model to be evaluated.
3. The method of claim 2,
wherein the content of the first and second substances,
Figure 768475DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE003
Figure 623298DEST_PATH_IMAGE004
to specify the sequence L 0 In chronological orderwIndividual element dataThe dependent variable y in (1);
Figure DEST_PATH_IMAGE005
is a time sequence L i In chronological orderjDependent variable y in individual element data.
4. The method of claim 3,
the recurrence formula is:
Figure 357030DEST_PATH_IMAGE006
wherein L is min (w, j) is the element d in the distance matrix 11 To any element d in the distance matrix wj The minimum distance of (a);
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE007
Figure 689923DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
5. the method of claim 4,
the formula for respectively obtaining the autocorrelation coefficients of two subdata sets corresponding to the sequence X with the specified length in any preset translation time slice is as follows:
Figure 956825DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE011
for a sequence X of a given length within a predetermined translation time segment t g The autocorrelation coefficients of the corresponding two subdata sets;
Figure 982550DEST_PATH_IMAGE012
is the mean value of the dependent variable y in the element data of the sequence X with the specified length;
X r the method comprises the steps of (1) obtaining a dependent variable y in the r-th element data in a sequence X with a specified length;
Figure DEST_PATH_IMAGE013
for a sequence X of a given length within a predetermined translation time segment t g Dependent variable y in the r-th element data in the corresponding first sub data set;
Figure 452845DEST_PATH_IMAGE014
for a sequence X of a given length within a predetermined translation time segment t g And the dependent variable y in the r-th element data in the corresponding second sub data set.
6. The method of claim 5,
wherein the first time period T 3 Is the corresponding time interval between two adjacent wave crests in the first curve;
the first curve is formed by connecting autocorrelation coefficients of two sub-data sets corresponding to all obtained sequences X with specified lengths in any preset translation time segment according to the arrangement sequence of the corresponding translation time segments;
at the current moment T, every other first time period T 3 Acquiring element data to obtain a timestamp corresponding to the element data to obtain a timestamp list A with K2 timestamps 2
Wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE015
7. the method of claim 6,
the obtaining of the total error of each time series based on the prediction result of each element data in each time series and the actual dependent variable y specifically includes: acquiring the total error of each time sequence by adopting a formula (3);
the formula (3) is:
Figure 340161DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE017
is a predicted value of the dependent variable y of the element data;
y is a dependent variable y in the element data;
m is a time sequence L i The number of element data of (a);
e i is a time sequence L i Total error of (2).
8. An evaluation data set construction system oriented to a flow-type industrial production data stream is characterized by comprising the following steps:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and the processor invokes the program instructions to perform the method for profiling dataset construction for flowsheet industrial process data streams according to any of claims 1-7.
CN202211014655.5A 2022-08-23 2022-08-23 Evaluation data set construction method and system for flow-type industrial production data stream Active CN115099370B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211014655.5A CN115099370B (en) 2022-08-23 2022-08-23 Evaluation data set construction method and system for flow-type industrial production data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211014655.5A CN115099370B (en) 2022-08-23 2022-08-23 Evaluation data set construction method and system for flow-type industrial production data stream

Publications (2)

Publication Number Publication Date
CN115099370A CN115099370A (en) 2022-09-23
CN115099370B true CN115099370B (en) 2022-12-02

Family

ID=83301684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211014655.5A Active CN115099370B (en) 2022-08-23 2022-08-23 Evaluation data set construction method and system for flow-type industrial production data stream

Country Status (1)

Country Link
CN (1) CN115099370B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110445629A (en) * 2018-05-03 2019-11-12 佛山市顺德区美的电热电器制造有限公司 A kind of server concurrency prediction technique and device
CN113033668A (en) * 2021-03-29 2021-06-25 中国人民解放军92859部队 LS-SVM algorithm depth sounding training sample thinning method based on sample Euclidean distance

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2354330B1 (en) * 2009-04-23 2012-01-30 Universitat Pompeu Fabra METHOD FOR CALCULATING MEASUREMENT MEASURES BETWEEN TEMPORARY SIGNS.
CN109711755A (en) * 2019-01-23 2019-05-03 华南理工大学 Short-term power load prediction model establishment method based on EMD-VMD-PSO-LSSVM
US11887015B2 (en) * 2019-09-13 2024-01-30 Oracle International Corporation Automatically-generated labels for time series data and numerical lists to use in analytic and machine learning systems
JP7319545B2 (en) * 2019-11-28 2023-08-02 富士通株式会社 Judgment processing program, judgment processing method and judgment processing device
CN113051442A (en) * 2019-12-26 2021-06-29 中国电信股份有限公司 Time series data processing method, device and computer readable storage medium
CN111310981B (en) * 2020-01-20 2022-07-19 浙江工业大学 Reservoir water level trend prediction method based on time series
CN112232447B (en) * 2020-12-14 2021-06-04 国网江西省电力有限公司电力科学研究院 Construction method of complete sample set of power equipment state monitoring data
CN112926633A (en) * 2021-02-01 2021-06-08 长江慧控科技(武汉)有限公司 Abnormal energy consumption detection method, device, equipment and storage medium
CN114528097A (en) * 2022-01-25 2022-05-24 华南理工大学 Cloud platform service load prediction method based on time sequence convolution neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110445629A (en) * 2018-05-03 2019-11-12 佛山市顺德区美的电热电器制造有限公司 A kind of server concurrency prediction technique and device
CN113033668A (en) * 2021-03-29 2021-06-25 中国人民解放军92859部队 LS-SVM algorithm depth sounding training sample thinning method based on sample Euclidean distance

Also Published As

Publication number Publication date
CN115099370A (en) 2022-09-23

Similar Documents

Publication Publication Date Title
Dibaeinia et al. SERGIO: a single-cell expression simulator guided by gene regulatory networks
CN112101480B (en) Multivariate clustering and fused time sequence combined prediction method
CN110674604A (en) Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM
CN104461842A (en) Log similarity based failure processing method and device
CN110083910B (en) NSGA-II based chaotic time sequence prediction sample acquisition method
Zhang et al. Inference of high-resolution trajectories in single-cell RNA-seq data by using RNA velocity
CN110442911B (en) High-dimensional complex system uncertainty analysis method based on statistical machine learning
CN111178623A (en) Business process remaining time prediction method based on multilayer machine learning
CN116340726A (en) Energy economy big data cleaning method, system, equipment and storage medium
CN115099370B (en) Evaluation data set construction method and system for flow-type industrial production data stream
Alrobaie et al. A Review of Data-Driven Approaches for Measurement and Verification Analysis of Building Energy Retrofits
Luo et al. Knot calculation for spline fitting based on the unimodality property
CN111915489A (en) Image redirection method based on supervised deep network learning
JP2021033895A (en) Variable selection method, variable selection program, and variable selection system
CN116307206A (en) Natural gas flow prediction method based on segmented graph convolution and time attention mechanism
Zhang et al. Collaborated online change-point detection in sparse time series for online advertising
CN110569966A (en) Data processing method and device and electronic equipment
CN115410642A (en) Biological relation network information modeling method and system
Wang et al. Stacking Based LightGBM-CatBoost-RandomForest Algorithm and Its Application in Big Data Modeling
CN107133292A (en) Object recommendation method and system
CN113539359A (en) Neural induction matrix supplementation-based map convolution network disease related lncRNA gene prediction method
CN111325384A (en) NDVI prediction method combining statistical characteristics and convolutional neural network model
CN117437973B (en) Single cell transcriptome sequencing data interpolation method
Blank Iteratively Weighted Least Squares as an Alternative Frontier Methodology, Applied to the Local Administrative Public Services Industry
CN117079667B (en) Scene classification method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant