CN115099370A - Evaluation data set construction method and system for flow type industrial production data flow - Google Patents

Evaluation data set construction method and system for flow type industrial production data flow Download PDF

Info

Publication number
CN115099370A
CN115099370A CN202211014655.5A CN202211014655A CN115099370A CN 115099370 A CN115099370 A CN 115099370A CN 202211014655 A CN202211014655 A CN 202211014655A CN 115099370 A CN115099370 A CN 115099370A
Authority
CN
China
Prior art keywords
time
sequence
data
data set
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211014655.5A
Other languages
Chinese (zh)
Other versions
CN115099370B (en
Inventor
南玉泽
王栋
党海峰
夏建涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Quanying Technology Co ltd
Original Assignee
Beijing Quanying Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Quanying Technology Co ltd filed Critical Beijing Quanying Technology Co ltd
Priority to CN202211014655.5A priority Critical patent/CN115099370B/en
Publication of CN115099370A publication Critical patent/CN115099370A/en
Application granted granted Critical
Publication of CN115099370B publication Critical patent/CN115099370B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a method and a system for constructing an evaluation data set for a flow-type industrial production data stream, wherein the method comprises the following steps: s1, selecting a time series list
Figure DEST_PATH_IMAGE002
And L 0 (ii) a S2, adopting a distance similarity screening strategy to obtain L 0 Medium dependent variables y and L i Obtaining the similarity list by the distance similarity of the medium dependent variable y; constructing a first trend data set D1 based on the similarity list and a preset construction mode; s3, selecting a sequence X with a specified length according to historical data of the production data stream, and acquiring the period T of the sequence X by adopting an autocorrelation coefficient processing mode 3 Based on T 3 Generating a time-stamp list based on the elements in the time-stamp listAnd a preset construction mode, constructing a second trend data set D2; s4, adopting a time sequence list
Figure DEST_PATH_IMAGE002A
Acquiring an error sequence from the trained model to be evaluated, and constructing a third trend data set D3 based on the error sequence and a preset construction mode; and S5, sampling the D1, the D2 and the D3 to obtain an evaluation data set.

Description

Evaluation data set construction method and system for flow type industrial production data flow
Technical Field
The invention relates to the technical field of process type industrial production, in particular to a method and a system for constructing an evaluation data set for a process type industrial production data stream.
Background
For a machine learning modeling project under a flow type industrial production scene, the effectiveness evaluation of a model should run through model training, model updating and on-line operation, and whether the effectiveness evaluation of a mathematical model is reasonable is established on the basis of a correct evaluation mode and an effective evaluation data set, that is, to realize the effect evaluation of the full life cycle of the model, a reasonable evaluation data set establishment method should be provided in addition to a correct evaluation method to ensure that data used for model testing are sufficient and correct.
The existing evaluation data set is mainly constructed by a leave-out method, a cross-validation method and a self-service method. However, in a process-type industrial production scene, an evaluation data set established by the existing method cannot well reflect the real data distribution situation of the model at the use time because the evaluation data set does not have the trend and the periodic characteristics of a time sequence, the validity and the generalization capability of the model cannot be correctly reflected on the data set, and the test on the model causes distortion because the evaluation data set is invalid.
Disclosure of Invention
Technical problem to be solved
In view of the above disadvantages and shortcomings of the prior art, the present invention provides a method and a system for constructing an evaluation data set for a flow-type industrial production data stream, which solve the technical problems that the existing evaluation data set does not have the characteristics of time-series trend and periodicity, and the validity and generalization capability of a model to be evaluated cannot be correctly reflected when the evaluation data set is evaluated.
(II) technical scheme
In order to achieve the purpose, the invention adopts the main technical scheme that:
the embodiment of the invention provides a method for constructing an evaluation data set for a process type industrial production data stream, which comprises the following steps:
s1, selecting a time sequence list [ L ] on a time axis according to time sequence data of the production data stream 1 , L 2 ,L i , …,L N ]And a specified sequence L 0 ,L 0 The corresponding time length and the update period T of the model to be evaluated 0 Are identical and specify the sequence L 0 Comprising [ T-T 0 ,t ]Generating element data of a data stream in a time period, wherein t is a current timestamp;
s2, adopting a distance similarity screening strategy to obtain L 0 Intermediate dependent variables y and L i The distance similarity of the medium dependent variable y and a similarity list is obtained; constructing a first trend data set D1 in the time series data of the production data stream based on the similarity list and a preset construction mode;
s3, selecting a sequence X with a specified length according to historical data of the production data stream, and acquiring the period T of the sequence X by adopting an autocorrelation coefficient processing mode 3 Based on said T 3 Generating a time stamp list with a preset length k2, and constructing a second trend data set D2 in the time sequence data of the production data stream according to elements in the time stamp list and a preset construction mode;
s4, adopting a time sequence list L 1 , L 2 ,L i , …,L N ]Acquiring an error sequence from the trained model to be evaluated, and constructing a third trend data set D3 in the time sequence data of the production data stream based on the error sequence and a preset construction mode;
and S5, sampling the D1, the D2 and the D3 to obtain an evaluation data set for evaluating the model to be evaluated.
Preferably, the first and second electrodes are formed of a metal,
the specified sequence L 0 The method comprises the following steps: z is a radical of 01 ,z 02 ,...z 0w ...z 0n
z 0w To specify the sequence L 0 W-th element data arranged in time series;
the time series L i The method comprises the following steps: a predetermined first time interval T 1 M pieces of element data arranged in chronological order in time series data of the production data stream before the current timestamp t in the stream;
wherein L is i =[z i1 ,z i2 ,...z ij ... z im ];
z ij Is a time sequence L i The jth element data arranged according to the time sequence;
wherein,
Figure 466745DEST_PATH_IMAGE001
f is a preset value;
each element data includes: the time stamp corresponding to the element data, and the independent variable and the dependent variable y corresponding to the preset model to be evaluated.
Preferably, the S2 specifically includes:
s21, adopting a distance similarity screening strategy based on the designated sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Separately obtaining the designated sequences L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Distance similarity between each time series in (a);
s22, based on the designated sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Obtaining a similarity list according to the distance similarity between each time sequence;
the similarity list includes: time series List [ L 1 , L 2 ,L i , …,L N ]The time sequences corresponding to the medium K1 maximum distance similarities respectively;
wherein K1 is a preset value, and K1 is more than or equal to 0 and is more than or equal to N/10;
s23, acquiring a first trend data set D1 in the historical production operation time sequence data based on the similarity list and a preset construction mode.
Preferably, the S21 specifically includes:
s211, aiming at the designated sequence L 0 And a time series L i Obtaining the specified sequence L 0 And a time series L i Corresponding distance matrix D (L0,Li)
Wherein,
Figure 3906DEST_PATH_IMAGE002
wherein,
Figure 625380DEST_PATH_IMAGE003
Figure 10225DEST_PATH_IMAGE004
to specify the sequence L 0 The dependent variable y in the w-th element data arranged in time sequence;
Figure 121269DEST_PATH_IMAGE005
is a time sequence L i The dependent variable y in the jth element data arranged in time sequence;
s212, based on the distance matrix D (L0,Li) Recursion formula (1) is adopted to recur the distance matrix D (L0,Li) Middle element d 11 To element d mn Minimum distance L therebetween min (m, n) and the minimum distance L min (m, n) as a designated sequence L 0 And a time series L i Distance similarity between them;
the formula (1) is:
Figure 958775DEST_PATH_IMAGE006
wherein L is min (w, j) is the element d in the distance matrix 11 To any element d in the distance matrix wj The minimum distance of (a);
wherein,
Figure 711836DEST_PATH_IMAGE007
Figure 403718DEST_PATH_IMAGE008
Figure 888926DEST_PATH_IMAGE009
preferably, the S23 specifically includes:
s231, acquiring a first timestamp set based on the similarity list;
the first set of timestamps includes: the time stamp corresponding to the last element data in each time sequence in the similarity list;
and S232, with each timestamp in the first timestamp set as a starting point, respectively backward acquiring element data in the production running time sequence data in the period T0, merging the element data, and acquiring a first trend data set D1.
Preferably, S3 specifically includes:
s31, acquiring two subdata sets corresponding to the sequence X with the specified length in any one preset translation time segment based on the sequence X with the specified length and p preset translation time segments;
the sequence X of the specified length comprises: a second time interval T 2 H element data arranged according to a time sequence in the internal historical production operation time sequence data;
wherein the second time interval T 2 The internal period is more than or equal to 15 days;
X=[z 1 ,z 2 ,...z r ...,z h ];
z r for a second time interval T 2 The r-th element data are arranged in time sequence in the internal historical production operation time sequence data;
the p preset valuesThe translation time slice comprises in sequence: t is t 1 、t 2 、...t g ...t p
Wherein p is more than or equal to 0 and less than or equal to 30;
wherein, t g The method comprises the steps of obtaining a g-th preset translation time segment in p preset translation time segments;
the two sub data sets corresponding to any preset translation time segment of the sequence X with the specified length are respectively a first sub data set of the any preset translation time segment and a second sub data set of the any preset translation time segment;
the first sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 The start time of the translation time segment is used as the start time of any one preset translation time segment, and the element data in the sequence X with the specified length in any one preset translation time segment is used as the element data in the sequence X with the specified length;
the second sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 The starting time of the translation time segment is used as the starting time of any one preset translation time segment, and the element data in the sequence X with other specified lengths in the translation time segment are divided;
s32, respectively acquiring autocorrelation coefficients of two sub data sets corresponding to a sequence X with a specified length in any preset translation time segment by adopting a formula (2);
the formula (2) is:
Figure 213728DEST_PATH_IMAGE010
Figure 776339DEST_PATH_IMAGE011
for a sequence X of a given length within a predetermined translation time segment t g The autocorrelation coefficients of the corresponding two subdata sets;
Figure 870197DEST_PATH_IMAGE012
the mean value of the dependent variable y in the element data of the sequence X with the specified length;
X r the method comprises the steps of (1) obtaining a dependent variable y in the r-th element data in a sequence X with a specified length;
Figure 57465DEST_PATH_IMAGE013
for a sequence X of a given length within a predetermined translation time segment t g A dependent variable y in the r-th element data in the corresponding first sub data set;
Figure 525355DEST_PATH_IMAGE014
for a sequence X of a given length within a predetermined translation time segment t g Dependent variable y in the r-th element data in the corresponding second sub data set;
s33, determining a first time period T corresponding to the sequence X with the specified length based on the autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time slice 3
Wherein the first time period T 3 Is the corresponding time interval between two adjacent wave crests in the first curve;
the first curve is formed by connecting autocorrelation coefficients of two sub-data sets corresponding to all obtained sequences X with specified lengths in any preset translation time segment according to the arrangement sequence of the corresponding translation time segments;
s34, at the current time T, every other first time period T 3 Acquiring element data to obtain a timestamp corresponding to the element data to obtain a timestamp list A with K2 timestamps 2
Wherein,
Figure 620219DEST_PATH_IMAGE015
(ii) a K2 is more than or equal to 0 and less than or equal to h/50, wherein h is the number of element data of the sequence X with the specified length;
s35, list A based on the time stamps 2 Determining a second trend data set D2;
the S35 specifically includes:
by time stamp list A 2 Taking each timestamp as a starting point, and respectively obtaining a first time period T backwards 3 And merging the internal element data to obtain a second trend data set D2.
Preferably, the S4 specifically includes:
s41, list of time series [ L ] to be selected on time axis 1 , L 2 ,L i , …,L N ]Inputting each element data in each time sequence into a trained model to be evaluated for prediction, and obtaining a prediction result of each element data in each time sequence;
wherein the trained model to be evaluated is previously determined by the specified sequence L 0 The element data in (1) is trained;
s42, acquiring the total error of each time sequence based on the prediction result of each element data in each time sequence and the actual dependent variable y;
s43, determining the time sequence with the minimum K3 total errors based on the total error of each time sequence;
wherein K3 is a preset value, K3 is more than or equal to 0 and is more than or equal to N/10;
s44, acquiring a third trend data set D3 based on the K3 time series with the minimum total error;
the S44 specifically includes:
respectively taking the time stamp of the last element data in the time sequence with the minimum K3 total errors as a starting point, and respectively obtaining time periods T backwards 0 The element data in the time series data of the production operation in the production process is merged to obtain a third trend data set D3.
Preferably, the S42 specifically includes:
acquiring a total error of each time sequence by adopting a formula (3) based on a prediction result of each element data in each time sequence and the value of an actual dependent variable y;
the formula (3) is:
Figure 834163DEST_PATH_IMAGE016
wherein,
Figure 926753DEST_PATH_IMAGE017
the predicted value of the element data dependent variable y is obtained;
y is a dependent variable y in the element data;
m is a time sequence L i The number of element data of (a);
e i is a time sequence L i Total error of (2).
Preferably, the S5 specifically includes:
s51, aiming at the first trend data set D1, the second trend data set D2 and the third trend data set D3, respectively, in a first proportion w 1 A second proportion w 2 A third proportion w 3 Sampling to obtain a first trend sample set D1, a second trend sample set D2 and a third trend sample set D3;
and S52, merging the first trend sample set D1, the second trend sample set D2 and the third trend sample set D3, and performing deduplication processing to obtain a final evaluation data set D.
On the other hand, the embodiment further provides an evaluation data set construction system for the process-type industrial production data stream, which includes:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and the processor invokes the program instructions to perform any of the above described profiling dataset construction methods for flowsheet industrial process data streams.
(III) advantageous effects
The invention has the beneficial effects that: the invention relates to a method and a system for constructing an evaluation data set oriented to a flow-type industrial production data streamA data set D1; obtaining the period T of the sequence X by utilizing an autocorrelation coefficient processing mode 3 Through a period T 3 Further constructing a second trend data set D2 and a third trend data set D3 through the trained model to be evaluated; and finally, the data sets with different characteristics are included in the evaluation data set of the model to be evaluated, so that the constructed evaluation data set can better reflect the prediction accuracy and generalization capability of the model at a specific use moment.
Drawings
FIG. 1 is a flow chart of an evaluation data set construction method for a flow-type industrial production data flow according to the present invention;
FIG. 2 is a schematic diagram of error distribution of a model to be evaluated on time series data of an actual production data stream;
fig. 3 is a schematic diagram of error distribution of a model to be evaluated on an evaluation data set constructed by the evaluation data set construction method facing a flow-type industrial production data flow in this embodiment;
fig. 4 is a schematic diagram of error distribution of a model to be evaluated on an evaluation data set constructed by a conventional outflow method.
Detailed Description
For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.
In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Example one
Referring to fig. 1, the embodiment provides a method for constructing an evaluation data set for a flow-type industrial production data stream, including:
s1, selecting time series data of production data flow on time axisTime series List [ L 1 , L 2 ,L i , …,L N ]And a specified sequence L 0 ,L 0 The corresponding time length and the update period T of the model to be evaluated 0 Are identical and specify the sequence L 0 Comprising [ T-T 0 ,t ]And generating element data of the data stream in the time period, wherein t is the current time stamp.
In practical application of this embodiment, the specified sequence L 0 The method comprises the following steps: z is a radical of 01 ,z 02 ,...z 0w ...z 0n
z 0w To specify the sequence L 0 W-th element data arranged in time series.
The time series L i The method comprises the following steps: a predetermined first time interval T 1 M pieces of element data arranged in chronological order in time series data of the production data stream before the current time stamp t in the inner.
Wherein L is i =[z i1 ,z i2 ,...z ij ... z im ]。
z ij Is a time sequence L i In chronological orderjThe individual element data.
Wherein,
Figure 944256DEST_PATH_IMAGE001
and F is a preset value.
Each element data includes: the time stamp corresponding to the element data, and the independent variable and the dependent variable y corresponding to the preset model to be evaluated.
S2, adopting a distance similarity screening strategy to obtain L 0 Intermediate dependent variables y and L i Distance similarity of the medium dependent variable y and obtaining a similarity list; and constructing a first trend data set D1 in the time series data of the production data stream based on the similarity list and a preset construction mode.
The S2 specifically includes:
s21, adopting a distance similarity screening strategy based on the designated sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Separately obtaining the designated sequences L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]The distance similarity between each time series.
Specifically, the S21 specifically includes:
s211, aiming at the designated sequence L 0 And a time series L i Obtaining a specified sequence L 0 And a time series L i Corresponding distance matrix D (L0,Li)
Wherein,
Figure 62385DEST_PATH_IMAGE002
wherein,
Figure 114523DEST_PATH_IMAGE003
Figure 909173DEST_PATH_IMAGE004
to specify the sequence L 0 In chronological orderwDependent variable y in individual element data.
Figure 103654DEST_PATH_IMAGE005
Is a time sequence L i In chronological orderjDependent variable y in individual element data.
S212, based on the distance matrix D (L0,Li) Recursion formula (1) is adopted to recur the distance matrix D (L0,Li) Middle element d 11 To element d mn A minimum distance L therebetween min (m, n) and the minimum distance L min (m, n) as a designated sequence L 0 And a time series L i The distance similarity between them.
The formula (1) is:
Figure 822212DEST_PATH_IMAGE006
wherein L is min (w, j) is the element d in the distance matrix 11 To any element d in the distance matrix wj The minimum distance of (c).
Wherein,
Figure 994436DEST_PATH_IMAGE007
Figure 241878DEST_PATH_IMAGE008
Figure 765132DEST_PATH_IMAGE009
s22, based on the designated sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]The distance similarity between each time series in the time series is obtained, and a similarity list is obtained.
The similarity list includes: time series List [ L 1 , L 2 ,L i , …,L N ]And the medium K1 maximum distance similarity respectively correspond to the time series.
Wherein K1 is a predetermined value, and K1 is not less than 0 and not more than N/10.
S23, acquiring a first trend data set D1 in the historical production operation time sequence data based on the similarity list and a preset construction mode.
Specifically, the S23 specifically includes:
s231, acquiring a first timestamp set based on the similarity list.
The first set of timestamps includes: and the time stamp corresponding to the last element data in each time sequence in the similarity list.
S232, taking each timestamp in the first timestamp set as a starting point, and respectively obtaining a period T backwards 0 The element data in the time series data of the production operation are merged to obtain a first trend data set D1.
In the embodiment, the similarity of two time series with different lengths is quantified, and the periodic data set is combined to better accord with the characteristics of the time series data.
S3, selecting a sequence X with a specified length according to historical data of the production data stream, and acquiring the period T of the sequence X by adopting an autocorrelation coefficient processing mode 3 Based on said T 3 And generating a time stamp list with a preset length k2, and constructing a second trend data set D2 in the time sequence data of the production data stream according to elements in the time stamp list and a preset construction mode.
In practical application of this embodiment, S3 specifically includes:
s31, based on the sequence X with the specified length and p preset translation time slices, obtaining two sub data sets corresponding to the sequence X with the specified length in any preset translation time slice.
The sequence X of the specified length comprises: a second time interval T 2 Arranged in chronological order in time series data of internal historical production runshThe individual element data.
Wherein the second time interval T 2 The internal period is more than or equal to 15 days.
X=[z 1 ,z 2 ,...z r ...,z h ]。
z r For a second time interval T 2 The r-th element data in the time sequence of the internal historical production operation time sequence data.
The p preset translation time slices sequentially include: t is t 1 、t 2 、...t g ...t p
Wherein p is more than or equal to 0 and less than or equal to 30.
Wherein, t g For the g-th predetermined panning time period of the p predetermined panning time periods.
Two sub data sets corresponding to any preset translation time segment of the sequence X with the specified length are respectively a first sub data set of the any preset translation time segment and a second sub data set of the any preset translation time segment.
The renA first sub data set of a predetermined shift time slice, comprising: at said second time interval T 2 Is used as the start time of any one of the preset translation time segments, and the element data in the sequence X with the specified length in any one of the preset translation time segments.
The second sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 Is taken as the start time of any one of the preset translation time segments, and is divided by the element data in the sequence X with other specified length in any one of the preset translation time segments.
S32, obtaining the autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment by adopting a formula (2).
The formula (2) is:
Figure 490642DEST_PATH_IMAGE010
Figure 986215DEST_PATH_IMAGE011
for a sequence X of a given length within a predetermined translation time segment t g The autocorrelation coefficients of the corresponding two subdata sets.
Figure 857087DEST_PATH_IMAGE012
Is the mean of the dependent variable y in the element data of a sequence X of a specified length.
X r Is the dependent variable y in the r-th element data in the sequence X with the specified length.
Figure 87212DEST_PATH_IMAGE013
For a sequence X of a given length within a predetermined translation time segment t g And the dependent variable y in the r-th element data in the corresponding first sub data set.
Figure 865681DEST_PATH_IMAGE014
For a sequence X of a given length within a predetermined translation time segment t g And the dependent variable y in the r-th element data in the corresponding second sub data set.
S33, determining a first time period T corresponding to the sequence X with the specified length based on the autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment 3
Wherein the first time period T 3 Is the corresponding time interval between two adjacent peaks in the first curve.
The first curve is formed by connecting autocorrelation coefficients of two sub data sets corresponding to all obtained sequences X with specified lengths in any preset translation time segment according to the arrangement sequence of the corresponding translation time segments.
S34, at current time T, every other first time period T 3 Acquiring element data to obtain a timestamp corresponding to the element data to obtain a timestamp list A with K2 timestamps 2
Wherein,
Figure 481339DEST_PATH_IMAGE015
(ii) a K2 is more than or equal to 0 and less than or equal to h/50, wherein h is the number of element data of the sequence X with the specified length.
S35, list A based on the time stamps 2 A second trend data set D2 is determined.
The S35 specifically includes:
by time stamp list A 2 Taking each timestamp as a starting point, and respectively obtaining a first time period T backwards 3 And merging the internal element data to obtain a second trend data set D2.
S4, adopting a time sequence list L 1 , L 2 ,L i , …,L N ]And training the model to be evaluated, obtaining an error sequence based on the error sequenceAnd a preset construction mode, constructing a third trend data set D3 in the time series data of the production data stream.
The S4 specifically includes:
s41, list of time series [ L ] to be selected on time axis 1 , L 2 ,L i , …,L N ]Inputting each element data in each time sequence into a trained model to be evaluated for prediction, and obtaining a prediction result of each element data in each time sequence.
Wherein the trained model to be evaluated is previously determined by the specified sequence L 0 Is trained.
And S42, acquiring the total error of each time series based on the prediction result of each element data in each time series and the actual dependent variable y.
The S42 specifically includes:
the total error of each time series is obtained by using formula (3) based on the prediction result of each element data in each time series and the value of the actual dependent variable y.
The formula (3) is:
Figure 257534DEST_PATH_IMAGE016
wherein,
Figure 974954DEST_PATH_IMAGE018
is a predicted value of the dependent variable y of the element data.
And y is a dependent variable y in the element data.
m is a time sequence L i Number of element data of (1).
e i Is a time sequence L i Total error of (2).
And S43, determining the time sequence with the minimum K3 total errors based on the total error of each time sequence.
Wherein K3 is a preset value, and K3 is more than or equal to 0 and less than or equal to N/10.
S44, acquiring a third trend data set D3 based on the time sequence with the minimum K3 total errors.
The S44 specifically includes:
respectively taking the time stamp of the last element data in the time sequence with the minimum K3 total errors as a starting point, and respectively obtaining the time periods T backwards 0 The element data in the time series data of the production operation are merged to obtain a third trend data set D3.
In the embodiment, based on the approximate running state of the model to be evaluated in the selection history, the prediction capability of the model to be evaluated in a real use scene can be better reflected through the error selection data set obtained by the model to be evaluated.
And S5, sampling the D1, the D2 and the D3 to obtain an evaluation data set for evaluating the model to be evaluated.
Specifically, the S5 specifically includes:
s51, aiming at the first trend data set D1, the second trend data set D2 and the third trend data set D3, respectively, in a first proportion w 1 A second proportion w 2 A third proportion w 3 Sampling is performed to obtain a first trend sample set D1, a second trend sample set D2 and a third trend sample set D3.
And S52, merging and de-duplicating the first trend sample set D1, the second trend sample set D2 and the third trend sample set D3 to obtain a final evaluation data set D.
In this embodiment, the data sets with different characteristics (the first trend sample set D1, the second trend sample set D2, and the third trend sample set D3) are included in the evaluation data set, so that the prediction capability and the generalization capability of the evaluation data set can be considered, and the evaluation of the model to be evaluated is more objective and comprehensive.
According to the evaluation data set construction method for the process type industrial production data stream, a first trend data set D1 is constructed by adopting a distance similarity screening strategy from a change rule of a time sequence; obtaining the period T of the sequence X by utilizing an autocorrelation coefficient processing mode 3 Through a period T 3 Further, a second trend data set D2 and a second trend data set to be trained are constructedThe evaluated model constructs a third trend data set D3; and finally, the data sets with different characteristics are included in the evaluation data set of the model to be evaluated, so that the constructed evaluation data set can better reflect the prediction accuracy and generalization capability of the model at a specific use moment.
Finally, in the practical application of the embodiment, the final evaluation data set D is input into the model to be evaluated, so as to obtain a more accurate evaluation result.
Example two
In order to better understand the scheme of the embodiment of the present invention, the steps of the embodiment of the present invention are described in detail below.
The embodiment provides a method for constructing an evaluation data set for a flow-type industrial production data stream, which comprises the following steps:
101. selecting a time sequence list [ L ] on a time axis for time series data of a production data stream 1 , L 2 ,L i , …,L N ]And a specified sequence L 0 ,L 0 The corresponding time length and the update period T of the model to be evaluated 0 Are identical and specify the sequence L 0 Comprising [ T-T ] 0 ,t ]Generating element data of a data stream in a time period, wherein t is a current timestamp; that is to say specifying the sequence L 0 The real operating data closest to the model to be evaluated on the time axis.
In practical application of this embodiment, the specified sequence L 0 The method comprises the following steps: z is a radical of 01 ,z 02 ,...z 0w ...z 0n
z 0w To specify the sequence L 0 The w-th element data arranged in time series.
The time series L i The method comprises the following steps: a predetermined first time interval T 1 M pieces of element data arranged in chronological order in time series data of the production data stream before the current time stamp t in the inner.
Wherein L is i =[z i1 ,z i2 ,...z ij ... z im ]。
z ij Is a time sequence L i In chronological orderjThe individual element data.
Wherein,
Figure 297394DEST_PATH_IMAGE001
and F is a preset value.
Each element data includes: the time stamp corresponding to the element data, and the independent variable and the dependent variable y corresponding to the preset model to be evaluated.
For example, in the specific application of the embodiment, as shown in table 1, the specified sequence L corresponding to the model to be evaluated with the main steam production as a dependent variable 0 The part of data (independent variables comprise the amount of coal entering the furnace, the total air volume entering the furnace and the content of oxygen in exhaust gas):
TABLE 1
Time stamp Amount of coal charged into the furnace Total air flow into furnace Oxygen content of exhaust gas Main steam production
1647050100000 15.721 76.5558 2.1969 171.1258
1647050106000 15.5656 75.834 2.4192 171.2354
1647050112000 15.4694 76.1088 2.4407 171.2512
1647050118000 16.0389 76.8488 2.362 172.0235
1647050124000 16.027 76.0773 2.4075 172.1235
1647050130000 16.7334 76.1407 2.1873 172.3258
1647050136000 16.993 76.2791 2.1918 173.3984
1647050142000 17.3402 76.0581 2.2747 173.9423
1647050148000 17.3856 77.4178 2.5426 173.4521
1647050154000 17.6544 77.3188 2.3211 174.2144
1647050160000 17.0855 77.5035 2.2665 174.6845
1647050166000 17.6559 77.9596 2.3259 174.9545
1647050172000 17.7959 79.6381 2.2277 175.3632
1647050178000 18.0086 79.8114 2.2929 175.8852
1647050184000 18.1417 79.6916 2.3011 175.8567
102. For a given sequence L 0 And a time series L i Obtaining a specified sequence L 0 And a time series L i Corresponding distance matrix D (L0,Li)
Wherein,
Figure 315029DEST_PATH_IMAGE002
wherein,
Figure 58863DEST_PATH_IMAGE003
Figure 263579DEST_PATH_IMAGE004
to specify the sequence L 0 In chronological orderwDependent variable y in individual element data.
Figure 852692DEST_PATH_IMAGE005
Is a time sequence L i In chronological orderjDependent variable y in individual element data.
103. Based on the distance matrix D (L0,Li) Recursion formula (1) is adopted to recur the distance matrix D (L0,Li) Middle element d 11 To element d mn Minimum distance L therebetween min (m, n) and the minimum distance L min (m, n) as a designated sequence L 0 And a time series L i The distance similarity between them.
The formula (1) is:
Figure 442942DEST_PATH_IMAGE006
wherein L is min (w, j) is the element d in the distance matrix 11 To any element d in the distance matrix wj The minimum distance of (c).
Wherein,
Figure 92098DEST_PATH_IMAGE007
Figure 49690DEST_PATH_IMAGE008
Figure 708073DEST_PATH_IMAGE009
104. based on a given sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]The distance similarity between each time series in the time series is obtained, and a similarity list is obtained.
The similarity list includes: time series List [ L 1 , L 2 ,L i , …,L N ]And the middle K1 maximum distance similarities respectively correspond to the time series.
Wherein K1 is a preset value, and K1 is more than or equal to 0 and less than or equal to N/10.
105. And acquiring a first timestamp set based on the similarity list.
The first set of timestamps includes: and the time stamp corresponding to the last element data in each time sequence in the similarity list.
106. Respectively backward acquiring a period T by taking each timestamp in the first timestamp set as a starting point 0 In-line production run timing dataAnd the union of the element data to obtain a first trend data set D1.
107. Based on the sequence X with the specified length and p preset translation time segments, two subdata sets corresponding to the sequence X with the specified length in any preset translation time segment are obtained.
The sequence X of the specified length comprises: a second time interval T 2 H element data arranged according to the time sequence in the internal historical production operation time sequence data.
Wherein the second time interval T 2 The internal period is more than or equal to 15 days.
X=[z 1 ,z 2 ,...z r ...,z h ]。
z r For a second time interval T 2 The r-th element data in the time sequence of the internal historical production operation time sequence data.
The p preset translation time slices sequentially include: t is t 1 、t 2 、...t g ...t p
Wherein p is more than or equal to 0 and less than or equal to 30.
Wherein, t g For the g-th predetermined panning time period of the p predetermined panning time periods.
Two sub data sets corresponding to any preset translation time segment of the sequence X with the specified length are respectively a first sub data set of the any preset translation time segment and a second sub data set of the any preset translation time segment.
The first sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 Is taken as the start time of any one of the preset translation time segments, and the element data in the sequence X with the specified length in any one of the preset translation time segments.
The second sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 As a starting moment of said any predetermined fraction of translation timeStart time, except for the element data in sequence X of other specified length within any of the predetermined translation time segments.
108. And (3) respectively acquiring autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment by adopting a formula (2).
The formula (2) is:
Figure 434721DEST_PATH_IMAGE010
Figure 67828DEST_PATH_IMAGE011
for a sequence X of a given length within a predetermined translation time segment t g The autocorrelation coefficients of the corresponding two subdata sets.
Figure 761983DEST_PATH_IMAGE012
Is the mean of the dependent variable y in the element data of a sequence X of a specified length.
X r Is the dependent variable y in the r-th element data in the sequence X with the specified length.
Figure 505948DEST_PATH_IMAGE020
For a sequence X of a given length within a predetermined translation time segment t g And the dependent variable y in the r-th element data in the corresponding first sub data set.
Figure 618261DEST_PATH_IMAGE022
For a sequence X of a given length within a predetermined translation time segment t g And the dependent variable y in the r-th element data in the corresponding second sub data set.
S33, determining the sequence with the specified length based on the autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time sliceThe first time period T corresponding to the row X 3
Wherein the first time period T 3 Is the corresponding time interval between two adjacent peaks in the first curve.
The first curve is formed by connecting autocorrelation coefficients of two sub data sets corresponding to all obtained sequences X with specified lengths in any preset translation time segment according to the arrangement sequence of the corresponding translation time segments.
109. At the current moment T, every other first time period T 3 Acquiring element data to obtain a timestamp corresponding to the element data to obtain a timestamp list A with K2 timestamps 2
Wherein,
Figure 422269DEST_PATH_IMAGE015
(ii) a K2 is more than or equal to 0 and less than or equal to h/50, wherein h is the number of element data of the sequence X with the specified length.
110. Based on the time stamp list A 2 A second trend data set D2 is determined.
The 110 specifically includes:
by time stamp list A 2 Taking each timestamp as a starting point, and respectively obtaining a first time period T backwards 3 And merging the internal element data to obtain a second trend data set D2.
111. List of time sequences [ L ] to be selected on time axis 1 , L 2 ,L i , …,L N ]Inputting each element data in each time sequence into a trained model to be evaluated for prediction, and obtaining a prediction result of each element data in each time sequence.
Wherein the trained model to be evaluated is previously determined by the specified sequence L 0 Is trained.
112. And acquiring the total error of each time series based on the prediction result of each element data in each time series and the actual dependent variable y.
The 112 specifically includes:
the total error of each time series is obtained by using formula (3) based on the prediction result of each element data in each time series and the value of the actual dependent variable y.
The formula (3) is:
Figure 869299DEST_PATH_IMAGE023
wherein,
Figure 151376DEST_PATH_IMAGE018
is a predicted value of the dependent variable y of the element data.
And y is a dependent variable y in the element data.
m is a time sequence L i Number of element data of (1).
e i Is a time sequence L i The total error of (c).
113. And determining the time sequence with the minimum K3 total errors based on the total error of each time sequence.
Wherein K3 is a preset value, and K3 is more than or equal to 0 and less than or equal to N/10.
114. Based on the K3 time series with the smallest total error, a third trend data set D3 was acquired.
The 114 specifically includes:
respectively taking the time stamp of the last element data in the time sequence with the minimum K3 total errors as a starting point, and respectively obtaining time periods T backwards 0 The element data in the time series data of the production operation are merged to obtain a third trend data set D3.
115. And sampling the D1, the D2 and the D3 to obtain an evaluation data set for evaluating the model to be evaluated.
Specifically, the 115 specifically includes:
aiming at the first trend data set D1, the second trend data set D2 and the third trend data set D3 respectively in a first proportion w 1 A second proportion w 2 A third proportion w 3 Sampling to obtain the corresponding first trend sample set D1 and the second trend sample setSet D2, third trend sample set D3.
And (3) merging the first trend sample set D1, the second trend sample set D2 and the third trend sample set D3, and performing deduplication processing to obtain a final evaluation data set D.
According to the method for constructing the evaluation data set for the flow-type industrial production data stream, a first trend data set D1 is constructed by starting from a change rule of a time sequence and utilizing a distance similarity screening strategy; obtaining the period T of the sequence X by utilizing an autocorrelation coefficient processing mode 3 Through a period T 3 Further constructing a second trend data set D2 and a third trend data set D3 through the trained model to be evaluated; and finally, the data sets with different characteristics are included in the evaluation data set of the model to be evaluated, so that the constructed evaluation data set can better reflect the prediction accuracy and generalization capability of the model at a specific use moment.
Finally, in the practical application of the embodiment, the final evaluation data set D is input into the model to be evaluated, so as to obtain a corresponding more accurate evaluation result.
As can be seen from the comparison of fig. 2, fig. 3, and fig. 4, the error distribution of the model to be evaluated on the evaluation data set constructed by the existing leave-out method is significantly larger than the error distribution of the model to be evaluated on the time series data of the actual production data stream and the error distribution of the model to be evaluated on the evaluation data set constructed by the evaluation data set construction method facing the flow-type industrial production data stream in this embodiment, the error distribution of the model to be evaluated on the evaluation data set constructed by the evaluation data set construction method facing the flow type industrial production data flow in the embodiment is within ± 3, and the mean value is 0, which indicates that the evaluation data set constructed by the evaluation data set construction method facing the flow type industrial production data flow in the embodiment is closer to the real operation data, that is, the prediction performance of the model to be evaluated in the actual use can be reflected.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third and the like are for convenience only and do not denote any order. These words are to be understood as part of the name of the component.
Furthermore, it should be noted that in the description of the present specification, the description of the term "one embodiment", "some embodiments", "examples", "specific examples" or "some examples", etc., means that a specific feature, structure, material or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, the claims should be construed to include preferred embodiments and all changes and modifications that fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention should also include such modifications and variations.

Claims (10)

1. A method for constructing an evaluation data set oriented to a flow type industrial production data flow is characterized by comprising the following steps:
s1, aiming at the time sequence data of the production data stream, selecting a time sequence list [ L ] on the time axis 1 , L 2 ,L i , …,L N ]And a specified sequence L 0 ,L 0 The corresponding time length and the update period T of the model to be evaluated 0 Are identical and specify the sequence L 0 Comprising [ T-T 0 ,t ]Generating element data of a data stream in a time period, wherein t is a current timestamp;
s2, adopting a distance similarity screening strategy to obtain L 0 Medium dependent variables y and L i The distance similarity of the medium dependent variable y and a similarity list is obtained; constructing a first trend data set D1 in the time series data of the production data stream based on the similarity list and a preset construction mode;
s3, selecting a sequence X with a specified length according to historical data of the production data stream, and acquiring the period T of the sequence X by adopting an autocorrelation coefficient processing mode 3 Based on said T 3 Generating a time stamp list of a preset length k2, and generating a production data stream according to elements in the time stamp list and a preset construction modeConstructing a second trend data set D2 in the time series data;
s4, adopting a time sequence list L 1 , L 2 ,L i , …,L N ]Acquiring an error sequence from the trained model to be evaluated, and constructing a third trend data set D3 in the time sequence data of the production data stream based on the error sequence and a preset construction mode;
and S5, sampling the D1, the D2 and the D3 to obtain an evaluation data set for evaluating the model to be evaluated.
2. The method of claim 1,
the specified sequence L 0 The method comprises the following steps: z is a radical of 01 ,z 02 ,...z 0w ...z 0n
z 0w To specify the sequence L 0 W-th element data arranged in time series;
the time series L i The method comprises the following steps: a predetermined first time interval T 1 M pieces of element data arranged in chronological order in time series data of the production data stream before the current timestamp t in the stream;
wherein L is i =[z i1 ,z i2 ,...z ij ... z im ];
z ij Is a time sequence L i In chronological orderjAn individual element data;
wherein,
Figure DEST_PATH_IMAGE001
f is a preset value;
each element data includes: the time stamp corresponding to the element data, and the independent variable and the dependent variable y corresponding to the preset model to be evaluated.
3. The method according to claim 2, wherein the S2 specifically includes:
s21, adopting a distance similarity screening strategy based on the designated sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Separately obtaining the designated sequences L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Distance similarity between each time series in (a);
s22, based on the designated sequence L 0 And time series List [ L 1 , L 2 ,L i , …,L N ]Obtaining a similarity list according to the distance similarity between each time sequence;
the similarity list includes: time series List [ L 1 , L 2 ,L i , …,L N ]The time series corresponding to the middle K1 maximum distance similarities respectively;
wherein K1 is a preset value, and K1 is more than or equal to 0 and is more than or equal to N/10;
s23, acquiring a first trend data set D1 in the historical production operation time sequence data based on the similarity list and a preset construction mode.
4. The method according to claim 3, wherein the S21 specifically includes:
s211, aiming at the designated sequence L 0 And a time series L i Obtaining a specified sequence L 0 And a time series L i Corresponding distance matrix D (L0,Li)
Wherein,
Figure 290280DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE003
Figure 597633DEST_PATH_IMAGE004
to specify the sequence L 0 W-th element arranged in time sequenceDependent variable y in the pixel data;
Figure DEST_PATH_IMAGE005
is a time sequence L i In chronological orderjDependent variable y in individual element data;
s212, based on the distance matrix D (L0,Li) Using a recursion formula (1), recursion of the distance matrix D (L0,Li) Middle element d 11 To element d mn Minimum distance L therebetween min (m, n) and the minimum distance L min (m, n) as a designated sequence L 0 And a time series L i Distance similarity between them;
the formula (1) is:
Figure 49474DEST_PATH_IMAGE006
wherein L is min (w, j) is the element d in the distance matrix 11 To any element d in the distance matrix wj A minimum distance of;
wherein,
Figure DEST_PATH_IMAGE007
Figure 569317DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
5. the method according to claim 4, wherein the S23 specifically includes:
s231, acquiring a first timestamp set based on the similarity list;
the first set of timestamps includes: the time stamp corresponding to the last element data in each time sequence in the similarity list;
s232, taking each timestamp in the first timestamp set as a starting point, and respectively obtaining a period T backwards 0 The element data in the time series data of the production operation are merged to obtain a first trend data set D1.
6. The method according to claim 5, wherein S3 specifically comprises:
s31, acquiring two subdata sets corresponding to the sequence X with the specified length in any one preset translation time segment based on the sequence X with the specified length and p preset translation time segments;
the sequence X of the specified length comprises: a second time interval T 2 H element data arranged according to a time sequence in the internal historical production operation time sequence data;
wherein the second time interval T 2 The internal period is more than or equal to 15 days;
X=[z 1 ,z 2 ,...z r ...,z h ];
z r for a second time interval T 2 The r-th element data are arranged in time sequence in the internal historical production operation time sequence data;
the p preset translation time slices sequentially include: t is t 1 、t 2 、...t g ...t p
Wherein p is more than or equal to 0 and less than or equal to 30;
wherein, t g The method comprises the steps of obtaining a g-th preset translation time segment in p preset translation time segments;
the two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment are respectively a first sub data set and a second sub data set of the any preset translation time segment;
the first sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 Is taken as the start time of any one of the predetermined translation time segments, at whichElement data in a sequence X of a specified length within any preset translation time segment;
the second sub data set of any one of the predetermined panning time segments comprises: at said second time interval T 2 The starting time of the translation time segment is used as the starting time of any one preset translation time segment, and the element data in the sequence X with other specified lengths in the translation time segment are divided;
s32, respectively acquiring autocorrelation coefficients of two sub data sets corresponding to a sequence X with a specified length in any preset translation time segment by adopting a formula (2);
the formula (2) is:
Figure 305061DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE011
for a sequence X of a given length within a predetermined translation time segment t g The autocorrelation coefficients of the corresponding two subdata sets;
Figure 783316DEST_PATH_IMAGE012
the mean value of the dependent variable y in the element data of the sequence X with the specified length;
X r the method comprises the steps of (1) obtaining a dependent variable y in the r-th element data in a sequence X with a specified length;
Figure DEST_PATH_IMAGE013
for a sequence X of a given length within a predetermined translation time segment t g Dependent variable y in the r-th element data in the corresponding first sub data set;
Figure 253611DEST_PATH_IMAGE014
for a sequence X of a given length within a predetermined translation time segment t g Dependent variable y in the r-th element data in the corresponding second sub data set;
s33, determining a first time period T corresponding to the sequence X with the specified length based on the autocorrelation coefficients of two sub data sets corresponding to the sequence X with the specified length in any preset translation time segment 3
Wherein the first time period T 3 Is the corresponding time interval between two adjacent wave crests in the first curve;
the first curve is formed by connecting autocorrelation coefficients of two sub-data sets corresponding to all obtained sequences X with specified lengths in any preset translation time segment according to the arrangement sequence of the corresponding translation time segments;
s34, at current time T, every other first time period T 3 Acquiring element data to obtain a timestamp corresponding to the element data to obtain a timestamp list A with K2 timestamps 2
Wherein,
Figure DEST_PATH_IMAGE015
(ii) a K2 is more than or equal to 0 and less than or equal to h/50, wherein h is the number of element data of the sequence X with the specified length;
s35, list A based on the time stamps 2 Determining a second trend data set D2;
the S35 specifically includes:
listing by time stamp A 2 Taking each timestamp as a starting point, and respectively obtaining a first time period T backwards 3 And merging the internal element data to obtain a second trend data set D2.
7. The method according to claim 6, wherein the S4 specifically includes:
s41, list of time series [ L ] to be selected on time axis 1 , L 2 ,L i , …,L N ]Each in each time series ofRespectively inputting the element data into a trained model to be evaluated for prediction, and obtaining a prediction result of each element data in each time sequence;
wherein the trained model to be evaluated is previously determined by the specified sequence L 0 The element data in (1) is trained;
s42, acquiring the total error of each time sequence based on the prediction result of each element data in each time sequence and the actual dependent variable y;
s43, determining the time sequence with the minimum K3 total errors based on the total error of each time sequence;
wherein K3 is a preset value, K3 is more than or equal to 0 and is more than or equal to N/10;
s44, acquiring a third trend data set D3 based on the time sequence with the minimum K3 total errors;
the S44 specifically includes:
respectively taking the time stamp of the last element data in the time sequence with the minimum K3 total errors as a starting point, and respectively obtaining the time periods T backwards 0 The element data in the time series data of the production operation are merged to obtain a third trend data set D3.
8. The method according to claim 7, wherein the S42 specifically includes:
acquiring a total error of each time sequence by adopting a formula (3) based on a prediction result of each element data in each time sequence and the value of an actual dependent variable y;
the formula (3) is:
Figure 108304DEST_PATH_IMAGE016
wherein,
Figure DEST_PATH_IMAGE017
is a predicted value of the dependent variable y of the element data;
y is a dependent variable y in the element data;
m is a time sequenceColumn L i The number of element data of (a);
e i is a time sequence L i Total error of (2).
9. The method according to claim 8, wherein the S5 specifically includes:
s51, aiming at the first trend data set D1, the second trend data set D2 and the third trend data set D3, respectively, in a first proportion w 1 A second proportion w 2 A third proportion w 3 Sampling to obtain a first trend sample set D1, a second trend sample set D2 and a third trend sample set D3;
and S52, merging and de-duplicating the first trend sample set D1, the second trend sample set D2 and the third trend sample set D3 to obtain a final evaluation data set D.
10. An evaluation data set construction system oriented to a flow-type industrial production data stream is characterized by comprising the following steps:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein the memory stores program instructions executable by the processor, and the processor invokes the program instructions to perform the method for profiling dataset construction for flowsheet industrial process data streams according to any of claims 1-9.
CN202211014655.5A 2022-08-23 2022-08-23 Evaluation data set construction method and system for flow-type industrial production data stream Active CN115099370B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211014655.5A CN115099370B (en) 2022-08-23 2022-08-23 Evaluation data set construction method and system for flow-type industrial production data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211014655.5A CN115099370B (en) 2022-08-23 2022-08-23 Evaluation data set construction method and system for flow-type industrial production data stream

Publications (2)

Publication Number Publication Date
CN115099370A true CN115099370A (en) 2022-09-23
CN115099370B CN115099370B (en) 2022-12-02

Family

ID=83301684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211014655.5A Active CN115099370B (en) 2022-08-23 2022-08-23 Evaluation data set construction method and system for flow-type industrial production data stream

Country Status (1)

Country Link
CN (1) CN115099370B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110178615A1 (en) * 2009-04-23 2011-07-21 Universitat Pompeu Fabra Method for calculating measures of similarity between time signals
CN109711755A (en) * 2019-01-23 2019-05-03 华南理工大学 Short-term power load prediction model establishment method based on EMD-VMD-PSO-LSSVM
CN110445629A (en) * 2018-05-03 2019-11-12 佛山市顺德区美的电热电器制造有限公司 A kind of server concurrency prediction technique and device
CN111310981A (en) * 2020-01-20 2020-06-19 浙江工业大学 Reservoir water level trend prediction method based on time series
CN112232447A (en) * 2020-12-14 2021-01-15 国网江西省电力有限公司电力科学研究院 Construction method of complete sample set of power equipment state monitoring data
US20210081818A1 (en) * 2019-09-13 2021-03-18 Oracle International Corporation Automatically-generated labels for time series data and numerical lists to use in analytic and machine learning systems
US20210165851A1 (en) * 2019-11-28 2021-06-03 Fujitsu Limited Determination method and determination apparatus
CN112926633A (en) * 2021-02-01 2021-06-08 长江慧控科技(武汉)有限公司 Abnormal energy consumption detection method, device, equipment and storage medium
CN113033668A (en) * 2021-03-29 2021-06-25 中国人民解放军92859部队 LS-SVM algorithm depth sounding training sample thinning method based on sample Euclidean distance
CN113051442A (en) * 2019-12-26 2021-06-29 中国电信股份有限公司 Time series data processing method, device and computer readable storage medium
CN114528097A (en) * 2022-01-25 2022-05-24 华南理工大学 Cloud platform service load prediction method based on time sequence convolution neural network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110178615A1 (en) * 2009-04-23 2011-07-21 Universitat Pompeu Fabra Method for calculating measures of similarity between time signals
CN110445629A (en) * 2018-05-03 2019-11-12 佛山市顺德区美的电热电器制造有限公司 A kind of server concurrency prediction technique and device
CN109711755A (en) * 2019-01-23 2019-05-03 华南理工大学 Short-term power load prediction model establishment method based on EMD-VMD-PSO-LSSVM
US20210081818A1 (en) * 2019-09-13 2021-03-18 Oracle International Corporation Automatically-generated labels for time series data and numerical lists to use in analytic and machine learning systems
US20210165851A1 (en) * 2019-11-28 2021-06-03 Fujitsu Limited Determination method and determination apparatus
CN113051442A (en) * 2019-12-26 2021-06-29 中国电信股份有限公司 Time series data processing method, device and computer readable storage medium
CN111310981A (en) * 2020-01-20 2020-06-19 浙江工业大学 Reservoir water level trend prediction method based on time series
CN112232447A (en) * 2020-12-14 2021-01-15 国网江西省电力有限公司电力科学研究院 Construction method of complete sample set of power equipment state monitoring data
CN112926633A (en) * 2021-02-01 2021-06-08 长江慧控科技(武汉)有限公司 Abnormal energy consumption detection method, device, equipment and storage medium
CN113033668A (en) * 2021-03-29 2021-06-25 中国人民解放军92859部队 LS-SVM algorithm depth sounding training sample thinning method based on sample Euclidean distance
CN114528097A (en) * 2022-01-25 2022-05-24 华南理工大学 Cloud platform service load prediction method based on time sequence convolution neural network

Also Published As

Publication number Publication date
CN115099370B (en) 2022-12-02

Similar Documents

Publication Publication Date Title
CN110674604B (en) Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM
CN112101480B (en) Multivariate clustering and fused time sequence combined prediction method
CN104461842A (en) Log similarity based failure processing method and device
CN108550400B (en) Method for evaluating influence of air pollutants on number of respiratory disease patients
CN114611792A (en) Atmospheric ozone concentration prediction method based on mixed CNN-Transformer model
CN108363902B (en) Accurate prediction method for pathogenic genetic variation
JP2006268558A (en) Data processing method and program
CN111178623A (en) Business process remaining time prediction method based on multilayer machine learning
JP2007323315A (en) Cooperative filtering method, cooperative filtering device, cooperative filtering program and recording medium with the same program recorded thereon
CN107451684A (en) Stock market's probability forecasting method based on core stochastic approximation
CN115099370B (en) Evaluation data set construction method and system for flow-type industrial production data stream
CN111488460A (en) Data processing method, device and computer readable storage medium
Shan et al. Automatic Generation of Piano Score Following Videos.
JP2021033895A (en) Variable selection method, variable selection program, and variable selection system
Zhang et al. Collaborated online change-point detection in sparse time series for online advertising
CN116307206A (en) Natural gas flow prediction method based on segmented graph convolution and time attention mechanism
CN113378546B (en) Non-autoregressive sentence sequencing method
CN114819152A (en) Graph embedding expert entity alignment method based on reinforcement learning enhancement
CN114187963A (en) Prediction method of protein binding nucleotide sites on full-length circular RNA
CN113190763A (en) Information recommendation method and system
CN111797300A (en) Knowledge representation learning model based on importance negative sampling and negative sampling frame construction method
CN113704635B (en) Social network event recommendation method and system
CN117079667B (en) Scene classification method, device, equipment and readable storage medium
JP6018852B2 (en) Factor analysis / display method and system
CN117473327B (en) Regional population model training method and regional population prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant