CN114169253A - Data flow dynamic prediction method and system based on Flink and LSTM - Google Patents

Data flow dynamic prediction method and system based on Flink and LSTM Download PDF

Info

Publication number
CN114169253A
CN114169253A CN202111640787.4A CN202111640787A CN114169253A CN 114169253 A CN114169253 A CN 114169253A CN 202111640787 A CN202111640787 A CN 202111640787A CN 114169253 A CN114169253 A CN 114169253A
Authority
CN
China
Prior art keywords
data stream
modeling
time
lstm
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111640787.4A
Other languages
Chinese (zh)
Other versions
CN114169253B (en
Inventor
施建明
刘亦飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Technology and Engineering Center for Space Utilization of CAS
Original Assignee
Technology and Engineering Center for Space Utilization of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technology and Engineering Center for Space Utilization of CAS filed Critical Technology and Engineering Center for Space Utilization of CAS
Priority to CN202111640787.4A priority Critical patent/CN114169253B/en
Publication of CN114169253A publication Critical patent/CN114169253A/en
Application granted granted Critical
Publication of CN114169253B publication Critical patent/CN114169253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of industrial big data, in particular to a method and a system for dynamically predicting data flow based on Flink and LSTM. The method comprises the following steps: acquiring time sequence data generated by a monitored system in real time, forming an input data stream according to the time sequence data, converting the input data stream to obtain a marked data stream, and processing the marked data stream to form a data set which is continuously accumulated along with time; judging whether a modeling action instruction is triggered or not according to the real-time state of the input data stream; when the judgment result is yes, carrying out model construction based on the data set to obtain a prediction model; and predicting the data elements in the marked data stream through the prediction model to obtain a prediction result. According to the method, key modeling and prediction tasks do not need to be manually executed, and the method is automatically completed under the traction of the data stream, so that a large amount of manpower is saved, and the efficiency is improved.

Description

Data flow dynamic prediction method and system based on Flink and LSTM
Technical Field
The invention relates to the field of industrial big data, in particular to a method and a system for dynamically predicting data flow based on Flink and LSTM.
Background
In the industrial fields of aerospace, aviation, nuclear power, energy, chemical engineering, ships, rail transit and the like, along with the improvement of the automation level of industrial equipment, more and more sensors are integrated into an industrial system, and abundant data source support is provided for monitoring and predicting the running state of the equipment. The performance parameter data collected and formed by the sensors is generally time sequence data with time stamps, and contains the health degradation rule of the monitored system along with the time. In order to predict the future health situation, a prediction model needs to be established according to the occurred data, and then the model is used for carrying out predictive analysis, which has important significance for guiding the use and maintenance planning of equipment.
However, the data change with time shows dynamic characteristics, the unknown data at the current moment become known with the time, and the model can be updated continuously to develop more accurate prediction. The conventional timing prediction method generally comprises the following steps: 1. at the current moment, selecting a target section from historical data to obtain a data set for model training; 2. training the data set to obtain a model, and then predicting unknown data extrapolated for a period of time at the current moment by using the model; 3. and repeating the two steps to obtain a new model after new known data are added, so that the prediction can be carried out by using the new model. The modeling of the existing time sequence prediction and the two operations which are isolated from each other by the model have two problems: 1. the whole process is lack of data flow traction, and when the first modeling is carried out and the model is updated, the data flow traction is realized in a manual mode; 2. modeling and using the model are not carried out under the same framework, the modeling and using the model lack interaction, and a prediction program does not know when a new model exists, so that the prediction is disconnected from the current time sequence data situation.
Disclosure of Invention
The invention aims to provide a method and a system for dynamically predicting data flow based on Flink and LSTM.
The technical scheme for solving the technical problems is as follows: a dynamic prediction method of time-series data flow based on Flink and LSTM comprises the following steps:
acquiring time sequence data generated by a monitored system in real time, forming an input data stream according to the time sequence data, converting the input data stream to obtain a marked data stream, and storing the marked data stream to form a data set which is continuously accumulated along with time;
judging whether a modeling action instruction is triggered or not according to the real-time state of the input data stream;
when the judgment result is yes, carrying out model construction based on the data set to obtain a prediction model;
and predicting the data elements in the marked data stream through the prediction model to obtain a prediction result.
The invention has the beneficial effects that: an effective trigger mechanism for the first modeling and the model updating can be formed through the processing of the input data stream; furthermore, the tagged data stream may be such that the data stream carries information on whether a model exists or is updated, such that prediction can be performed automatically using the most up-to-date model and under the pull of the data stream. According to the method, key modeling and prediction tasks do not need to be manually executed, and the method is automatically completed under the traction of the data stream, so that a large amount of manpower is saved, and the efficiency is improved.
Further, converting the input data stream to obtain a marked data stream specifically includes:
and converting the input data stream through a Flimk map operator to obtain a marked data stream.
Further, the determining whether to trigger the modeling action instruction according to the real-time state of the input data stream specifically includes:
judging whether the real-time state meets a preset condition, triggering a modeling action instruction when the real-time state meets the preset condition, and not triggering the modeling action instruction if the real-time state does not meet the preset condition; the preset condition is that data elements in the input data stream exceed a preset limit value.
Further, when the judgment result is yes, performing model construction based on the data set, and obtaining a prediction model specifically includes:
and when the judgment result is yes, firstly triggering modeling based on the LSTM, simultaneously starting a periodic modeling cycle timer, entering a timing updating mechanism, triggering modeling based on the LSTM at fixed intervals based on the timing updating mechanism, and taking the LSTM model established in the fixed period as the latest prediction model.
Further, the timing update mechanism specifically includes:
registering and starting a timer, triggering modeling based on the LSTM when the timer is up, registering a new timer, and starting the new timer as the timer.
Further, the triggering LSTM-based modeling specifically includes:
and calling the encapsulated modeling subprogram, and carrying out LSTM modeling on the data set based on the modeling subprogram.
Further, the predicting the marked data stream by the prediction model to obtain a prediction result specifically includes:
and judging the marker of each data element in the marked data stream in real time, and inputting the data element with the true judgment result into the prediction model to obtain a prediction result.
Another technical solution of the present invention for solving the above technical problems is as follows: a system for dynamic prediction of a time-series data stream based on Flink and LSTM, comprising:
the system comprises a Flink read-write module, a data acquisition module and a data processing module, wherein the Flink read-write module is used for acquiring time sequence data generated by a monitored system in real time, forming an input data stream according to the time sequence data, converting the input data stream to obtain a marked data stream, and processing the marked data stream to form a data set which is continuously accumulated along with time;
the triggering and traction module is used for judging whether a modeling action instruction is triggered or not according to the real-time state of the input data stream;
the LSTM modeling module is used for carrying out model construction based on the data set to obtain a prediction model when the judgment result is yes;
and the dynamic prediction module is used for predicting the data elements in the marked data stream through the prediction model to obtain a prediction result.
The invention has the beneficial effects that: an effective trigger mechanism for the first modeling and the model updating can be formed through the processing of the input data stream; furthermore, the tagged data stream may be such that the data stream carries information on whether a model exists or is updated, such that prediction can be performed automatically using the most up-to-date model and under the pull of the data stream. According to the method, key modeling and prediction tasks do not need to be manually executed, and the method is automatically completed under the traction of the data stream, so that a large amount of manpower is saved, and the efficiency is improved.
Further, converting the input data stream to obtain a marked data stream specifically includes:
and converting the input data stream through a Flimk map operator to obtain a marked data stream.
Further, the determining whether to trigger the modeling action instruction according to the real-time state of the input data stream specifically includes:
judging whether the real-time state meets a preset condition, triggering a modeling action instruction when the real-time state meets the preset condition, and not triggering the modeling action instruction if the real-time state does not meet the preset condition; the preset condition is that data elements in the input data stream exceed a preset limit value.
Further, when the judgment result is yes, performing model construction based on the data set, and obtaining a prediction model specifically includes:
and when the judgment result is yes, firstly triggering modeling based on the LSTM, simultaneously starting a periodic modeling cycle timer, entering a timing updating mechanism, triggering modeling based on the LSTM at fixed intervals based on the timing updating mechanism, and taking the LSTM model established in the fixed period as the latest prediction model.
Further, the timing update mechanism specifically includes:
registering and starting a timer, triggering modeling based on the LSTM when the timer is up, registering a new timer, and starting the new timer as the timer.
Further, the triggering LSTM-based modeling specifically includes:
and calling the encapsulated modeling subprogram, and carrying out LSTM modeling on the data set based on the modeling subprogram.
Further, the predicting the marked data stream by the prediction model to obtain a prediction result specifically includes:
and judging the marker of each data element in the marked data stream in real time, and inputting the data element with the true judgment result into the prediction model to obtain a prediction result.
Drawings
FIG. 1 is a schematic flowchart of a dynamic prediction method for a time-series data stream based on Flink and LSTM according to an embodiment of the present invention;
FIG. 2 is a structural framework diagram provided by an embodiment of the dynamic prediction system for time-series data flow based on Flink and LSTM of the present invention;
FIG. 3 is a schematic diagram of the output total voltage timing data of the fuel cell stack provided by an embodiment of the method for dynamically predicting the timing data stream based on Flink and LSTM of the present invention;
FIG. 4 is a schematic diagram of an LSTM dynamic prediction time provided by an embodiment of a dynamic prediction method for a time series data stream based on Flink and LSTM of the present invention;
FIG. 5 is a schematic diagram of modeling delay time dispersion provided by an embodiment of a dynamic prediction method for a time-series data stream based on Flink and LSTM of the present invention;
FIG. 6 is a schematic diagram of LSTM predicted delay time scatter provided by an embodiment of a dynamic prediction method for a time-series data stream based on Flink and LSTM of the present invention;
fig. 7 is a schematic diagram of an LSTM prediction error curve provided by an embodiment of the dynamic prediction method for a time-series data stream based on Flink and LSTM of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with examples which are set forth to illustrate, but are not to be construed to limit the scope of the invention.
As shown in fig. 1, a dynamic prediction method for time-series data stream based on Flink and LSTM includes:
acquiring time sequence data generated by a monitored system in real time, forming an input data stream according to the time sequence data, converting the input data stream to obtain a marked data stream, and processing the marked data stream to form a data set which is continuously accumulated along with time;
judging whether a modeling action instruction is triggered or not according to the real-time state of the input data stream;
when the judgment result is yes, carrying out model construction based on the data set to obtain a prediction model;
and predicting the data elements in the marked data stream through the prediction model to obtain a prediction result.
In some possible embodiments, an effective trigger mechanism for the first modeling and model updating can be formed through the processing of the input data stream in the application; furthermore, the tagged data stream may be such that the data stream carries information on whether a model exists or is updated, such that prediction can be performed automatically using the most up-to-date model and under the pull of the data stream. According to the method, key modeling and prediction tasks do not need to be manually executed, and the method is automatically completed under the traction of the data stream, so that a large amount of manpower is saved, and the efficiency is improved.
It should be noted that, the time sequence data of the monitored system is acquired in real time to form an input data stream, and the input data stream is converted to obtain a marked data stream; writing the marked data stream into an external system, which can be an external file system or an external database system, and gradually accumulating the data stream to form a data set for modeling as time goes on;
judging whether data elements in the input data stream exceed a preset limit value, if so, triggering modeling based on an LSTM for the first time, and simultaneously starting a periodic modeling cycle timer to enter a timing updating mechanism, namely, immediately registering and starting the timer after a first modeling instruction is triggered, judging whether the preset time is reached by the timer, if so, triggering modeling based on the LSTM, registering and starting a new timer, namely, registering and starting the new timer after each modeling is triggered, so as to realize a continuous periodic modeling triggering mechanism. Based on the timing updating mechanism, the modeling based on the LSTM is triggered at fixed intervals, and the LSTM model established in the fixed period is used as the latest prediction model; the model is built by calling the encapsulated modeling subprogram and completing model building by using a data set according to a trigger instruction;
and executing dynamic prediction operation on the marked time sequence data stream, continuously judging whether the marker of each data element in the marked data stream is true, and if so, inputting the data element into a prediction model to obtain a prediction result.
The specific process of acquiring the time series data generated by the monitored system in real time, forming an input data stream according to the time series data, converting the input data stream to obtain a marked data stream, and processing the marked data stream to form a data set which is continuously accumulated along with time can refer to example 1. The conversion of the input data stream may be: the input data stream is converted into a marked data stream using a Flink map operator. And adding 1 Flag field in each element in the data stream (namely each data point in the data stream), wherein the specific value of the Flag field is controlled by a variable Flag of the singleton object. And the conversion process is that each piece of data in the data stream is marked.
Embodiment 1, as the monitored system monitors the real-time occurrence and flooding of data flows, the data flows need to be cached in order to invoke the latest known data when modeling. Establishing a data channel from the Flink source to the sink: data streams are continuously ingested by reading the kafka source operator through Flink, using kafka as the data source. The original data stream is converted into a marked data stream, wherein the mark is the key for realizing dynamic prediction. The data stream is then incrementally written to an external system, which may be a file system or a database (e.g., MySQL database), by the Flink sink operator.
The specific process of determining whether to trigger the modeling action command according to the real-time state of the input data stream may refer to embodiment 2.
The real-time status can be understood here as whether the conditions that require the development of predictive analysis are met.
Embodiment 2, the state of data in an input data stream is judged in real time, and when a condition that predictive analysis needs to be performed is satisfied, a primary modeling action instruction is triggered. The conditions for carrying out the predictive analysis that meet the requirements can be understood as: the setting is based on user requirements, for example, the temperature is 38 ℃ at the beginning, the data flow slowly rises, and the modeling instruction is triggered when the data flow rises to 39 ℃. It is a threshold judgment and has no hard requirement.
If the result of the determination is yes, model construction is performed based on the data set, and the specific process of obtaining the prediction model can refer to example 3.
In embodiment 3, after the first modeling instruction is issued, an LSTM model periodic update mechanism is created by cycling a timer, and when the timer expires, an action instruction for periodically updating the model is triggered.
After the two instructions are issued, the LSTM modeling program is automatically called, and the latest data set can be read when the modeling program runs. Wherein, the latest data set is: the external system continuously accumulates data from the time when the data stream is written into the external system to the time when the modeling command is issued, and the latest data set is the data read from the external system for the latest period of time. And after the modeling is finished, immediately changing a Flag value, wherein the Flag value is that whether a new model is generated or not is judged by the prediction program according to the Flag value, so that the prediction analysis is carried out to inform that the prediction program can start to operate. The specific flow of LSTM online modeling based on data flow triggering can be understood with reference to example 4.
Embodiment 4, the data flow is judged in real time, when the data exceeds a preset limit value, it indicates that it is necessary to start predicting the future change situation of the data, and at this time, the LSTM first modeling response mechanism is triggered. For data that exceeds a preset limit, may be: the values of the data themselves, such as temperature values, voltage values, etc., and the data in the state monitoring data stream are all physical quantities. The specific procedure for the first modeling of LSTM can be understood with reference to example 5.
The LSTM modeling program is packaged and called by a modeling triggering instruction, the input of the program is a data set, and the output of the program is an LSTM model.
And immediately starting a periodic modeling cycle timer after a first modeling instruction is sent out, wherein interval period parameters of the timer are configurable, and a user selects a reasonable model updating period according to actual conditions.
When the timer reaches the trigger time, the ith (i is 1,2,3, 4.) modeling task of the LSTM is triggered immediately, and the LSTM modeling program is called.
And after the LSTM modeling is finished, the LSTM model is stored in the designated path, so that the use of a prediction program is facilitated, and when a new model is generated, the original model is covered.
The above process can be summarized as follows: the method comprises the steps of flexibly and automatically realizing the first modeling and periodic updating of the LSTM under the mediation of Flink stream processing, calling a program for executing the LSTM modeling after a trigger instruction is sent out, and reading the latest data in a database by the program to obtain a training data set required by the LSTM modeling.
Embodiment 5, the first modeling and timing update triggering mechanism of LSTM is specifically as follows:
as mentioned above, the key to realizing LSTM online modeling is to realize the trigger mechanism of first modeling and timing update through the Flink stream processing. This mechanism is embodied herein using a custom processing function that is inherited from the Flink KeyedProcessFunction.
The function has 2 parameters, the limit to determine whether the first modeling is performed and the interval duration of the timing update.
When the data in the data flow exceeds the limit value and no timer exists, the first modeling operation is immediately triggered, and the timer is immediately registered.
The LSTM modeling trigger may be implemented in a listener mode, and in the processing function, only the modeling instruction needs to be issued, and the modeling program may listen to the instruction and execute it automatically.
And when the timer reaches the trigger time, immediately triggering the modeling operation, simultaneously registering a new timer, and realizing the periodic updating of the LSTM model in a way that the new timer is immediately registered after the old timer is finished.
The specific process of predicting the data elements in the marked data stream by the prediction model to obtain the prediction result can refer to embodiment 6 and embodiment 7.
Embodiment 6, the key of the time sequence prediction is to obtain the latest LSTM model, so that the time sequence data prediction can closely track the situation of the data stream change at the moment after the first modeling and model updating are completed, that is, the latest model is loaded into the time sequence data prediction program. The predictor loads the current model and inputs it to the LSTM model with the known data points at the latest time, where the known data points can be: if the data point corresponding to the data stream at the predicted time, such as the data stream of the battery voltage, lasts for 5 minutes and the LSTM model is trained with the 5 minutes of data, the LSTM model can be used to predict future data using the voltage value corresponding to the data stream at the current predicted time. And calculating and outputting a data sequence in a future period of time by the model, namely completing the prediction of the time sequence data. After the prediction is completed, the Flag value is changed immediately.
Example 7, the processing system continuously ingests the stream of object performance data, with new data continually being flooded, and with a subsequently updated LSTM model. Based on a Flink flow processing framework, the LSTM modeling completion time is used as a prediction reference time, data of the reference time is used as an input variable of a prediction program, the LSTM model is called by the prediction program to carry out calculation, and a prediction sequence with a certain length is output, namely the prediction analysis of the future development trend of the data flow is completed.
Considering that the first prediction calculation task needs to be triggered immediately after the first modeling task is completed, and a new LSTM model needs to be loaded and prediction starts after the model is updated, a dynamic prediction mechanism based on Flink data stream traction is designed.
In a Flag program written based on a scala language, a sample class packet capacity (timing: Long, sensory: String, capacity: Double, predflag: Boolean) is designed, and an original data stream to be processed is mapped into a sample class format data stream containing a predflag field, wherein the predflag field is assigned by a variable Flag of a singleton object with an initial value of false, so that a Flag for dynamic prediction indication is injected into the converted data stream.
When LSTM modeling is completed, Flag is assigned to true.
If the predflag is true, the prediction program is triggered, and Flag is assigned to false immediately after prediction is finished.
And after the next LSTM modeling is finished, assigning Flag to true, and repeating the steps in such a way, wherein the Flag prediction program can be automatically loaded into the LSTM model to dynamically perform prediction analysis according to the generation and updating conditions of the LSTM model.
Corresponding to the LSTM modeling subprogram, a time sequence data prediction program set based on the LSTM model is developed by adopting Matlab or Python, and the program set is packaged into a jar packet for calling by a Flink stream processing main program based on a scala language.
The data stream changing in real time is processed by adopting a KeyedProcessFunction custom function inherited to Flink, and the core of the function is to execute prediction analysis based on LSTM on each piece of data (namely data stream element) in the data stream, and the specific prediction is realized by calling a prediction subprogram by the function.
Because the data stream element contains the dynamically changing predflag field, after the data stream element is transmitted to the prediction function, the prediction function can determine whether to load a new LSTM model and carry out prediction according to the field.
And taking the data stream element corresponding to the prediction moment as an input variable of the LSTM prediction, introducing forward predicted time step number stepForward, and predicting a data sequence of future stepForward step number with the current data point as a reference under the current LSTM model.
Preferably, in any of the above embodiments, converting the input data stream to obtain a marked data stream specifically includes:
and converting the input data stream through a Flimk map operator to obtain a marked data stream.
Preferably, in any of the above embodiments, the determining whether to trigger the modeling action instruction according to the real-time state of the input data stream specifically includes:
judging whether the real-time state meets a preset condition, triggering a modeling action instruction when the real-time state meets the preset condition, and not triggering the modeling action instruction if the real-time state does not meet the preset condition; the preset condition is that the summarized data elements of the input data streams exceed a preset limit value.
Preferably, in any embodiment above, when the determination result is yes, performing model construction based on the data set, and obtaining the prediction model specifically includes:
and when the judgment result is yes, firstly triggering modeling based on the LSTM, simultaneously starting a periodic modeling cycle timer, entering a timing updating mechanism, triggering modeling based on the LSTM at fixed intervals based on the timing updating mechanism, and taking the LSTM model established in the fixed period as the latest prediction model.
Preferably, in any of the above embodiments, the timing update mechanism specifically includes:
registering and starting a timer, triggering modeling based on the LSTM when the timer is up, registering a new timer, and starting the new timer as the timer.
Preferably, in any of the above embodiments, the triggering LSTM-based modeling specifically includes:
and calling the encapsulated modeling subprogram, and carrying out LSTM modeling on the data set based on the modeling subprogram.
It should be noted that the modeling subroutine may be: the LSTM modeling subprogram is packaged, so that the program can be directly called in a Flink self-defined processing function to complete modeling, and the LSTM modeling subprogram can adapt to different data streams and is convenient to use through standardization and modular design of the modeling subprogram. An LSTM modeling program set of sequence-to-sequence is developed through Python/Matlab and the like, and an entry main function is written to package the whole modeling program set into a jar packet.
In addition, the LSTM modeling suite includes several programs:
a. an input data set reading program required by modeling;
LSTM algorithm configuration parameter reading and analyzing program;
LSTM model training program;
d. model output and save procedures.
It should be noted that the LSTM model training adopts a "sequence-to-sequence" method, i.e. seq1(x1,x2,...xN-1) As input, seq2(x1,x2,...xN) As an output, an LSTM regression model is constructed, which can establish a correlation model between the data at the current time and the data at the next time, and thus be used for the extrapolation prediction of the data sequence.
Further, the predicting the marked data stream by the prediction model to obtain a prediction result specifically includes:
and judging the marker of each data element in the marked data stream in real time, and inputting the data element with the true judgment result into the prediction model to obtain a prediction result.
Example 8, in the practical application process, it is also necessary to ensure timeliness and accuracy for the modeling and prediction processes, and therefore there are tests for modeling and tests for prediction, and the test contents can be roughly classified into functional tests and performance tests. Using the performance degradation test data of the fuel cell as a data source, wherein the total output voltage U of the battery packtotFor the key index of measuring the battery performance, as the battery is used, U istotThe time series data shows a certain degradation rule, as shown in fig. 3. The abscissa in the third diagram represents time and the ordinate represents the voltage values in the data elements. The change of the battery performance can be predicted by constructing a sequence-to-sequence LSTM model.
For ease of understanding, the present invention is further explained by means of fig. 3 to 7.
Through the real-time detection of the battery performance testing system, the corresponding battery performance monitoring data, namely the monitoring time sequence data stream is obtained, wherein for the convenience of testing and verification, a producer of the data stream is realized through kafka, and the data of fig. 3 is collected and issued from the 1 st point at intervals of 5s, so that the Flink system can consume the data. In the actual degradation test of the fuel cell product, the span is more than 1000 hours, 654 pieces of performance data are collected, and for the convenience of the test, the whole process is quickly released by simulating a data source. Only 1 battery pack data is used in the test, and considering that the actual system may meet the requirement of simultaneously monitoring and predicting the performance degradation of a plurality of sets of battery packs in the process of processing, the format of corresponding performance monitoring data, namely one monitoring data point in the monitoring time sequence data stream, is as follows:
(timestamp,batteryid,Utot)
wherein, the timestamp field is the time of data point generation, which is the event time; batteryid is the battery pack number; u shapetotIs the performance parameter collected at that moment.
As time goes on, a series of detection time series data streams composed of monitoring data points can be obtained, that is, the corresponding time series of battery performance values contain abundant time series information, and the trend of future performance needs to be predicted according to the current variation trend.
It can be seen that fig. 3 is the entire data collected in real time throughout the test, covering the entire process from good to bad cells. In an actual use scene, the performance data of the battery is generated continuously according to a certain frequency, and a performance monitoring data stream, namely a monitoring time sequence data stream, is formed. It will be appreciated that in a practical scenario, where the system receives and accumulates different data at different stages, and the battery is in different degradation stages, the LSTM model trained from known data will be constantly updated.
The functional test and the performance test including online LSTM modeling and prediction are shown in Table 1.
TABLE 1
Figure BDA0003443753350000141
For online LSTM modeling functional testing:
the LSTM modeling includes first-time modeling and periodic modeling, which are developed as the data stream advances. During testing, the limit value of the first modeling data stream element is set to be less than or equal to 3.35, if the element in the data stream meets the limit value setting condition, the first modeling is triggered, a periodic modeling cycle timer is started simultaneously, the interval duration of timing updating is set to be 40s, modeling is carried out again when the timer is up, and the new model is modeled according to the newly accumulated data in a period of time. The timeframe data for each modeling trigger time and the recorded modeling program start time are listed in table 2. It can be seen that the difference between the triggering time and the real modeling starting time is basically in the range of tens of milliseconds (ms) except for the first modeling, and the online LSTM modeling is automated under the traction of data flow. On a given path, the LSTM model file is continually overwritten with each modeling task completed.
TABLE 2 on-line LSTM modeling function test results
Figure BDA0003443753350000142
Figure BDA0003443753350000151
Figure BDA0003443753350000161
Regarding dynamic predictive functional testing:
after each modeling is completed, 1 new LSTM model is generated on the specified path and the original model is covered. Meanwhile, the prediction of the time sequence data flow is carried out, the prediction results at different prediction moments are obtained, and the dynamic prediction function can normally operate.
Delay time test for LSTM modeling:
and automatically recording the starting time and the ending time of each modeling in the program running process, automatically calculating the difference between the starting time and the ending time, storing modeling record data into an external system, extracting modeling delay time data from the modeling record data, and drawing a scatter diagram shown in figure 5, wherein the abscissa is the modeling frequency, and the ordinate is the delay time of each modeling. Statistical analysis results in a modeled lowest delay of 10987ms, highest delay of 22363ms, and average delay of 18711 ms.
Prediction with respect to dynamic predictive functional testing:
as shown in fig. 4, the abscissa represents the cumulative prediction times, and the ordinate represents the time corresponding to the occurrence of the prediction. The time value is an event time extracted from the data stream, and an input variable provided for the LSTM model during prediction is a battery voltage value corresponding to the time.
Prediction with respect to the prediction delay time test:
the delay time from the data flow to the prediction result of each prediction is automatically recorded in the program running process, the prediction recorded data is stored in an external system, the prediction delay time data is extracted from the prediction recorded data, and a scatter diagram is drawn, as shown in fig. 6, wherein the abscissa is the prediction times and the ordinate is the delay time of each prediction. Statistical analysis gave a predicted minimum delay of 4661ms, a maximum delay of 5931ms, and an average delay of 5013 ms.
Prediction with respect to prediction error test:
for each prediction operation, extracting corresponding 4 future predicted values, extracting 4 future true values corresponding to the predicted point in the data stream from the moment, and solving the Root Mean Square Error (RMSE) of the 4 future predicted values to obtain a prediction error curve shown in fig. 7, wherein the abscissa is the prediction frequency, the ordinate is the error of each prediction, and the prediction error is obtained through statistical analysis to have a minimum value of 0.0029, a maximum value of 0.0761, and an average value of 0.0300.
As shown in fig. 2, a dynamic prediction system for time-series data stream based on Flink and LSTM comprises:
the Flink read-write module 100 is configured to acquire time series data generated by a monitored system in real time, form an input data stream according to the time series data, convert the input data stream to obtain a marked data stream, and process the marked data stream to form a data set that is continuously accumulated over time;
the triggering and traction module 200 is used for judging whether a modeling action instruction is triggered according to the real-time state of the input data stream;
the LSTM modeling module 300 is used for carrying out model construction based on the data set to obtain a prediction model when the judgment result is yes;
and the dynamic prediction module 400 is configured to predict the data elements in the marked data stream through the prediction model to obtain a prediction result.
In some possible embodiments, an effective trigger mechanism for the first modeling and model updating can be formed through the processing of the input data stream in the application; furthermore, the tagged data stream may be such that the data stream carries information on whether a model exists or is updated, such that prediction can be performed automatically using the most up-to-date model and under the pull of the data stream. According to the method, key modeling and prediction tasks do not need to be manually executed, and the method is automatically completed under the traction of the data stream, so that a large amount of manpower is saved, and the efficiency is improved.
Preferably, in any of the above embodiments, converting the input data stream to obtain a marked data stream specifically includes:
and converting the input data stream through a Flimk map operator to obtain a marked data stream.
Preferably, in any of the above embodiments, the determining whether to trigger the modeling action instruction according to the real-time state of the input data stream specifically includes:
judging whether the real-time state meets a preset condition, triggering a modeling action instruction when the real-time state meets the preset condition, and not triggering the modeling action instruction if the real-time state does not meet the preset condition; the preset condition is that the summarized data elements of the input data streams exceed a preset limit value.
Preferably, in any embodiment above, when the determination result is yes, performing model construction based on the data set, and obtaining the prediction model specifically includes:
and when the judgment result is yes, firstly triggering modeling based on the LSTM, simultaneously starting a periodic modeling cycle timer, entering a timing updating mechanism, triggering modeling based on the LSTM at fixed intervals based on the timing updating mechanism, and taking the LSTM model established in the fixed period as the latest prediction model.
Preferably, in any of the above embodiments, the timing update mechanism specifically includes:
registering and starting a timer, triggering modeling based on the LSTM when the timer is up, registering a new timer, and starting the new timer as the timer.
Preferably, in any of the above embodiments, the triggering LSTM-based modeling specifically includes:
and calling the encapsulated modeling subprogram, and carrying out LSTM modeling on the data set based on the modeling subprogram.
Further, the predicting the marked data stream by the prediction model to obtain a prediction result specifically includes:
and judging the marker of each data element in the marked data stream in real time, and inputting the data element with the true judgment result into the prediction model to obtain a prediction result.
The reader should understand that in the description of this specification, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described method embodiments are merely illustrative, and for example, the division of steps into only one logical functional division may be implemented in practice in another way, for example, multiple steps may be combined or integrated into another step, or some features may be omitted, or not implemented.
The above method, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A dynamic prediction method for time-series data flow based on Flink and LSTM is characterized by comprising the following steps:
acquiring time sequence data generated by a monitored system in real time, forming an input data stream according to the time sequence data, converting the input data stream to obtain a marked data stream, and processing the marked data stream to form a data set which is continuously accumulated along with time;
judging whether a modeling action instruction is triggered or not according to the real-time state of the input data stream;
when the judgment result is yes, carrying out model construction based on the data set to obtain a prediction model;
and predicting the data elements in the marked data stream through the prediction model to obtain a prediction result.
2. The method of claim 1, wherein the converting the input data stream to obtain the marked data stream specifically comprises:
and converting the input data stream through a Fl imk map operator to obtain a marked data stream.
3. The method for dynamically predicting a time-series data stream based on Flink and LSTM according to claim 1, wherein the determining whether to trigger a modeling action command according to the real-time status of the input data stream specifically comprises:
judging whether the real-time state meets a preset condition, triggering a modeling action instruction when the real-time state meets the preset condition, and not triggering the modeling action instruction if the real-time state does not meet the preset condition; the preset condition is that data elements in the input data stream exceed a preset limit value.
4. The dynamic prediction method for time-series data flow based on Flink and LSTM according to claim 1, wherein when the determination result is yes, performing model construction based on the data set, and obtaining a prediction model specifically comprises:
and when the judgment result is yes, firstly triggering modeling based on the LSTM, simultaneously starting a periodic modeling cycle timer, entering a timing updating mechanism, triggering modeling based on the LSTM at fixed intervals based on the timing updating mechanism, and taking the LSTM model established in the fixed period as the latest prediction model.
5. The method for dynamically predicting a time-series data stream based on Flink and LSTM according to claim 4, wherein said timing update mechanism specifically comprises:
registering and starting a timer, triggering modeling based on the LSTM when the timer is up, registering a new timer, and starting the new timer as the timer.
6. The dynamic prediction method for time-series data flow based on Flink and LSTM according to claim 5, wherein the triggering of modeling based on LSTM specifically comprises:
and calling the encapsulated modeling subprogram, and carrying out LSTM modeling on the data set based on the modeling subprogram.
7. The method according to claim 1, wherein the predicting the marked data stream by the prediction model to obtain a prediction result specifically comprises:
and judging the marker of each data element in the marked data stream in real time, and inputting the data element with the true judgment result into the prediction model to obtain a prediction result.
8. A system for dynamic prediction of a time-series data stream based on Flink and LSTM, comprising:
the system comprises a Flink read-write module, a data acquisition module and a data acquisition module, wherein the Flink read-write module is used for acquiring time sequence data generated by a monitored system in real time, forming an input data stream according to the time sequence data, converting the input data stream to obtain a marked data stream, and storing and processing the marked data stream to form a data set which is continuously accumulated along with time;
the triggering and traction module is used for judging whether a modeling action instruction is triggered or not according to the real-time state of the input data stream, and the marked data stream is used for traction dynamic prediction analysis;
the LSTM modeling module is used for carrying out model construction based on the data set when a trigger instruction is sent out to obtain an LSTM prediction model;
and the dynamic prediction module is used for predicting the data elements in the marked data stream through the prediction model to obtain a prediction result.
9. The system according to claim 8, wherein the converting the input data stream to obtain the marked data stream specifically comprises:
and converting the input data stream through a Fl imk map operator to obtain a marked data stream.
10. The dynamic prediction system of time-series data stream based on Flink and LSTM according to claim 8, wherein the determining whether to trigger a modeling action command according to the real-time status of the input data stream specifically comprises:
judging whether the real-time state meets a preset condition, triggering a modeling action instruction when the real-time state meets the preset condition, and not triggering the modeling action instruction if the real-time state does not meet the preset condition; the preset condition is that data elements in the input data stream exceed a preset limit value.
CN202111640787.4A 2021-12-29 2021-12-29 Data flow dynamic prediction method and system based on Flink and LSTM Active CN114169253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111640787.4A CN114169253B (en) 2021-12-29 2021-12-29 Data flow dynamic prediction method and system based on Flink and LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111640787.4A CN114169253B (en) 2021-12-29 2021-12-29 Data flow dynamic prediction method and system based on Flink and LSTM

Publications (2)

Publication Number Publication Date
CN114169253A true CN114169253A (en) 2022-03-11
CN114169253B CN114169253B (en) 2022-07-19

Family

ID=80488603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111640787.4A Active CN114169253B (en) 2021-12-29 2021-12-29 Data flow dynamic prediction method and system based on Flink and LSTM

Country Status (1)

Country Link
CN (1) CN114169253B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116384592A (en) * 2023-06-01 2023-07-04 广东宏大欣电子科技有限公司 Health prediction method of energy power generation equipment based on real-time data stream processing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797122A (en) * 2020-05-28 2020-10-20 浙江大学 Method and device for predicting change trend of high-dimensional reappearance concept drift stream data
CN111970163A (en) * 2020-06-30 2020-11-20 网络通信与安全紫金山实验室 Network flow prediction method of LSTM model based on attention mechanism
WO2021223522A1 (en) * 2020-10-10 2021-11-11 上海星融汽车科技有限公司 Intelligent vehicle diagnosis method and system, and diagnosis device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797122A (en) * 2020-05-28 2020-10-20 浙江大学 Method and device for predicting change trend of high-dimensional reappearance concept drift stream data
CN111970163A (en) * 2020-06-30 2020-11-20 网络通信与安全紫金山实验室 Network flow prediction method of LSTM model based on attention mechanism
WO2021223522A1 (en) * 2020-10-10 2021-11-11 上海星融汽车科技有限公司 Intelligent vehicle diagnosis method and system, and diagnosis device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
施建明等: "空间站有效载荷预测性维护支持系统设计", 《载人航天》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116384592A (en) * 2023-06-01 2023-07-04 广东宏大欣电子科技有限公司 Health prediction method of energy power generation equipment based on real-time data stream processing

Also Published As

Publication number Publication date
CN114169253B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN112731159B (en) Method for pre-judging and positioning battery faults of battery compartment of energy storage power station
Park et al. LSTM-based battery remaining useful life prediction with multi-channel charging profiles
CN114114049B (en) Lithium ion battery life prediction method based on sample migration
CN112763929A (en) Method and device for predicting health of battery monomer of energy storage power station system
CN112817280A (en) Implementation method for intelligent monitoring alarm system of thermal power plant
EP4083643A1 (en) Soh test method and apparatus
CN116455085B (en) Intelligent monitoring system of battery energy storage power station
CN116401585B (en) Energy storage battery failure risk assessment method based on big data
CN114169253B (en) Data flow dynamic prediction method and system based on Flink and LSTM
CN116381541B (en) Health assessment method and system for energy storage lithium battery system
CN116360377B (en) Data processing method for battery cell formation component capacity and distributed control system
Zhu et al. A novel based-performance degradation Wiener process model for real-time reliability evaluation of lithium-ion battery
CN114216558B (en) Method and system for predicting remaining life of battery of wireless vibration sensor
CN115267575A (en) Energy storage battery life prediction algorithm based on Transformer
CN117318033B (en) Power grid data management method and system combining data twinning
CN114548493A (en) Method and system for predicting current overload of electric energy meter
CN112580875B (en) Fault prediction method and system for power distribution device
CN113283679A (en) AI artificial intelligence based power load prediction system
CN114487848A (en) Method and device for calculating state of storage battery
CN112485676B (en) Battery energy storage system state estimation early warning method under digital mirror image
CN116860296B (en) Application method and system for remote upgrading of solid-state battery pack
Chen et al. A Precise Life Estimation Method for Retired Energy Storage Batteries Based on Energy Storage Batteries Attenuation Characteristics and XGBoost Algorithm
Kuang et al. State-of-charge estimation hybrid method for lithium-ion batteries using BiGRU and AM co-modified Seq2Seq network and H-infinity filter
KR102497271B1 (en) Apparatus and method for optimizing the use ratio of ESS and water electrolysis facilities considering deterioration cost
CN117686935B (en) Battery RUL prediction method based on voltage probability density

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant