CN111639060A

CN111639060A - Thermal power plant time sequence data processing method, device, equipment and medium

Info

Publication number: CN111639060A
Application number: CN202010512753.6A
Authority: CN
Inventors: 袁雪峰; 马成龙; 李晓静; 张含智; 陈世和; 陈木斌; 陈建华; 卫平宝; 聂怀志; 姜利辉
Original assignee: China Resource Power Technology Research Institute
Current assignee: China Resource Power Technology Research Institute
Priority date: 2020-06-08
Filing date: 2020-06-08
Publication date: 2020-09-08

Abstract

The application discloses a method, a device and equipment for processing time sequence data of a thermal power plant and a computer readable storage medium, wherein the method comprises the following steps: acquiring time sequence data acquired by a sensor in thermal power plant equipment; storing the time sequence data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system; reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set; and storing the calculated data set into a Hive database. According to the technical scheme, the distributed file system, the spark sql and the Hive databases are used for processing the time sequence data in a distributed deployment mode, so that the processing of the time sequence data can be distributed to different servers, and the processing capacity of the time sequence data of the thermal power plant can be improved conveniently.

Description

Thermal power plant time sequence data processing method, device, equipment and medium

Technical Field

The application relates to the technical field of thermal power historical data processing, in particular to a thermal power plant time sequence data processing method, device and equipment and a computer readable storage medium.

Background

The thermal power plant comprises a plurality of devices, wherein the devices are provided with sensors used for acquiring operation data of the corresponding devices at regular intervals (for example, 0.5s), namely the operation data of the devices can be obtained through the sensors, the operation data are mostly time sequence data (time sequence data for short), and the past time sequence data (historical time sequence data) are analyzed and processed, so that the operation parameters of the thermal power plant can be known to have better operation performance, namely the subsequent operation of the thermal power plant can be guided, and the thermal power plant can operate in a better state.

At present, time series data acquired by a sensor are mostly stored in a single server, and the single server is used for calculating and storing the stored time series data after calculation, but because thermal power plant equipment and the sensor are more and thermal power plants are generally in an operating state, the data volume of the corresponding time series data is larger and larger (taking GB and TB as units), at the moment, the single server is difficult to support huge storage and calculation amount, namely, the problem of limited processing capacity exists in the current mode of processing the time series data of the thermal power plants by adopting the single server.

In summary, how to improve the processing capability of the time series data of the thermal power plant is a technical problem to be solved urgently by those skilled in the art.

Disclosure of Invention

In view of the above, an object of the present application is to provide a method, an apparatus, a device and a computer readable storage medium for processing thermal power plant time series data, which are used to improve the processing capability of the thermal power plant time series data.

In order to achieve the above purpose, the present application provides the following technical solutions:

a thermal power plant time series data processing method comprises the following steps:

acquiring time sequence data acquired by a sensor in thermal power plant equipment;

storing the time series data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system;

reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set;

and storing the calculated data set into a Hive database.

Preferably, after acquiring the time series data collected by the sensor in the thermal power plant equipment, the method further includes:

storing the time series data in a time series database;

accordingly, storing the time series data in a distributed storage manner in a distributed file system, and forming a data set in the distributed file system, includes:

reading the time sequence data from the time sequence database at intervals of preset time, storing the read time sequence data in the distributed file system in a distributed storage mode, forming a data set in the distributed file system by the time sequence data read for the first time, and adding the time sequence data read except for the first time to the corresponding data set.

Preferably, after reading the data set into the spark calculation framework, the method further includes:

storing the dataset in the spark calculation framework in a DataFrame format.

Preferably, before the calculation of the data set by using spark sql, the method comprises the following steps:

the dataset was preprocessed with spark sql.

Preferably, preprocessing the data set by using spark sql comprises:

performing data elimination on the operation data corresponding to each type of operation parameter in the data set by using a 3 sigma criterion so as to eliminate outlier operation data;

comparing the operation data corresponding to each type of operation parameters with the corresponding set maximum value and set minimum value, and rejecting the operation data larger than the set maximum value and the operation data smaller than the set minimum value;

for each type of the operation parameters, eliminating operation data which are kept unchanged within a first set time length;

and for each type of the operation parameters, removing unstable operation data within a second set time length to obtain a preprocessed data set.

Preferably, the calculating the data set by using spark sql to obtain a calculated data set includes:

for each type of the operation parameters in the preprocessed data set, dividing parameter intervals according to the maximum value and the minimum value of corresponding operation data by utilizing spark sql;

aggregating the parameter intervals corresponding to the different types of the operation parameters by utilizing spark sql to obtain a plurality of working conditions;

and calculating the optimal operation data combination corresponding to each working condition by utilizing spark sql and a multi-objective fuzzy optimization algorithm to obtain the calculated data set.

Preferably, the storing the time series data in a distributed storage manner in a distributed file system includes:

and storing the time series data in the HDFS in a distributed storage mode.

A thermal power plant time series data processing apparatus comprising:

the acquisition module is used for acquiring time sequence data acquired by a sensor in thermal power plant equipment;

the first storage module is used for storing the time sequence data in a distributed file system in a distributed storage mode and forming a data set in the distributed file system;

the calculation module is used for reading the data set into a spark calculation frame and calculating the data set by utilizing spark sql to obtain a calculated data set;

and the second storage module is used for storing the calculated data set into the Hive database.

A thermal power plant time series data processing apparatus comprising:

a memory for storing a computer program;

a processor for implementing the steps of the thermal power plant time series data processing method as claimed in any one of the above when executing the computer program.

A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the thermal power plant time series data processing method according to any one of the above.

The application provides a method, a device, equipment and a computer readable storage medium for processing time series data of a thermal power plant, wherein the method comprises the following steps: acquiring time sequence data acquired by a sensor in thermal power plant equipment; storing the time sequence data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system; reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set; and storing the calculated data set into a Hive database.

According to the technical scheme, time sequence data collected by the sensors in the thermal power plant equipment are stored in the distributed file system in a distributed storage mode, the formed data set is read into the spark calculation frame, the spark sql calculates the data set, the calculated data set is stored in the Hive database, the distributed file system, the spark sql and the Hive database are all used for processing the time sequence data in a distributed deployment mode, namely the time sequence data can be stored and calculated on a plurality of servers in a cluster mode, and therefore processing of the time sequence data can be distributed to different servers, and processing capacity of the thermal power plant time sequence data is improved conveniently.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a method for processing time series data of a thermal power plant according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a thermal power plant time series data processing apparatus according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a thermal power plant time series data processing device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, which shows a flowchart of a method for processing time series data of a thermal power plant according to an embodiment of the present application, a method for processing time series data of a thermal power plant according to an embodiment of the present application may include:

s11: the method comprises the steps of obtaining time sequence data collected by a sensor in thermal power plant equipment.

The method comprises the steps of obtaining time sequence data collected by different sensors in thermal power plant equipment, wherein the sensors can be arranged on the thermal power plant equipment in each link of production and operation of the thermal power plant, and each time sequence data can comprise information of the thermal power plant equipment, information of the sensors, operation data of collected operation parameters, collection time and the like.

S12: the time sequence data is stored in a distributed file system in a distributed storage mode, and a data set is formed in the distributed file system.

After the time series data is obtained, the time series data can be stored in a distributed file system in a distributed storage mode, wherein the distributed file system adopts distributed deployment, namely the time series data can be deployed and stored on a plurality of servers in a cluster mode, and therefore, the expansibility can be effectively improved by adopting the method for deploying and storing, and the storage of a large amount of time series data can be met.

In addition, the time series data stored in the distributed file system can be formed into a data set so as to be convenient for subsequent calculation processing.

The time sequence data may be stored in binary data formats such as CSV, JSON, TXT, Avro, and partial in the distributed file system, and may be stored in other formats.

S13: reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set.

After the time series data stored in the distributed file system in a distributed storage mode are formed into a data set, the data set can be read into a spark calculation framework, and then the data set can be calculated by utilizing spark sql to obtain a calculated data set, so that the subsequent operation of the thermal power plant can be guided according to the calculated data set.

The spark calculation framework is a fast general calculation engine specially designed for large-scale data processing, is an open-source cluster calculation environment, is specifically a cluster calculation framework developed by UC Berkeley AMP lab, is similar to Hadoop (but has many differences), is optimized to the greatest extent that intermediate results of calculation tasks can be stored in a memory, and are not required to be written into a distributed file system every time, so that better performance improvement can be obtained, and the spark sql is a spark module and is mainly used for processing structured data.

S14: and storing the calculated data set into a Hive database.

After the spark sql is used for calculating the data set to obtain the calculated data set, the calculated data set can be stored in the Hive database so as to be convenient for subsequent query from the Hive database, and therefore the operation of the thermal power plant is guided according to the query result, and the operation performance of the thermal power plant is improved.

The Hive database is an ETL and data warehouse tool developed on a Hadoop Distributed File System (HDFS), can map a structured data file into a database table and provides a sql-like query function, enables execution operation to be easy, and has the characteristics of data encapsulation, instant query, analysis of a huge data set and the like, so that storage of a large amount of data can be achieved when the Hive database is applied to time sequence data processing of a thermal power plant and used for storing a calculated data set, and subsequent query of target data can be facilitated.

The method for processing the time series data of the thermal power plant provided by the embodiment of the application can further comprise the following steps after the time series data acquired by the sensor in the thermal power plant equipment is acquired:

storing the time series data in a time series database;

accordingly, storing the time series data in a distributed storage manner in a distributed file system, and forming a data set in the distributed file system may include:

reading time sequence data from a time sequence database at intervals of preset time, storing the read time sequence data in a distributed file system in a distributed storage mode, forming a data set from the time sequence data read for the first time in the distributed file system, and adding the time sequence data read except for the first time to the corresponding data set.

In the thermal power plant time sequence data processing provided by the application, considering that the time interval for data acquisition of a sensor in thermal power plant equipment is short and the frequency is high, in order to avoid the influence on the performance of a distributed file system caused by directly storing the time sequence data acquired by the sensor in the distributed file system and avoid the requirement on a transmission network, the time sequence data acquired by the sensor can be firstly stored in a time sequence database after the time sequence data acquired by the sensor in the thermal power plant equipment is acquired, so as to store the time sequence data by using the time sequence database, and simultaneously, the time sequence data can be read from the time sequence database at intervals (which are larger than the period for the time sequence data acquired by the sensor) and stored in the distributed file system in a distributed storage mode, the frequency of storing the time sequence data into the distributed file system is reduced, so that the storage performance of the distributed file system on the time sequence data is improved conveniently, and the requirement of the time sequence data for a transmission network when the time sequence data is stored into the distributed file system is reduced.

In the process of reading the time series data from the time series database to the distributed file system at preset time intervals, the time series data read for the first time into the distributed file system can be formed into a data set in the distributed file system, and the time series data read later can be added into the corresponding data set, namely the data set formed for the first time, so that the data set can be directly calculated and processed subsequently.

Before the time sequence data read except for the first time is added to the corresponding data set and after the time sequence data read for the first time forms the data set in the distributed file system, whether the time sequence data are read successfully or not can be judged, if the time sequence data are read successfully, the step of adding the time sequence data read except for the first time to the corresponding data set is executed, if the time sequence data are not read successfully, the time sequence data are read from the time sequence database again, and therefore the time sequence data in the time sequence database can be stored in the distributed file system conveniently.

After the data set is read into the spark calculation frame, the method for processing the thermal power plant time series data, provided by the embodiment of the application, may further include:

the dataset is stored in the spark calculation framework in DataFrame form.

After the dataset is read into the spark computation framework, the dataset may be stored in the spark computation framework in DataFrame form for subsequent spark sql operations. When the data set is stored in the spark calculation frame in the DataFrame form, each column is a data set of an operating parameter, each row represents the acquisition time of the operating data corresponding to the operating parameter, and the acquisition time corresponding to the operating parameter is increased progressively from the first row to the next row.

It should be noted that the DataFrame is a distributed data set based on RDD, similar to a two-dimensional table in a conventional database, and can be constructed by various sources, such as: structured data files, tables in Hive, external databases, or existing RDDs, etc. Of course, the data set may also be stored in the spark calculation framework in the form of RDD.

Before the spark sql is used to calculate the data set, the method for processing the time series data of the thermal power plant according to the embodiment of the present application may include:

the dataset was preprocessed with spark sql.

Before the spark sql is used for calculating the data set, the spark sql can be used for preprocessing the data set to remove abnormal operation data in the data set, so that the accuracy of calculation of the data set is improved, and the operation of a thermal power plant is guided better.

The thermal power plant time sequence data processing method provided by the embodiment of the application utilizes spark sql to preprocess a data set, and may include:

comparing the operation data corresponding to each type of operation parameter with the corresponding set maximum value and set minimum value, and rejecting the operation data larger than the set maximum value and the operation data smaller than the set minimum value;

for each type of operation parameter, eliminating operation data which is kept unchanged within a first set time length;

and for each type of operation parameters, removing unstable operation data within a second set time length to obtain a preprocessed data set.

The process of preprocessing the data set by using spark sql may specifically include:

1) data culling using 3 sigma criterion

Considering that the acquired time sequence data are caused by instability, fluctuation, external interference and the like of a sensor in the time sequence data acquisition process of the thermal power plant equipment, in order to avoid the influence of abnormal time sequence data on the subsequent calculation process, the operation data corresponding to each type of operation parameters in the data set can be subjected to data elimination by using a 3 sigma criterion so as to eliminate outlier operation data, namely, the operation data which are not in a 3 sigma range are eliminated, wherein sigma is a standard deviation corresponding to each type of operation parameters.

Specifically, | V may be utilized_n(t)-AVG(V_n(t_x,t_z) Whether or not) is greater than 3 × STD (V)_n(t_x,t_z) If V) to determine whether to perform data culling_n(t)-AVG(V_n(t_x,t_z))|＞3×STD(V_n(t_x,t_z) Do data culling, where V_n(t) is the current operating data, AVG (V), corresponding to the operating parameter_n(t_x,t_z) Is the average value of the operation data corresponding to the operation parameters in the current period, [ t ]_x,t_z]For a period range, STD (V)_n(t_x,t_z) Is the standard deviation of the operating data corresponding to the class of operating parameters in the current cycle.

2) Culling out overrun run data

Specifically, the operation data corresponding to each type of operation parameter is compared with the set maximum value and the set minimum value corresponding to each type of operation parameter, the operation data larger than the set maximum value are removed from the operation data, and the operation data smaller than the set minimum value are removed, so that the operation data corresponding to each type of operation parameter can be located in the corresponding limited range formed by the set minimum value and the set maximum value.

The set maximum value and the set minimum value corresponding to each type of operation parameter can be set by working personnel according to the operation performance or experience of the thermal power plant equipment.

3) Rejecting operating data that remains unchanged for a first set length of time

Considering that the collected operation data may be kept unchanged due to poor contact of the sensor and the like in the sensor collection process, in order to avoid the influence of the operation data on the calculation result caused by participation in data calculation, the operation data which is kept unchanged in the first set time length can be eliminated.

Specifically, firstly, obtaining operation data within a first set time length, obtaining a maximum value maxv (t) and a minimum value minv (t) of the operation data within the first set time length, judging whether the maximum value maxv (t) of the operation data within the first set time length is equal to the minimum value minv (t) of the operation data, if so, removing the first operation data within the first set time length, forming another first set time length by using second operation data within the first set time length and operation data of later time, and executing the steps of obtaining the maximum value maxv (t) and the minimum value minv (t) of the operation data within the first set time length and relevant steps thereof; if not, not deleting the first operation data within the first set time length, and executing another first set time length and related steps formed by the second operation data within the first set time length and the operation data of later time until the operation data of all time is polled;

in addition to the determination and the rejection by the sliding method, the following method may be used: the method comprises the steps of obtaining operation data of a first set time length each time, obtaining operation data maximum value maxV (t) and operation data minimum value minV (t) in each first time length, judging whether the operation data maximum value maxV (t) is equal to the operation data minimum value minV (t) in each first set time length, if so, determining that the operation data are kept unchanged in the first set time length, and removing the operation data, and if not, not removing the operation data.

It should be noted that, in the above process, in addition to the criterion of whether the maximum value maxv (t) is equal to the minimum value minv (t) of the operation data, it may be determined whether the maximum value maxv (t) and/or the minimum value minv (t) is equal to the average value.

4) Rejecting operating data that is unstable for a second set length of time

Considering that only the operating data of the thermal power plant equipment in stable operation can reflect the actual condition of the thermal power plant equipment, therefore, in order to avoid the influence of unstable data on subsequent calculation, the unstable operating data in the second set time length can be removed for each type of operating parameters.

When the unstable operation data in the second set time length are rejected, the operation data can be judged and rejected in a sliding mode, and specifically, for each type of operation parameter, the first second operation parameter can be setThe time length is used as the current second set time length, the maximum value maxV ' (t) of the operation data and the minimum value minV ' (t) of the operation data in the current second set time length are obtained, and the average value avg [ V ' (t) of the operation data in the current second set time length is calculated]By using

Calculating a stable calculation value, comparing the stable calculation value with a stable threshold lambda, if the stable calculation value is smaller than the stable threshold lambda, rejecting first operation data within a current second set time length, forming another second set time length by using the second operation data within the current second set time length and operation data of later time, taking the formed another second set time length as the current second set time length, and then executing a step of obtaining a maximum value maxV '(t) and a minimum value minV' (t) of the operation data within the current second set time length; if the stable calculated value is not less than the stable threshold lambda, not rejecting the first running data within the current second set time length, and then executing a step of forming another second set time length by using the second running data within the current second set time length and the running data of the later time until the running data of all the time is polled; the stability threshold λ may be 0.05, and of course, the magnitude of the stability threshold λ may also be adjusted according to experience or requirements.

After the above four steps are performed, the preprocessing of the data set can be completed to obtain a preprocessed data set.

It should be noted that the sequence of the four pretreatment steps can be arbitrarily adjusted, and the sequence of the pretreatment steps is not limited in this application.

According to the thermal power plant time sequence data processing method provided by the embodiment of the application, the spark sql is used for calculating the data set to obtain the calculated data set, and the method can comprise the following steps:

for each type of operation parameter in the preprocessed data set, dividing parameter intervals according to the maximum value and the minimum value of the corresponding operation data by utilizing spark sql;

aggregating parameter intervals corresponding to different types of operation parameters by utilizing spark sql to obtain a plurality of working conditions;

and calculating the optimal operation data combination corresponding to each working condition by utilizing spark sql and a multi-objective fuzzy optimization algorithm to obtain a calculated data set.

After the preprocessing of the data set is completed to obtain the preprocessed data set, for each type of operation parameter in the preprocessed data set, the parameter interval may be divided by using spark sql according to the maximum value and the minimum value of the corresponding operation data, and specifically, the parameter interval may be divided averagely according to the interval length, or divided in other manners according to the requirement. After each type of operation parameter is divided, the parameter intervals corresponding to different types of operation parameters can be aggregated by using spark ksql, and each aggregation result can represent one working condition, that is, a plurality of working conditions can be obtained. For example: for the load as an operation parameter, 30 load intervals can be averagely divided according to the maximum value 600 of the load data and the minimum value 300 of the load data: [300,310], [310,320], …, [590,600], for the main steam temperature 50 main steam temperature intervals can be equally divided according to their corresponding maximum 700 and minimum 200 values: [200,210], [210,220], …, [690,700], assuming only two types of operating parameters, the load interval [300,310] and the main steam temperature interval [200,210] may be aggregated to obtain one regime, and the load interval [300,310] and the main steam temperature interval [210,220] may be aggregated to obtain one regime … … to obtain a plurality of regimes. The method for dividing the working conditions of the data set by spark sql can solve the problems of large calculation amount and long calculation time.

After a plurality of working conditions are obtained, spark sql and a multi-objective fuzzy optimization algorithm can be used for calculating the optimal operation data combination corresponding to each working condition, and each working condition and the optimal operation data combination corresponding to each working condition are used as a calculated data set, so that the optimal operation data combination corresponding to the current working condition of the thermal power plant can be obtained according to the calculated data set in the following process, and therefore the thermal power plant equipment can be adjusted according to the optimal operation data combination corresponding to the current working condition of the thermal power plant, and the operation performance of the thermal power plant can be improved.

When the optimal operation parameters corresponding to each working condition are calculated by using a multi-objective fuzzy optimization algorithm, the performances of stability, economy, environmental protection and comprehensiveness can be considered, and the optimal operation data combination is determined according to the performances, wherein the stability considers the operation parameters which have larger influence on the stability of the thermal power plant equipment, such as actual load, main steam temperature, main steam pressure and the like, and the economic index considers the power generation coal consumption rate of the thermal power plant equipment; environmental protection consideration of NO_XDischarge amount, SO₂Discharge amount and dust discharge amount; the comprehensiveness is the weighting of the related operation parameters, and the corresponding multi-target fuzzy optimization algorithm specifically comprises the following steps:

1) firstly, the optimal solution of the constraint condition of the operation parameter corresponding to each performance (namely stability, economy, environmental protection and comprehensiveness) mentioned above is separately solved

Sum optimum value

And the worst value

Wherein m is the number of the operation parameters;

2) single target optimum

In that

Interval fuzzification, i.e. a fuzzy subset M (f) in the target value space_j) Representing, membership function M (f)_j) It should satisfy:

in that

And monotonically decreases. Wherein M (f)_j) Comprises the following steps:

mixing M (f)_j) Mapping to a design space x to obtain a fuzzy optimal solution N (f)_j) According to the expansion principle, the membership function is as follows:

in the formula, the value of q is less than 1 so as to improve the precision of optimization. Membership function N (f) thus constructed_j) Is a monotonic function and the synthesis function can be monotonically optimized.

3)N(f_j) (j ═ 1,2, …, m) the intersection (fuzzy superior set) D has a membership function of

Solving a satisfactory solution x to a multi-objective optimization problem^*Satisfies the following conditions:

wherein λ is^*For the satisfaction of the optimization result, a larger value represents a higher relative satisfaction, and finally, λ can be set^*And taking the combination of the corresponding operation data at the maximum time as the optimal operation data combination.

The method for processing the time series data of the thermal power plant, provided by the embodiment of the application, stores the time series data in a distributed file system in a distributed storage manner, and may include:

and storing the time sequence data in the HDFS in a distributed storage mode.

The method specifically can store the time sequence data in the HDFS in a distributed storage mode, wherein the HDFS (Hadoop distributed File System) simplifies a consistency model of files, provides a data access function of a high-throughput application program through stream data access, is suitable for application programs of large data sets, provides a mechanism of writing in and reading for multiple times, and distributes data on different physical machines in a cluster at the same time in a block form, so that when the method is used for storing the time sequence data of the equipment of the thermal power plant, the data storage performance can be improved conveniently, and the expansibility can be effectively improved.

The HDFS is a Hadoop storage mainstream frame, the spark is also a mainstream calculation engine in a Hadoop ecological circle, and the Hive is a data warehouse based on the HDFS, so that the HDFS, the spark and the data warehouse have good technical compatibility, mature interfaces and convenient data calling, and have great advantages for processing strong structured time sequence data of a thermal power plant. Of course, the time series data can also be stored in Hive in a distributed storage mode.

An embodiment of the present application further provides a thermal power plant time series data processing apparatus, refer to fig. 2, which shows a schematic structural diagram of the thermal power plant time series data processing apparatus provided in the embodiment of the present application, and the apparatus may include:

the acquisition module 21 is configured to acquire time series data acquired by a sensor in thermal power plant equipment;

the first storage module 22 is used for storing the time sequence data in a distributed file system in a distributed storage mode and forming a data set in the distributed file system;

the calculating module 23 is configured to read the data set into a spark calculation frame, and calculate the data set by using spark sql to obtain a calculated data set;

and a second storage module 24, configured to store the calculated data set in the Hive database.

The time series data processing device of the thermal power plant provided by the embodiment of the application can also comprise:

the third storage module is used for storing the time sequence data in the time sequence database after the time sequence data acquired by the sensor in the thermal power plant equipment is acquired;

accordingly, the first storage module 22 may include:

the first storage unit is used for reading time sequence data from the time sequence database at intervals of preset time, storing the read time sequence data in a distributed storage mode in a distributed file system, forming a data set from the time sequence data read for the first time in the distributed file system, and adding the time sequence data read for the first time to the corresponding data set.

and the fourth storage module is used for storing the data set in the spark calculation frame in a DataFrame form after the data set is read into the spark calculation frame.

and the preprocessing module is used for preprocessing the data set by utilizing spark sql before the data set is calculated by utilizing spark sql.

The embodiment of the application provides a thermal power plant time series data processing apparatus, the preprocessing module can include:

the first eliminating unit is used for eliminating the operation data corresponding to each type of operation parameter in the data set by using a 3 sigma criterion so as to eliminate the outlier operation data;

the second rejection unit is used for comparing the operation data corresponding to each type of operation parameter with the corresponding set maximum value and set minimum value, and rejecting the operation data larger than the set maximum value and the operation data smaller than the set minimum value;

the third rejection unit is used for rejecting the operation data which are kept unchanged within the first set time length for each type of operation parameter;

and the fourth eliminating unit is used for eliminating unstable running data within a second set time length for each type of running parameters to obtain a preprocessed data set.

According to the time series data processing device of the thermal power plant, provided by the embodiment of the application, the calculation module 23 may include:

the partitioning unit is used for partitioning a parameter interval according to the maximum value and the minimum value of the corresponding operation data by utilizing spark sql for each type of operation parameters in the preprocessed data set;

the aggregation unit is used for aggregating the parameter intervals corresponding to the different types of operation parameters by utilizing spark sql to obtain a plurality of working conditions;

and the calculating unit is used for calculating the optimal operation data combination corresponding to each working condition by utilizing spark sql and a multi-target fuzzy optimization algorithm so as to obtain a calculated data set.

According to an embodiment of the present application, in the time series data processing apparatus for a thermal power plant, the first storage module 22 may include:

and the second storage unit is used for storing the time sequence data in the HDFS in a distributed storage mode.

An embodiment of the present application further provides a thermal power plant time series data processing device, refer to fig. 3, which shows a schematic structural diagram of a thermal power plant time series data processing device provided in an embodiment of the present application, and the schematic structural diagram may include:

a memory 31 for storing a computer program;

the processor 32, when executing the computer program stored in the memory 31, may implement the following steps:

acquiring time sequence data acquired by a sensor in thermal power plant equipment; storing the time sequence data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system; reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set; and storing the calculated data set into a Hive database.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the following steps may be implemented:

The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

For a description of a relevant part in a thermal power plant time series data processing apparatus, a device, and a computer readable storage medium provided in the embodiments of the present application, reference may be made to a detailed description of a corresponding part in a thermal power plant time series data processing method provided in the embodiments of the present application, and details are not repeated here.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include elements inherent in the list. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. In addition, parts of the above technical solutions provided in the embodiments of the present application, which are consistent with the implementation principles of corresponding technical solutions in the prior art, are not described in detail so as to avoid redundant description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for processing time series data of a thermal power plant is characterized by comprising the following steps:

and storing the calculated data set into a Hive database.

2. The method for processing time series data of a thermal power plant according to claim 1, after acquiring the time series data collected by the sensor in the thermal power plant equipment, further comprising:

storing the time series data in a time series database;

3. The thermal power plant time series data processing method as recited in claim 1, further comprising, after reading the data set into a spark calculation framework:

storing the dataset in the spark calculation framework in a DataFrame format.

4. The thermal power plant time series data processing method according to claim 3, wherein before the calculating the data set by using spark sql, the method comprises:

the dataset was preprocessed with spark sql.

5. The thermal power plant time series data processing method according to claim 4, wherein preprocessing the data set by using spark sql comprises:

6. The thermal power plant time series data processing method according to claim 5, wherein the calculating the data set by using spark sql to obtain a calculated data set comprises:

7. The thermal power plant time series data processing method according to claim 1, wherein storing the time series data in a distributed storage manner in a distributed file system comprises:

and storing the time series data in the HDFS in a distributed storage mode.

8. A thermal power plant time series data processing apparatus, comprising:

9. A thermal power plant time series data processing apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the thermal power plant time series data processing method as claimed in any one of claims 1 to 7 when executing said computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for thermal power plant time series data processing according to any one of claims 1 to 7.