CN111639060A - Thermal power plant time sequence data processing method, device, equipment and medium - Google Patents

Thermal power plant time sequence data processing method, device, equipment and medium Download PDF

Info

Publication number
CN111639060A
CN111639060A CN202010512753.6A CN202010512753A CN111639060A CN 111639060 A CN111639060 A CN 111639060A CN 202010512753 A CN202010512753 A CN 202010512753A CN 111639060 A CN111639060 A CN 111639060A
Authority
CN
China
Prior art keywords
data
power plant
thermal power
data set
time series
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010512753.6A
Other languages
Chinese (zh)
Inventor
袁雪峰
马成龙
李晓静
张含智
陈世和
陈木斌
陈建华
卫平宝
聂怀志
姜利辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Resource Power Technology Research Institute
Original Assignee
China Resource Power Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Resource Power Technology Research Institute filed Critical China Resource Power Technology Research Institute
Priority to CN202010512753.6A priority Critical patent/CN111639060A/en
Publication of CN111639060A publication Critical patent/CN111639060A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application discloses a method, a device and equipment for processing time sequence data of a thermal power plant and a computer readable storage medium, wherein the method comprises the following steps: acquiring time sequence data acquired by a sensor in thermal power plant equipment; storing the time sequence data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system; reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set; and storing the calculated data set into a Hive database. According to the technical scheme, the distributed file system, the spark sql and the Hive databases are used for processing the time sequence data in a distributed deployment mode, so that the processing of the time sequence data can be distributed to different servers, and the processing capacity of the time sequence data of the thermal power plant can be improved conveniently.

Description

Thermal power plant time sequence data processing method, device, equipment and medium
Technical Field
The application relates to the technical field of thermal power historical data processing, in particular to a thermal power plant time sequence data processing method, device and equipment and a computer readable storage medium.
Background
The thermal power plant comprises a plurality of devices, wherein the devices are provided with sensors used for acquiring operation data of the corresponding devices at regular intervals (for example, 0.5s), namely the operation data of the devices can be obtained through the sensors, the operation data are mostly time sequence data (time sequence data for short), and the past time sequence data (historical time sequence data) are analyzed and processed, so that the operation parameters of the thermal power plant can be known to have better operation performance, namely the subsequent operation of the thermal power plant can be guided, and the thermal power plant can operate in a better state.
At present, time series data acquired by a sensor are mostly stored in a single server, and the single server is used for calculating and storing the stored time series data after calculation, but because thermal power plant equipment and the sensor are more and thermal power plants are generally in an operating state, the data volume of the corresponding time series data is larger and larger (taking GB and TB as units), at the moment, the single server is difficult to support huge storage and calculation amount, namely, the problem of limited processing capacity exists in the current mode of processing the time series data of the thermal power plants by adopting the single server.
In summary, how to improve the processing capability of the time series data of the thermal power plant is a technical problem to be solved urgently by those skilled in the art.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method, an apparatus, a device and a computer readable storage medium for processing thermal power plant time series data, which are used to improve the processing capability of the thermal power plant time series data.
In order to achieve the above purpose, the present application provides the following technical solutions:
a thermal power plant time series data processing method comprises the following steps:
acquiring time sequence data acquired by a sensor in thermal power plant equipment;
storing the time series data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system;
reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set;
and storing the calculated data set into a Hive database.
Preferably, after acquiring the time series data collected by the sensor in the thermal power plant equipment, the method further includes:
storing the time series data in a time series database;
accordingly, storing the time series data in a distributed storage manner in a distributed file system, and forming a data set in the distributed file system, includes:
reading the time sequence data from the time sequence database at intervals of preset time, storing the read time sequence data in the distributed file system in a distributed storage mode, forming a data set in the distributed file system by the time sequence data read for the first time, and adding the time sequence data read except for the first time to the corresponding data set.
Preferably, after reading the data set into the spark calculation framework, the method further includes:
storing the dataset in the spark calculation framework in a DataFrame format.
Preferably, before the calculation of the data set by using spark sql, the method comprises the following steps:
the dataset was preprocessed with spark sql.
Preferably, preprocessing the data set by using spark sql comprises:
performing data elimination on the operation data corresponding to each type of operation parameter in the data set by using a 3 sigma criterion so as to eliminate outlier operation data;
comparing the operation data corresponding to each type of operation parameters with the corresponding set maximum value and set minimum value, and rejecting the operation data larger than the set maximum value and the operation data smaller than the set minimum value;
for each type of the operation parameters, eliminating operation data which are kept unchanged within a first set time length;
and for each type of the operation parameters, removing unstable operation data within a second set time length to obtain a preprocessed data set.
Preferably, the calculating the data set by using spark sql to obtain a calculated data set includes:
for each type of the operation parameters in the preprocessed data set, dividing parameter intervals according to the maximum value and the minimum value of corresponding operation data by utilizing spark sql;
aggregating the parameter intervals corresponding to the different types of the operation parameters by utilizing spark sql to obtain a plurality of working conditions;
and calculating the optimal operation data combination corresponding to each working condition by utilizing spark sql and a multi-objective fuzzy optimization algorithm to obtain the calculated data set.
Preferably, the storing the time series data in a distributed storage manner in a distributed file system includes:
and storing the time series data in the HDFS in a distributed storage mode.
A thermal power plant time series data processing apparatus comprising:
the acquisition module is used for acquiring time sequence data acquired by a sensor in thermal power plant equipment;
the first storage module is used for storing the time sequence data in a distributed file system in a distributed storage mode and forming a data set in the distributed file system;
the calculation module is used for reading the data set into a spark calculation frame and calculating the data set by utilizing spark sql to obtain a calculated data set;
and the second storage module is used for storing the calculated data set into the Hive database.
A thermal power plant time series data processing apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the thermal power plant time series data processing method as claimed in any one of the above when executing the computer program.
A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the thermal power plant time series data processing method according to any one of the above.
The application provides a method, a device, equipment and a computer readable storage medium for processing time series data of a thermal power plant, wherein the method comprises the following steps: acquiring time sequence data acquired by a sensor in thermal power plant equipment; storing the time sequence data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system; reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set; and storing the calculated data set into a Hive database.
According to the technical scheme, time sequence data collected by the sensors in the thermal power plant equipment are stored in the distributed file system in a distributed storage mode, the formed data set is read into the spark calculation frame, the spark sql calculates the data set, the calculated data set is stored in the Hive database, the distributed file system, the spark sql and the Hive database are all used for processing the time sequence data in a distributed deployment mode, namely the time sequence data can be stored and calculated on a plurality of servers in a cluster mode, and therefore processing of the time sequence data can be distributed to different servers, and processing capacity of the thermal power plant time sequence data is improved conveniently.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a method for processing time series data of a thermal power plant according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a thermal power plant time series data processing apparatus according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a thermal power plant time series data processing device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, which shows a flowchart of a method for processing time series data of a thermal power plant according to an embodiment of the present application, a method for processing time series data of a thermal power plant according to an embodiment of the present application may include:
s11: the method comprises the steps of obtaining time sequence data collected by a sensor in thermal power plant equipment.
The method comprises the steps of obtaining time sequence data collected by different sensors in thermal power plant equipment, wherein the sensors can be arranged on the thermal power plant equipment in each link of production and operation of the thermal power plant, and each time sequence data can comprise information of the thermal power plant equipment, information of the sensors, operation data of collected operation parameters, collection time and the like.
S12: the time sequence data is stored in a distributed file system in a distributed storage mode, and a data set is formed in the distributed file system.
After the time series data is obtained, the time series data can be stored in a distributed file system in a distributed storage mode, wherein the distributed file system adopts distributed deployment, namely the time series data can be deployed and stored on a plurality of servers in a cluster mode, and therefore, the expansibility can be effectively improved by adopting the method for deploying and storing, and the storage of a large amount of time series data can be met.
In addition, the time series data stored in the distributed file system can be formed into a data set so as to be convenient for subsequent calculation processing.
The time sequence data may be stored in binary data formats such as CSV, JSON, TXT, Avro, and partial in the distributed file system, and may be stored in other formats.
S13: reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set.
After the time series data stored in the distributed file system in a distributed storage mode are formed into a data set, the data set can be read into a spark calculation framework, and then the data set can be calculated by utilizing spark sql to obtain a calculated data set, so that the subsequent operation of the thermal power plant can be guided according to the calculated data set.
The spark calculation framework is a fast general calculation engine specially designed for large-scale data processing, is an open-source cluster calculation environment, is specifically a cluster calculation framework developed by UC Berkeley AMP lab, is similar to Hadoop (but has many differences), is optimized to the greatest extent that intermediate results of calculation tasks can be stored in a memory, and are not required to be written into a distributed file system every time, so that better performance improvement can be obtained, and the spark sql is a spark module and is mainly used for processing structured data.
S14: and storing the calculated data set into a Hive database.
After the spark sql is used for calculating the data set to obtain the calculated data set, the calculated data set can be stored in the Hive database so as to be convenient for subsequent query from the Hive database, and therefore the operation of the thermal power plant is guided according to the query result, and the operation performance of the thermal power plant is improved.
The Hive database is an ETL and data warehouse tool developed on a Hadoop Distributed File System (HDFS), can map a structured data file into a database table and provides a sql-like query function, enables execution operation to be easy, and has the characteristics of data encapsulation, instant query, analysis of a huge data set and the like, so that storage of a large amount of data can be achieved when the Hive database is applied to time sequence data processing of a thermal power plant and used for storing a calculated data set, and subsequent query of target data can be facilitated.
According to the technical scheme, time sequence data collected by the sensors in the thermal power plant equipment are stored in the distributed file system in a distributed storage mode, the formed data set is read into the spark calculation frame, the spark sql calculates the data set, the calculated data set is stored in the Hive database, the distributed file system, the spark sql and the Hive database are all used for processing the time sequence data in a distributed deployment mode, namely the time sequence data can be stored and calculated on a plurality of servers in a cluster mode, and therefore processing of the time sequence data can be distributed to different servers, and processing capacity of the thermal power plant time sequence data is improved conveniently.
The method for processing the time series data of the thermal power plant provided by the embodiment of the application can further comprise the following steps after the time series data acquired by the sensor in the thermal power plant equipment is acquired:
storing the time series data in a time series database;
accordingly, storing the time series data in a distributed storage manner in a distributed file system, and forming a data set in the distributed file system may include:
reading time sequence data from a time sequence database at intervals of preset time, storing the read time sequence data in a distributed file system in a distributed storage mode, forming a data set from the time sequence data read for the first time in the distributed file system, and adding the time sequence data read except for the first time to the corresponding data set.
In the thermal power plant time sequence data processing provided by the application, considering that the time interval for data acquisition of a sensor in thermal power plant equipment is short and the frequency is high, in order to avoid the influence on the performance of a distributed file system caused by directly storing the time sequence data acquired by the sensor in the distributed file system and avoid the requirement on a transmission network, the time sequence data acquired by the sensor can be firstly stored in a time sequence database after the time sequence data acquired by the sensor in the thermal power plant equipment is acquired, so as to store the time sequence data by using the time sequence database, and simultaneously, the time sequence data can be read from the time sequence database at intervals (which are larger than the period for the time sequence data acquired by the sensor) and stored in the distributed file system in a distributed storage mode, the frequency of storing the time sequence data into the distributed file system is reduced, so that the storage performance of the distributed file system on the time sequence data is improved conveniently, and the requirement of the time sequence data for a transmission network when the time sequence data is stored into the distributed file system is reduced.
In the process of reading the time series data from the time series database to the distributed file system at preset time intervals, the time series data read for the first time into the distributed file system can be formed into a data set in the distributed file system, and the time series data read later can be added into the corresponding data set, namely the data set formed for the first time, so that the data set can be directly calculated and processed subsequently.
Before the time sequence data read except for the first time is added to the corresponding data set and after the time sequence data read for the first time forms the data set in the distributed file system, whether the time sequence data are read successfully or not can be judged, if the time sequence data are read successfully, the step of adding the time sequence data read except for the first time to the corresponding data set is executed, if the time sequence data are not read successfully, the time sequence data are read from the time sequence database again, and therefore the time sequence data in the time sequence database can be stored in the distributed file system conveniently.
After the data set is read into the spark calculation frame, the method for processing the thermal power plant time series data, provided by the embodiment of the application, may further include:
the dataset is stored in the spark calculation framework in DataFrame form.
After the dataset is read into the spark computation framework, the dataset may be stored in the spark computation framework in DataFrame form for subsequent spark sql operations. When the data set is stored in the spark calculation frame in the DataFrame form, each column is a data set of an operating parameter, each row represents the acquisition time of the operating data corresponding to the operating parameter, and the acquisition time corresponding to the operating parameter is increased progressively from the first row to the next row.
It should be noted that the DataFrame is a distributed data set based on RDD, similar to a two-dimensional table in a conventional database, and can be constructed by various sources, such as: structured data files, tables in Hive, external databases, or existing RDDs, etc. Of course, the data set may also be stored in the spark calculation framework in the form of RDD.
Before the spark sql is used to calculate the data set, the method for processing the time series data of the thermal power plant according to the embodiment of the present application may include:
the dataset was preprocessed with spark sql.
Before the spark sql is used for calculating the data set, the spark sql can be used for preprocessing the data set to remove abnormal operation data in the data set, so that the accuracy of calculation of the data set is improved, and the operation of a thermal power plant is guided better.
The thermal power plant time sequence data processing method provided by the embodiment of the application utilizes spark sql to preprocess a data set, and may include:
performing data elimination on the operation data corresponding to each type of operation parameter in the data set by using a 3 sigma criterion so as to eliminate outlier operation data;
comparing the operation data corresponding to each type of operation parameter with the corresponding set maximum value and set minimum value, and rejecting the operation data larger than the set maximum value and the operation data smaller than the set minimum value;
for each type of operation parameter, eliminating operation data which is kept unchanged within a first set time length;
and for each type of operation parameters, removing unstable operation data within a second set time length to obtain a preprocessed data set.
The process of preprocessing the data set by using spark sql may specifically include:
1) data culling using 3 sigma criterion
Considering that the acquired time sequence data are caused by instability, fluctuation, external interference and the like of a sensor in the time sequence data acquisition process of the thermal power plant equipment, in order to avoid the influence of abnormal time sequence data on the subsequent calculation process, the operation data corresponding to each type of operation parameters in the data set can be subjected to data elimination by using a 3 sigma criterion so as to eliminate outlier operation data, namely, the operation data which are not in a 3 sigma range are eliminated, wherein sigma is a standard deviation corresponding to each type of operation parameters.
Specifically, | V may be utilizedn(t)-AVG(Vn(tx,tz) Whether or not) is greater than 3 × STD (V)n(tx,tz) If V) to determine whether to perform data cullingn(t)-AVG(Vn(tx,tz))|>3×STD(Vn(tx,tz) Do data culling, where Vn(t) is the current operating data, AVG (V), corresponding to the operating parametern(tx,tz) Is the average value of the operation data corresponding to the operation parameters in the current period, [ t ]x,tz]For a period range, STD (V)n(tx,tz) Is the standard deviation of the operating data corresponding to the class of operating parameters in the current cycle.
2) Culling out overrun run data
Specifically, the operation data corresponding to each type of operation parameter is compared with the set maximum value and the set minimum value corresponding to each type of operation parameter, the operation data larger than the set maximum value are removed from the operation data, and the operation data smaller than the set minimum value are removed, so that the operation data corresponding to each type of operation parameter can be located in the corresponding limited range formed by the set minimum value and the set maximum value.
The set maximum value and the set minimum value corresponding to each type of operation parameter can be set by working personnel according to the operation performance or experience of the thermal power plant equipment.
3) Rejecting operating data that remains unchanged for a first set length of time
Considering that the collected operation data may be kept unchanged due to poor contact of the sensor and the like in the sensor collection process, in order to avoid the influence of the operation data on the calculation result caused by participation in data calculation, the operation data which is kept unchanged in the first set time length can be eliminated.
Specifically, firstly, obtaining operation data within a first set time length, obtaining a maximum value maxv (t) and a minimum value minv (t) of the operation data within the first set time length, judging whether the maximum value maxv (t) of the operation data within the first set time length is equal to the minimum value minv (t) of the operation data, if so, removing the first operation data within the first set time length, forming another first set time length by using second operation data within the first set time length and operation data of later time, and executing the steps of obtaining the maximum value maxv (t) and the minimum value minv (t) of the operation data within the first set time length and relevant steps thereof; if not, not deleting the first operation data within the first set time length, and executing another first set time length and related steps formed by the second operation data within the first set time length and the operation data of later time until the operation data of all time is polled;
in addition to the determination and the rejection by the sliding method, the following method may be used: the method comprises the steps of obtaining operation data of a first set time length each time, obtaining operation data maximum value maxV (t) and operation data minimum value minV (t) in each first time length, judging whether the operation data maximum value maxV (t) is equal to the operation data minimum value minV (t) in each first set time length, if so, determining that the operation data are kept unchanged in the first set time length, and removing the operation data, and if not, not removing the operation data.
It should be noted that, in the above process, in addition to the criterion of whether the maximum value maxv (t) is equal to the minimum value minv (t) of the operation data, it may be determined whether the maximum value maxv (t) and/or the minimum value minv (t) is equal to the average value.
4) Rejecting operating data that is unstable for a second set length of time
Considering that only the operating data of the thermal power plant equipment in stable operation can reflect the actual condition of the thermal power plant equipment, therefore, in order to avoid the influence of unstable data on subsequent calculation, the unstable operating data in the second set time length can be removed for each type of operating parameters.
When the unstable operation data in the second set time length are rejected, the operation data can be judged and rejected in a sliding mode, and specifically, for each type of operation parameter, the first second operation parameter can be setThe time length is used as the current second set time length, the maximum value maxV ' (t) of the operation data and the minimum value minV ' (t) of the operation data in the current second set time length are obtained, and the average value avg [ V ' (t) of the operation data in the current second set time length is calculated]By using
Figure BDA0002528993710000101
Calculating a stable calculation value, comparing the stable calculation value with a stable threshold lambda, if the stable calculation value is smaller than the stable threshold lambda, rejecting first operation data within a current second set time length, forming another second set time length by using the second operation data within the current second set time length and operation data of later time, taking the formed another second set time length as the current second set time length, and then executing a step of obtaining a maximum value maxV '(t) and a minimum value minV' (t) of the operation data within the current second set time length; if the stable calculated value is not less than the stable threshold lambda, not rejecting the first running data within the current second set time length, and then executing a step of forming another second set time length by using the second running data within the current second set time length and the running data of the later time until the running data of all the time is polled; the stability threshold λ may be 0.05, and of course, the magnitude of the stability threshold λ may also be adjusted according to experience or requirements.
After the above four steps are performed, the preprocessing of the data set can be completed to obtain a preprocessed data set.
It should be noted that the sequence of the four pretreatment steps can be arbitrarily adjusted, and the sequence of the pretreatment steps is not limited in this application.
According to the thermal power plant time sequence data processing method provided by the embodiment of the application, the spark sql is used for calculating the data set to obtain the calculated data set, and the method can comprise the following steps:
for each type of operation parameter in the preprocessed data set, dividing parameter intervals according to the maximum value and the minimum value of the corresponding operation data by utilizing spark sql;
aggregating parameter intervals corresponding to different types of operation parameters by utilizing spark sql to obtain a plurality of working conditions;
and calculating the optimal operation data combination corresponding to each working condition by utilizing spark sql and a multi-objective fuzzy optimization algorithm to obtain a calculated data set.
After the preprocessing of the data set is completed to obtain the preprocessed data set, for each type of operation parameter in the preprocessed data set, the parameter interval may be divided by using spark sql according to the maximum value and the minimum value of the corresponding operation data, and specifically, the parameter interval may be divided averagely according to the interval length, or divided in other manners according to the requirement. After each type of operation parameter is divided, the parameter intervals corresponding to different types of operation parameters can be aggregated by using spark ksql, and each aggregation result can represent one working condition, that is, a plurality of working conditions can be obtained. For example: for the load as an operation parameter, 30 load intervals can be averagely divided according to the maximum value 600 of the load data and the minimum value 300 of the load data: [300,310], [310,320], …, [590,600], for the main steam temperature 50 main steam temperature intervals can be equally divided according to their corresponding maximum 700 and minimum 200 values: [200,210], [210,220], …, [690,700], assuming only two types of operating parameters, the load interval [300,310] and the main steam temperature interval [200,210] may be aggregated to obtain one regime, and the load interval [300,310] and the main steam temperature interval [210,220] may be aggregated to obtain one regime … … to obtain a plurality of regimes. The method for dividing the working conditions of the data set by spark sql can solve the problems of large calculation amount and long calculation time.
After a plurality of working conditions are obtained, spark sql and a multi-objective fuzzy optimization algorithm can be used for calculating the optimal operation data combination corresponding to each working condition, and each working condition and the optimal operation data combination corresponding to each working condition are used as a calculated data set, so that the optimal operation data combination corresponding to the current working condition of the thermal power plant can be obtained according to the calculated data set in the following process, and therefore the thermal power plant equipment can be adjusted according to the optimal operation data combination corresponding to the current working condition of the thermal power plant, and the operation performance of the thermal power plant can be improved.
When the optimal operation parameters corresponding to each working condition are calculated by using a multi-objective fuzzy optimization algorithm, the performances of stability, economy, environmental protection and comprehensiveness can be considered, and the optimal operation data combination is determined according to the performances, wherein the stability considers the operation parameters which have larger influence on the stability of the thermal power plant equipment, such as actual load, main steam temperature, main steam pressure and the like, and the economic index considers the power generation coal consumption rate of the thermal power plant equipment; environmental protection consideration of NOXDischarge amount, SO2Discharge amount and dust discharge amount; the comprehensiveness is the weighting of the related operation parameters, and the corresponding multi-target fuzzy optimization algorithm specifically comprises the following steps:
1) firstly, the optimal solution of the constraint condition of the operation parameter corresponding to each performance (namely stability, economy, environmental protection and comprehensiveness) mentioned above is separately solved
Figure BDA0002528993710000111
Sum optimum value
Figure BDA0002528993710000112
And the worst value
Figure BDA0002528993710000113
Wherein m is the number of the operation parameters;
2) single target optimum
Figure BDA0002528993710000114
In that
Figure BDA0002528993710000115
Interval fuzzification, i.e. a fuzzy subset M (f) in the target value spacej) Representing, membership function M (f)j) It should satisfy:
Figure BDA0002528993710000116
in that
Figure BDA0002528993710000117
And monotonically decreases. Wherein M (f)j) Comprises the following steps:
Figure BDA0002528993710000118
mixing M (f)j) Mapping to a design space x to obtain a fuzzy optimal solution N (f)j) According to the expansion principle, the membership function is as follows:
Figure BDA0002528993710000119
in the formula, the value of q is less than 1 so as to improve the precision of optimization. Membership function N (f) thus constructedj) Is a monotonic function and the synthesis function can be monotonically optimized.
3)N(fj) (j ═ 1,2, …, m) the intersection (fuzzy superior set) D has a membership function of
Figure BDA0002528993710000121
Solving a satisfactory solution x to a multi-objective optimization problem*Satisfies the following conditions:
Figure BDA0002528993710000122
wherein λ is*For the satisfaction of the optimization result, a larger value represents a higher relative satisfaction, and finally, λ can be set*And taking the combination of the corresponding operation data at the maximum time as the optimal operation data combination.
The method for processing the time series data of the thermal power plant, provided by the embodiment of the application, stores the time series data in a distributed file system in a distributed storage manner, and may include:
and storing the time sequence data in the HDFS in a distributed storage mode.
The method specifically can store the time sequence data in the HDFS in a distributed storage mode, wherein the HDFS (Hadoop distributed File System) simplifies a consistency model of files, provides a data access function of a high-throughput application program through stream data access, is suitable for application programs of large data sets, provides a mechanism of writing in and reading for multiple times, and distributes data on different physical machines in a cluster at the same time in a block form, so that when the method is used for storing the time sequence data of the equipment of the thermal power plant, the data storage performance can be improved conveniently, and the expansibility can be effectively improved.
The HDFS is a Hadoop storage mainstream frame, the spark is also a mainstream calculation engine in a Hadoop ecological circle, and the Hive is a data warehouse based on the HDFS, so that the HDFS, the spark and the data warehouse have good technical compatibility, mature interfaces and convenient data calling, and have great advantages for processing strong structured time sequence data of a thermal power plant. Of course, the time series data can also be stored in Hive in a distributed storage mode.
An embodiment of the present application further provides a thermal power plant time series data processing apparatus, refer to fig. 2, which shows a schematic structural diagram of the thermal power plant time series data processing apparatus provided in the embodiment of the present application, and the apparatus may include:
the acquisition module 21 is configured to acquire time series data acquired by a sensor in thermal power plant equipment;
the first storage module 22 is used for storing the time sequence data in a distributed file system in a distributed storage mode and forming a data set in the distributed file system;
the calculating module 23 is configured to read the data set into a spark calculation frame, and calculate the data set by using spark sql to obtain a calculated data set;
and a second storage module 24, configured to store the calculated data set in the Hive database.
The time series data processing device of the thermal power plant provided by the embodiment of the application can also comprise:
the third storage module is used for storing the time sequence data in the time sequence database after the time sequence data acquired by the sensor in the thermal power plant equipment is acquired;
accordingly, the first storage module 22 may include:
the first storage unit is used for reading time sequence data from the time sequence database at intervals of preset time, storing the read time sequence data in a distributed storage mode in a distributed file system, forming a data set from the time sequence data read for the first time in the distributed file system, and adding the time sequence data read for the first time to the corresponding data set.
The time series data processing device of the thermal power plant provided by the embodiment of the application can also comprise:
and the fourth storage module is used for storing the data set in the spark calculation frame in a DataFrame form after the data set is read into the spark calculation frame.
The time series data processing device of the thermal power plant provided by the embodiment of the application can also comprise:
and the preprocessing module is used for preprocessing the data set by utilizing spark sql before the data set is calculated by utilizing spark sql.
The embodiment of the application provides a thermal power plant time series data processing apparatus, the preprocessing module can include:
the first eliminating unit is used for eliminating the operation data corresponding to each type of operation parameter in the data set by using a 3 sigma criterion so as to eliminate the outlier operation data;
the second rejection unit is used for comparing the operation data corresponding to each type of operation parameter with the corresponding set maximum value and set minimum value, and rejecting the operation data larger than the set maximum value and the operation data smaller than the set minimum value;
the third rejection unit is used for rejecting the operation data which are kept unchanged within the first set time length for each type of operation parameter;
and the fourth eliminating unit is used for eliminating unstable running data within a second set time length for each type of running parameters to obtain a preprocessed data set.
According to the time series data processing device of the thermal power plant, provided by the embodiment of the application, the calculation module 23 may include:
the partitioning unit is used for partitioning a parameter interval according to the maximum value and the minimum value of the corresponding operation data by utilizing spark sql for each type of operation parameters in the preprocessed data set;
the aggregation unit is used for aggregating the parameter intervals corresponding to the different types of operation parameters by utilizing spark sql to obtain a plurality of working conditions;
and the calculating unit is used for calculating the optimal operation data combination corresponding to each working condition by utilizing spark sql and a multi-target fuzzy optimization algorithm so as to obtain a calculated data set.
According to an embodiment of the present application, in the time series data processing apparatus for a thermal power plant, the first storage module 22 may include:
and the second storage unit is used for storing the time sequence data in the HDFS in a distributed storage mode.
An embodiment of the present application further provides a thermal power plant time series data processing device, refer to fig. 3, which shows a schematic structural diagram of a thermal power plant time series data processing device provided in an embodiment of the present application, and the schematic structural diagram may include:
a memory 31 for storing a computer program;
the processor 32, when executing the computer program stored in the memory 31, may implement the following steps:
acquiring time sequence data acquired by a sensor in thermal power plant equipment; storing the time sequence data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system; reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set; and storing the calculated data set into a Hive database.
An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the following steps may be implemented:
acquiring time sequence data acquired by a sensor in thermal power plant equipment; storing the time sequence data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system; reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set; and storing the calculated data set into a Hive database.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
For a description of a relevant part in a thermal power plant time series data processing apparatus, a device, and a computer readable storage medium provided in the embodiments of the present application, reference may be made to a detailed description of a corresponding part in a thermal power plant time series data processing method provided in the embodiments of the present application, and details are not repeated here.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include elements inherent in the list. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. In addition, parts of the above technical solutions provided in the embodiments of the present application, which are consistent with the implementation principles of corresponding technical solutions in the prior art, are not described in detail so as to avoid redundant description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for processing time series data of a thermal power plant is characterized by comprising the following steps:
acquiring time sequence data acquired by a sensor in thermal power plant equipment;
storing the time series data in a distributed file system in a distributed storage mode, and forming a data set in the distributed file system;
reading the data set into a spark calculation frame, and calculating the data set by utilizing spark sql to obtain a calculated data set;
and storing the calculated data set into a Hive database.
2. The method for processing time series data of a thermal power plant according to claim 1, after acquiring the time series data collected by the sensor in the thermal power plant equipment, further comprising:
storing the time series data in a time series database;
accordingly, storing the time series data in a distributed storage manner in a distributed file system, and forming a data set in the distributed file system, includes:
reading the time sequence data from the time sequence database at intervals of preset time, storing the read time sequence data in the distributed file system in a distributed storage mode, forming a data set in the distributed file system by the time sequence data read for the first time, and adding the time sequence data read except for the first time to the corresponding data set.
3. The thermal power plant time series data processing method as recited in claim 1, further comprising, after reading the data set into a spark calculation framework:
storing the dataset in the spark calculation framework in a DataFrame format.
4. The thermal power plant time series data processing method according to claim 3, wherein before the calculating the data set by using spark sql, the method comprises:
the dataset was preprocessed with spark sql.
5. The thermal power plant time series data processing method according to claim 4, wherein preprocessing the data set by using spark sql comprises:
performing data elimination on the operation data corresponding to each type of operation parameter in the data set by using a 3 sigma criterion so as to eliminate outlier operation data;
comparing the operation data corresponding to each type of operation parameters with the corresponding set maximum value and set minimum value, and rejecting the operation data larger than the set maximum value and the operation data smaller than the set minimum value;
for each type of the operation parameters, eliminating operation data which are kept unchanged within a first set time length;
and for each type of the operation parameters, removing unstable operation data within a second set time length to obtain a preprocessed data set.
6. The thermal power plant time series data processing method according to claim 5, wherein the calculating the data set by using spark sql to obtain a calculated data set comprises:
for each type of the operation parameters in the preprocessed data set, dividing parameter intervals according to the maximum value and the minimum value of corresponding operation data by utilizing spark sql;
aggregating the parameter intervals corresponding to the different types of the operation parameters by utilizing spark sql to obtain a plurality of working conditions;
and calculating the optimal operation data combination corresponding to each working condition by utilizing spark sql and a multi-objective fuzzy optimization algorithm to obtain the calculated data set.
7. The thermal power plant time series data processing method according to claim 1, wherein storing the time series data in a distributed storage manner in a distributed file system comprises:
and storing the time series data in the HDFS in a distributed storage mode.
8. A thermal power plant time series data processing apparatus, comprising:
the acquisition module is used for acquiring time sequence data acquired by a sensor in thermal power plant equipment;
the first storage module is used for storing the time sequence data in a distributed file system in a distributed storage mode and forming a data set in the distributed file system;
the calculation module is used for reading the data set into a spark calculation frame and calculating the data set by utilizing spark sql to obtain a calculated data set;
and the second storage module is used for storing the calculated data set into the Hive database.
9. A thermal power plant time series data processing apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the thermal power plant time series data processing method as claimed in any one of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for thermal power plant time series data processing according to any one of claims 1 to 7.
CN202010512753.6A 2020-06-08 2020-06-08 Thermal power plant time sequence data processing method, device, equipment and medium Withdrawn CN111639060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010512753.6A CN111639060A (en) 2020-06-08 2020-06-08 Thermal power plant time sequence data processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010512753.6A CN111639060A (en) 2020-06-08 2020-06-08 Thermal power plant time sequence data processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN111639060A true CN111639060A (en) 2020-09-08

Family

ID=72331189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010512753.6A Withdrawn CN111639060A (en) 2020-06-08 2020-06-08 Thermal power plant time sequence data processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN111639060A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612823A (en) * 2020-12-14 2021-04-06 南京铁道职业技术学院 Big data time sequence analysis method based on fusion of Pyspark and Pandas
CN112700122A (en) * 2020-12-29 2021-04-23 华润电力技术研究院有限公司 Thermodynamic system performance calculation method, device and equipment
CN112835908A (en) * 2021-02-22 2021-05-25 广东数程科技有限公司 Time sequence data storage method, system, storage device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933620A (en) * 2019-03-18 2019-06-25 上海大学 Thermoelectricity big data method for digging based on Spark
CN110109923A (en) * 2019-04-04 2019-08-09 北京市天元网络技术股份有限公司 Storage method, analysis method and the device of time series data
CN110765154A (en) * 2019-10-16 2020-02-07 华电莱州发电有限公司 Method and device for processing mass real-time generated data of thermal power plant

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933620A (en) * 2019-03-18 2019-06-25 上海大学 Thermoelectricity big data method for digging based on Spark
CN110109923A (en) * 2019-04-04 2019-08-09 北京市天元网络技术股份有限公司 Storage method, analysis method and the device of time series data
CN110765154A (en) * 2019-10-16 2020-02-07 华电莱州发电有限公司 Method and device for processing mass real-time generated data of thermal power plant

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612823A (en) * 2020-12-14 2021-04-06 南京铁道职业技术学院 Big data time sequence analysis method based on fusion of Pyspark and Pandas
CN112700122A (en) * 2020-12-29 2021-04-23 华润电力技术研究院有限公司 Thermodynamic system performance calculation method, device and equipment
CN112835908A (en) * 2021-02-22 2021-05-25 广东数程科技有限公司 Time sequence data storage method, system, storage device and storage medium

Similar Documents

Publication Publication Date Title
CN111639060A (en) Thermal power plant time sequence data processing method, device, equipment and medium
CN106528787B (en) query method and device based on multidimensional analysis of mass data
CN102521386B (en) Method for grouping space metadata based on cluster storage
CN102915347A (en) Distributed data stream clustering method and system
CN108388603B (en) Spark framework-based distributed summary data structure construction method and query method
CN107220285A (en) Towards the temporal index construction method of magnanimity track point data
US11442915B2 (en) Methods and systems for extracting and visualizing patterns in large-scale data sets
CN110222029A (en) A kind of big data multidimensional analysis computational efficiency method for improving and system
CN111078634B (en) Distributed space-time data indexing method based on R tree
CN105786996A (en) Electricity information data quality analyzing system
WO2020118928A1 (en) Distributed time sequence pattern retrieval method for massive equipment operation data
CN109783441A (en) Mass data inquiry method based on Bloom Filter
Lei et al. An incremental clustering algorithm based on grid
CN110795469A (en) Spark-based high-dimensional sequence data similarity query method and system
CN104794237A (en) Web page information processing method and device
CN106776810B (en) Big data processing system and method
KR101331350B1 (en) Large-scale, time-series data handling method using data cube
CN112148830A (en) Semantic data storage and retrieval method and device based on maximum area grid
Yu et al. DBWGIE-MR: A density-based clustering algorithm by using the weighted grid and information entropy based on MapReduce
CN108596390B (en) Method for solving vehicle path problem
Moertini et al. Big Data Reduction Technique using Parallel Hierarchical Agglomerative Clustering.
Wang et al. Uncertain top-k query processing in distributed environments
CN111813800A (en) Streaming data real-time approximate calculation method based on deep reinforcement learning
Song et al. Large scale data storage and processing of insulator leakage current using HBase and mapreduce
CN104715031A (en) Outlier division sampling method used in mass data approximate aggregation query

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200908