CN116560850A

CN116560850A - Distributed computing method for realizing digital energy management system

Info

Publication number: CN116560850A
Application number: CN202310590722.6A
Authority: CN
Inventors: 李启龙; 马越; 黄晶晶; 褚治广; 张如燕
Original assignee: Liaoning Qisheng Technology Group Co ltd
Current assignee: Liaoning Qisheng Technology Group Co ltd
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-08-08

Abstract

The invention discloses a distributed computing method for realizing a digital energy management system, which comprises the following steps: s1, acquiring offline data and real-time data to realize unified collection of multi-source data; s2, storing collected data based on a digital energy management system; s3, building a Spark distributed environment, and distributing memory according to task requests submitted by a system; s4, calculating and analyzing data by Spark, and storing the result in Hdfs and Hbase; s5, acquiring data from Hdfs and Hbase, and digitally displaying the data. The distributed computing method for realizing the digital energy management system adopts the memory storage Spark computing framework to process the data in real time, so that the problems of server paralysis and overlong data processing time caused by data accumulation in computing caused by storing the acquired mass data into Hdfs are solved; the multi-source data acquisition is carried out, the integrity of the acquired data is ensured, and the problem that the data is easy to lose in the acquisition process is solved.

Description

Distributed computing method for realizing digital energy management system

Technical Field

The invention relates to the technical field of distributed computing methods, in particular to a distributed computing method for realizing a digital energy management system.

Background

With the rapid development of social industrialization and science and technology, the energy consumption amount of China is also accelerating to increase. The 5G base station and the data center are large consumers of energy consumption, more 5G base stations, edge data centers and large data centers are deployed in the future, and the traditional network energy technology and construction mode are difficult to meet the requirements of low carbon and zero carbon of a network and the sustainable development of operators, so that a digital energy management system is presented. The digital energy management system converts the traditional network energy infrastructure into digital and intelligent, realizes the interconnection, management and scheduling of the power generation and power utilization all-purpose energy chain, and constructs a high-speed data communication network and a high-efficiency energy supply network by stations, machine rooms and data centers. However, at present, the digital energy management system mostly adopts a Hadoop platform to perform analysis and calculation on data, although Hadoop MapReduce has a powerful data processing function, with the development of parallel computing technology, the data read-write mode of MapReduce based on hard disk read-write can cause too long time for data processing, and the application of low-delay access data in the millisecond range is not suitable for Hdfs and cannot process real-time data.

Disclosure of Invention

The invention aims to provide a distributed computing method for realizing a digital energy management system, which adopts a memory storage Spark computing framework to process data in real time, and solves the problems of server paralysis and overlong data processing time caused by data accumulation in computing caused by storing acquired mass data into Hdfs; the multi-source data acquisition is carried out, the integrity of the acquired data is ensured, and the problem that the data is easy to lose in the acquisition process is solved.

In order to achieve the above object, the present invention provides a distributed computing method for implementing a digital energy management system, including the steps of:

s1, data acquisition is carried out on offline data and real-time data, and unified collection of multi-source data is achieved

Based on an integrated architecture of kafka and Flume, real-time streaming acquisition is carried out on various energy consumption change real-time data, so that unified collection of offline energy data, real-time energy data and metadata multi-source data is realized;

s2, storing the acquired data based on the digital energy management system

Storing the acquired data, storing offline energy data in an Hdfs distributed file system, and storing user related data in a MySQL relational database in a digital energy management system; setting a group of independent topic as temporary storage points of real-time data in a Kafka cluster, reconstructing the data into data which is consistent with the data stored in an Hdfs distributed database for the data originally stored in the system, importing the data into an Hbase column database through an sqoop component, and adding a secondary index based on memory for the distributed database Hbase;

s3, building a Spark distributed environment, and distributing memory according to task requests submitted by the system

When a Spark distributed environment is built and an energy management system submits a query task, spark allocates memory resources, a FAIR allocation mechanism FAIR is adopted to allocate memory, and the memory allocation mechanism realizes a memory pool with adjustable size and shared among tasks in the same Spark context;

s4, calculating and analyzing the data by Spark, and storing the result in Hdfs and Hbase

According to the calculation request, a Spark calculation task enters a message queue, waits for yarn to automatically allocate server resources, distributes data to Spark RDDs for distributed calculation after dequeuing, each RDD supports MapReduce operation, generates new RDDs after the MapReduce operation without modifying the original RDDs, the data set of the RDDs is partitioned, the RDDs place each data partition on different partitions for calculation, and then store the data after calculation analysis in HDFS and HBase;

s5, acquiring data from Hdfs and Hbase, and digitally displaying the data

And acquiring data from Hdfs and Hbase by using a third-party tool Grafana for data visualization and large-screen display.

Preferably, in step S3, the FAIR allocation mechanism specifically includes the steps of:

s31, submitting Spark programs to Spark in the form of Stage, wherein each Stage generates a set task for recording tasks at the Stage and submits the set task to a memory pool;

s32, traversing all tasks of the task set, adding the tasks into a HashMap named MemoryFortask, and recording the current memory occupation amount of each task by the HashMap;

s33, the Spark records the currently active task, if the execution memory is insufficient at the moment, the memory occupied by the storage memory is recovered, and a task memory value interval which can be allocated is obtained according to the maximum memory after the memory is recovered;

s34, memory is allocated, and the upper limit of allocation is the average memory of active tasks in Spark;

s35, spark checks the residual memory, if the residual memory does not meet the next task memory allocation requirement, the memory pool management thread sleeps until the memory resource meets the task memory allocation requirement, and if so, the memory allocation is continuously executed from the step S33.

Preferably, in step S4, the whole calculation process from RDD to output result is a main line, and in a specific Spark distributed calculation, the calculation process includes the following steps:

s41, constructing a dependency relationship between RDDs, and converting the RDDs into a directed acyclic graph of the stage;

s42, submitting a task according to the idle computing resource condition by Spark, and monitoring and processing the running state of the task;

s43, spark builds a task running environment, executes tasks and returns task results;

s44, when wide dependence exists between two stages, as in the distributed computation, each computation node of each stage only processes part of data of a task, if the next stage needs to rely on all computation results of the previous stage, then the computation results of the previous stage need to be re-integrated and classified, namely a Shuffle operation is performed;

s45, collecting and summarizing the results from the step S43 and the step S44, and storing the summarized calculation results in Hbase and Hdfs.

Therefore, the distributed computing method for realizing the digital energy management system has the following technical effects:

(1) The memory is used for storing Spark computing frames to process data in real time, and meanwhile, the problems that the server is paralyzed and data accumulation causes overlong data processing time when computing is caused by directly storing collected mass data into an Hdfs distributed file system in the past are solved.

(2) The multi-source data acquisition is carried out in the acquired data, so that the integrity of the acquired data is ensured, and the problem that the data is easy to lose in the acquisition process in the past is solved.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a flow chart of a distributed computing method for implementing a digital energy management system in accordance with the present invention.

Detailed Description

The technical scheme of the invention is further described below through the attached drawings and the embodiments.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art. Such other embodiments are also within the scope of the present invention.

It should also be understood that the above-mentioned embodiments are only for explaining the present invention, the protection scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the protection scope of the present invention by equally replacing or changing the technical scheme and the inventive concept thereof within the scope of the present invention.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be considered part of the specification where appropriate.

The disclosures of the prior art documents cited in the present specification are incorporated by reference in their entirety into the present invention and are therefore part of the present disclosure.

Example 1

As shown in the figure, the invention provides a distributed computing method for realizing a digital energy management system, which comprises the following steps:

step 1: based on the integrated architecture of kafka and Flume, real-time streaming acquisition is carried out on various energy consumption change real-time data, and unified collection of multi-source data such as offline energy data, real-time energy data, metadata and the like is realized.

Step 2: storing the acquired data, storing the offline energy data in an HDFS distributed file system, and storing the user related data in a MySQL relational database in a digital energy management system; the real-time change data sets a group of independent topic as a temporary storage point of the real-time data in the Kafka cluster, the data is reconstructed into data which accords with the data stored in the Hdfs distributed database, the data is imported into the Hbase column database through the sqoop component, and a secondary index based on a memory is added for the distributed database Hbase, so that the retrieval speed is improved.

Step 3: and building a Spark distributed environment, wherein when the energy system submits a query task, the Spark allocates memory resources. In this example, a FAIR allocation mechanism FAIR is used to allocate memory, where the memory allocation mechanism implements a memory pool with adjustable size and shared between tasks in the same SparkContext, and the memory pool can ensure balance of memory allocation between multiple tasks. The FAIR allocation mechanism comprises the following specific steps:

step 3-1: the Spark program submits Spark in the form of Stage, each Stage generates a set task recording the task at that Stage and submits the set task to the memory pool.

Step 3-2: traversing all the tasks of the task set, adding the tasks into a HashMap named MemoryFortask, and recording the current memory occupation amount of each task by the HashMap.

Step 3-3: and (3) the Spark records the currently active task, if the execution memory is insufficient at the moment, the memory occupied by the storage memory is recovered, and a range of memory values, which can be allocated to the task, is obtained according to the maximum memory after the storage memory is recovered.

Step 3-4: and allocating the memory, wherein the upper limit of allocation is the average memory of the active tasks in Spark.

Step 3-5: spark checks the remaining memory, if the remaining memory does not meet the next task memory allocation requirement, the memory pool management thread sleeps until the memory resources meet the task memory allocation requirement. If yes, continuing to execute the memory allocation from the step 3-3.

Step 4: spark computes, counts, and analyzes the received data. According to the calculation request, the Spark calculation task enters a message queue, waits for yarn to automatically allocate server resources, distributes data to Spark RDDs for distributed calculation after dequeuing, each RDD supports MapReduce operation, generates new RDDs after MapReduce operation without modifying original RDDs, the data set of the RDDs is partitioned, the RDDs place each data partition on different partitions for calculation, and then store the data after calculation analysis in HDFS and HBase.

In this step, the whole calculation process from RDD to output result is a main line, and in specific Spark distributed calculation, the calculation process includes the following steps:

step 4-1: and constructing a dependency relationship between RDDs, and converting the RDDs into a directed acyclic graph of stages.

Step 4-2: and submitting the task by Spark according to the idle computing resource condition, and monitoring and processing the running state of the task.

Step 4-3: spark builds a task running environment, executes tasks and returns task results.

Step 4-4: when there is a wide dependency between two stages, since in the distributed computing, each computing node of each stage only processes a part of data of a task, if the next stage needs to rely on all computing results of the previous stage, it needs to re-integrate and classify all computing results of the previous stage, that is, perform a Shuffle operation.

Step 4-5: the results are collected and summarized in the steps 4-3 and 4-4, and the summarized calculation results are stored in Hbase and Hdfs.

Step 5: and a third party tool Grafana is used for taking data from Hdfs and Hbase to perform data visualization and large-screen display of the data, so that the function of a digital energy management system is realized, and the energy is digitized and intelligentized.

Therefore, the distributed computing method for realizing the digital energy management system adopts the memory storage Spark computing framework to process the data in real time, and solves the problems of server paralysis and overlong data processing time caused by data accumulation when the acquired mass data is stored in Hdfs; the multi-source data acquisition is carried out, the integrity of the acquired data is ensured, and the problem that the data is easy to lose in the acquisition process is solved.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. A distributed computing method for realizing a digital energy management system is characterized in that: the method comprises the following steps:

s1, carrying out data acquisition on offline data and real-time data, realizing unified aggregation of multi-source data based on an integrated architecture of kafka and Flume, carrying out real-time streaming acquisition on various energy consumption change real-time data, and realizing unified aggregation of offline energy data, real-time energy data and metadata multi-source data;

s2, storing the acquired data based on the digital energy management system

s5, acquiring data from Hdfs and Hbase, and digitally displaying the data

2. A distributed computing method for implementing a digital energy management system according to claim 1, wherein: in step S3, the FAIR allocation mechanism specifically includes the steps of:

3. A distributed computing method for implementing a digital energy management system according to claim 1, wherein: in step S4, the whole calculation process from RDD to output result is a main line, and in specific Spark distributed calculation, the calculation process includes the following steps: