CN116560850A - Distributed computing method for realizing digital energy management system - Google Patents

Distributed computing method for realizing digital energy management system Download PDF

Info

Publication number
CN116560850A
CN116560850A CN202310590722.6A CN202310590722A CN116560850A CN 116560850 A CN116560850 A CN 116560850A CN 202310590722 A CN202310590722 A CN 202310590722A CN 116560850 A CN116560850 A CN 116560850A
Authority
CN
China
Prior art keywords
data
memory
spark
task
hdfs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310590722.6A
Other languages
Chinese (zh)
Inventor
李启龙
马越
黄晶晶
褚治广
张如燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Qisheng Technology Group Co ltd
Original Assignee
Liaoning Qisheng Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Qisheng Technology Group Co ltd filed Critical Liaoning Qisheng Technology Group Co ltd
Priority to CN202310590722.6A priority Critical patent/CN116560850A/en
Publication of CN116560850A publication Critical patent/CN116560850A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/548Queue
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a distributed computing method for realizing a digital energy management system, which comprises the following steps: s1, acquiring offline data and real-time data to realize unified collection of multi-source data; s2, storing collected data based on a digital energy management system; s3, building a Spark distributed environment, and distributing memory according to task requests submitted by a system; s4, calculating and analyzing data by Spark, and storing the result in Hdfs and Hbase; s5, acquiring data from Hdfs and Hbase, and digitally displaying the data. The distributed computing method for realizing the digital energy management system adopts the memory storage Spark computing framework to process the data in real time, so that the problems of server paralysis and overlong data processing time caused by data accumulation in computing caused by storing the acquired mass data into Hdfs are solved; the multi-source data acquisition is carried out, the integrity of the acquired data is ensured, and the problem that the data is easy to lose in the acquisition process is solved.

Description

Distributed computing method for realizing digital energy management system
Technical Field
The invention relates to the technical field of distributed computing methods, in particular to a distributed computing method for realizing a digital energy management system.
Background
With the rapid development of social industrialization and science and technology, the energy consumption amount of China is also accelerating to increase. The 5G base station and the data center are large consumers of energy consumption, more 5G base stations, edge data centers and large data centers are deployed in the future, and the traditional network energy technology and construction mode are difficult to meet the requirements of low carbon and zero carbon of a network and the sustainable development of operators, so that a digital energy management system is presented. The digital energy management system converts the traditional network energy infrastructure into digital and intelligent, realizes the interconnection, management and scheduling of the power generation and power utilization all-purpose energy chain, and constructs a high-speed data communication network and a high-efficiency energy supply network by stations, machine rooms and data centers. However, at present, the digital energy management system mostly adopts a Hadoop platform to perform analysis and calculation on data, although Hadoop MapReduce has a powerful data processing function, with the development of parallel computing technology, the data read-write mode of MapReduce based on hard disk read-write can cause too long time for data processing, and the application of low-delay access data in the millisecond range is not suitable for Hdfs and cannot process real-time data.
Disclosure of Invention
The invention aims to provide a distributed computing method for realizing a digital energy management system, which adopts a memory storage Spark computing framework to process data in real time, and solves the problems of server paralysis and overlong data processing time caused by data accumulation in computing caused by storing acquired mass data into Hdfs; the multi-source data acquisition is carried out, the integrity of the acquired data is ensured, and the problem that the data is easy to lose in the acquisition process is solved.
In order to achieve the above object, the present invention provides a distributed computing method for implementing a digital energy management system, including the steps of:
s1, data acquisition is carried out on offline data and real-time data, and unified collection of multi-source data is achieved
Based on an integrated architecture of kafka and Flume, real-time streaming acquisition is carried out on various energy consumption change real-time data, so that unified collection of offline energy data, real-time energy data and metadata multi-source data is realized;
s2, storing the acquired data based on the digital energy management system
Storing the acquired data, storing offline energy data in an Hdfs distributed file system, and storing user related data in a MySQL relational database in a digital energy management system; setting a group of independent topic as temporary storage points of real-time data in a Kafka cluster, reconstructing the data into data which is consistent with the data stored in an Hdfs distributed database for the data originally stored in the system, importing the data into an Hbase column database through an sqoop component, and adding a secondary index based on memory for the distributed database Hbase;
s3, building a Spark distributed environment, and distributing memory according to task requests submitted by the system
When a Spark distributed environment is built and an energy management system submits a query task, spark allocates memory resources, a FAIR allocation mechanism FAIR is adopted to allocate memory, and the memory allocation mechanism realizes a memory pool with adjustable size and shared among tasks in the same Spark context;
s4, calculating and analyzing the data by Spark, and storing the result in Hdfs and Hbase
According to the calculation request, a Spark calculation task enters a message queue, waits for yarn to automatically allocate server resources, distributes data to Spark RDDs for distributed calculation after dequeuing, each RDD supports MapReduce operation, generates new RDDs after the MapReduce operation without modifying the original RDDs, the data set of the RDDs is partitioned, the RDDs place each data partition on different partitions for calculation, and then store the data after calculation analysis in HDFS and HBase;
s5, acquiring data from Hdfs and Hbase, and digitally displaying the data
And acquiring data from Hdfs and Hbase by using a third-party tool Grafana for data visualization and large-screen display.
Preferably, in step S3, the FAIR allocation mechanism specifically includes the steps of:
s31, submitting Spark programs to Spark in the form of Stage, wherein each Stage generates a set task for recording tasks at the Stage and submits the set task to a memory pool;
s32, traversing all tasks of the task set, adding the tasks into a HashMap named MemoryFortask, and recording the current memory occupation amount of each task by the HashMap;
s33, the Spark records the currently active task, if the execution memory is insufficient at the moment, the memory occupied by the storage memory is recovered, and a task memory value interval which can be allocated is obtained according to the maximum memory after the memory is recovered;
s34, memory is allocated, and the upper limit of allocation is the average memory of active tasks in Spark;
s35, spark checks the residual memory, if the residual memory does not meet the next task memory allocation requirement, the memory pool management thread sleeps until the memory resource meets the task memory allocation requirement, and if so, the memory allocation is continuously executed from the step S33.
Preferably, in step S4, the whole calculation process from RDD to output result is a main line, and in a specific Spark distributed calculation, the calculation process includes the following steps:
s41, constructing a dependency relationship between RDDs, and converting the RDDs into a directed acyclic graph of the stage;
s42, submitting a task according to the idle computing resource condition by Spark, and monitoring and processing the running state of the task;
s43, spark builds a task running environment, executes tasks and returns task results;
s44, when wide dependence exists between two stages, as in the distributed computation, each computation node of each stage only processes part of data of a task, if the next stage needs to rely on all computation results of the previous stage, then the computation results of the previous stage need to be re-integrated and classified, namely a Shuffle operation is performed;
s45, collecting and summarizing the results from the step S43 and the step S44, and storing the summarized calculation results in Hbase and Hdfs.
Therefore, the distributed computing method for realizing the digital energy management system has the following technical effects:
(1) The memory is used for storing Spark computing frames to process data in real time, and meanwhile, the problems that the server is paralyzed and data accumulation causes overlong data processing time when computing is caused by directly storing collected mass data into an Hdfs distributed file system in the past are solved.
(2) The multi-source data acquisition is carried out in the acquired data, so that the integrity of the acquired data is ensured, and the problem that the data is easy to lose in the acquisition process in the past is solved.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a flow chart of a distributed computing method for implementing a digital energy management system in accordance with the present invention.
Detailed Description
The technical scheme of the invention is further described below through the attached drawings and the embodiments.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art. Such other embodiments are also within the scope of the present invention.
It should also be understood that the above-mentioned embodiments are only for explaining the present invention, the protection scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the protection scope of the present invention by equally replacing or changing the technical scheme and the inventive concept thereof within the scope of the present invention.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be considered part of the specification where appropriate.
The disclosures of the prior art documents cited in the present specification are incorporated by reference in their entirety into the present invention and are therefore part of the present disclosure.
Example 1
As shown in the figure, the invention provides a distributed computing method for realizing a digital energy management system, which comprises the following steps:
step 1: based on the integrated architecture of kafka and Flume, real-time streaming acquisition is carried out on various energy consumption change real-time data, and unified collection of multi-source data such as offline energy data, real-time energy data, metadata and the like is realized.
Step 2: storing the acquired data, storing the offline energy data in an HDFS distributed file system, and storing the user related data in a MySQL relational database in a digital energy management system; the real-time change data sets a group of independent topic as a temporary storage point of the real-time data in the Kafka cluster, the data is reconstructed into data which accords with the data stored in the Hdfs distributed database, the data is imported into the Hbase column database through the sqoop component, and a secondary index based on a memory is added for the distributed database Hbase, so that the retrieval speed is improved.
Step 3: and building a Spark distributed environment, wherein when the energy system submits a query task, the Spark allocates memory resources. In this example, a FAIR allocation mechanism FAIR is used to allocate memory, where the memory allocation mechanism implements a memory pool with adjustable size and shared between tasks in the same SparkContext, and the memory pool can ensure balance of memory allocation between multiple tasks. The FAIR allocation mechanism comprises the following specific steps:
step 3-1: the Spark program submits Spark in the form of Stage, each Stage generates a set task recording the task at that Stage and submits the set task to the memory pool.
Step 3-2: traversing all the tasks of the task set, adding the tasks into a HashMap named MemoryFortask, and recording the current memory occupation amount of each task by the HashMap.
Step 3-3: and (3) the Spark records the currently active task, if the execution memory is insufficient at the moment, the memory occupied by the storage memory is recovered, and a range of memory values, which can be allocated to the task, is obtained according to the maximum memory after the storage memory is recovered.
Step 3-4: and allocating the memory, wherein the upper limit of allocation is the average memory of the active tasks in Spark.
Step 3-5: spark checks the remaining memory, if the remaining memory does not meet the next task memory allocation requirement, the memory pool management thread sleeps until the memory resources meet the task memory allocation requirement. If yes, continuing to execute the memory allocation from the step 3-3.
Step 4: spark computes, counts, and analyzes the received data. According to the calculation request, the Spark calculation task enters a message queue, waits for yarn to automatically allocate server resources, distributes data to Spark RDDs for distributed calculation after dequeuing, each RDD supports MapReduce operation, generates new RDDs after MapReduce operation without modifying original RDDs, the data set of the RDDs is partitioned, the RDDs place each data partition on different partitions for calculation, and then store the data after calculation analysis in HDFS and HBase.
In this step, the whole calculation process from RDD to output result is a main line, and in specific Spark distributed calculation, the calculation process includes the following steps:
step 4-1: and constructing a dependency relationship between RDDs, and converting the RDDs into a directed acyclic graph of stages.
Step 4-2: and submitting the task by Spark according to the idle computing resource condition, and monitoring and processing the running state of the task.
Step 4-3: spark builds a task running environment, executes tasks and returns task results.
Step 4-4: when there is a wide dependency between two stages, since in the distributed computing, each computing node of each stage only processes a part of data of a task, if the next stage needs to rely on all computing results of the previous stage, it needs to re-integrate and classify all computing results of the previous stage, that is, perform a Shuffle operation.
Step 4-5: the results are collected and summarized in the steps 4-3 and 4-4, and the summarized calculation results are stored in Hbase and Hdfs.
Step 5: and a third party tool Grafana is used for taking data from Hdfs and Hbase to perform data visualization and large-screen display of the data, so that the function of a digital energy management system is realized, and the energy is digitized and intelligentized.
Therefore, the distributed computing method for realizing the digital energy management system adopts the memory storage Spark computing framework to process the data in real time, and solves the problems of server paralysis and overlong data processing time caused by data accumulation when the acquired mass data is stored in Hdfs; the multi-source data acquisition is carried out, the integrity of the acquired data is ensured, and the problem that the data is easy to lose in the acquisition process is solved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims (3)

1. A distributed computing method for realizing a digital energy management system is characterized in that: the method comprises the following steps:
s1, carrying out data acquisition on offline data and real-time data, realizing unified aggregation of multi-source data based on an integrated architecture of kafka and Flume, carrying out real-time streaming acquisition on various energy consumption change real-time data, and realizing unified aggregation of offline energy data, real-time energy data and metadata multi-source data;
s2, storing the acquired data based on the digital energy management system
Storing the acquired data, storing offline energy data in an Hdfs distributed file system, and storing user related data in a MySQL relational database in a digital energy management system; setting a group of independent topic as temporary storage points of real-time data in a Kafka cluster, reconstructing the data into data which is consistent with the data stored in an Hdfs distributed database for the data originally stored in the system, importing the data into an Hbase column database through an sqoop component, and adding a secondary index based on memory for the distributed database Hbase;
s3, building a Spark distributed environment, and distributing memory according to task requests submitted by the system
When a Spark distributed environment is built and an energy management system submits a query task, spark allocates memory resources, a FAIR allocation mechanism FAIR is adopted to allocate memory, and the memory allocation mechanism realizes a memory pool with adjustable size and shared among tasks in the same Spark context;
s4, calculating and analyzing the data by Spark, and storing the result in Hdfs and Hbase
According to the calculation request, a Spark calculation task enters a message queue, waits for yarn to automatically allocate server resources, distributes data to Spark RDDs for distributed calculation after dequeuing, each RDD supports MapReduce operation, generates new RDDs after the MapReduce operation without modifying the original RDDs, the data set of the RDDs is partitioned, the RDDs place each data partition on different partitions for calculation, and then store the data after calculation analysis in HDFS and HBase;
s5, acquiring data from Hdfs and Hbase, and digitally displaying the data
And acquiring data from Hdfs and Hbase by using a third-party tool Grafana for data visualization and large-screen display.
2. A distributed computing method for implementing a digital energy management system according to claim 1, wherein: in step S3, the FAIR allocation mechanism specifically includes the steps of:
s31, submitting Spark programs to Spark in the form of Stage, wherein each Stage generates a set task for recording tasks at the Stage and submits the set task to a memory pool;
s32, traversing all tasks of the task set, adding the tasks into a HashMap named MemoryFortask, and recording the current memory occupation amount of each task by the HashMap;
s33, the Spark records the currently active task, if the execution memory is insufficient at the moment, the memory occupied by the storage memory is recovered, and a task memory value interval which can be allocated is obtained according to the maximum memory after the memory is recovered;
s34, memory is allocated, and the upper limit of allocation is the average memory of active tasks in Spark;
s35, spark checks the residual memory, if the residual memory does not meet the next task memory allocation requirement, the memory pool management thread sleeps until the memory resource meets the task memory allocation requirement, and if so, the memory allocation is continuously executed from the step S33.
3. A distributed computing method for implementing a digital energy management system according to claim 1, wherein: in step S4, the whole calculation process from RDD to output result is a main line, and in specific Spark distributed calculation, the calculation process includes the following steps:
s41, constructing a dependency relationship between RDDs, and converting the RDDs into a directed acyclic graph of the stage;
s42, submitting a task according to the idle computing resource condition by Spark, and monitoring and processing the running state of the task;
s43, spark builds a task running environment, executes tasks and returns task results;
s44, when wide dependence exists between two stages, as in the distributed computation, each computation node of each stage only processes part of data of a task, if the next stage needs to rely on all computation results of the previous stage, then the computation results of the previous stage need to be re-integrated and classified, namely a Shuffle operation is performed;
s45, collecting and summarizing the results from the step S43 and the step S44, and storing the summarized calculation results in Hbase and Hdfs.
CN202310590722.6A 2023-05-24 2023-05-24 Distributed computing method for realizing digital energy management system Pending CN116560850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310590722.6A CN116560850A (en) 2023-05-24 2023-05-24 Distributed computing method for realizing digital energy management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310590722.6A CN116560850A (en) 2023-05-24 2023-05-24 Distributed computing method for realizing digital energy management system

Publications (1)

Publication Number Publication Date
CN116560850A true CN116560850A (en) 2023-08-08

Family

ID=87501670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310590722.6A Pending CN116560850A (en) 2023-05-24 2023-05-24 Distributed computing method for realizing digital energy management system

Country Status (1)

Country Link
CN (1) CN116560850A (en)

Similar Documents

Publication Publication Date Title
CN110022226B (en) Object-oriented data acquisition system and acquisition method
Huang et al. An optimistic job scheduling strategy based on QoS for cloud computing
CN110047014A (en) A kind of user's electricity data restorative procedure based on load curve and history electricity
WO2020211300A1 (en) Resource allocation method and apparatus, and computer device and storage medium
CN104407879B (en) A kind of power network sequential big data loaded in parallel method
CN107070890A (en) Flow data processing device and communication network major clique system in a kind of communication network major clique system
CN106790718A (en) Service call link analysis method and system
CN104820670A (en) Method for acquiring and storing big data of power information
CN106982356B (en) Distributed large-scale video stream processing system
CN108469988A (en) A kind of method for scheduling task based on isomery Hadoop clusters
CN106777093A (en) Skyline inquiry systems based on space time series data stream application
CN112860695B (en) Monitoring data query method, device, equipment, storage medium and program product
CN106649687A (en) Method and device for on-line analysis and processing of large data
CN107895046A (en) A kind of Heterogeneous Database Integration Platform
CN107291539A (en) Cluster program scheduler method based on resource significance level
CN112632025A (en) Power grid enterprise management decision support application system based on PAAS platform
CN114327811A (en) Task scheduling method, device and equipment and readable storage medium
CN115687468A (en) System for processing data in distributed service by ETL process button
CN108319604B (en) Optimization method for association of large and small tables in hive
CN116560850A (en) Distributed computing method for realizing digital energy management system
CN111049898A (en) Method and system for realizing cross-domain architecture of computing cluster resources
CN112000703B (en) Data warehousing processing method and device, computer equipment and storage medium
CN116186053A (en) Data processing method, device and storage medium
Wei et al. An optimization method for elasticsearch index shard number
CN111538575B (en) Resource scheduling system, method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination