CN113806606A

CN113806606A - Three-dimensional scene-based electric power big data rapid visual analysis method and system

Info

Publication number: CN113806606A
Application number: CN202111046249.2A
Authority: CN
Inventors: 高菘; 姚明亮; 张龙浩; 付恩狄; 莫理; 梁宇柔; 刘永辉; 李德华; 胡道平; 陈远政
Original assignee: Information Communication Branch of Peak Regulation and Frequency Modulation Power Generation of China Southern Power Grid Co Ltd
Current assignee: Information Communication Branch of Peak Regulation and Frequency Modulation Power Generation of China Southern Power Grid Co Ltd
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2021-12-17

Abstract

The invention relates to the technical field of big data analysis, and discloses a three-dimensional scene-based electric big data rapid visual analysis method and a three-dimensional scene-based electric big data rapid visual analysis system, wherein the visual analysis method comprises the following processing flows: s1: collecting data, serializing the data into a big data file system (HDFS), and meanwhile, persisting the data into a database HBase; s2: determining which scheme is adopted to mine and analyze the data, and adopting a corresponding intelligent algorithm; s3: and mapping the output result set to a visualization module to form graphic information, integrating the result set with a scene by a visualization engine, and outputting the result set to a user in a three-dimensional space field visualization mode. According to the method and the system for rapidly visualizing the electric power big data based on the three-dimensional scene, the big data ecosystem and the visualization module are integrated, so that the graphic calculation and the numerical calculation of the data analysis of the three-dimensional scene are integrated in the same frame, the storage mode and the calculation mode are universal, and the multi-service collaborative visualization analysis work is realized.

Description

Three-dimensional scene-based electric power big data rapid visual analysis method and system

Technical Field

The invention relates to the technical field of big data analysis, in particular to a three-dimensional scene-based electric big data rapid visual analysis method and system.

Background

Big data is a new subject which is emerging in recent years, and research in the field starts soon, and no more intensive research and mature application exist in the field of electric power big data visualization analysis.

The application of the rapid data analysis method and the rapid scene drawing method to the electric power big data requires an integrated platform special for an electric power system. The existing big data application system does not have a specially designed framework for the power system, and the existing national power grid service system does not have an application specially designed for the big data, so that various data and services are separated from each other, and a user cannot perform visual analysis work on a unified platform. The mode of respectively processing different data on different platforms and then trying to integrate the data into a visual analysis system not only brings much inconvenience to users, but also cannot obtain and visually display the correlation among various data in the data mining process.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method and a system for rapidly visualizing analysis of electric power big data based on a three-dimensional scene, and provides an integrated model for visualizing analysis of electric power big data, a big data ecosystem and a visualization module are integrated, so that the graphic calculation of the three-dimensional scene and the numerical calculation of data analysis are integrated in the same frame, and the storage mode and the calculation mode are universal, thereby realizing multi-service collaborative visualization analysis work without adopting the traditional processing mode of performing data mining and exporting before importing data into a visualization system for analysis.

In order to achieve the purpose of the method and the system for rapidly and visually analyzing the electric power big data based on the three-dimensional scene, the invention provides the following technical scheme: the method for rapidly and visually analyzing the big electric power data based on the three-dimensional scene comprises the following processing flows of a visual analysis task:

the method comprises the following steps: collecting data, serializing the data into a big data file system (HDFS), and meanwhile, persisting the data into a database HBase;

step two: determining which scheme is adopted to mine and analyze the data, and adopting a corresponding intelligent algorithm;

step three: and mapping the output result set to a visualization module to form graphic information, integrating the result set with a scene by a visualization engine, and outputting the result set to a user in a three-dimensional space field visualization mode.

The visual analysis system has the following functional modules:

(1) a service module: providing a high-level abstract interface to realize the service requirement of a user layer;

(2) a visualization engine: as a core subsystem of a model, an integration method of data and scenes should be realized, and rapid rendering of a large-scale three-dimensional scene is realized to meet the requirements of practical application;

(3) a calculation module: the method is used for implementing various intelligent algorithms to complete data mining work;

(4) a control module: the method is used for finishing the scheduling function of big data jobs and realizing reasonable load balancing control;

(5) a storage module: for storing data, large data file systems and database systems need to be implemented.

Preferably, the system is layered by combining the possibly generated services with the modules, namely an interface layer, an engine layer, a calculation layer, a control layer and a persistence layer;

(1) durable layer

The method comprises the following steps that a Hadoop file system HDFS and a database system HBase are adopted and used for storing all types of data, including scene data, numerical data, log data generated in actual operation and the like;

(2) control layer

A Hadoop task scheduling module ZooKeeper and a chip-level parallel technology MPI are respectively used for controlling different types of computing tasks, the ZooKeeper is used for controlling a data intensive computing mode STORM, and the MPI is used for controlling a compute intensive computing mode CUDA. The control modules complete task scheduling, low-level load balancing and simple fault-tolerant processing;

(3) computing layer

The method comprises the steps of classifying calculation tasks, not distinguishing the calculation tasks by graph calculation and data calculation, but classifying the calculation tasks into data intensive type and calculation intensive type according to calculation characteristics, wherein the classification method is used for improving the calculation efficiency as much as possible; all computation intensive computations are distributed to the CUDA module for execution;

(4) engine layer

The engine is a core subsystem in the model, the module realizes a fast rendering engine, a plurality of optimization algorithms and strategies are designed for a large-scale three-dimensional scene under a big data environment, the real-time rendering efficiency of the scene can meet the actual application requirement, the real-time rendering problem of the large-scale scene is always a research hotspot, and two methods are respectively provided for accelerating rendering aiming at the problem: one method is a visibility elimination method based on octree, and the other method is a multi-resolution drawing method based on weight function LOD, wherein the two methods will introduce an algorithm and an implementation process in detail in the subsequent sections and verify the efficiency through experiments.

(5) Interface layer

The interface layer is used as a high-level abstraction, and a series of interfaces are defined for operations directly performed by a user, wherein the interfaces comprise operations on a scene and data and other operations such as scene import, data analysis, log export and the like, and the interfaces simultaneously reserve fields for indicating whether the calculation type of a requested task is data intensive or calculation intensive; for tasks that do not require parallel computation, this field should be set to NULL to avoid unnecessary task scheduling and inter-node communication time loss.

Preferably, when the interface layer receives a computing task request, for a data-intensive computing task, the parallelism of the data amount should be considered to be maximized, so a distributed real-time computing framework STORM is adopted; for the calculation intensive calculation task, the parallelism of the function thread should be maximized, so a super computing framework CUDA is adopted;

the STORM only carries out parallel computation on the CPU array, the CUDA only carries out parallel computation on the GPU array, the two middle-layer computing frames do not need to divide own scope in cluster hardware independently, the cluster can adopt a plurality of machines as nodes to realize cooperative parallel computation on heterogeneous resources, the computing layer judges whether a computing task belongs to data intensive type or computing intensive type according to the task type defined by the interface layer, decomposes and distributes the task to the corresponding computing frame, carries out parallel processing on computing resources of different types, the computing task in the CUDA frame can be distributed to all GPU resources in the cluster by Hadoop to carry out parallel computing processing, and the computing task in the STORM frame can be distributed to all CPU resources in the cluster by Hadoop to carry out parallel computing processing.

Preferably, the STORM distributed computing framework uses a single control node (Master) named as Nimbus, when an interface layer receives a service request, the interface layer analyzes which type of computing task is according to the interface type, if the computing task is data intensive, the computing task is submitted to the Nimbus for topology generation operation, the Nimbus sends a generated task topology sequence to a Zookeeper of a control layer, the Zookeeper uniformly schedules the tasks, the STORM computing nodes (Slave) are divided into two types, one type is a spout (Supervisor) for distributing primitives, the other type is a bolt (worker) for computing the primitives, the STORM does not require all the primitives to perform the same operation and is more suitable for processing the data intensive tasks, and the STORM can realize better acceleration ratio for non-iterative tasks (data intensive tasks) in numerical computation; STORM can also achieve better efficiency for computational tasks that are not suitable for acceleration by CUDA in graphics tasks.

Preferably, the CUDA architecture is based on a Single Instruction Multiple Thread (SIMT) model, and is an extension of a Single Instruction Multiple Data (SIMD) model, a function executed on a GPU is called a kernel function (kernel), when the kernel function is executed, the kernel function is concurrently transmitted to all stream processors sp (stream processors) in an array, one kernel is only a function, but not a complete program, before the kernel is executed, the CPU is required to assist in completing Data preprocessing and device initialization, the CUDA calculation process is divided into three stages, i.e., input, execution and output, and in the first stage, a GPU memory space is allocated for the main program for input and output Data, and input Data is transmitted from a CPU memory to a GPU memory; in the second stage, the main program starts a kernel program on the GPU and executes tasks in parallel; in the third stage, when the kernel program is finished, the main program transmits the output data of the kernel program from the GPU memory to the CPU memory so as to obtain an output result;

the CUDA architecture divides computing resources into two classes: the CUDA is a parallel mode of a single control node, a Grid-Block-Thread three-layer model is adopted in a CUDA programming mode, each layer has different indexes, a synchronous mode, a shared memory mode and a collaborative computing mode, a computing task is gradually refined in granularity, Thread with the finest granularity has the highest parallelism, and the CUDA is suitable for large-scale and balanced high-concurrency computing tasks.

Compared with the prior art, the invention provides a method and a system for rapidly and visually analyzing electric power big data based on a three-dimensional scene, and the method and the system have the following beneficial effects:

the electric power big data rapid visual analysis method and system based on the three-dimensional scene is based on a Hadoop ecosystem design, the Hadoop system is widely practical and easy to use in the big data processing field, after being pushed out, the Hadoop system quickly obtains wide attention and research of academia, and is also popularized and applied in the industry, the Hadoop is the most successful and widely accepted big data processing mainstream technology and system platform at present, various functional modules required by complete distributed cluster computing are provided, the Hadoop platform has evolved into a complete ecosystem so far, the Hadoop platform runs on a computing cluster consisting of common commercial servers and even cheap machines, and a cheap, convenient and telescopic big data solution is provided;

the invention provides an integrated system for power big data visual analysis, which integrates a big data ecosystem with a visual module, integrates the graphic calculation of a three-dimensional scene and the numerical calculation of data analysis in the same frame, and enables the storage mode and the calculation mode to be universal, thereby realizing multi-service collaborative visual analysis work without adopting the traditional processing mode of firstly mining and exporting data and then importing the data into a visual system for analysis.

Drawings

FIG. 1 is a flow chart of a visualized analysis task of a power system in a big data environment according to the present invention;

FIG. 2 is a functional block diagram of the electric power big data visual analysis platform of the present invention;

FIG. 3 is a hierarchical architecture diagram of a power big data visualization analysis model according to the present invention;

FIG. 4 is a diagram of resource allocation for different types of tasks in the compute layer of the present invention;

FIG. 5 is a block diagram of the STORM calculation framework of the present invention;

FIG. 6 is a CUDA computational framework diagram in accordance with the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, in the method for rapidly visualizing and analyzing the big power data based on the three-dimensional scene, the processing flow of the visualization analysis task is as follows:

Under a big data environment, data, services and scenes are complex, the requirement of practical application can be met only by reasonably organizing the whole task processing flow, based on the Hadoop ecosystem design, the Hadoop system has wide practicability and good usability in the big data processing field, after being pushed out, the Hadoop system quickly obtains wide attention and research of academic circles, and is also popularized and applied in the industrial circles, the Hadoop is the most successful and widely accepted big data processing mainstream technology and system platform at present, various functional modules required by complete distributed cluster computing are provided, the Hadoop platform has evolved into a complete ecosystem so far, and the Hadoop platform runs on a computing cluster consisting of common commercial servers and even cheap machines, and a cheap, convenient and telescopic big data solution is provided.

The visual analysis system has the following functional modules:

Referring to fig. 2-3, the system is layered by combining the possible generated services with the above modules, which are an interface layer, an engine layer, a computation layer, a control layer and a persistence layer;

(1) durable layer

(2) control layer

(3) computing layer

(4) engine layer

(5) Interface layer

Referring to fig. 4, when the interface layer receives a computation task request, for a data intensive computation task, it should be considered to maximize the parallelism of the data amount, so a distributed real-time computation framework STORM is adopted; for the calculation intensive calculation task, the parallelism of the function thread should be maximized, so a super computing framework CUDA is adopted;

Referring to fig. 5, when a server distributed computing framework uses a single control node (Master) named Nimbus, and an interface layer receives a service request, it analyzes which type of computing task is according to the interface type, if the computing task is data intensive, the computing task is submitted to Nimbus for topology generation operation, Nimbus sends the generated task topology sequence to Zookeeper of a control layer, and the Zookeeper uniformly schedules the tasks, and the computing nodes (Slave) of the server are divided into two types, one type is spout (hypervisor) for distributing primitives, and the other type is bolt (worker) for computing primitives, and the server does not require all the primitives to perform the same operation, and is more suitable for processing the data intensive tasks, and the server can achieve a better acceleration ratio for non-iterative tasks (data intensive tasks) in numerical computation; STORM can also achieve better efficiency for computational tasks that are not suitable for acceleration by CUDA in graphics tasks.

Referring to fig. 6, the CUDA architecture is based on a Single Instruction Multiple Thread (SIMT) model, and is an extension of a Single Instruction Multiple Data (SIMD) model, a function executed on a GPU is called a kernel function (kernel), when the kernel function is executed, kernel function instructions are concurrently transmitted to all stream processors sp (stream processors) in an array, a kernel is only a function, but not a complete program, before the kernel is executed, the CPU is required to assist in completing Data preprocessing and device initialization, the CUDA calculation process is divided into three stages, i.e., input, execution, and output, and in the first stage, the main program allocates a GPU memory space for input and output Data, and transmits the input Data from the CPU memory to the GPU memory; in the second stage, the main program starts a kernel program on the GPU and executes tasks in parallel; in the third stage, when the kernel program is finished, the main program transmits the output data of the kernel program from the GPU memory to the CPU memory so as to obtain an output result;

When the calculation type field of the interface layer service request function is correctly designed, the STORM and CUDA work cooperatively to achieve extremely high efficiency.

The working use process and the installation method are that when the electric power big data rapid visual analysis method and the electric power big data rapid visual analysis system based on the three-dimensional scene are used, the system is designed based on a Hadoop ecosystem, the Hadoop system is rapidly concerned and researched widely in academic circles after being pushed out due to wide practicability and good usability in the big data processing field, and is popularized and applied in the industrial circles, the Hadoop is the most successful and widely accepted big data processing mainstream technology and system platform at present, various functional modules required by complete distributed cluster calculation are provided, the Hadoop platform has evolved into a complete ecosystem so far, the Hadoop platform runs on a calculation cluster composed of a common commercial server or even a cheap machine, and a cheap, convenient and telescopic big data solution is provided; the invention provides an integrated system for power big data visual analysis, which integrates a big data ecosystem with a visual module, integrates the graphic calculation of a three-dimensional scene and the numerical calculation of data analysis in the same frame, and enables the storage mode and the calculation mode to be universal, thereby realizing multi-service collaborative visual analysis work without adopting the traditional processing mode of firstly mining and exporting data and then importing the data into a visual system for analysis.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The method for rapidly and visually analyzing the electric power big data based on the three-dimensional scene is characterized by comprising the following steps of: the processing flow of the visual analysis method is as follows:

2. Electric power big data quick visual analysis system based on three-dimensional scene, its characterized in that: the visual analysis system has the following functional modules:

3. The system for rapidly visualizing and analyzing the electric power big data based on the three-dimensional scene according to claim 2, is characterized in that: the system is layered by combining the possibly generated services with the modules, namely an interface layer, an engine layer, a calculation layer, a control layer and a persistence layer;

(1) durable layer

(2) control layer

(3) computing layer

(4) engine layer

(5) Interface layer

4. The system for rapidly visualizing and analyzing the electric power big data based on the three-dimensional scene according to claim 3, wherein: when an interface layer receives a computing task request, for a data intensive computing task, the parallelism of data quantity should be considered to be maximized, so a distributed real-time computing framework STORM is adopted; for the calculation intensive calculation task, the parallelism of the function thread should be maximized, so a super computing framework CUDA is adopted;

5. The system for rapidly visualizing and analyzing the electric power big data based on the three-dimensional scene according to claim 4, wherein: the STORM distributed computing framework uses a single control node (Master) named as Nimbus, when an interface layer receives a service request, the interface layer firstly analyzes which type of computing task is according to the interface type, if the computing task is data intensive, the computing task is submitted to the Nimbus to perform topology generation operation, the Nimbus sends a generated task topology sequence to a Zookeeper of a control layer, the Zookeeper uniformly performs task scheduling, the computing nodes (Slave) of the STORM are divided into two types, one type is spout (Supervisor) used for distributing primitives, the other type is bolt (worker) used for computing the primitives, the STORM does not require all the primitives to perform the same operation and is more suitable for processing the data intensive tasks, and the STORM can realize better speed-up ratio for non-iterative tasks (data intensive tasks) in numerical computation; STORM can also achieve better efficiency for computational tasks that are not suitable for acceleration by CUDA in graphics tasks.

6. The system for rapidly visualizing and analyzing the electric power big data based on the three-dimensional scene according to claim 4, wherein: the CUDA architecture is based on a Single Instruction Multiple Thread (SIMT) model, is an extension of a Single Instruction Multiple Data (SIMD) model, a function executed on a GPU is called a kernel function (kernel), when the CUDA architecture is operated, kernel function instructions are parallelly transmitted to all Stream Processors (SP) (stream processors) in an array, one kernel is only a function but not a complete program, before the kernel is executed, the CPU is required to assist in completing Data preprocessing and equipment initialization, the CUDA calculation process is divided into an input stage, an execution stage and an output stage, a first stage is that a GPU memory space is allocated to input and output Data by a main program, and the input Data are transmitted to a GPU memory from the CPU memory; in the second stage, the main program starts a kernel program on the GPU and executes tasks in parallel; in the third stage, when the kernel program is finished, the main program transmits the output data of the kernel program from the GPU memory to the CPU memory so as to obtain an output result;