CN112445858A - Big data management and control platform - Google Patents
Big data management and control platform Download PDFInfo
- Publication number
- CN112445858A CN112445858A CN201910811117.0A CN201910811117A CN112445858A CN 112445858 A CN112445858 A CN 112445858A CN 201910811117 A CN201910811117 A CN 201910811117A CN 112445858 A CN112445858 A CN 112445858A
- Authority
- CN
- China
- Prior art keywords
- module
- service
- cluster
- big data
- management
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013523 data management Methods 0.000 title claims abstract description 18
- 238000007726 management method Methods 0.000 claims abstract description 34
- 238000012544 monitoring process Methods 0.000 claims abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000013475 authorization Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 238000004891 communication Methods 0.000 claims description 3
- 238000013144 data compression Methods 0.000 claims description 3
- 238000003745 diagnosis Methods 0.000 claims description 3
- 230000010076 replication Effects 0.000 claims description 2
- 238000013461 design Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/133—Protocols for remote procedure calls [RPC]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a big data management and control platform, which comprises: the resource management module is used for scheduling and distributing resources such as memory and calculation of the cluster to upper-layer application and service, and managing the life cycle and resource use of tasks running on the cluster nodes; the security management module is used for providing identity authentication and authorization in a user unit and performing access control generation on cluster data resources and services; the remote procedure calling module is used for providing reliable and efficient interprocess remote calling service; the distributed cooperative service module is used for providing basic naming service, state synchronization service and distributed lock service of a distributed system; the task scheduling module is used for providing a data-driven multistage pipeline parallel computing framework for complex application of massive data processing and large-scale computing types; and the cluster deployment and monitoring module is used for providing deployment, configuration management and self-checking and bootstrap of the whole cloud operating system and the upper-layer application service.
Description
Technical Field
The invention relates to the technical field of big data, in particular to a big data management and control platform.
Background
The security and privacy problems of big data in the past years are the research hotspots at home and abroad, and the security and privacy problems of big data in the future still are the hotspots of research and discussion in academia and business circles. Big data and relevant core resources thereof relate to enterprise business confidentiality and national ownership, and cause wide attention of people in all social circles, so that how to protect the safety of the big data and the privacy of users becomes a social hotspot problem to be solved urgently.
Disclosure of Invention
The big data management and control platform provided by the invention can provide an electronic government affair big data service platform for big data application, and effectively protects the safety of big data and the privacy of users.
In a first aspect, the present invention provides a big data management and control platform, which comprises a resource management module, a security management module, a remote procedure call module, a distributed collaboration service module, a task scheduling module, and a cluster deployment and monitoring module,
the resource management module is used for scheduling and distributing resources such as memory and calculation of the cluster to upper-layer application and service, and managing the life cycle and resource use of tasks running on the cluster nodes;
the security management module is used for providing identity authentication and authorization in a user unit and performing access control generation on cluster data resources and services;
the remote procedure calling module is used for providing reliable and efficient interprocess remote calling service;
the distributed cooperative service module is used for providing basic naming service, state synchronization service and distributed lock service of a distributed system;
the task scheduling module is used for providing a data-driven multistage pipeline parallel computing framework for complex application of massive data processing and large-scale computing types;
the cluster deployment and monitoring module is used for providing deployment, configuration management and self-checking and bootstrap of the whole cloud operating system and the upper application service.
Optionally, in a multi-user runtime environment, the resource management module supports computation quota and access control, as well as job priority and resource preemption.
Optionally, the remote procedure call module supports data compression and consistency checking of a communication channel.
Optionally, the distributed cooperative service module is designed based on multiple master control nodes of a Paxos protocol, so that single-point failure of a cluster is avoided, and fault monitoring and data replication are automatically performed.
Optionally, the task scheduling module is compatible with MapReduce and Cascadeing programming modes in expression capability.
Optionally, the cluster deployment and monitoring module supports online cluster extension and online upgrade of application services.
Optionally, the cluster deployment and monitoring module is configured to monitor an operating condition and a performance index of a cloud operating system cluster and an upper application service, and provide a rich monitoring graph and a cluster condition dashboard.
Optionally, the cluster deployment and monitoring module supports user-defined automated alert services, as well as in-line energy profiling and fault diagnosis.
In the big data management and control platform provided by the embodiment of the invention, the resource management module schedules and distributes resources such as the memory and the computation of the cluster to upper-layer applications and services, manages the life cycle and the resource use of tasks running on cluster nodes, the safety management module provides identity authentication and authorization in a user unit and performs access control generation on the cluster data resources and services, the remote process calling module provides reliable and efficient inter-process remote calling services, the distributed cooperation service module provides basic naming services, state synchronization services and distributed locking services of a distributed system, the task scheduling module provides a data-driven multistage pipeline parallel computing framework for complex applications in the types of massive data processing and large-scale computation, and the cluster deployment and monitoring module provides deployment, distribution and management of the whole cloud operating system and the upper-layer application services, Configuration management and self-test and bootstrapping of services. The embodiment of the invention can provide an electronic government affair big data service platform for big data application, and effectively protect the safety of big data and the privacy of users.
Drawings
Fig. 1 is a block diagram of a big data management and control platform according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
An embodiment of the present invention provides a big data management and control platform, as shown in fig. 1, the big data management and control platform includes a resource management module, a security management module, a remote procedure call module, a distributed collaborative service module, a task scheduling module, and a cluster deployment and monitoring module, wherein,
the resource management module is used for scheduling and distributing resources such as memory and calculation of the cluster to upper-layer application and service, and managing the life cycle and resource use of tasks running on the cluster nodes;
the security management module is used for providing identity authentication and authorization in a user unit and performing access control generation on cluster data resources and services;
the remote procedure calling module is used for providing reliable and efficient interprocess remote calling service;
the distributed cooperative service module is used for providing basic naming service, state synchronization service and distributed lock service of a distributed system;
the task scheduling module is used for providing a data-driven multistage pipeline parallel computing framework for complex application of massive data processing and large-scale computing types;
the cluster deployment and monitoring module is used for providing deployment, configuration management and self-checking and bootstrap of the whole cloud operating system and the upper application service.
The big data management and control platform according to the embodiment of the invention is described in detail below.
The big data base management and control platform provided by the embodiment of the invention requires to be responsible for managing physical server resources of a big data center Linux cluster, controlling distributed program operation, hiding details such as lower layer fault recovery and data redundancy and the like, and effectively provides services of elastic computation and cloud load balancing.
1. Resource management module
The resource management module is responsible for scheduling and distributing resources such as the memory and the calculation of the cluster to upper-layer application and service, and managing the life cycle and the resource use of tasks running on the cluster nodes. In a multi-user running environment, the method supports calculation limit and access control, and job priority and resource preemption, so as to effectively share cluster resources on the premise of guaranteeing fairness.
2. Security management module
The security management module provides identity authentication and authorization in a user unit and performs access control generation on the cluster data resources and services.
3. Remote procedure call module
The remote procedure calling module provides reliable and efficient interprocess remote calling service and supports data compression and consistency check of a communication channel.
4. Distributed collaborative service module
The distributed cooperative service module provides basic naming service, state synchronization service and distributed lock service of the distributed system. A distributed consensus protocol based on Paxos (a message-passing based consistency algorithm). The distributed file system has high expandability and supports hundreds of millions of files and file storage with magnitude higher than PB. The design of a plurality of Master control nodes (masters) based on the Paxos protocol avoids single-point failure of a cluster, automatically monitors fault and copies data, and has extremely high availability and reliability under the condition of not depending on special hardware equipment such as RAID cards, NAS and the like. And a Share-gating (unshared state) architecture design is adopted, large-scale concurrent reading and writing are met, and distributed parallel bandwidth is fully utilized. Log update operations on the millisecond level, fast responding online services. Incremental expansion and automatic data balancing capabilities allow users to customize data distribution strategies.
5. Task scheduling module
The task scheduling module is oriented to complex application of mass data processing and large-scale computing types, provides a data-driven multistage pipeline parallel computing framework, and is compatible with various programming modes such as MapReduce (a large-scale distributed programming model) and Cascadeg (a large-scale distributed programming model) in expression capacity. High expandability and can meet the requirements of more than one hundred thousand levels of parallel task scheduling. And automatically detecting faults and system hotspots, retrying failed tasks, and ensuring stable and reliable operation completion of the operation.
6. Cluster deployment and monitoring module
The cluster deployment and monitoring module provides deployment, configuration management, self-checking and bootstrap of the whole cloud operating system and upper-layer application services. And online cluster extension and online upgrading of application services are supported. The method comprises the steps of monitoring the running conditions and performance indexes of a cloud operating system cluster and an upper-layer application service, providing rich monitoring charts and cluster condition dashboards, supporting automatic alarm service defined by a user, and performing online performance analysis and fault diagnosis. The cloud operating system integrates the cluster, so that the cluster is displayed in front of other services and applications in the form of a super computer.
In the big data management and control platform provided by the embodiment of the invention, the resource management module schedules and distributes resources such as the memory and the computation of the cluster to upper-layer applications and services, manages the life cycle and the resource use of tasks running on cluster nodes, the safety management module provides identity authentication and authorization in a user unit and performs access control generation on the cluster data resources and services, the remote process calling module provides reliable and efficient inter-process remote calling services, the distributed cooperation service module provides basic naming services, state synchronization services and distributed locking services of a distributed system, the task scheduling module provides a data-driven multistage pipeline parallel computing framework for complex applications in the types of massive data processing and large-scale computation, and the cluster deployment and monitoring module provides deployment, distribution and management of the whole cloud operating system and the upper-layer application services, Configuration management and self-test and bootstrapping of services. The embodiment of the invention can provide an electronic government affair big data service platform for big data application, and effectively protect the safety of big data and the privacy of users.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (8)
1. A big data management and control platform is characterized by comprising a resource management module, a safety management module, a remote procedure call module, a distributed collaborative service module, a task scheduling module and a cluster deployment and monitoring module, wherein,
the resource management module is used for scheduling and distributing resources such as memory and calculation of the cluster to upper-layer application and service, and managing the life cycle and resource use of tasks running on the cluster nodes;
the security management module is used for providing identity authentication and authorization in a user unit and performing access control generation on cluster data resources and services;
the remote procedure calling module is used for providing reliable and efficient interprocess remote calling service;
the distributed cooperative service module is used for providing basic naming service, state synchronization service and distributed lock service of a distributed system;
the task scheduling module is used for providing a data-driven multistage pipeline parallel computing framework for complex application of massive data processing and large-scale computing types;
the cluster deployment and monitoring module is used for providing deployment, configuration management and self-checking and bootstrap of the whole cloud operating system and the upper application service.
2. The big data management and control platform according to claim 1, wherein the resource management module supports computation quota and access control, as well as job priority and resource preemption in a multi-user execution environment.
3. The big data management and control platform according to claim 1, wherein the remote procedure call module supports data compression and consistency check of a communication channel.
4. The big data management and control platform according to claim 1, wherein the distributed collaborative service module is designed based on a plurality of master nodes of a Paxos protocol, single-point failure of a cluster is avoided, and fault monitoring and data replication are automatically performed.
5. The big data governance platform according to claim 1, wherein the task scheduling module is compatible with MapReduce and Cascadeing programming modes in presentation capabilities.
6. The big data management and control platform according to claim 1, wherein the cluster deployment and monitoring module supports online cluster expansion and online upgrade of application services.
7. The big data management and control platform according to claim 1 or 6, wherein the cluster deployment and monitoring module is configured to monitor operating conditions and performance indicators of a cloud operating system cluster and an upper layer application service, and provide a rich monitoring graph and a cluster condition dashboard.
8. The big data governance platform according to claim 1 or 6, wherein the cluster deployment and monitoring module supports user-defined automated alarm services, as well as online performance profiling and fault diagnosis.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910811117.0A CN112445858A (en) | 2019-08-29 | 2019-08-29 | Big data management and control platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910811117.0A CN112445858A (en) | 2019-08-29 | 2019-08-29 | Big data management and control platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112445858A true CN112445858A (en) | 2021-03-05 |
Family
ID=74742213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910811117.0A Pending CN112445858A (en) | 2019-08-29 | 2019-08-29 | Big data management and control platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112445858A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116995816A (en) * | 2023-09-25 | 2023-11-03 | 国网山东省电力公司淄博供电公司 | Power supply data processing platform and method based on artificial intelligence |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105703940A (en) * | 2015-12-10 | 2016-06-22 | 中国电力科学研究院 | Multistage dispatching distributed parallel computing-oriented monitoring system and monitoring method |
WO2016101638A1 (en) * | 2014-12-23 | 2016-06-30 | 国家电网公司 | Operation management method for electric power system cloud simulation platform |
US20190138410A1 (en) * | 2017-11-09 | 2019-05-09 | Bank Of America Corporation | Distributed data monitoring device |
-
2019
- 2019-08-29 CN CN201910811117.0A patent/CN112445858A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016101638A1 (en) * | 2014-12-23 | 2016-06-30 | 国家电网公司 | Operation management method for electric power system cloud simulation platform |
CN105703940A (en) * | 2015-12-10 | 2016-06-22 | 中国电力科学研究院 | Multistage dispatching distributed parallel computing-oriented monitoring system and monitoring method |
US20190138410A1 (en) * | 2017-11-09 | 2019-05-09 | Bank Of America Corporation | Distributed data monitoring device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116995816A (en) * | 2023-09-25 | 2023-11-03 | 国网山东省电力公司淄博供电公司 | Power supply data processing platform and method based on artificial intelligence |
CN116995816B (en) * | 2023-09-25 | 2024-02-23 | 国网山东省电力公司淄博供电公司 | Power supply data processing platform and method based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112104723B (en) | Multi-cluster data processing system and method | |
EP2802990B1 (en) | Fault tolerance for complex distributed computing operations | |
CN110851278A (en) | Distribution network automation master station mobile application service management method and system based on micro-service architecture | |
CN104657497A (en) | Mass electricity information concurrent computation system and method based on distributed computation | |
CN102473170A (en) | Virtual-machine-based application-service provision | |
CN103678354A (en) | Local relation type database node scheduling method and device based on cloud computing platform | |
Kaur et al. | Analysis of different techniques used for fault tolerance | |
CN113259447A (en) | Cloud platform deployment method and device, electronic equipment and storage medium | |
CN112579288A (en) | Cloud computing-based intelligent security data management system | |
CN110727508A (en) | Task scheduling system and scheduling method | |
CN114338684A (en) | Energy management system and method | |
CN112445858A (en) | Big data management and control platform | |
CN111723401A (en) | Data access authority control method, device, system, storage medium and equipment | |
Imran et al. | Cloud-niagara: A high availability and low overhead fault tolerance middleware for the cloud | |
CN110826993A (en) | Project management processing method, device, storage medium and processor | |
CN103078764A (en) | Operational monitoring system and method based on virtual computing task | |
CN116974983A (en) | Data processing method, device, computer readable medium and electronic equipment | |
Chen et al. | Big data storage architecture design in cloud computing | |
CN110069343B (en) | Power equipment distributed storage and calculation architecture for complex high concurrency calculation | |
CN112653753B (en) | RPC-based multi-room independent multi-activity method and system and electronic equipment | |
US9274905B1 (en) | Configuration tests for computer system | |
US20210357239A1 (en) | Methods and systems for managing computing virtual machine instances | |
CN113708994A (en) | Go language-based cloud physical host and cloud server monitoring method and system | |
CN112488462A (en) | Unified pushing method, device and medium for workflow data | |
CN111381921A (en) | Front-end and back-end separation system and method based on Ambari |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210305 |
|
RJ01 | Rejection of invention patent application after publication |