CN112445858A - Big data management and control platform - Google Patents

Big data management and control platform Download PDF

Info

Publication number
CN112445858A
CN112445858A CN201910811117.0A CN201910811117A CN112445858A CN 112445858 A CN112445858 A CN 112445858A CN 201910811117 A CN201910811117 A CN 201910811117A CN 112445858 A CN112445858 A CN 112445858A
Authority
CN
China
Prior art keywords
module
service
cluster
big data
management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910811117.0A
Other languages
Chinese (zh)
Inventor
江永渡
万晶
赵志武
吴朝阳
柴磊
王梨
周昌树
厉屹
饶鹏城
李懿
骆斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Soft Hangzhou Anren Network Communication Co ltd
Original Assignee
China Soft Hangzhou Anren Network Communication Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Soft Hangzhou Anren Network Communication Co ltd filed Critical China Soft Hangzhou Anren Network Communication Co ltd
Priority to CN201910811117.0A priority Critical patent/CN112445858A/en
Publication of CN112445858A publication Critical patent/CN112445858A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/133Protocols for remote procedure calls [RPC]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a big data management and control platform, which comprises: the resource management module is used for scheduling and distributing resources such as memory and calculation of the cluster to upper-layer application and service, and managing the life cycle and resource use of tasks running on the cluster nodes; the security management module is used for providing identity authentication and authorization in a user unit and performing access control generation on cluster data resources and services; the remote procedure calling module is used for providing reliable and efficient interprocess remote calling service; the distributed cooperative service module is used for providing basic naming service, state synchronization service and distributed lock service of a distributed system; the task scheduling module is used for providing a data-driven multistage pipeline parallel computing framework for complex application of massive data processing and large-scale computing types; and the cluster deployment and monitoring module is used for providing deployment, configuration management and self-checking and bootstrap of the whole cloud operating system and the upper-layer application service.

Description

Big data management and control platform
Technical Field
The invention relates to the technical field of big data, in particular to a big data management and control platform.
Background
The security and privacy problems of big data in the past years are the research hotspots at home and abroad, and the security and privacy problems of big data in the future still are the hotspots of research and discussion in academia and business circles. Big data and relevant core resources thereof relate to enterprise business confidentiality and national ownership, and cause wide attention of people in all social circles, so that how to protect the safety of the big data and the privacy of users becomes a social hotspot problem to be solved urgently.
Disclosure of Invention
The big data management and control platform provided by the invention can provide an electronic government affair big data service platform for big data application, and effectively protects the safety of big data and the privacy of users.
In a first aspect, the present invention provides a big data management and control platform, which comprises a resource management module, a security management module, a remote procedure call module, a distributed collaboration service module, a task scheduling module, and a cluster deployment and monitoring module,
the resource management module is used for scheduling and distributing resources such as memory and calculation of the cluster to upper-layer application and service, and managing the life cycle and resource use of tasks running on the cluster nodes;
the security management module is used for providing identity authentication and authorization in a user unit and performing access control generation on cluster data resources and services;
the remote procedure calling module is used for providing reliable and efficient interprocess remote calling service;
the distributed cooperative service module is used for providing basic naming service, state synchronization service and distributed lock service of a distributed system;
the task scheduling module is used for providing a data-driven multistage pipeline parallel computing framework for complex application of massive data processing and large-scale computing types;
the cluster deployment and monitoring module is used for providing deployment, configuration management and self-checking and bootstrap of the whole cloud operating system and the upper application service.
Optionally, in a multi-user runtime environment, the resource management module supports computation quota and access control, as well as job priority and resource preemption.
Optionally, the remote procedure call module supports data compression and consistency checking of a communication channel.
Optionally, the distributed cooperative service module is designed based on multiple master control nodes of a Paxos protocol, so that single-point failure of a cluster is avoided, and fault monitoring and data replication are automatically performed.
Optionally, the task scheduling module is compatible with MapReduce and Cascadeing programming modes in expression capability.
Optionally, the cluster deployment and monitoring module supports online cluster extension and online upgrade of application services.
Optionally, the cluster deployment and monitoring module is configured to monitor an operating condition and a performance index of a cloud operating system cluster and an upper application service, and provide a rich monitoring graph and a cluster condition dashboard.
Optionally, the cluster deployment and monitoring module supports user-defined automated alert services, as well as in-line energy profiling and fault diagnosis.
In the big data management and control platform provided by the embodiment of the invention, the resource management module schedules and distributes resources such as the memory and the computation of the cluster to upper-layer applications and services, manages the life cycle and the resource use of tasks running on cluster nodes, the safety management module provides identity authentication and authorization in a user unit and performs access control generation on the cluster data resources and services, the remote process calling module provides reliable and efficient inter-process remote calling services, the distributed cooperation service module provides basic naming services, state synchronization services and distributed locking services of a distributed system, the task scheduling module provides a data-driven multistage pipeline parallel computing framework for complex applications in the types of massive data processing and large-scale computation, and the cluster deployment and monitoring module provides deployment, distribution and management of the whole cloud operating system and the upper-layer application services, Configuration management and self-test and bootstrapping of services. The embodiment of the invention can provide an electronic government affair big data service platform for big data application, and effectively protect the safety of big data and the privacy of users.
Drawings
Fig. 1 is a block diagram of a big data management and control platform according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
An embodiment of the present invention provides a big data management and control platform, as shown in fig. 1, the big data management and control platform includes a resource management module, a security management module, a remote procedure call module, a distributed collaborative service module, a task scheduling module, and a cluster deployment and monitoring module, wherein,
the resource management module is used for scheduling and distributing resources such as memory and calculation of the cluster to upper-layer application and service, and managing the life cycle and resource use of tasks running on the cluster nodes;
the security management module is used for providing identity authentication and authorization in a user unit and performing access control generation on cluster data resources and services;
the remote procedure calling module is used for providing reliable and efficient interprocess remote calling service;
the distributed cooperative service module is used for providing basic naming service, state synchronization service and distributed lock service of a distributed system;
the task scheduling module is used for providing a data-driven multistage pipeline parallel computing framework for complex application of massive data processing and large-scale computing types;
the cluster deployment and monitoring module is used for providing deployment, configuration management and self-checking and bootstrap of the whole cloud operating system and the upper application service.
The big data management and control platform according to the embodiment of the invention is described in detail below.
The big data base management and control platform provided by the embodiment of the invention requires to be responsible for managing physical server resources of a big data center Linux cluster, controlling distributed program operation, hiding details such as lower layer fault recovery and data redundancy and the like, and effectively provides services of elastic computation and cloud load balancing.
1. Resource management module
The resource management module is responsible for scheduling and distributing resources such as the memory and the calculation of the cluster to upper-layer application and service, and managing the life cycle and the resource use of tasks running on the cluster nodes. In a multi-user running environment, the method supports calculation limit and access control, and job priority and resource preemption, so as to effectively share cluster resources on the premise of guaranteeing fairness.
2. Security management module
The security management module provides identity authentication and authorization in a user unit and performs access control generation on the cluster data resources and services.
3. Remote procedure call module
The remote procedure calling module provides reliable and efficient interprocess remote calling service and supports data compression and consistency check of a communication channel.
4. Distributed collaborative service module
The distributed cooperative service module provides basic naming service, state synchronization service and distributed lock service of the distributed system. A distributed consensus protocol based on Paxos (a message-passing based consistency algorithm). The distributed file system has high expandability and supports hundreds of millions of files and file storage with magnitude higher than PB. The design of a plurality of Master control nodes (masters) based on the Paxos protocol avoids single-point failure of a cluster, automatically monitors fault and copies data, and has extremely high availability and reliability under the condition of not depending on special hardware equipment such as RAID cards, NAS and the like. And a Share-gating (unshared state) architecture design is adopted, large-scale concurrent reading and writing are met, and distributed parallel bandwidth is fully utilized. Log update operations on the millisecond level, fast responding online services. Incremental expansion and automatic data balancing capabilities allow users to customize data distribution strategies.
5. Task scheduling module
The task scheduling module is oriented to complex application of mass data processing and large-scale computing types, provides a data-driven multistage pipeline parallel computing framework, and is compatible with various programming modes such as MapReduce (a large-scale distributed programming model) and Cascadeg (a large-scale distributed programming model) in expression capacity. High expandability and can meet the requirements of more than one hundred thousand levels of parallel task scheduling. And automatically detecting faults and system hotspots, retrying failed tasks, and ensuring stable and reliable operation completion of the operation.
6. Cluster deployment and monitoring module
The cluster deployment and monitoring module provides deployment, configuration management, self-checking and bootstrap of the whole cloud operating system and upper-layer application services. And online cluster extension and online upgrading of application services are supported. The method comprises the steps of monitoring the running conditions and performance indexes of a cloud operating system cluster and an upper-layer application service, providing rich monitoring charts and cluster condition dashboards, supporting automatic alarm service defined by a user, and performing online performance analysis and fault diagnosis. The cloud operating system integrates the cluster, so that the cluster is displayed in front of other services and applications in the form of a super computer.
In the big data management and control platform provided by the embodiment of the invention, the resource management module schedules and distributes resources such as the memory and the computation of the cluster to upper-layer applications and services, manages the life cycle and the resource use of tasks running on cluster nodes, the safety management module provides identity authentication and authorization in a user unit and performs access control generation on the cluster data resources and services, the remote process calling module provides reliable and efficient inter-process remote calling services, the distributed cooperation service module provides basic naming services, state synchronization services and distributed locking services of a distributed system, the task scheduling module provides a data-driven multistage pipeline parallel computing framework for complex applications in the types of massive data processing and large-scale computation, and the cluster deployment and monitoring module provides deployment, distribution and management of the whole cloud operating system and the upper-layer application services, Configuration management and self-test and bootstrapping of services. The embodiment of the invention can provide an electronic government affair big data service platform for big data application, and effectively protect the safety of big data and the privacy of users.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A big data management and control platform is characterized by comprising a resource management module, a safety management module, a remote procedure call module, a distributed collaborative service module, a task scheduling module and a cluster deployment and monitoring module, wherein,
the resource management module is used for scheduling and distributing resources such as memory and calculation of the cluster to upper-layer application and service, and managing the life cycle and resource use of tasks running on the cluster nodes;
the security management module is used for providing identity authentication and authorization in a user unit and performing access control generation on cluster data resources and services;
the remote procedure calling module is used for providing reliable and efficient interprocess remote calling service;
the distributed cooperative service module is used for providing basic naming service, state synchronization service and distributed lock service of a distributed system;
the task scheduling module is used for providing a data-driven multistage pipeline parallel computing framework for complex application of massive data processing and large-scale computing types;
the cluster deployment and monitoring module is used for providing deployment, configuration management and self-checking and bootstrap of the whole cloud operating system and the upper application service.
2. The big data management and control platform according to claim 1, wherein the resource management module supports computation quota and access control, as well as job priority and resource preemption in a multi-user execution environment.
3. The big data management and control platform according to claim 1, wherein the remote procedure call module supports data compression and consistency check of a communication channel.
4. The big data management and control platform according to claim 1, wherein the distributed collaborative service module is designed based on a plurality of master nodes of a Paxos protocol, single-point failure of a cluster is avoided, and fault monitoring and data replication are automatically performed.
5. The big data governance platform according to claim 1, wherein the task scheduling module is compatible with MapReduce and Cascadeing programming modes in presentation capabilities.
6. The big data management and control platform according to claim 1, wherein the cluster deployment and monitoring module supports online cluster expansion and online upgrade of application services.
7. The big data management and control platform according to claim 1 or 6, wherein the cluster deployment and monitoring module is configured to monitor operating conditions and performance indicators of a cloud operating system cluster and an upper layer application service, and provide a rich monitoring graph and a cluster condition dashboard.
8. The big data governance platform according to claim 1 or 6, wherein the cluster deployment and monitoring module supports user-defined automated alarm services, as well as online performance profiling and fault diagnosis.
CN201910811117.0A 2019-08-29 2019-08-29 Big data management and control platform Pending CN112445858A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910811117.0A CN112445858A (en) 2019-08-29 2019-08-29 Big data management and control platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910811117.0A CN112445858A (en) 2019-08-29 2019-08-29 Big data management and control platform

Publications (1)

Publication Number Publication Date
CN112445858A true CN112445858A (en) 2021-03-05

Family

ID=74742213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910811117.0A Pending CN112445858A (en) 2019-08-29 2019-08-29 Big data management and control platform

Country Status (1)

Country Link
CN (1) CN112445858A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116995816A (en) * 2023-09-25 2023-11-03 国网山东省电力公司淄博供电公司 Power supply data processing platform and method based on artificial intelligence

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105703940A (en) * 2015-12-10 2016-06-22 中国电力科学研究院 Multistage dispatching distributed parallel computing-oriented monitoring system and monitoring method
WO2016101638A1 (en) * 2014-12-23 2016-06-30 国家电网公司 Operation management method for electric power system cloud simulation platform
US20190138410A1 (en) * 2017-11-09 2019-05-09 Bank Of America Corporation Distributed data monitoring device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016101638A1 (en) * 2014-12-23 2016-06-30 国家电网公司 Operation management method for electric power system cloud simulation platform
CN105703940A (en) * 2015-12-10 2016-06-22 中国电力科学研究院 Multistage dispatching distributed parallel computing-oriented monitoring system and monitoring method
US20190138410A1 (en) * 2017-11-09 2019-05-09 Bank Of America Corporation Distributed data monitoring device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116995816A (en) * 2023-09-25 2023-11-03 国网山东省电力公司淄博供电公司 Power supply data processing platform and method based on artificial intelligence
CN116995816B (en) * 2023-09-25 2024-02-23 国网山东省电力公司淄博供电公司 Power supply data processing platform and method based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN112104723B (en) Multi-cluster data processing system and method
EP2802990B1 (en) Fault tolerance for complex distributed computing operations
CN110851278A (en) Distribution network automation master station mobile application service management method and system based on micro-service architecture
CN104657497A (en) Mass electricity information concurrent computation system and method based on distributed computation
CN102473170A (en) Virtual-machine-based application-service provision
CN103678354A (en) Local relation type database node scheduling method and device based on cloud computing platform
Kaur et al. Analysis of different techniques used for fault tolerance
CN113259447A (en) Cloud platform deployment method and device, electronic equipment and storage medium
CN112579288A (en) Cloud computing-based intelligent security data management system
CN110727508A (en) Task scheduling system and scheduling method
CN114338684A (en) Energy management system and method
CN112445858A (en) Big data management and control platform
CN111723401A (en) Data access authority control method, device, system, storage medium and equipment
Imran et al. Cloud-niagara: A high availability and low overhead fault tolerance middleware for the cloud
CN110826993A (en) Project management processing method, device, storage medium and processor
CN103078764A (en) Operational monitoring system and method based on virtual computing task
CN116974983A (en) Data processing method, device, computer readable medium and electronic equipment
Chen et al. Big data storage architecture design in cloud computing
CN110069343B (en) Power equipment distributed storage and calculation architecture for complex high concurrency calculation
CN112653753B (en) RPC-based multi-room independent multi-activity method and system and electronic equipment
US9274905B1 (en) Configuration tests for computer system
US20210357239A1 (en) Methods and systems for managing computing virtual machine instances
CN113708994A (en) Go language-based cloud physical host and cloud server monitoring method and system
CN112488462A (en) Unified pushing method, device and medium for workflow data
CN111381921A (en) Front-end and back-end separation system and method based on Ambari

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210305

RJ01 Rejection of invention patent application after publication