CN107682209A - A kind of SDP big datas automatically dispose monitor supervision platform - Google Patents

A kind of SDP big datas automatically dispose monitor supervision platform Download PDF

Info

Publication number
CN107682209A
CN107682209A CN201711105672.9A CN201711105672A CN107682209A CN 107682209 A CN107682209 A CN 107682209A CN 201711105672 A CN201711105672 A CN 201711105672A CN 107682209 A CN107682209 A CN 107682209A
Authority
CN
China
Prior art keywords
cluster environment
sdp
module
node
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711105672.9A
Other languages
Chinese (zh)
Inventor
张�林
武保权
马培娜
王成锐
韩克强
连杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Sarntah Inteligent Technology Co Ltd
Original Assignee
Qingdao Sarntah Inteligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Sarntah Inteligent Technology Co Ltd filed Critical Qingdao Sarntah Inteligent Technology Co Ltd
Priority to CN201711105672.9A priority Critical patent/CN107682209A/en
Publication of CN107682209A publication Critical patent/CN107682209A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • H04L67/025Protocols based on web technology, e.g. hypertext transfer protocol [HTTP] for remote control or remote monitoring of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/75Indicating network or usage conditions on the user display

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of SDP big datas automatically dispose monitor supervision platform, creation module including cluster environment, the monitoring module of cluster environment, the operation module of cluster environment, the alarm module of cluster environment, the log analysis module of cluster environment, the safety control module of cluster environment and the user role of cluster environment and authority module, the creation module of the cluster environment includes the establishment of environmental preparation script and the configuration installation of cluster environment, SDP big datas automatically dispose monitor supervision platform disclosed in this invention can carry out rapid deployment, unified cluster and service management, intelligent monitoring alarm management, security is higher.

Description

SDP big data automatic deployment monitoring platform
Technical Field
The invention relates to the technical field of computer information storage and processing, in particular to an SDP big data automatic deployment monitoring platform.
Background
The existing society is a society with high-speed development, developed science and technology and information circulation, people communicate with each other more and more closely, the life is more and more convenient, data needing to be processed is larger and larger, and the requirements of various fields on processing of mass data are more and more increased. Under the background that the storage space and the operational capability of a single machine cannot meet the requirements of people on mass data processing, distributed computing and parallel computing begin to be rapidly developed and applied, and are finally developed into grid computing.
The monitoring information of the distributed system under large scale is massive, the monitoring resource is multi-level and multi-source, and the dynamics and complexity of the large data platform bring a lot of difficulties to the monitoring system of the large data platform. How to effectively monitor software and hardware resources in a big data platform, predict the bottleneck of the resources in time, and take corresponding measures before a fault occurs is a key for improving the service quality of the big data platform and is also the key point of the current research.
Big data is the product of this high technology era. With the advent of the cloud era, Big data (Big data) has attracted more and more attention. The first problem to be solved by big data is storage and computation. The Hadoop framework arose in the year 08, and the Hadoop was independent from a package in Nuch, and is continuously concerned by people, and the Hadoop framework itself continuously evolves, including Native implementation of a compression algorithm, optimization of a Checksum mechanism, and shortcircuit read (supporting built-in short reading of direct files), which is optimization of reading performance.
Under the continuous optimization, the Hadoop is continuously improved, an ecological system surrounding the Hadoop is also continuously improved, but the Hadoop system is still very troublesome to deploy; from 08 years to the present, the large and small systems are built in, self-built and built-in, comprise various platform projects built in companies, countless, comprise scripts, permission settings and some catalogue permission settings, and a system is installed for about half a day to about one day. After being very skilled, the time is about half a day.
Big data technology is constantly evolving, but it is still somewhat of an inherent deficiency. Firstly, the technology is in a hundred flowers, and the technologies in the ecological system are continuously improved, including Hadoop, Hive, Spark and the like, so that the selection difficulty exists, and how to apply each technology is a difficult problem.
In addition, the fusion within the big data technology itself is not enough. There is a trend that every open source tool is trying to build an ecosystem around itself, emphasizing how good its performance is. In addition, how reasonably these techniques can be used. For example, the previous system is built around one technology, and then a new technology is introduced, so how to realize the fusion of the two technologies is a great problem.
Disclosure of Invention
In order to solve the technical problems, the invention provides an SDP big data automatic deployment monitoring platform so as to achieve the purposes of rapid deployment, unified cluster and service management, intelligent monitoring alarm management and higher safety.
In order to achieve the purpose, the technical scheme of the invention is as follows:
an SDP big data automatic deployment monitoring platform comprises a cluster environment creating module, a cluster environment monitoring module, a cluster environment operating module, a cluster environment warning module, a cluster environment log analyzing module, a cluster environment safety control module and a cluster environment user role and authority module.
In the above solution, the creating module of the cluster environment includes creating an environment preparation script and configuring and installing the cluster environment.
In the above scheme, the cluster environment includes components, services, interfaces, models, databases, and tools.
In the above scheme, the monitoring module of the cluster environment includes hardware usage monitoring, component service condition monitoring, data storage condition monitoring, alarm condition monitoring, task execution condition monitoring, configuration condition monitoring, and node operation condition monitoring.
In the above solution, the operation module of the cluster environment includes operations of nodes, services, background management, alarms, and data.
In the above solution, the alarm module of the cluster environment includes WEB alarm, PORT alarm, METRIC alarm, AGGREGATE alarm, SCRIPT alarm, SERVER alarm and RECOVERY alarm.
In the above scheme, the log analysis module of the cluster environment includes user behavior analysis and error log analysis.
In the above solution, the security control module of the cluster environment includes high availability of nodes and high availability of services.
In the above solution, the creating of the environment preparation script includes:
(1) closing the firewall on each node;
(2) hosts on each node are configured, and host names correspond to the ips;
(3) ssh secret-free login between the two main nodes and each sub-node;
(4) modifying the maximum open number of the files of each node;
(5) a profile server;
(6) configuring a local yum source;
(7) configuring a time server and a client;
(8) installing jdk of each node and configuring an environment variable of each node;
(9) configuring the HugePages of each node;
(10) installing a mysql database for a specified node;
(11) the provisioning sdp host service is installed.
In the above scheme, the components include HADOOP, HDFS, HBASE, zokeeper, MAPREDUCE, REDIS, elasticcsearch, SPARK, and STORM.
Through the technical scheme, the SDP big data automatic deployment monitoring platform provided by the invention has the following advantages:
1. rapid deployment:
the system integrates most common service components in the Hadoop ecosystem, provides a concise and visual installation guide, completes platform deployment in one step, and can complete the whole deployment process within one hour.
2. Unified clustering and service management:
the SDP provides visual cluster management support, and provides rich development components and service integration, and HDFS, HIVE, HBASE and the like can be quickly installed, and running resources and tasks are monitored, so that an administrator is helped to improve operation and maintenance efficiency, guarantee service quality, optimize cluster performance and reduce management cost.
3. Intelligent monitoring alarm management:
the SDP uses a set of predefined seven alerts for each cluster node and service that monitor clusters and can alert and help users to identify and troubleshoot problems. The user can also custom create alerts and set notification targets to obtain monitoring alerts of interest.
4. High-availability construction:
SDP ensures high availability by establishing primary and secondary components. SDP can configure the high availability of components in many stack services and can also manage and disable (roll back) the high availability of these components after configuring the high availability for the service. In addition, the SDP supports master-slave hot standby, and ensures the safety of SDP metadata and the safe use of a platform.
5. And (3) data security:
the SDP can install a secure (Kerberos-based) Hadoop cluster, thereby realizing the support of Hadoop security, providing functions of user authentication, authorization and audit based on roles, and integrating LDAP and ActiveDirectory for user management.
6. A product stack:
most hadoop ecosphere services, including Spark and Storm, may be installed on the platform.
Configuration dependence and version dependence among services are shielded at the bottom layer, and users only need to care about the use of service components, so that the production efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a functional framework diagram of an SDP big data automation deployment monitoring platform disclosed in the embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The invention provides an SDP big data automatic deployment monitoring platform, which can be rapidly deployed, provide more visual operations and more alarm management types and ensure safer data and service as shown in figure 1.
An SDP big data automatic deployment monitoring platform comprises a cluster environment creating module, a cluster environment monitoring module, a cluster environment operating module, a cluster environment warning module, a cluster environment log analyzing module, a cluster environment safety control module and a cluster environment user role and authority module.
Creating module of cluster environment
In the above solution, the creating module of the cluster environment includes creating an environment preparation script and configuring and installing the cluster environment.
In a further technical solution, the creating of the environment preparation script includes:
(1) closing the firewall on each node;
(2) hosts on each node are configured, and host names correspond to the ips;
(3) ssh secret-free login between the two main nodes and each sub-node;
(4) modifying the maximum open number of the files of each node;
(5) a profile server;
(6) configuring a local yum source;
(7) configuring a time server and a client;
(8) installing jdk of each node and configuring an environment variable of each node;
(9) configuring the HugePages of each node;
(10) installing a mysql database for a specified node;
(11) the provisioning sdp host service is installed.
In the above scheme, the cluster environment includes components, services, interfaces, models, databases, and tools.
Wherein,
assembly of: including HADOOP, HDFS, HBASE, ZOOKEEPER, MAPREDUCE, REDIS, ELASTICSEARCH, SPARK, STORM, Kafka, etc.
1. Hadoop has the following properties:
convenience is realized: hadoop runs on large clusters of generic business machines, or on cloud computing services
And (3) robustness: hadoop is directed to running on general commercial hardware, the architecture of which assumes that the hardware will fail frequently, and Hadoop can gracefully handle most such failures.
And (3) expandable: hadoop can be linearly expanded to handle larger data sets by adding cluster nodes.
2、HBase:
The distributed storage system is a highly reliable, high-performance, nematic and telescopic distributed storage system, and a large-scale structured storage cluster can be built on a low-cost PC Server by utilizing the Hbase technology. HBase is an open source implementation of Google Bigtable, is similar to Google Bigtable in that GFS is used as a file storage system of the HBase, and is similar to Google Bigtable in that Hadoop HDFS is used as the file storage system of the HBase; google runs MapReduce to process mass data in Bigtable, and HBase also utilizes Hadoop MapReduce to process mass data in HBase; google Bigtable utilizes Chubby as a collaborative service and HBase utilizes Zookepper as a counterpart. Somebody asks about what relationship HBase and HDFS are, HBase is stored by using HDFS, like MySQL and a disk, MySQL is an application, and the disk is a specific storage medium. The HDFS is not suitable for random search due to its own characteristics, and is not friendly to update operations, for example, a hundred-degree network disk is constructed by the HDFS, and supports uploading and deletion, but does not allow a user to directly modify the content of a certain file on the network disk.
3. Characteristics of Kafaka
Kafaka is distributed, and all its components, i.e., server cluster, producer, and consumer, can be distributed.
An identifier topic can be used for distinguishing during the production of the message, and partitioning can be carried out; each partition is a sequential, immutable message queue and can be added on a continuous basis.
While providing high throughput for publishing and subscribing. It is understood that Kafka can produce about 25 thousand messages per second (50MB), and process 55 thousand messages per second (110 MB).
The state that the message is processed is maintained at the concurer end, not by the server end. Automatic balancing can be achieved when failure occurs.
4. Storm is a free-sourced, distributed, highly fault-tolerant real-time computing system. Storm makes continuous stream computation easy, making up real-time requirements that Hadoop batch processing cannot meet. Storm is often used in the fields of real-time analysis, online machine learning, continuous computing, distributed remote invocation, and ETL. Storm deployment management is very simple and Storm performance is superior in homogeneous streaming tools.
Storm is largely divided into two components, Nimbus and Supervisor. Both of these components fail quickly and have no state. The task state, heartbeat information and the like are stored on the Zookeeper, and submitted code resources are all on a hard disk of the local machine.
The Nimbus is responsible for sending code within the cluster, distributing work to the machines, and monitoring status. There is only one global.
The Supervisor will listen to the work assigned to that machine and start/shut down the work process Worker as required. One is deployed on each machine to run Storm, and the number of slots allocated is set according to the configuration of the machine.
Zookeeper is an external resource on which Storm is heavily dependent. Nimbus and supervisors even the actually running Worker store heartbeats on Zookeeper. The Nimbus also performs scheduling and task allocation according to the heartbeat and task running conditions on the zookeeper.
ELASTICSEARCH ElasticSearch (ES for short) is a distributed Restful search and analysis server designed for distributed computing; the method can achieve real-time search, and is stable, reliable and rapid. Like Apache Solr, it is also a Lucence-based index server, while the advantage of ElasticSearch over Solr is:
light weight: the installation is convenient to start, and a command can be started after the file is downloaded.
Schema free: xml specifies the index structure using schema.
The multi-index file supports: another index file can be created using different index parameters, which needs to be configured separately in Solr.
Distributed: the Solr Cloud configuration is relatively complex.
In the beginning of 2013, GitHub abandoned Solr and used an elastic search for PB level searches.
The development of the ElasticSearch has been rapid in recent years, and has surpassed the role of the original pure search engine, and the characteristics of data aggregation analysis (aggregation) and visualization have been increased, and the ElasticSearch is certainly the best choice if you have millions of documents to be located by keywords. Of course, if your document is JSON, you can also treat the ElasticSearch as a kind of "NoSQL database" applying the property of ElasticSearch data aggregation analysis (aggregation) to perform multidimensional analysis on the data.
Some excellent cases at home and abroad for elastic search:
github: "GitHub searches 20TB of data, including 13 billion files and 1300 billion lines of code, using an elastic search".
Soundlog: "soundlog provides instant and accurate music search service to 1.8 million users using ElasticSearch".
5、Zookeeper
As the number of compute nodes increases, cluster members need to synchronize with each other and learn where to access services and how to configure, as ZooKeeper does. ZooKeeper, as the name suggests, is a zoo administrator, which is an administrator for managing elephants (hadoops), bees (hives) and piglets (Pig), and ZooKeeper is adopted in projects such as Apache Hbase, Apache Solr and LinkedInsesie. ZooKeeper is a distributed application program coordination service with open source codes, and is distributed application such as synchronization service, configuration maintenance and naming service based on Fast Paxos algorithm.
6. Spark is an Apache project that is listed as "fast as lightning cluster calculations". It has a prosperous open source community and is currently the most active Apache project.
Spark provides a faster, more versatile data processing platform. Compared with Hadoop, Spark can increase the speed of your program by 100 times when running in the memory or by 10 times when running on the disk. In the last year, Spark has outweighed Hadoop in the 100TBDaytona gray sort race, which uses only one tenth of the machine, but the speed of operation has increased by a factor of 3.
Spark Core is a basic engine for massively parallel and distributed data processing. It is mainly responsible for:
memory management and fault recovery;
scheduling, distributing, and monitoring jobs across the cluster;
interacting with the storage system.
Spark introduced a concept called elastic distributed dataset (RDD), which is a collection of immutable, fault-tolerant, distributed objects that we can manipulate in parallel. The RDD may contain any type of object that is created when an external data set is loaded or a collection is distributed from a driver application.
RDD supports two types of operations:
translation is an operation (e.g., mapping, filtering, joining, federation, etc.) that performs an operation on one RDD and then creates a new RDD to save the result.
An action is an operation (e.g., merge, count, first, etc.) that performs some computation on an RDD and then returns the result.
In Spark, the conversions are "lazy," meaning they do not compute the results immediately. Rather, they simply "remember" the operations to be performed and the data sets (e.g., files) to be operated on. Only when a behavior is invoked will the translation actually perform the computation and return the result to the driver program. This design allows Spark to operate more efficiently. For example, if a large file is to be transformed in various ways and the file is passed to the first action, Spark will only process the first line of the file and return the result, and not the entire file.
By default, when you run an action on the converted RDD, it is possible that this RDD will be recalculated. However, you can also persist a RDD in memory from the beginning of the year by using a persistence or caching method, so Spark keeps these elements on the cluster, and when you next query it, the query speed is much faster.
SparkSQL
sparkSQL is a component of Spark, which supports our query of data in SQL or Hive query languages. It originally came from the Apache Hive project for running on Spark (instead of MapReduce), and it is now integrated into the Spark heap.
Service: refers to a thread service used by a component to provide a programming interface or a background task.
Interface: a plurality of interfaces are opened for external applications to access and use, such as data docking interfaces, and various data forms and data source docking are supported, including excel, json, files, databases, data warehouses and the like.
Model: a plurality of basic machine learning models such as a clustering model, a Pupperwell model and a linear regression model are created.
A database: various forms of databases are created, including a cache database redis, a relational database Oracle or mysql, a data warehouse hive
Tool: data transfer tool sqoop, ETL cleaning tool keyle, etc.
Monitoring module for cluster environment
In the above scheme, the monitoring module of the cluster environment includes hardware usage monitoring, component service condition monitoring, data storage condition monitoring, alarm condition monitoring, task execution condition monitoring, configuration condition monitoring, and node operation condition monitoring.
The service condition monitoring of the hardware comprises a memory, a hard disk, a network, a CPU, an HDFS and the like.
The service condition monitoring of the component includes basic information of the component and resource usage conditions of the component, such as NameNode, DateNode status, JournalNodes, disk usage conditions (DFS usage, non-DFS usage, disk remnant space) and NameNode GC number, NameNode GC time, NNConnection Load, NameNode Heap, NameNode Host Load, NameNode RPC, etc.
The data storage condition monitoring comprises the distribution condition of data on the hdfs of the distributed system and the resource occupation condition.
The alarm condition monitoring comprises the quantity, the type and the problem condition of the alarm.
The task execution condition monitoring comprises the percentage of task execution, time, error condition and result.
Configuration condition monitoring refers to the condition of the configured parameters.
The monitoring of the node running condition comprises a node running state IP, a memory, a disk use, an average load and the like.
Operation module of cluster environment
In the above solution, the operation module of the cluster environment includes operations of nodes, services, background management, alarms, and data.
The operation of the nodes mainly comprises the addition of the nodes and the deletion of the nodes.
The operation aiming at each node mainly comprises starting all components, stopping all components, restarting all components, reinstalling failed components, starting a maintenance mode, closing the maintenance mode, setting a frame, downloading a client configuration file and the like.
The operation of the service includes starting and stopping all services, displaying a service operation summary, adding services, executing service behaviors, restarting after installation, monitoring background operations, deleting services, auditing services, using quick links, refreshing yarn capacity scheduling and managing HDFS.
The background management operation comprises adding users, setting roles and setting authority.
The warning operation comprises deleting the warning, adding the custom warning, managing the early warning, managing the notification and managing the reminding setting. Management reminder set-remind number of checks, please set the number of checks before setting the notification, if the status changes during the warning check, the system will try the set number of times before sending the notification. If there is a temporary problem with the environment that would result in a false alarm, please increase the value.
Operation of the data: the method supports visual sql operation, and can directly inquire data stored in components or services such as hive, hbase, hdfs and the like through simple sql, thereby reducing the use of commands.
Fourth, alarm module of cluster environment
In the above solution, the alarm module of the cluster environment includes WEB alarm, PORT alarm, METRIC alarm, AGGREGATE alarm, SCRIPT alarm, SERVER alarm and RECOVERY alarm.
1. WEB alerts
WEB alerts monitor Web URLs on a given component; an alert status is determined from the HTTP response code. Therefore, it is not possible to change which HTTP response code determines the threshold for network alerts. You can customize each threshold and response text connection timeout of the whole network. Connection timeout is considered a critical alert. The threshold units are based on seconds. The response code and corresponding status of the WEB alert are as follows:
normal state: the Web URL response code is less than 400.
Warning state: the Web URL response code is equal to or greater than 400.
Error status: SDP cannot connect to a Web URL.
2. PORT warning
PORT alert check connection to a given PORT response time; the unit of the threshold is based on seconds.
3. METRIC Warning
The METRIC warning checks for single or multiple monitored performance values (if calculated). Monitoring performance is the URL termination available from a given component. A connection timeout is considered a false alarm. The threshold values are adjustable and the units depend on the monitored content corresponding to each threshold value. For example, in the case of a CPU utilization warning, the units are percentages; in the case of RPC delay warning, the unit is milliseconds.
4. AGGREGATE Warning
AGGREGATE the warning indicates the degree of status impact by the usage of the instance warning (in hundred percent). For example, the DataNode usage rate represents the warning impact level of the DataNode.
5. SCRIPT warning
SCRIPT alerts executing a SCRIPT, and the SCRIPT execution result state includes: normal, warning, error. You can customize response text, attribute values, and script warning thresholds.
6. SERVER warning
SERVER alerts implement a SERVER-side runnable class that determines alarm conditions, such as: normal, warning, error.
7. RECOVERY WARNING
The RECOVERY alert is handled by the SDP entries that are restarting the monitoring process. The number of times that the warning state is normal, warning, error are automatically restarted based on the process. This is useful to know when a process terminates and the SDP is automatically restarting.
Fifth, log analysis module of cluster environment
In the above scheme, the log analysis module of the cluster environment includes user behavior analysis and error log analysis.
1. User behavior analysis
Recording all operation records of the whole cluster user, including login ID, time, duration and the like, recording various operations of the user in the system, such as which button operations are clicked at a certain time, when a node is added, configuration is changed and the like
2. Error log analysis
The running logs of each component and service are collected in real time, error conditions are analyzed, error points which possibly cause problems are given, errors are found in time, and operation and maintenance personnel can correct the errors in time conveniently.
Sixth, safety control module of cluster environment
In the above solution, the security control module of the cluster environment includes high availability of nodes and high availability of services.
The SDP Web provides a wizard-driven user experience that allows you to configure the high availability of components in many Hortonworks DataPlatform (HDP) stack services. By establishing primary and secondary components, high availability may be ensured. If the primary component fails or is unavailable, the secondary component is available. After configuring the service with high availability, the SDP enables you to manage and disable (roll back) the high availability of the components in the service.
High availability of nodes: and (3) HA. Preventing single node from fault, one node from fault and the other one can be automatically switched and started to bear the function of the fault node
High availability of service: the service is used for providing access to other people and providing services. And single service faults are prevented, one service is in fault, and the other service can be automatically switched and started to bear the function of fault service.
Seventh, user role and authority module of cluster environment
And a strict access verification mechanism is provided, so that the authority can be detailed to each button, even different data, and multi-dimensional authority control is performed.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An SDP big data automatic deployment monitoring platform is characterized by comprising a cluster environment creating module, a cluster environment monitoring module, a cluster environment operating module, a cluster environment warning module, a cluster environment log analyzing module, a cluster environment safety control module and a cluster environment user role and authority module.
2. The SDP big data automation deployment monitoring platform of claim 1, wherein the creation module of the cluster environment comprises creation of an environment preparation script and configuration installation of the cluster environment.
3. The SDP big data automation deployment monitoring platform of claim 1, wherein the cluster environment comprises components, services, interfaces, models, databases, and tools.
4. The SDP big data automation deployment monitoring platform of claim 1, wherein the monitoring modules of the cluster environment comprise hardware usage monitoring, component service condition monitoring, data storage condition monitoring, alarm condition monitoring, task execution condition monitoring, configuration condition monitoring and node operation condition monitoring.
5. The SDP big data automation deployment monitoring platform of claim 1, wherein the operation modules of the cluster environment comprise operations of nodes, services, background management operations, alarms and data.
6. The SDP big data automation deployment monitoring platform of claim 1, wherein the alarm modules of the cluster environment comprise a WEB alert, a PORT alert, a METRIC alert, an AGGREGATE alert, a SCRIPT alert, a SERVER alert, and a RECOVERY alert.
7. The SDP big data automation deployment monitoring platform of claim 1, wherein the log analysis module of the cluster environment comprises user behavior analysis and error log analysis.
8. The SDP big data automation deployment monitoring platform of claim 1, wherein a security control module of the cluster environment comprises a high availability of nodes and a high availability of services.
9. The SDP big data automation deployment monitoring platform of claim 2, wherein the creation of the environment preparation script comprises:
(1) closing the firewall on each node;
(2) hosts on each node are configured, and host names correspond to the ips;
(3) ssh secret-free login between the two main nodes and each sub-node;
(4) modifying the maximum open number of the files of each node;
(5) a profile server;
(6) configuring a local yum source;
(7) configuring a time server and a client;
(8) installing jdk of each node and configuring an environment variable of each node;
(9) configuring the HugePages of each node;
(10) installing a mysql database for a specified node;
(11) the provisioning sdp host service is installed.
10. The SDP big data automation deployment monitoring platform of claim 3, wherein the components comprise HADOOP, HDFS, HBASE, ZOOKEEPER, MAPREDUCE, REDIS, ELASTICSEARCH, SPARK, STORM.
CN201711105672.9A 2017-11-10 2017-11-10 A kind of SDP big datas automatically dispose monitor supervision platform Pending CN107682209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711105672.9A CN107682209A (en) 2017-11-10 2017-11-10 A kind of SDP big datas automatically dispose monitor supervision platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711105672.9A CN107682209A (en) 2017-11-10 2017-11-10 A kind of SDP big datas automatically dispose monitor supervision platform

Publications (1)

Publication Number Publication Date
CN107682209A true CN107682209A (en) 2018-02-09

Family

ID=61146539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711105672.9A Pending CN107682209A (en) 2017-11-10 2017-11-10 A kind of SDP big datas automatically dispose monitor supervision platform

Country Status (1)

Country Link
CN (1) CN107682209A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549717A (en) * 2018-04-23 2018-09-18 泰华智慧产业集团股份有限公司 The method and system of automatically dispose O&M Hadoop ecology coil assemblies
CN109101811A (en) * 2018-08-10 2018-12-28 成都安恒信息技术有限公司 A kind of O&M and auditing method of the controllable Oracle session based on the tunnel SSH
CN110266800A (en) * 2019-06-24 2019-09-20 合肥盈川信息技术有限公司 A kind of wisdom text Luda data supervising platform
CN110286921A (en) * 2019-06-27 2019-09-27 四川中电启明星信息技术有限公司 A kind of distributed big data platform CDH method of automation installation
CN110580203A (en) * 2019-08-19 2019-12-17 武汉长江通信智联技术有限公司 Data processing method, device and system based on elastic distributed data set
CN110764788A (en) * 2019-09-10 2020-02-07 武汉联影医疗科技有限公司 Cloud storage deployment method and device, computer equipment and readable storage medium
CN111490990A (en) * 2020-04-10 2020-08-04 吴萌萌 Network security analysis method based on big data platform and big data platform server
CN111901158A (en) * 2020-07-14 2020-11-06 广东科徕尼智能科技有限公司 Intelligent home distribution network fault data analysis method, equipment and storage medium
CN112241269A (en) * 2019-07-16 2021-01-19 深圳兆日科技股份有限公司 Zookeeper cluster control system, equipment and storage medium
CN113704069A (en) * 2021-07-20 2021-11-26 北京直真科技股份有限公司 Alarm system fault positioning method based on flash log collection technology
CN113885387A (en) * 2021-10-11 2022-01-04 青岛萨纳斯智能科技股份有限公司 SDP-based big data monitoring method and device, terminal equipment and platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160103877A1 (en) * 2014-10-10 2016-04-14 International Business Machines Corporation Joining data across a parallel database and a distributed processing system
US20160253340A1 (en) * 2015-02-27 2016-09-01 Podium Data, Inc. Data management platform using metadata repository
CN106209821A (en) * 2016-07-07 2016-12-07 何钟柱 The big data management system of information security based on credible cloud computing
CN106326006A (en) * 2016-08-23 2017-01-11 成都卡莱博尔信息技术股份有限公司 Task management system aiming at task flow of data platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160103877A1 (en) * 2014-10-10 2016-04-14 International Business Machines Corporation Joining data across a parallel database and a distributed processing system
US20160253340A1 (en) * 2015-02-27 2016-09-01 Podium Data, Inc. Data management platform using metadata repository
CN106209821A (en) * 2016-07-07 2016-12-07 何钟柱 The big data management system of information security based on credible cloud computing
CN106326006A (en) * 2016-08-23 2017-01-11 成都卡莱博尔信息技术股份有限公司 Task management system aiming at task flow of data platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
江樱等: "大数据管理可视化平台设计与实践", 《大众用电》 *
那超: "大数据平台的自动化部署与监控系统设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549717B (en) * 2018-04-23 2021-06-29 泰华智慧产业集团股份有限公司 Method and system for automatically deploying operation and maintenance Hadoop ecological circle component
CN108549717A (en) * 2018-04-23 2018-09-18 泰华智慧产业集团股份有限公司 The method and system of automatically dispose O&M Hadoop ecology coil assemblies
CN109101811A (en) * 2018-08-10 2018-12-28 成都安恒信息技术有限公司 A kind of O&M and auditing method of the controllable Oracle session based on the tunnel SSH
CN109101811B (en) * 2018-08-10 2021-10-15 成都安恒信息技术有限公司 Operation, maintenance and audit method of controllable Oracle session based on SSH tunnel
CN110266800A (en) * 2019-06-24 2019-09-20 合肥盈川信息技术有限公司 A kind of wisdom text Luda data supervising platform
CN110286921A (en) * 2019-06-27 2019-09-27 四川中电启明星信息技术有限公司 A kind of distributed big data platform CDH method of automation installation
CN110286921B (en) * 2019-06-27 2023-11-10 四川中电启明星信息技术有限公司 CDH method for automatically installing distributed big data platform
CN112241269B (en) * 2019-07-16 2024-05-10 深圳兆日科技股份有限公司 Zookeeper cluster control system, device and storage medium
CN112241269A (en) * 2019-07-16 2021-01-19 深圳兆日科技股份有限公司 Zookeeper cluster control system, equipment and storage medium
CN110580203A (en) * 2019-08-19 2019-12-17 武汉长江通信智联技术有限公司 Data processing method, device and system based on elastic distributed data set
CN110764788A (en) * 2019-09-10 2020-02-07 武汉联影医疗科技有限公司 Cloud storage deployment method and device, computer equipment and readable storage medium
CN111490990A (en) * 2020-04-10 2020-08-04 吴萌萌 Network security analysis method based on big data platform and big data platform server
CN111901158A (en) * 2020-07-14 2020-11-06 广东科徕尼智能科技有限公司 Intelligent home distribution network fault data analysis method, equipment and storage medium
CN111901158B (en) * 2020-07-14 2023-07-25 广东好太太智能家居有限公司 Intelligent household distribution network fault data analysis method, equipment and storage medium
CN113704069A (en) * 2021-07-20 2021-11-26 北京直真科技股份有限公司 Alarm system fault positioning method based on flash log collection technology
CN113885387A (en) * 2021-10-11 2022-01-04 青岛萨纳斯智能科技股份有限公司 SDP-based big data monitoring method and device, terminal equipment and platform

Similar Documents

Publication Publication Date Title
CN107682209A (en) A kind of SDP big datas automatically dispose monitor supervision platform
US20200242129A1 (en) System and method to improve data synchronization and integration of heterogeneous databases distributed across enterprise and cloud using bi-directional transactional bus of asynchronous change data system
US10089307B2 (en) Scalable distributed data store
US10509696B1 (en) Error detection and mitigation during data migrations
US8312037B1 (en) Dynamic tree determination for data processing
US9336288B2 (en) Workflow controller compatibility
WO2023142054A1 (en) Container microservice-oriented performance monitoring and alarm method and alarm system
US12014248B2 (en) Machine learning performance and workload management
CN111241078A (en) Data analysis system, data analysis method and device
US11727004B2 (en) Context dependent execution time prediction for redirecting queries
CN112162821B (en) Container cluster resource monitoring method, device and system
US11488082B2 (en) Monitoring and verification system for end-to-end distribution of messages
WO2022165168A1 (en) Configuring an instance of a software program using machine learning
CN112148578A (en) IT fault defect prediction method based on machine learning
KR20170053013A (en) Data Virtualization System for Bigdata Analysis
CN107203639A (en) Parallel file system based on High Performance Computing
Hu et al. ScalaRDF: a distributed, elastic and scalable in-memory RDF triple store
CN117056303B (en) Data storage method and device suitable for military operation big data
Xiao et al. RETRACTED ARTICLE: Cloud platform wireless sensor network detection system based on data sharing
US11500874B2 (en) Systems and methods for linking metric data to resources
US11757703B1 (en) Access requests processing and failover handling across multiple fault tolerance zones
Chullipparambil Big data analytics using Hadoop tools
Chen et al. Big data storage architecture design in cloud computing
Li Design of real-time data analysis system based on Impala
Zburivsky Hadoop cluster deployment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180209

RJ01 Rejection of invention patent application after publication