Big data cluster monitoring method and related equipment
Technical Field
The present disclosure relates to the field of computer and communication technologies, and in particular, to a big data cluster monitoring method and apparatus, a computer-readable storage medium, and an electronic device.
Background
In the operation of the existing big data cluster, the monitoring and alarming method in the prior art has high difficulty in secondary development and is not easy to expand. In addition, the monitoring and alarming method in the prior art is complex in alarming setting, tedious and difficult to use. With the development and application of large data clusters, a new technical method is needed to assist operation and maintenance personnel to ensure the healthy operation of the large data clusters aiming at the operation and maintenance of the large number of clusters, so that heavy repeated work is avoided.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The embodiment of the disclosure provides a big data cluster monitoring method and device, a computer readable storage medium and an electronic device, which can improve the efficiency and accuracy of big data cluster processing.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, there is provided a monitoring method for a large data cluster, including:
collecting monitoring indexes of the big data cluster through a collector;
writing the monitoring index into a time sequence database;
comparing the monitoring index written into the time sequence database with an alarm rule;
when the monitoring index reaches the alarm rule, alarming; or
And when the monitoring index does not reach the alarm rule, continuing monitoring.
In one embodiment, the collector is a client data collector, and the method further comprises:
and installing the client data collector to the target equipment of the big data cluster.
In one embodiment, further comprising:
operating the client data collector through a self-contained interpreted programming language parser in the system of the target device;
wherein the client data collector can be dynamically increased.
In one embodiment, the time series database is a distributed time series database, writing the monitoring metrics to the time series database includes:
writing the monitoring index into the distributed time sequence database with extensible bottom storage in a socket mode through the client data collector;
wherein the distributed time series database uses a distributed columnar database cluster for background storage.
In one embodiment, the alarming rule is that the number of times that the monitoring index is greater than or equal to 90% of the maximum value is not greater than or equal to one half of the monitoring number of times, and alarming when the monitoring index reaches the alarming rule includes:
when the frequency of the monitoring index being more than or equal to 90% of the maximum value is more than or equal to one half of the monitoring frequency of a specific value, alarming;
wherein the specific numerical value is an even number of 2 or more.
In one embodiment, writing the monitored metrics to a timing database comprises:
and writing the name, the numerical value, the acquisition time, the cluster name and the address of the monitoring index into the time sequence database.
In one embodiment, collecting monitoring metrics for large data clusters by a collector includes:
and collecting the monitoring index of the big data cluster at a specific frequency through the client data collector.
According to an aspect of the present disclosure, there is also provided a monitoring apparatus for a large data cluster, including:
the acquisition module is configured to acquire the monitoring indexes of the big data cluster through the acquisition device;
a write-in module configured to write the monitoring indicator into a timing database;
the comparison module is configured to compare the monitoring index written into the time sequence database with an alarm rule; and
and the alarm module is configured to alarm when the monitoring index reaches the alarm rule.
According to an aspect of the present disclosure, there is also provided an electronic device, including:
one or more processors;
a storage device configured to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the method of any of the above.
According to an aspect of the present disclosure, there is also provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the above.
According to the implementation mode of the application, monitoring indexes of the big data cluster are acquired through client data; writing the monitoring index into a time sequence database; comparing the monitoring index written into the time sequence database with an alarm rule; when the monitoring index reaches the alarm rule, alarming; or when the monitoring index does not reach the alarm rule, continuing monitoring. The client data collector of the embodiment is realized by the interpreted programming language, can run by using the interpreted programming language parser of the system, has small invasion to a target machine, can dynamically increase collectors, and does not need to restart the client data collector. The client data acquisition device writes acquired monitoring indexes into the distributed time sequence database in a socket mode, and the written data comprise names and values of the monitoring indexes, time, cluster names, ip and the like. The background storage of the distributed time sequence database uses a distributed column-type database cluster, and the bottom storage is expandable.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
fig. 1 shows a schematic diagram of an exemplary system architecture of a monitoring method of a big data cluster or a monitoring apparatus of a big data cluster to which an embodiment of the present disclosure may be applied;
FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device implementing embodiments of the present disclosure;
FIG. 3 schematically illustrates a flow diagram of a large data cluster monitoring method according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of a big data cluster monitoring apparatus according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a block diagram of a big data cluster monitoring apparatus according to another embodiment of the present invention;
FIG. 6 schematically shows a block diagram of a big data cluster monitoring apparatus according to another embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 shows a schematic diagram of an exemplary system architecture 100 of a monitoring method of a big data cluster or a monitoring apparatus of a big data cluster to which the embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may be various electronic devices having display screens including, but not limited to, smart phones, tablets, portable and desktop computers, digital cinema projectors, and the like.
The server 105 may be a server that provides various services. For example, a user sends a monitoring request of a large data cluster to the server 105 by using the terminal device 103 (or the terminal device 101 or 102). Or the terminal device 103 automatically acquires the monitoring index of the big data cluster and sends the monitoring index to the server 105. The server 105 may write the monitoring index into a time sequence database based on a monitoring index of a big data cluster, compare the monitoring index written into the time sequence database with an alarm rule, and alarm when the monitoring index reaches the alarm rule; or when the monitoring index does not reach the alarm rule, continuing monitoring.
Also, for example, the terminal device 103 (also may be the terminal device 101 or 102) may be a smart tv, a VR (virtual Reality)/AR (Augmented Reality) helmet display, or a mobile terminal such as a smart phone, a tablet computer, etc. on which navigation, network appointment, instant messaging, video Application (APP) and the like are installed, and the user may send a monitoring request of a large data cluster to the server 105 through the smart tv, the VR/AR helmet display, or the navigation, network appointment, instant messaging, video APP. The server 105 may collect the monitoring index of the big data cluster through the collector based on the monitoring request of the big data cluster; writing the monitoring index into a time sequence database; comparing the monitoring index written into the time sequence database with an alarm rule; when the monitoring index reaches the alarm rule, alarming; or when the monitoring index does not reach the alarm rule, continuing monitoring.
FIG. 2 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
It should be noted that the computer system 200 of the electronic device shown in fig. 2 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.
As shown in fig. 2, the computer system 200 includes a Central Processing Unit (CPU)201 that can perform various appropriate actions and processes in accordance with a program stored in a Read-Only Memory (ROM) 202 or a program loaded from a storage section 208 into a Random Access Memory (RAM) 203. In the RAM 203, various programs and data necessary for system operation are also stored. The CPU 201, ROM 202, and RAM 203 are connected to each other via a bus 204. An input/output (I/O) interface 205 is also connected to bus 204.
The following components are connected to the I/O interface 205: an input portion 206 including a keyboard, a mouse, and the like; an output section 207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 208 including a hard disk and the like; and a communication section 209 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 209 performs communication processing via a network such as the internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 210 as necessary, so that a computer program read out therefrom is installed into the storage section 208 as necessary.
In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication section 209 and/or installed from the removable medium 211. The computer program, when executed by a Central Processing Unit (CPU)201, performs various functions defined in the methods and/or apparatus of the present application.
It should be noted that the computer readable storage medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or flash Memory), an optical fiber, a portable compact disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules and/or units and/or sub-units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described modules and/or units and/or sub-units may also be disposed in a processor. Wherein the names of such modules and/or units and/or sub-units in some cases do not constitute a limitation on the modules and/or units and/or sub-units themselves.
As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiment; or may exist separately without being assembled into the electronic device. The computer-readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the embodiments below. For example, the electronic device may implement the steps shown in fig. 3.
In the related art, for example, a machine learning method, a deep learning method, or the like may be used to perform large data cluster monitoring, and the application range of different methods is different.
FIG. 3 schematically shows a flow diagram of a large data cluster monitoring method according to an embodiment of the present disclosure. The method steps of the embodiment of the present disclosure may be executed by the terminal device, the server, or both, for example, the server 105 in fig. 1 may be executed by the terminal device and the server, but the present disclosure is not limited thereto.
In step S310, a monitoring index of the big data cluster is collected by the collector.
In the step, the server or the terminal collects the monitoring indexes of the big data cluster through the collector. In one embodiment, the big data is a huge and complex data set, including data sets of various data types such as text, pictures, video, audio, and the like, especially from completely new data sources, and the size of the big data set is overwhelmed by traditional data processing software. In one embodiment, the big data cluster is an application software cluster deployed on a large number of computer servers for storing and computing and processing big data, where the software may include a distributed system infrastructure (Hadoop), a distributed application coordination service (Zookeeper), a distributed columnar database (Hbase, a data warehouse system (Hive), a distributed publish-subscribe messaging system (Kafka), a large-scale data computing engine (Spark), a distributed full-text search engine (Elasticsearch), and a large-scale data stream processing framework (flash) Monitoring indexes such as memory utilization rate, network access flow, thread number, Transaction Per Second (TPS) and disk read-write throughput.
In one embodiment, the collector is a client data collector (Tcollector).
In one embodiment, step S310 is preceded by the step of installing the client data collector (Tcollector) into a target device of the large data cluster. In one embodiment, the client data collector is run through a self-contained interpreted programming language (Python) parser in the system of the target device; wherein the client data collector can be dynamically increased.
In step S320, the monitoring index is written into a time series database.
In this step, the server or the terminal writes or stores the monitoring index collected in step S310 into the time series database. Wherein, in one embodiment, the time series database is a time series database distributed time series database (OpenTSDB). In one embodiment, the monitoring metrics are written into the distributed time series database with an underlying storage extensible by the client data collector in a socket manner; wherein the distributed time series database uses a distributed column-wise database (HBase) cluster for background storage. In one embodiment, the name, the value, the collection time, the cluster name and the address of the monitoring index are written into the time sequence database.
In step S330, the monitoring index written into the time-series database is compared with an alarm rule.
In this step, the server or the terminal compares the monitoring index written in the time series database with an alarm rule. In one embodiment, the alarm rule is that the number of times that the monitoring index is greater than or equal to 90% of the maximum value is not greater than or equal to one half of the monitoring number.
In step S340, when the monitoring index reaches the alarm rule, an alarm is performed; or when the monitoring index does not reach the alarm rule, continuing monitoring.
In the step, the server or the terminal gives an alarm when the monitoring index reaches the alarm rule; or when the monitoring index does not reach the alarm rule, continuing monitoring. In one embodiment, when the number of times that the monitoring index is greater than or equal to 90% of the maximum value is greater than or equal to one half of the monitoring number of times of a specific value, an alarm is given; wherein the specific numerical value is an even number of 2 or more. For example, the specific value is equal to 10, the monitoring value of the monitoring index is more than 90% of the maximum value after more than 5 times of recent 10 times, and an alarm is triggered.
In one embodiment, the monitoring metrics for the large data cluster are collected by the client data collector at a particular frequency. The specific frequency is, for example, once in 1 minute.
According to the implementation mode of the application, monitoring indexes of a big data cluster are collected through a client data collector; writing the monitoring index into a time sequence database; comparing the monitoring index written into the time sequence database with an alarm rule; when the monitoring index reaches the alarm rule, alarming; or when the monitoring index does not reach the alarm rule, continuing monitoring. The client data collector of the embodiment is realized by the interpreted programming language, can run by using the interpreted programming language parser of the system, has small invasion to a target machine, can dynamically increase collectors, and does not need to restart the client data collector. The client data acquisition device writes acquired monitoring indexes into the distributed time sequence database in a socket mode, and the written data comprise names and values of the monitoring indexes, time, cluster names, ip and the like. The background storage of the distributed time sequence database uses a distributed column-type database cluster, and the bottom storage is expandable.
In one embodiment, monitoring is implemented by a World Wide Web backend application framework (Django) and a World Wide Web frontend application framework (read) for exposing monitoring data and configuring alarm condition rules.
In one embodiment, OpenTSDB is read cyclically once per minute and judged according to the alarm condition rule, which is implemented by a non-blocking World Wide Web (Tornado) server framework, and the monitoring indicators of each cluster can be read in multiple batches at one time by using the asynchronous characteristic of Tornado and compared with the alarm condition rule to make a judgment.
In one embodiment, a determination is made as to whether an exception host is included, and if so, no subsequent alarm determination is made.
In one embodiment, after the alarm problem is solved, a recovery notification is sent to enable operation and maintenance personnel to know the health condition of the cluster at any time.
FIG. 4 schematically shows a block diagram of a big data cluster monitoring apparatus according to an embodiment of the present disclosure. The big data cluster monitoring apparatus 400 provided in the embodiment of the present disclosure may be disposed on a terminal device, may also be disposed on a server side, or may be partially disposed on a terminal device and partially disposed on a server side, for example, may be disposed on the server 105 in fig. 1, but the present disclosure is not limited thereto.
The big data cluster monitoring apparatus 400 provided by the embodiment of the present disclosure may include an acquisition module 410, a writing module 420, a comparison module 430, and an alarm module 440.
The acquisition module 410 is configured to acquire the monitoring index of the big data cluster through the acquisition unit; the write module 420 is configured to write the monitoring indicator to a timing database; the comparison module 430 is configured to compare the monitoring index written into the time series database with an alarm rule; and the alarm module 440 is configured to alarm when the monitoring index reaches the alarm rule.
According to the embodiment of the present disclosure, the big data cluster monitoring apparatus 400 may be used to implement the big data cluster monitoring method described in the embodiment of fig. 3.
FIG. 5 schematically shows a block diagram of a big data cluster monitoring apparatus 500 according to another embodiment of the present invention.
As shown in fig. 5, the big data cluster monitoring apparatus 500 further includes a display module 510 in addition to the collection module 410, the writing module 420, the comparison module 430 and the alarm module 440 described in the embodiment of fig. 4.
Specifically, the display module 510 displays the monitoring index of the alarm on the terminal after the alarm module 440 alarms.
In the big data cluster monitoring apparatus 500, the display module 510 may complete the visual display of the monitoring index of the alarm.
FIG. 6 schematically shows a block diagram of a big data cluster monitoring apparatus 600 according to another embodiment of the present invention.
As shown in fig. 6, in addition to the collection module 410, the writing module 420, the comparison module 430, and the alarm module 440 described in the embodiment of fig. 4, the big data cluster monitoring apparatus 600 further includes a storage module 610.
Specifically, the storage module 610 is configured to store data of the monitoring index of the big data cluster, so as to facilitate a call and a reference of a worker or a server.
It is understood that the acquisition module 410, the writing module 420, the comparison module 430, the alarm module 440, the display module 510, and the storage module 610 may be combined into one module for implementation, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present invention, at least one of the acquisition module 410, the writing module 420, the comparing module 430, the alarm module 440, the display module 510, and the storage module 610 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging a circuit, as hardware or firmware, or as a suitable combination of software, hardware, and firmware implementations. Alternatively, at least one of the acquisition module 410, the writing module 420, the comparison module 430, the alarm module 440, the display module 510, and the storage module 610 may be at least partially implemented as a computer program module that, when executed by a computer, may perform the functions of the respective modules.
For details that are not disclosed in the embodiment of the apparatus of the present invention, please refer to the embodiment of the big data cluster monitoring method of the present invention described above for details that are not disclosed in the embodiment of the apparatus of the present invention, because each module of the big data cluster monitoring apparatus of the example embodiment of the present invention may be used to implement the steps of the example embodiment of the big data cluster monitoring method described above in fig. 3.
The specific implementation of each module, unit and subunit in the big data cluster monitoring apparatus provided in the embodiments of the present disclosure may refer to the content in the big data cluster monitoring method, and will not be described herein again.
It should be noted that although several modules, units and sub-units of the apparatus for action execution are mentioned in the above detailed description, such division is not mandatory. Indeed, the features and functionality of two or more modules, units and sub-units described above may be embodied in one module, unit and sub-unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module, unit and sub-unit described above may be further divided into embodiments by a plurality of modules, units and sub-units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.