CN111274088B - Real-time monitoring method, device, medium and electronic equipment for big data platform - Google Patents

Real-time monitoring method, device, medium and electronic equipment for big data platform Download PDF

Info

Publication number
CN111274088B
CN111274088B CN202010044188.5A CN202010044188A CN111274088B CN 111274088 B CN111274088 B CN 111274088B CN 202010044188 A CN202010044188 A CN 202010044188A CN 111274088 B CN111274088 B CN 111274088B
Authority
CN
China
Prior art keywords
data
platform
rule
fusion
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010044188.5A
Other languages
Chinese (zh)
Other versions
CN111274088A (en
Inventor
毛振赫
贺波
万书武
李均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010044188.5A priority Critical patent/CN111274088B/en
Priority to PCT/CN2020/093583 priority patent/WO2021143024A1/en
Publication of CN111274088A publication Critical patent/CN111274088A/en
Application granted granted Critical
Publication of CN111274088B publication Critical patent/CN111274088B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to a real-time monitoring method, a real-time monitoring device, a real-time monitoring medium and electronic equipment for a big data platform, which belong to the technical field of platform monitoring, and the method comprises the following steps: collecting performance related data and queue resource information of a big data platform; converting the performance related data into first rule data, and converting the queue resource information into second rule data; performing data fusion on the first rule data and the second rule data; extracting the fusion data of a plurality of classes with index incidence relation in the multi-class fusion data, and calculating the platform scheduling condition scores of the platform monitoring indexes corresponding to the index incidence relation; and when the platform scheduling condition score is lower than a preset threshold value, analyzing the scheduling bottleneck of the big data platform by using the performance related data and the queue resource information related to the platform monitoring index. The application effectively improves the monitoring reliability of the big data platform.

Description

Real-time monitoring method, device, medium and electronic equipment for big data platform
Technical Field
The application relates to the technical field of platform monitoring, in particular to a real-time monitoring method, a real-time monitoring device, a real-time monitoring medium and electronic equipment for a big data platform.
Background
With the popularization of big data, the own big data platforms of all companies often have large data storage and daily increment, and the importance of the availability of the big data platforms is self-evident. In some important applications, the availability of the big data platform is even 100%, which means that the big data platform is accessible at any time. Furthermore, monitoring of these large data platforms is particularly important.
At present, when a big data platform is monitored, each monitoring data of the platform is generally compared with a threshold value to realize abnormal monitoring, and when the abnormal data is monitored, an alarm is given. Therefore, when the large data platform is monitored, the monitoring data is too discretized, abnormal alarms are frequent, and the monitoring reliability of the large data platform is lower.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present application and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.
Disclosure of Invention
The purpose of this application is to provide a big data platform real time monitoring scheme, and then promotes big data platform real time monitoring reliability to a certain extent at least.
According to one aspect of the application, a big data platform real-time monitoring method is provided, and comprises the following steps:
collecting performance related data and queue resource information of a big data platform;
converting the performance related data into first rule data, and converting the queue resource information into second rule data, wherein the first rule data and the second rule data are data in a preset format;
performing data fusion on the first rule data and the second rule data to obtain multi-class fusion data;
extracting the fusion data of a plurality of classes with index incidence relation in the multi-class fusion data, and calculating the platform scheduling condition scores of the platform monitoring indexes corresponding to the index incidence relation;
and when the platform scheduling condition score is lower than a preset threshold value, analyzing the scheduling bottleneck of the big data platform by using the performance related data and the queue resource information related to the platform monitoring index.
In an exemplary embodiment of the present application, converting the performance-related data into first rule data and converting the queue resource information into second rule data includes:
after field splicing is carried out on fields in each piece of data in the performance related data to obtain spliced data, the spliced data is normalized into the first rule data;
and after comparing the queue resource information with a preset threshold value to obtain a comparison result, normalizing the comparison result into the second rule data.
In an exemplary embodiment of the present application, performing data fusion on the first rule data and the second rule data to obtain multiple types of fused data, includes:
acquiring a preset data fusion strategy of the big data platform, wherein the preset fusion strategy comprises an incidence relation among a plurality of data;
and fusing data in the first rule data and the second rule data based on the association relationship to obtain multi-class fused data.
In an exemplary embodiment of the present application, fusing data in the first rule data and the second rule data based on the association relationship includes:
acquiring all data corresponding to the same association relation from the first rule data and the second rule data;
and fusing all the data corresponding to the same incidence relation into fused data based on a preset fusion algorithm corresponding to the same incidence relation.
In an exemplary embodiment of the present application, extracting fusion data of multiple classes having an index association relationship in the multiple classes of fusion data, and calculating a platform scheduling condition score of a platform monitoring index corresponding to the index association relationship includes:
extracting multi-class fusion data corresponding to the same index, and acquiring a preset weight of each class of fusion data in the multi-class fusion data;
and obtaining a weighted sum of each type of fusion data corresponding to the same index and the corresponding preset weight to obtain a platform scheduling condition sub-score corresponding to the same index, wherein the platform scheduling condition sub-score is used as a platform scheduling condition score of a platform monitoring index corresponding to the same index.
In an exemplary embodiment of the present application, analyzing a scheduling bottleneck of the big data platform by using the performance-related data and queue resource information associated with the platform monitoring index includes:
acquiring a pre-trained bottleneck analysis model corresponding to the platform monitoring index;
and inputting the performance related data and queue resource information associated with the platform monitoring index into the bottleneck analysis model to obtain scheduling bottleneck information of the big data platform.
In an exemplary embodiment of the present application, analyzing a scheduling bottleneck of the big data platform by using the performance-related data and queue resource information associated with the platform monitoring index includes:
when the scheduling bottleneck load is scheduled to be the bottleneck limit value, acquiring performance related data associated with the platform monitoring index and the first rule data and the second rule data corresponding to the queue resource information;
and drawing and displaying by utilizing the corresponding first rule data and the second rule data.
According to an aspect of the present application, a big data platform real-time monitoring apparatus is provided, which includes:
the acquisition module is used for acquiring performance related data and queue resource information of the big data platform;
the conversion module is used for converting the performance related data into first rule data and converting the queue resource information into second rule data, wherein the first rule data and the second rule data are data in a preset format;
the fusion module is used for carrying out data fusion on the first rule data and the second rule data to obtain multi-class fusion data;
the extraction module is used for extracting the fusion data of a plurality of classes with index incidence relation in the multi-class fusion data and calculating the platform scheduling condition scores of the platform monitoring indexes corresponding to the index incidence relation;
and the analysis module is used for analyzing the scheduling bottleneck of the big data platform by using the performance related data and the queue resource information related to the platform monitoring index when the platform scheduling condition score is lower than a preset threshold value.
According to an aspect of the application, there is provided a computer readable storage medium having stored thereon program instructions, characterized in that the program instructions, when executed by a processor, implement the method of any of the above.
According to an aspect of the present application, there is provided an electronic device, comprising:
a processor; and
a memory for storing program instructions for the processor; wherein the processor is configured to perform any of the methods described above via execution of the program instructions.
The application relates to a real-time monitoring method and a related device for a big data platform, which are used for collecting performance related data and queue resource information of the big data platform; converting the performance related data into first rule data, and converting the queue resource information into second rule data, wherein the first rule data and the second rule data are data in a preset format; performing data fusion on the first rule data and the second rule data to obtain multi-class fusion data; extracting the fusion data of a plurality of classes with index association relation in the multi-class fusion data, and calculating the platform scheduling condition score of the platform monitoring index corresponding to the index association relation; and when the platform scheduling condition score is lower than a preset threshold value, analyzing the scheduling bottleneck of the big data platform by using performance related data and queue resource information associated with the platform monitoring index. In this way, after data is converted by regularization, data fusion is carried out, and then the platform scheduling condition scores of the monitoring indexes of each platform are calculated based on the fusion data, so that the scheduling conditions of the monitoring indexes of each platform can be accurately monitored and analyzed; and then when the platform scheduling condition score is lower than a preset threshold value, analyzing the relevant information of the platform monitoring index corresponding to the platform scheduling condition score lower than the preset threshold value, determining the current scheduling bottleneck, and realizing the reliable monitoring of the big data platform.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 schematically shows a flow chart of a real-time monitoring method for a big data platform.
Fig. 2 schematically shows an application scenario example of a real-time monitoring method for a big data platform.
Fig. 3 schematically illustrates a flow chart of a method of data fusion.
Fig. 4 schematically shows a block diagram of a large data platform real-time monitoring device.
Fig. 5 schematically shows an example block diagram of an electronic device for implementing the above-described real-time monitoring method for a big data platform.
Fig. 6 schematically illustrates a computer-readable storage medium for implementing the above-described big data platform real-time monitoring method.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present application.
Furthermore, the drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In the present exemplary embodiment, a real-time monitoring method for a big data platform is first provided, where the real-time monitoring method for a big data platform may be run on a server, or may be run on a server cluster or a cloud server, and the like. Referring to fig. 1, the big data platform real-time monitoring method may include the following steps:
step S110, collecting performance related data and queue resource information of a big data platform;
step S120, converting the performance related data into first rule data, and converting the queue resource information into second rule data, wherein the first rule data and the second rule data are data in a preset format;
step S130, performing data fusion on the first rule data and the second rule data to obtain multi-class fusion data;
step S140, extracting the fusion data of a plurality of classes with index incidence relation in the multi-class fusion data, and calculating the platform scheduling condition scores of the platform monitoring indexes corresponding to the index incidence relation;
and S150, when the platform scheduling condition score is lower than a preset threshold value, analyzing the scheduling bottleneck of the big data platform by using performance related data and queue resource information associated with the platform monitoring index.
Hereinafter, the steps in the above-mentioned real-time monitoring method for a big data platform in the present exemplary embodiment will be explained and explained in detail with reference to the drawings.
In step S110, performance related data and queue resource information of the big data platform are collected.
In the present exemplary embodiment, referring to fig. 2, the server 201 collects performance-related data and queue resource information of the big data platform from the big data platform server 202. Therefore, the server 201 can analyze and monitor the real-time scheduling condition of the big data platform in the subsequent steps. It is understood that, the server 201 and the server 202 may be any devices with processing capability, such as computers, microprocessors, etc., and are not limited thereto.
The performance related data is data related to processing capacity of the big data platform, and the queue resource information is the use condition of resources of each queue of the big data platform, such as the condition of a yarn resource queue: 500G memory, 200 cpu core, etc. Where the performance related data, for example, metrics information for the component, such as the metrics information generated in real time by the NameNode of the hadoop HDFS component. The most important data related to the performance of HDFS are contained in these metrics, such as process RPC queues, throughput, etc. directly reflect the traffic handling capability of YARN.
The method can be used for accurately analyzing the scheduling condition of the big data platform by collecting performance related data and queue resource information of the big data platform.
In step S120, the performance related data is converted into first rule data, and the queue resource information is converted into second rule data, where the first rule data and the second rule data are data in a predetermined format.
In the embodiment of the present example, the performance-related data and the queue resource information collected in real time are relatively discrete and inconvenient to analyze and process, and the performance-related data and the queue resource information can be converted into data in a predetermined format through the corresponding conversion rule of each type of data. The transformation can be performed by field splicing or key data extraction for data standardization.
After the required data are actively collected, the collected data are analyzed in real time to generate rule data, so that drawing display and fusion processing in subsequent steps can be facilitated.
In one embodiment, translating the performance-related data into first rule data and translating the queue resource information into second rule data includes:
after field splicing is carried out on fields in each piece of data in the performance related data to obtain spliced data, the spliced data is normalized into the first rule data;
and after comparing the queue resource information with a preset threshold value to obtain a comparison result, normalizing the comparison result into the second rule data.
For example, for performance-related data, the individual data is concatenated with other fields to obtain concatenated data. For the data "HeapMemoryUsage": { "max":4294967296, "used":2599783024}, the concatenated data is obtained by field-concatenating the individual data HeapMemoryUsage with the other fields "max":4294967296, "used":2599783024 at the time of storage as { "HeapMemoryUsage _ max":4294967296, "HeapMemoryUsage _ used":2599783024 }. Then, for the normalization of the concatenated data into the first rule data, the normalization method calculates the attribute value as a rule value by normalizing the attribute value through a normalization formula corresponding to the data attribute in the concatenated data based on the data attribute (e.g., HeapMemoryUsage _ max) and the attribute value (e.g., 4294967296).
And after the queue resource information is compared with a preset threshold value to obtain a comparison result, normalizing the comparison result into second rule data. The data of each attribute in the resource information of each queue may be compared with a predetermined threshold corresponding to the attribute to obtain a plurality of difference values corresponding to each queue, and then the sum of the difference values in each queue is calculated as the second rule data of each queue.
In step S130, data fusion is performed on the first rule data and the second rule data to obtain multiple types of fusion data.
In this exemplary embodiment, the first rule data may contain a plurality of data, such as related data originating from different components or related data of different properties; meanwhile, the second rule data may also include a plurality of data, such as resource-related data of different queues. The combination of different ones of the first rule data and the second rule data being associated together corresponds to a platform scheduling condition of different platform monitoring indicators of the big data platform. For example, when an index of a payment condition of a large data platform is analyzed, data related to a component and a component B in first rule data is often required, while resource information of some queues (C and D) is required for second rule data, but some of the required data has an association relationship, such as a and C association, and B and D association. The data fusion is to combine all the first rule data and the second rule data according to different association conditions to obtain multiple types of fusion information corresponding to different indexes, so that the various types of fusion data can be selected for analysis as required.
In an embodiment, referring to fig. 3, performing data fusion on the first rule data and the second rule data to obtain multiple types of fused data includes:
step S310, acquiring a preset data fusion strategy of the big data platform, wherein the preset fusion strategy comprises an incidence relation among a plurality of data;
step S320, fusing data in the first rule data and the second rule data based on the association relationship to obtain multiple types of fused data.
The preset data fusion policy may be a fusion policy table including association relations between a plurality of data, each association relation including an association between a plurality of data. The association is a relationship that the monitoring data has at the time of generation, for example, a relationship between a set of data that affect each other. All data corresponding to each association relationship (data combinations corresponding to the association relationship in the first rule data and the second rule data) are subjected to data fusion, and fusion data reflecting the data group with the association relationship can be obtained.
In one embodiment, fusing data in the first rule data and the second rule data based on the association relationship includes:
acquiring all data corresponding to the same association relation from the first rule data and the second rule data;
and fusing all the data corresponding to the same incidence relation into fused data based on a preset fusion algorithm corresponding to the same incidence relation.
The predetermined fusion algorithm may be a weighted sum calculation formula or other predetermined algorithm. And further fusing all data corresponding to the same incidence relation into fused data through a preset fusion algorithm corresponding to the same incidence relation.
In step S140, the fusion data of multiple classes having the index association relationship in the multiple classes of fusion data is extracted, and the platform scheduling condition score of the platform monitoring index corresponding to the index association relationship is calculated.
In the embodiment of the present example, the index association relationship has an association relationship with a certain monitoring index. For example, when the index of the payment condition of a large data platform is analyzed, data related to the component a and the component B in the first rule data is often needed, and resource information of some queues (C and D) is needed for the second rule data, but some of the needed data have an association relationship, such as association between a and C, association between B and D, and at this time, the fused data of a and C and the fused data of B and D have the index association relationship of the payment condition. At the same time, the index for the login case may correlate the fused data for A and C and the fused data for F and E. The platform monitors the indicators such as registration failure, login failure, payment failure, etc.
The platform scheduling condition score is calculated according to algorithms corresponding to different indexes and fusion data of multiple classes, in one example, corresponding weights are preset for each class of fusion data corresponding to the same index, and then the fusion data of the multiple classes with index association is subjected to weighted sum to obtain the platform scheduling condition score of the platform monitoring index corresponding to the index association. Before the weighted sum is solved, the fusion data of each class is normalized to a corresponding numerical value, and the normalization method can be a method of comparing the fusion data of each class with the standard fusion data of the corresponding class to obtain the similarity. The higher the score is, the better the scheduling condition corresponding to the platform monitoring index is for the index.
Therefore, the platform dispatching condition scores of different platform indexes can be calculated accurately according to the index incidence relation, and the platform is warned in advance.
In one embodiment, the index association relationship is queried from a preset index association relationship table.
In one embodiment, extracting fused data of multiple classes having an index association relationship from the multiple classes of fused data, and calculating a platform scheduling score of a platform monitoring index corresponding to the index association relationship includes:
extracting multi-class fusion data corresponding to the same index, and acquiring a preset weight of each class of fusion data in the multi-class fusion data;
and obtaining a weighted sum of each type of fusion data corresponding to the same index and the corresponding preset weight to obtain a platform scheduling condition sub-score corresponding to the same index, wherein the platform scheduling condition sub-score is used as a platform scheduling condition score of a platform monitoring index corresponding to the same index.
For example, the payment condition index may correspond to A, B, C three types of fused data, and the preset weights of the three types of fused data are 5, 3, and 7, so that the platform scheduling condition score of the payment index may be obtained by 5 × a +3 × B +7 × C.
In step S150, a weighted sum is obtained for each type of fusion data corresponding to the same index and the corresponding preset weight, so as to obtain a platform scheduling condition sub-score corresponding to the same index, which is used as a platform scheduling condition score of the platform monitoring index corresponding to the same index.
When the platform scheduling condition score is lower than a preset threshold value, the platform scheduling condition corresponding to a certain platform monitoring index is in crisis or about to be in crisis. The predetermined threshold is set by an expert. The performance related data and queue resource information related to the platform monitoring index are analyzed, so that the scheduling bottleneck of a big data platform can be known, and the scheduling bottleneck can be timely and accurately processed. The analysis method can be to analyze by using a machine learning model corresponding to the platform monitoring index which is trained in advance. Due to the fact that the load required by analysis is large, when the fraction of the platform scheduling condition is lower than a preset threshold value (early warning), resource consumption can be effectively saved by conducting analysis, and the reliability of monitoring of the large data platform is guaranteed.
In an embodiment of this example, analyzing the scheduling bottleneck of the big data platform by using the performance-related data and the queue resource information associated with the platform monitoring index includes:
acquiring a pre-trained bottleneck analysis model corresponding to the platform monitoring index;
and inputting the performance related data and queue resource information associated with the platform monitoring index into the bottleneck analysis model to obtain scheduling bottleneck information of the big data platform.
In an embodiment of this example, the method further comprises:
collecting a performance related data and queue resource information sample set associated with a specific monitoring index, wherein each sample in the sample set calibrates corresponding scheduling bottleneck information in advance;
respectively inputting input data of each sample in the sample set into a machine learning model to obtain predicted scheduling bottleneck information corresponding to each sample;
if the predicted scheduling bottleneck information corresponding to the obtained sample is inconsistent with the scheduling bottleneck information calibrated in advance for the sample after the input data of the sample is input into the machine learning model, adjusting the coefficient of the learning model until the predicted scheduling bottleneck information is consistent with the scheduling bottleneck information calibrated in advance for the sample;
and when the input data of all the samples are input into the machine learning model, the obtained predicted scheduling bottleneck information corresponding to each sample is consistent with the scheduling bottleneck information calibrated in advance for each sample, and the training is finished.
In an embodiment of this example, analyzing the scheduling bottleneck of the big data platform by using the performance-related data and the queue resource information associated with the platform monitoring index includes:
when the scheduling bottleneck load is scheduled to be the bottleneck limit value, acquiring performance related data associated with the platform monitoring index and the first rule data and the second rule data corresponding to the queue resource information;
and drawing and displaying by utilizing the corresponding first rule data and the second rule data.
For example, for a multi-tenant hadoop big data platform, especially a platform providing PaaS service, the usage of the user queue resource directly relates to the budget of the user, the user may view the data in real time, and if there is no set of charts for showing the usage of the resource purchased by the user, the platform usability may be greatly reduced. A particular grafana chart may be drawn for a particular component by an automated show dockee. Because the concerned contents of each service are different, some graphs which are common to all services can be drawn, and graphs which are concerned by other services can access the grafana address in the browser and can be drawn by the user. Grafana is a cross-platform open-source measurement analysis and visualization tool, and can be used for inquiring collected data, then visually displaying the data and timely notifying the data.
In one embodiment, after analyzing the scheduling bottleneck of the big data platform by using the performance-related data and queue resource information associated with the platform monitoring index when the platform scheduling condition score is lower than the predetermined threshold, the method further includes:
and inquiring the label of the scheduling bottleneck from a preset processing strategy table according to the scheduling bottleneck to obtain a processing strategy corresponding to the scheduling splicing.
The application also provides a real-time monitoring device for the big data platform. Referring to fig. 4, the big data platform real-time monitoring apparatus includes:
the acquisition module 410 is used for acquiring performance related data and queue resource information of the big data platform;
the conversion module 420 is configured to convert the performance related data into first rule data, and convert the queue resource information into second rule data, where the first rule data and the second rule data are data in a predetermined format;
the fusion module 430 is configured to perform data fusion on the first rule data and the second rule data to obtain multiple types of fusion data;
the extracting module 440 is configured to extract fused data of multiple classes having an index association relationship from the multiple classes of fused data, and calculate a platform scheduling condition score of a platform monitoring index corresponding to the index association relationship;
the analysis module 450 is configured to analyze a scheduling bottleneck of the big data platform by using the performance-related data and the queue resource information associated with the platform monitoring indicator when the platform scheduling condition score is lower than a predetermined threshold.
The specific details of each module in the real-time monitoring device for the big data platform have been described in detail in the corresponding real-time monitoring method for the big data platform, and therefore are not described herein again.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods herein are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
In an exemplary embodiment of the present application, there is also provided an electronic device capable of implementing the above method.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to this embodiment of the invention is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 that couples various system components including the memory unit 520 and the processing unit 510.
Wherein the storage unit stores program code that is executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 510 may perform the steps as shown in fig. 1.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.
Storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a client to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. As shown, the network adapter 560 communicates with the other modules of the electronic device 500 over the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present application.
In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 6, a program product 600 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the client computing device, partly on the client device, as a stand-alone software package, partly on the client computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the client computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims (10)

1. A real-time monitoring method for a big data platform is characterized by comprising the following steps:
collecting performance related data and queue resource information of a big data platform;
converting the performance related data into first rule data, and converting the queue resource information into second rule data, wherein the first rule data and the second rule data are data in a preset format;
performing data fusion on the first rule data and the second rule data to obtain multi-class fusion data;
extracting the fusion data of a plurality of classes with index incidence relation in the multi-class fusion data, and calculating the platform scheduling condition scores of the platform monitoring indexes corresponding to the index incidence relation;
and when the platform scheduling condition score is lower than a preset threshold value, analyzing the scheduling bottleneck of the big data platform by using the performance related data and the queue resource information related to the platform monitoring index.
2. The method of claim 1, wherein translating the performance-related data into first rule data and translating the queue resource information into second rule data comprises:
after field splicing is carried out on fields in each piece of data in the performance related data to obtain spliced data, the spliced data is normalized into the first rule data;
and after comparing the queue resource information with a preset threshold value to obtain a comparison result, normalizing the comparison result into the second rule data.
3. The method according to claim 1, wherein the performing data fusion on the first rule data and the second rule data to obtain multiple types of fused data comprises:
acquiring a preset data fusion strategy of the big data platform, wherein the preset fusion strategy comprises an incidence relation among a plurality of data;
and fusing data in the first rule data and the second rule data based on the association relationship to obtain multi-class fused data.
4. The method according to claim 3, wherein the fusing the data in the first rule data and the second rule data based on the association relationship comprises:
acquiring all data corresponding to the same association relation from the first rule data and the second rule data;
and fusing all the data corresponding to the same incidence relation into fused data based on a preset fusion algorithm corresponding to the same incidence relation.
5. The method according to claim 1, wherein the extracting the fusion data of the plurality of classes having the index association relationship from the plurality of classes of fusion data, and calculating the platform scheduling score of the platform monitoring index corresponding to the index association relationship comprises:
extracting multi-class fusion data corresponding to the same index, and acquiring a preset weight of each class of fusion data in the multi-class fusion data;
and obtaining a weighted sum of each type of fusion data corresponding to the same index and the corresponding preset weight to obtain a platform scheduling condition sub-score corresponding to the same index, wherein the platform scheduling condition sub-score is used as a platform scheduling condition score of a platform monitoring index corresponding to the same index.
6. The method of claim 1, wherein analyzing the scheduling bottleneck of the big data platform using the performance-related data and queue resource information associated with the platform monitoring metrics comprises:
acquiring a pre-trained bottleneck analysis model corresponding to the platform monitoring index;
and inputting the performance related data and queue resource information associated with the platform monitoring index into the bottleneck analysis model to obtain scheduling bottleneck information of the big data platform.
7. The method of claim 1, wherein analyzing the scheduling bottleneck of the big data platform using the performance-related data and queue resource information associated with the platform monitoring metrics comprises:
when the scheduling bottleneck load is scheduled to be the bottleneck limit value, acquiring performance related data associated with the platform monitoring index and the first rule data and the second rule data corresponding to the queue resource information;
and drawing and displaying by utilizing the corresponding first rule data and the second rule data.
8. The utility model provides a big data platform real time monitoring device which characterized in that includes:
the acquisition module is used for acquiring performance related data and queue resource information of the big data platform;
the conversion module is used for converting the performance related data into first rule data and converting the queue resource information into second rule data, wherein the first rule data and the second rule data are data in a preset format;
the fusion module is used for carrying out data fusion on the first rule data and the second rule data to obtain multi-class fusion data;
the extraction module is used for extracting the fusion data of a plurality of classes with index incidence relation in the multi-class fusion data and calculating the platform scheduling condition scores of the platform monitoring indexes corresponding to the index incidence relation;
and the analysis module is used for analyzing the scheduling bottleneck of the big data platform by using the performance related data and the queue resource information related to the platform monitoring index when the platform scheduling condition score is lower than a preset threshold value.
9. A computer readable storage medium having stored thereon program instructions, characterized in that the program instructions, when executed by a processor, implement the method of any of claims 1-7.
10. An electronic device, comprising:
a processor; and
a memory for storing program instructions for the processor; wherein the processor is configured to perform the method of any of claims 1-7 via execution of the program instructions.
CN202010044188.5A 2020-01-15 2020-01-15 Real-time monitoring method, device, medium and electronic equipment for big data platform Active CN111274088B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010044188.5A CN111274088B (en) 2020-01-15 2020-01-15 Real-time monitoring method, device, medium and electronic equipment for big data platform
PCT/CN2020/093583 WO2021143024A1 (en) 2020-01-15 2020-05-30 Method and apparatus for real-time monitoring of big data platform, medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010044188.5A CN111274088B (en) 2020-01-15 2020-01-15 Real-time monitoring method, device, medium and electronic equipment for big data platform

Publications (2)

Publication Number Publication Date
CN111274088A CN111274088A (en) 2020-06-12
CN111274088B true CN111274088B (en) 2021-08-24

Family

ID=70999045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010044188.5A Active CN111274088B (en) 2020-01-15 2020-01-15 Real-time monitoring method, device, medium and electronic equipment for big data platform

Country Status (2)

Country Link
CN (1) CN111274088B (en)
WO (1) WO2021143024A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107741955A (en) * 2017-09-15 2018-02-27 平安科技(深圳)有限公司 Business datum monitoring method, device, terminal device and storage medium
CN107797875A (en) * 2017-04-17 2018-03-13 平安科技(深圳)有限公司 A kind of big data management method, terminal and equipment
CN109905267A (en) * 2017-12-11 2019-06-18 镇江共远软件开发有限公司 A kind of method and apparatus for big data system status monitoring
CN110008085A (en) * 2019-04-04 2019-07-12 安徽汇迈信息科技有限公司 A kind of monitoring system of big data platform

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9183111B2 (en) * 2011-05-10 2015-11-10 Microsoft Technology Licensing, Llc Methods and computer program products for collecting storage resource performance data using file system hooks
CN103618644A (en) * 2013-11-26 2014-03-05 曙光信息产业股份有限公司 Distributed monitoring system based on hadoop cluster and method thereof
CN105871605A (en) * 2016-03-30 2016-08-17 国网江西省电力科学研究院 Operation and maintenance monitoring platform based on big power marketing data
US10545973B2 (en) * 2016-07-28 2020-01-28 Wipro Limited System and method for performing dynamic orchestration of rules in a big data environment
CN107070740A (en) * 2017-03-11 2017-08-18 郑州云海信息技术有限公司 A kind of efficient PAAS platform monitoring methods and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107797875A (en) * 2017-04-17 2018-03-13 平安科技(深圳)有限公司 A kind of big data management method, terminal and equipment
CN107741955A (en) * 2017-09-15 2018-02-27 平安科技(深圳)有限公司 Business datum monitoring method, device, terminal device and storage medium
CN109905267A (en) * 2017-12-11 2019-06-18 镇江共远软件开发有限公司 A kind of method and apparatus for big data system status monitoring
CN110008085A (en) * 2019-04-04 2019-07-12 安徽汇迈信息科技有限公司 A kind of monitoring system of big data platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
大数据模型调度系统的关键问题研究;彭世锦;《中国优秀硕士学位论文全文数据库信息科技辑》;20180215;第I138-1085页 *

Also Published As

Publication number Publication date
CN111274088A (en) 2020-06-12
WO2021143024A1 (en) 2021-07-22

Similar Documents

Publication Publication Date Title
CN110968985B (en) Method and device for determining integrated circuit repair algorithm, storage medium and electronic equipment
CN110674009B (en) Application server performance monitoring method and device, storage medium and electronic equipment
US20140033176A1 (en) Methods for predicting one or more defects in a computer program and devices thereof
CN109871315B (en) Diagnosis method and device for system upgrade failure based on machine learning
CN110827157B (en) Data processing method and device, storage medium and electronic equipment
CN113837596B (en) Fault determination method and device, electronic equipment and storage medium
CN110647523B (en) Data quality analysis method and device, storage medium and electronic equipment
CN110727437A (en) Code optimization item acquisition method and device, storage medium and electronic equipment
CN113111305B (en) Abnormality detection method and device, storage medium and electronic equipment
CN115034596A (en) Risk conduction prediction method, device, equipment and medium
CN111181757A (en) Information security risk prediction method and device, computing equipment and storage medium
CN113269359A (en) User financial status prediction method, device, medium, and computer program product
CN110348999B (en) Financial risk sensitive user identification method and device and electronic equipment
CN117952100A (en) Data processing method, device, electronic equipment and storage medium
CN113656391A (en) Data detection method and device, storage medium and electronic equipment
CN111582649B (en) Risk assessment method and device based on user APP single-heat coding and electronic equipment
CN111274088B (en) Real-time monitoring method, device, medium and electronic equipment for big data platform
CN116974934A (en) Memory leakage detection method, device, equipment and storage medium
CN115809818A (en) Multidimensional diagnosis and evaluation method and device for auxiliary equipment of pumped storage power station
CN114912541A (en) Classification method, classification device, electronic equipment and storage medium
CN110349002B (en) Method and device for monitoring and early warning of whole flow of consumption finance and electronic equipment
CN113934595A (en) Data analysis method and system, storage medium and electronic terminal
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN113590484A (en) Algorithm model service testing method, system, equipment and storage medium
CN109285559B (en) Role transition point detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant