CN108540439B - Data analysis method, system, device and storage medium - Google Patents

Data analysis method, system, device and storage medium Download PDF

Info

Publication number
CN108540439B
CN108540439B CN201810100691.0A CN201810100691A CN108540439B CN 108540439 B CN108540439 B CN 108540439B CN 201810100691 A CN201810100691 A CN 201810100691A CN 108540439 B CN108540439 B CN 108540439B
Authority
CN
China
Prior art keywords
data
message
ganglia
big data
acquisition tool
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810100691.0A
Other languages
Chinese (zh)
Other versions
CN108540439A (en
Inventor
黄昌明
童晨曦
蔡适择
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN201810100691.0A priority Critical patent/CN108540439B/en
Publication of CN108540439A publication Critical patent/CN108540439A/en
Application granted granted Critical
Publication of CN108540439B publication Critical patent/CN108540439B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Abstract

The invention provides a data analysis method, a system, equipment and a storage medium, wherein the method comprises the following steps: receiving and analyzing message data sent by a plurality of big data components through a user-defined udp service to obtain performance index data and pushing the performance index data to a user-defined message middleware; performance indicator data is provided to the application through the message middleware. The invention analyzes the message data sent by a plurality of different big data components by configuring the customized udp service, and provides the performance index data obtained by analysis to the application program at the back end through the customized message middleware, thereby realizing providing a uniform data acquisition inlet which is suitable for diversified big data components for the system.

Description

Data analysis method, system, device and storage medium
Technical Field
The present application relates to the field of data parsing technologies, and in particular, to a data parsing method, system, device, and storage medium.
Background
With the rapid development and widespread application of computer and information technology, the development of science and technology is changing day by day, and the data volume produced by the development is in explosive growth. Various big data components are generated for handling big data processing, and the big data components comprise components such as distributed storage, distributed computation, distributed scheduling and the like. Each big data component is a sharp device for storing and processing data, and a larger value can be mined from the data only when the components run normally and healthily, so that the collection of the performance indexes of the big data components is very important. Through performance index collection, a monitoring data chassis is constructed, and the method is very beneficial to timely understanding and analyzing the operation health condition of the big data assembly.
At present, big data assembly is various, how to gather the performance index of diversified big data assembly, and current system does not usually possess the unified data acquisition entry that adapts to diversified big data assembly, leads to unable make full use of to come from the data of different big data assemblies.
Disclosure of Invention
In view of the above-mentioned deficiencies or inadequacies in the prior art, it would be desirable to provide a data parsing method and system, device, and storage medium that provides a unified data collection portal adapted to a diverse large data component.
In a first aspect, the present invention provides a data parsing method, including:
receiving and analyzing message data sent by a plurality of big data components through a user-defined udp service to obtain performance index data and pushing the performance index data to a user-defined message middleware;
performance indicator data is provided to the application through the message middleware.
In a second aspect, the present invention provides a data parsing system, which includes a parsing unit and a middleware unit.
The analysis unit is configured to receive and analyze message data sent by the big data assemblies through a user-defined udp service to obtain performance index data and push the performance index data to a user-defined message middleware;
the middleware unit is configured to provide the performance indicator data for the application through the message middleware.
In a third aspect, the present invention also provides an apparatus comprising one or more processors and a memory, wherein the memory contains instructions executable by the one or more processors to cause the one or more processors to perform a data parsing method provided according to embodiments of the present invention.
In a fourth aspect, the present invention also provides a storage medium storing a computer program that causes a computer to execute the data analysis method provided according to the embodiments of the present invention.
The data analysis method, the system, the equipment and the storage medium provided by the embodiments of the invention analyze the message data sent by a plurality of different big data assemblies by configuring the customized udp service, and provide the performance index data obtained by analysis to the application program at the back end through the customized message middleware, thereby realizing providing a uniform data acquisition inlet adapted to diversified big data assemblies for the system;
the data analysis method, the system, the equipment and the storage medium provided by some embodiments of the invention further provide download information of a Jvmtran acquisition tool for a big data assembly which is not provided with a ganglia plug-in, so that ganglia message data can be acquired through any big data assembly, and the success rate of data analysis is further ensured;
the data analysis method, the system, the equipment and the storage medium provided by some embodiments of the invention further improve the data throughput of the unified data acquisition entry through the netty custom udp service and the kafka custom message middleware.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a flowchart of a data parsing method according to an embodiment of the present invention.
FIG. 2 is a flow diagram of a preferred embodiment of the method shown in FIG. 1.
Fig. 3 is a schematic structural diagram of a data parsing system according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a preferred embodiment of the system of fig. 3.
Fig. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 is a flowchart of a data parsing method according to an embodiment of the present invention.
As shown in fig. 1, in this embodiment, the present invention provides a data parsing method, including:
s30: receiving and analyzing message data sent by a plurality of big data components through a user-defined udp service to obtain performance index data and pushing the performance index data to a user-defined message middleware;
s50: performance indicator data is provided to the application through the message middleware.
Specifically, in this embodiment, the message data is configured as ganglia message data. Ganglia is an open source cluster monitoring project initiated by UC Berkeley, designed to measure thousands of nodes. The core of Ganglia includes gmond, gmetad and a Web front end, which are mainly used to monitor system performance, such as: the CPU, the mem, the utilization rate of the hard disk, the I/O load, the network flow condition and the like can conveniently reflect the working state of each node through a curve. Most big data components currently have a ganglia collection plug-in.
Correspondingly, in step S30, a ganglia message data parsing method is preconfigured in advance, and each big data component can collect and generate ganglia message data in one of the following manners according to whether a ganglia plug-in is configured:
for the big data assembly configured with the ganglia plug-in, the ganglia message data can be acquired and generated through the built-in ganglia plug-in;
for a big data component which is not configured with a ganglia plugin, the data of the ganglia message can be acquired and generated by configuring a Jvmtran acquisition tool with the ganglia plugin or any other acquisition tool with the ganglia plugin in the acquisition tool.
In further embodiments, the message data may also be configured into other different types of message data according to actual requirements, and a parsing method for the type of message data is configured in step S30, and a corresponding plug-in or collection tool is configured in each big data component to collect the type of message data.
Furthermore, the message data may be configured as a combination of multiple types of message data according to actual requirements, and an analysis method for each type of message data is configured in step S30, and a plug-in or a collection tool corresponding to at least one type of message data in each type of message data is configured in each big data component.
In step S30, each big data component sends the gathered ganglia message data to an analysis unit through load balancing, the analysis unit is configured with a udp service based on the netty customization, analyzes each received ganglia message data through the udp service to obtain performance index data, and pushes the performance index data to a message middleware based on the kafka customization.
Wherein, the netty is a network application framework based on Java NIO client-server, and the network application, such as a server and a client protocol, can be rapidly developed by using the netty. Netty provides a new way to develop web applications that makes it easy to use and highly extensible. Since the netty is an open source architecture, those skilled in the art can know the technical principle of the netty-based custom udp service through open source technical information, and details are not described herein.
Kafka is an open source stream processing platform developed by the Apache software foundation, written in Scala and Java. Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data in a consumer-scale website. The technical principle of kafka-based custom message middleware can be known by those skilled in the art through open source technical materials, and details are not described herein.
In step S50, the performance index data is provided to the application program at the back end through the message middleware.
In the embodiment, a mode of self-defining udp service based on netty and message middleware based on kafka is selected, and the effect of improving the data throughput of unified data acquisition entries can be realized by utilizing the characteristics of high throughput of netty and kafka.
In further embodiments, different technical frameworks may also be employed to implement udp services and message middleware, such as any framework custom udp services based on Akka, ZMQ, smart-socket, etc., message middleware custom based on RabbitMQ, rocktmq, etc., which may achieve similar technical effects.
In the embodiments, the message data sent by the different big data assemblies are analyzed by configuring the customized udp service, and the performance index data obtained by analysis is provided for the application program at the back end through the customized message middleware, so that a unified data acquisition entry which is adapted to the diversified big data assemblies is provided for the system.
FIG. 2 is a flow diagram of a preferred embodiment of the method shown in FIG. 1.
As shown in fig. 2, in a preferred embodiment, the method further comprises:
s10: and receiving an acquisition tool downloading request sent by the big data assembly, and returning downloading information of the Jvmtran acquisition tool for downloading and configuring the Jvmtran acquisition tool.
Specifically, by providing the download information of the acquisition tool, the big data assembly can be guaranteed to be successfully configured with the acquisition tool and the plug-ins carried by the acquisition tool, so that the successful acquisition of the ganglia message data is guaranteed, and the success rate of data analysis is further guaranteed.
Fig. 3 is a schematic structural diagram of a data parsing system according to an embodiment of the present invention. The system shown in fig. 3 may correspondingly perform the method shown in fig. 1.
As shown in fig. 3, in the present embodiment, the present invention provides a data parsing system 10, which includes a parsing unit 13 and a middleware unit 15.
The analysis unit 13 is configured to receive and analyze message data sent by the plurality of big data components through a customized udp service, obtain performance index data, and push the performance index data to a customized message middleware;
the middleware unit 15 is arranged for providing the performance indicator data for the application via the message middleware.
The data parsing principle of the system shown in fig. 3 can refer to the method shown in fig. 1, and is not described herein again.
Fig. 4 is a schematic diagram of a preferred embodiment of the system of fig. 3. The system shown in fig. 4 may correspondingly perform the method shown in fig. 2.
As shown in fig. 4, in a preferred embodiment, the data parsing system 10 further includes a configuration unit 11.
The configuration unit 11 is configured to receive an acquisition tool download request sent by the big data component, and return download information of the jvmtran acquisition tool for downloading and configuring the jvmtran acquisition tool.
The data parsing principle of the system shown in fig. 4 can refer to the method shown in fig. 2, and is not described herein again.
Fig. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
As shown in fig. 5, as another aspect, the present application also provides an apparatus 500 including one or more Central Processing Units (CPUs) 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the apparatus 500 are also stored. The CPU501, ROM502, and RAM503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to an embodiment of the present disclosure, the data parsing method described in any of the above embodiments may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program containing program code for performing a data parsing method. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511.
As yet another aspect, the present application also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the system of the above embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the data parsing method described herein.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, for example, each of the described units may be a software program provided in a computer or a mobile intelligent device, or may be a separately configured hardware device. Wherein the designation of a unit or module does not in some way constitute a limitation of the unit or module itself.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the present application. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A data parsing method, comprising:
receiving and analyzing message data sent by a plurality of big data components through a user-defined udp service to obtain performance index data and pushing the performance index data to a user-defined message middleware, wherein the big data components are used for storing and processing data;
providing the performance indicator data to an application through the message middleware.
2. The method of claim 1, wherein the packet data is ganglia packet data, and the ganglia packet data is generated by any one of the following methods:
the big data component is generated by the collection of a native ganglia plug-in;
and configuring a Jvmtran acquisition tool by the big data component, and acquiring and generating by using a ganglia plug-in the Jvmtran acquisition tool.
3. The method of claim 2, further comprising:
and receiving a downloading request of the acquisition tool sent by the big data assembly, and returning downloading information of the Jvmtran acquisition tool for downloading and configuring the Jvmtran acquisition tool.
4. The method of any of claims 1-3, wherein the udp service is customized based on netty.
5. The method of any of claims 1-3, wherein the message middleware is customized based on kafka.
6. A data parsing system, comprising:
the analysis unit is configured to receive and analyze message data sent by a plurality of big data assemblies through a user-defined udp service to obtain performance index data and push the performance index data to a user-defined message middleware, wherein the big data assemblies are used for storing and processing data;
a middleware unit configured to provide the performance indicator data for an application through the message middleware.
7. The system of claim 6, wherein the message data is ganglia message data, and the ganglia message data is generated by any one of the following methods:
the big data component is generated by the collection of a native ganglia plug-in;
and configuring a Jvmtran acquisition tool by the big data component, and acquiring and generating by using a ganglia plug-in the Jvmtran acquisition tool.
8. The system of claim 7, further comprising:
and the configuration unit is used for receiving the acquisition tool downloading request sent by the big data assembly, and returning the downloading information of the Jvmtran acquisition tool for downloading and configuring the Jvmtran acquisition tool.
9. The system of any of claims 6-8, wherein the udp service is customized based on netty.
10. The system of any of claims 6-8, wherein the message middleware is customized based on kafka.
11. An apparatus, characterized in that the apparatus comprises:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method recited in any of claims 1-5.
12. A storage medium storing a computer program, characterized in that the program, when executed by a processor, implements the method according to any one of claims 1-5.
CN201810100691.0A 2018-02-01 2018-02-01 Data analysis method, system, device and storage medium Active CN108540439B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810100691.0A CN108540439B (en) 2018-02-01 2018-02-01 Data analysis method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810100691.0A CN108540439B (en) 2018-02-01 2018-02-01 Data analysis method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN108540439A CN108540439A (en) 2018-09-14
CN108540439B true CN108540439B (en) 2021-10-29

Family

ID=63486238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810100691.0A Active CN108540439B (en) 2018-02-01 2018-02-01 Data analysis method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN108540439B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111698159A (en) * 2019-03-15 2020-09-22 顺丰科技有限公司 Service data processing method, device and storage medium
CN114168405A (en) * 2021-11-17 2022-03-11 深圳市梦网科技发展有限公司 Data monitoring method and device, terminal equipment and storage medium
CN114785808A (en) * 2022-03-28 2022-07-22 深圳开源互联网安全技术有限公司 Data synchronization analysis method, device and equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520785A (en) * 2011-12-27 2012-06-27 东软集团股份有限公司 Energy consumption management method and system for cloud data center
CN104268739A (en) * 2014-08-29 2015-01-07 蓝信工场(北京)科技有限公司 Method and system for quickly converting enterprise information system into mobile application
CN104345717A (en) * 2014-10-17 2015-02-11 武汉华大优能信息有限公司 Intelligent remote data acquisition system based on Internet of Things
CN104407910A (en) * 2014-10-29 2015-03-11 华南理工大学 Virtualization server performance monitoring method and system
CN106161143A (en) * 2016-07-22 2016-11-23 浪潮电子信息产业股份有限公司 A kind of network performance test method based on ARM server and device
CN106294091A (en) * 2016-08-11 2017-01-04 福建富士通信息软件有限公司 A kind of without intrusive mood daily record interception method for analyzing performance and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7650317B2 (en) * 2006-12-06 2010-01-19 Microsoft Corporation Active learning framework for automatic field extraction from network traffic

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520785A (en) * 2011-12-27 2012-06-27 东软集团股份有限公司 Energy consumption management method and system for cloud data center
CN104268739A (en) * 2014-08-29 2015-01-07 蓝信工场(北京)科技有限公司 Method and system for quickly converting enterprise information system into mobile application
CN104345717A (en) * 2014-10-17 2015-02-11 武汉华大优能信息有限公司 Intelligent remote data acquisition system based on Internet of Things
CN104407910A (en) * 2014-10-29 2015-03-11 华南理工大学 Virtualization server performance monitoring method and system
CN106161143A (en) * 2016-07-22 2016-11-23 浪潮电子信息产业股份有限公司 A kind of network performance test method based on ARM server and device
CN106294091A (en) * 2016-08-11 2017-01-04 福建富士通信息软件有限公司 A kind of without intrusive mood daily record interception method for analyzing performance and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHMA:一种云平台的监控框架;陈林 等;《计算机科学》;20170410;第44卷(第1期);第7-12页 *

Also Published As

Publication number Publication date
CN108540439A (en) 2018-09-14

Similar Documents

Publication Publication Date Title
CN107809331B (en) Method and device for identifying abnormal flow
US10187461B2 (en) Configuring a system to collect and aggregate datasets
US9361203B2 (en) Collecting and aggregating log data with fault tolerance
US8959063B2 (en) Managing incident reports
US9082127B2 (en) Collecting and aggregating datasets for analysis
CN109710615B (en) Database access management method, system, electronic device and storage medium
CN108540439B (en) Data analysis method, system, device and storage medium
US20110246528A1 (en) Dynamically processing an event using an extensible data model
CN108494860B (en) WEB access system, WEB access method and device for client
US11188443B2 (en) Method, apparatus and system for processing log data
CN113157545A (en) Method, device and equipment for processing service log and storage medium
CN114416685B (en) Log processing method, system and storage medium
CN110546615B (en) Super dynamic JAVA management extension
CN110928934A (en) Data processing method and device for business analysis
CN113312321A (en) Abnormal monitoring method for traffic and related equipment
CN109597702B (en) Root cause analysis method, device, equipment and storage medium for message bus abnormity
CN112579406A (en) Log call chain generation method and device
CN116668331A (en) Distributed performance monitoring system and method
CN107682432B (en) Spark-based data processing system and method
Vardhan et al. Design and development of IoT plugin for hpcc systems
CN114265866A (en) Streaming data processing method, rule plug-in, streaming data processing module and system
CN111611131A (en) Saltstack-based operation and maintenance method, device, system and storage medium
CN111078975A (en) Multi-node incremental data acquisition system and acquisition method
US10218591B2 (en) Embedded performance monitoring of a DBMS
US11921602B2 (en) Edge-based data collection system for an observability pipeline system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant