CN108600300B - Log data processing method and device - Google Patents

Log data processing method and device Download PDF

Info

Publication number
CN108600300B
CN108600300B CN201810184438.8A CN201810184438A CN108600300B CN 108600300 B CN108600300 B CN 108600300B CN 201810184438 A CN201810184438 A CN 201810184438A CN 108600300 B CN108600300 B CN 108600300B
Authority
CN
China
Prior art keywords
processing
message
platform
data processing
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810184438.8A
Other languages
Chinese (zh)
Other versions
CN108600300A (en
Inventor
李士超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sikong Tech Co ltd
Original Assignee
Beijing Sikong Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sikong Tech Co ltd filed Critical Beijing Sikong Tech Co ltd
Priority to CN201810184438.8A priority Critical patent/CN108600300B/en
Publication of CN108600300A publication Critical patent/CN108600300A/en
Application granted granted Critical
Publication of CN108600300B publication Critical patent/CN108600300B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a log data processing method and device. Wherein, the method comprises the following steps: the data processing platform acquires log data of preset cache contents, processes the acquired log data to further obtain processing information, and issues the obtained processing information to the Kafka cluster, wherein the Kafka cluster is used for subscribing the processing information by the CND management and control platform. The invention solves the technical problems of poor data transmission timeliness and high peak pressure in the related technology.

Description

Log data processing method and device
Technical Field
The invention relates to the technical field of communication, in particular to a log data processing method and device.
Background
As a provider providing an underlying Content Delivery Network (CDN) service, for a specific domain name, page or content cache, a CDN service provider may periodically generate log data for accessing the cache content, and need to extract bandwidth flow information by analyzing and calculating the log data.
The CDN management and control platform is a comprehensive management system which is based on a CDN technology and integrates management, control, flow bandwidth monitoring and cost accounting. In the operation process, log data are acquired from a CDN service provider side, submitted to a data processing platform for log analysis and result statistics, and then the calculation result is synchronized to a CDN management and control platform in a certain mode. In the related technology, a File Transfer Protocol (FTP) file or a RESTful WebService interface is mainly adopted to realize data synchronization, wherein the FTP mode has the problems of low timeliness and higher safety risk, while the RESTful WebSevice interface based on the HTTP protocol has the problems of limited single-time data transmission capacity, increased response time at peak time and easy risk of causing the shutdown of a server.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a log data processing method and device, which are used for at least solving the technical problems of poor data transmission timeliness and high peak period pressure in the related technology.
According to an aspect of an embodiment of the present invention, there is provided a log data processing method, including: the data processing platform acquires log data of preset cache contents; the data processing platform processes the acquired log data to obtain a processing message; and the data processing platform issues the obtained processing message to a Kafka cluster, wherein the Kafka cluster is used for a CDN management and control platform to subscribe the processing message.
Optionally, the issuing, by the data processing platform, the obtained processing message to the Kafka cluster includes: allocating a partition identification to the processing message; and the data processing platform issues the obtained processing message to the Kafka cluster according to the allocated partition identification.
Optionally, the Kafka cluster is configured to subscribe to the processing message through the Storm cluster by the CDN management and control platform.
According to an aspect of an embodiment of the present invention, there is provided another log data processing method, including: a CDN management and control platform of a content delivery network subscribes to a processing message issued by a data processing platform through a Storm cluster, wherein the processing message is obtained by processing log data of preset cache content by the data processing platform; and the CDN management and control platform stores a processing result obtained by processing the subscribed processing message.
Optionally, before the CDN management and control platform subscribes to the processing message issued by the data processing platform through the Storm cluster, the method further includes: and the CDN management and control platform allocates the Storm cluster through zookeeper service.
Optionally, the step of subscribing, by the CDN management and control platform, the processing message issued by the data processing platform through the Storm cluster includes: and the CDN management and control platform subscribes to the processing message which is issued by the data processing platform through the Kafka cluster through the Storm cluster.
According to another aspect of the embodiments of the present invention, there is also provided a log data processing apparatus applied to a data processing platform, including: the acquisition module is used for acquiring log data of preset cache contents; the processing module is used for processing the acquired log data to obtain a processing message; and the issuing module is used for issuing the obtained processing message to a Kafka cluster, wherein the Kafka cluster is used for subscribing the processing message by the CDN management and control platform.
Optionally, the Kafka cluster is configured to subscribe to the processing message through the Storm cluster by the CDN management and control platform.
According to another aspect of the embodiments of the present invention, there is provided another log data processing apparatus, applied to a content delivery network CDN management and control platform, including: the subscription module is used for subscribing processing information issued by the data processing platform through the Storm cluster, wherein the processing information is obtained by processing log data of preset cache content by the data processing platform; and the storage module is used for storing a processing result obtained by processing the subscribed processing message.
Optionally, the subscription module is further configured to subscribe, through the Storm cluster, to the processing message issued by the data processing platform through the Kafka cluster.
In the embodiment of the invention, a mode of combining a Kafka cluster and a Storm cluster is adopted, log data of preset cache contents are obtained through a data processing platform, the obtained log data are processed to obtain processing information, the obtained processing information is further issued to the Kafka cluster, and the Storm cluster subscribes to the processing information issued by the data processing platform through the Kafka cluster, so that the purpose of providing data transmission preprocessing service with good real-time performance, high availability and strong timeliness is achieved, and the technical problems of poor data transmission timeliness and high peak period pressure in the related technology are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a flowchart of a log data processing method according to embodiment 1 of the present invention;
fig. 2 is a schematic structural diagram of a log data processing apparatus according to embodiment 2 of the present invention;
fig. 3 is a flowchart of a log data processing method according to embodiment 3 of the present invention;
fig. 4 is a schematic structural diagram of a log data processing apparatus according to embodiment 4 of the present invention;
fig. 5 is a flowchart of a method for improving a CDN management and control platform to synchronize bandwidth traffic data based on a streaming real-time computation framework according to embodiment 5 of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
FTP: file transfer protocol for bidirectional transfer of control files over the Internet. There are different FTP applications based on different operating systems and all of these applications adhere to the same protocol to transfer files.
Storm: the distributed real-time computing framework has the characteristics of low delay, high performance, distribution, expandability, fault tolerance and the like, and further has the advantages of ensuring that the message is not lost, strictly and orderly processing the message, supporting the development of multiple languages and the like. It delegates work tasks to different types of components, each responsible for handling a simple specific task.
Kafka: the distributed publish-subscribe message system is mainly used for processing active streaming data. Has the advantages of high writing speed, high reliability, high capacity, durability and the like.
Example 1
In the related technology, a File Transfer Protocol (FTP) file or a RESTful WebService interface is mainly adopted to realize data synchronization, wherein the FTP mode has the problems of low timeliness and higher safety risk, while the RESTful WebSevice interface based on the HTTP protocol has the problems of limited single-time data transmission capacity, increased response time at peak time and easy risk of causing the shutdown of a server. And then, when the CDN management and control platform and the data processing platform perform data transmission, the problem of low data transmission efficiency and instability is easily caused.
In order to solve the above technical problem, the present application provides a log data processing method, and fig. 1 is a flowchart of a log data processing method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, the data processing platform obtains log data of preset cache contents.
In the log data processing process, the related processing amount of the CDN service provider is relatively large. For a specific domain name, page or content cache, the CDN service provider may periodically generate log data (for example, the log data may exist in a file form, that is, may be a log data file) for accessing the cache content, and need to extract bandwidth traffic information by analyzing and calculating the log data.
The CDN is a content delivery network constructed on the network, and by means of edge servers deployed in various places, a user can obtain required content nearby through functional modules of load balancing, content delivery, scheduling and the like of a central platform, network congestion is reduced, and the access response speed and hit rate of the user are improved. The main method is to widely adopt various cache servers, distribute the cache servers to areas or networks with relatively concentrated user access, and when a user accesses a website, the access of the user is directed to the closest cache server which works normally by using a global load technology, and the cache server directly responds to a user request.
And step S104, the data processing platform processes the acquired log data to obtain a processing message.
In one alternative, log data of predetermined cache contents may be acquired and processed by a data processing platform (which may be, for example, a big data computing platform, i.e., a platform for a big data computing scenario).
The CDN management and control platform is a comprehensive management system which is based on CDN technology and integrates management, control, flow bandwidth monitoring and cost accounting into a whole aiming at related services of the CDN. In the operation process, log data are acquired from the CDN service provider side, then submitted to the data processing platform for log analysis and result statistics, and the calculation result is synchronized to the CDN management and control platform through the data processing platform again in a certain mode. The CDN management and control platform can download the log data file of the service provider side in a mode called by an Application Programming Interface (API), and submit the log data file to the data processing platform for log analysis and calculation.
Aiming at the problems of poor timeliness of synchronizing files by adopting an FTP mode and small data volume and high peak concurrency pressure when adopting an HTTP interface in the related technology, the problems of low data transmission efficiency and instability are easily caused when a CDN management and control platform and a data processing platform carry out data transmission.
And step S106, the data processing platform issues the obtained processing message to a Kafka cluster, wherein the Kafka cluster is used for subscribing the processing message by the CDN management and control platform.
Through the steps, the data processing platform can issue the processing message processed according to the acquired log data of the preset cache content to the Kafka cluster, and further subscribe the processing message on the CDN management and control platform through the Kafka cluster. Through the advantages of high writing speed, high reliability, high capacity and durability possessed by Kafka, the technical effects of high writing speed and high reliability when the data processing platform subscribes to the published message can be realized. The problems that file synchronization in an FTP mode is poor in timeliness, data volume is small when an HTTP interface is adopted, and peak concurrency pressure is high in the related technology are effectively solved.
As an alternative embodiment, step S106, issuing the obtained processing message to the Kafka cluster, may include the following steps:
step S1061, the data processing platform allocates a partition identifier to the processing message;
step S1062, the data processing platform issues the obtained processing message to the Kafka cluster according to the allocated partition identifier.
In the operation process of the CDN management and control platform, log data are required to be obtained from a CDN service provider side, submitted to a data processing platform for log analysis and result statistics, and then a calculation result is synchronized to the CDN management and control platform in a certain mode. Further, a distributed message subscription system Kafka cluster may be deployed on the data processing platform, wherein a log analysis file program on the big data processing service provider side processes the acquired log data to obtain a processed message, and thus the processed message is used as a producer, assembles the analyzed result into a subject message containing a subject, an operator, regional information, a timestamp, and bandwidth traffic data, and issues the subject message to the Kafka cluster system.
Since the more partitions in the Kafka cluster, the higher the throughput, it is preferable that a plurality of partitions be provided in the Kafka cluster according to the system requirements. Furthermore, according to the optional embodiment, the partition identifier is allocated to the processed Topic message (i.e., the processed message), and the obtained processed message is issued to the Kafka cluster according to the allocated partition identifier, so that the operation speed can be increased, the load balancing effect is realized, and the operation pressure of the server in the data transmission peak period is reduced. Preferably, the partition Key Partitioning Key may be designated as the partition identifier.
As a preferred embodiment, the Kafka cluster in the embodiment of the present invention may be used for the CDN management and control platform to process a message through the Storm cluster subscription.
Specifically, a real-time computing framework Storm cluster may be deployed on the CDN management and control platform. And subscribing the processing message in the Kafka cluster by the Storm cluster serving as the subject Topic Topic consumer through the Kafka Spout component, processing data according to the service rule, and storing the processing result in a database Mysql/MongoDB of the CDN management and control platform. And then, the processing result can be called through an API (application programming interface) of the CDN (content delivery network) management and control platform, and the data change of the service provider can be displayed on a foreground page in real time.
The Storm has the characteristics of low delay, high performance, distribution, expandability, fault tolerance and the like, and further has the advantages of ensuring that the message is not lost, strictly and orderly processing the message, supporting development of multiple languages and the like, so that the acquired log data is processed to obtain the processed message by acquiring the log data of the preset cache content in a mode of combining the Kafka cluster and the Storm cluster, the obtained processed message is issued to the Kafka cluster, and the Storm cluster subscribes the processed message issued by the data processing platform through the Kafka cluster, so that the aim of providing data transmission preprocessing service with good real-time property, high availability and strong timeliness is fulfilled, and the technical problems of poor data transmission timeliness and high peak pressure in the related technology are solved.
It should be noted that, according to the embodiments of the present invention, the steps shown in the flowchart of the figure can be executed in a computer system such as a set of computer executable instructions, and although the logical sequence is shown in the flowchart, in some cases, the steps shown or described can be executed in a sequence different from the sequence here.
All or part of the technical solution of the embodiment of the present invention may be embodied in the form of a computer software product, where the computer software product may be stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiment of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Example 2
According to an embodiment of the present application, there is further provided a log data processing apparatus for implementing the foregoing embodiment 1, and fig. 2 is a schematic structural diagram of the log data processing apparatus according to an embodiment of the present invention, and as shown in fig. 2, the apparatus may be applied to a data processing platform, and includes:
an obtaining module 22, configured to obtain log data of predetermined cache content;
a processing module 24, connected to the obtaining module 22, for processing the obtained log data to obtain a processing message;
and the issuing module 26 is connected to the processing module 24 and configured to issue the obtained processing message to the Kafka cluster, where the Kafka cluster is used for the CDN management and control platform to subscribe to the processing message.
It should be noted here that the acquiring module 22, the processing module 24 and the publishing module 26 correspond to steps S102 to S106 in embodiment 1, and the three modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1.
The CDN management and control platform is a comprehensive management system which is based on a CDN technology and integrates management, control, flow bandwidth monitoring and cost accounting. In the operation process, log data are acquired from a CDN service provider side, submitted to a data processing platform for log analysis and result statistics, and then statistical results are synchronized to a CDN management and control platform in a certain mode. The CDN management and control platform can download the log data file of the service provider side in an API calling mode and submit the log data file to the data processing platform for log analysis and calculation. Therefore, the data transmission quantity is large in the operation process.
Aiming at the problems of poor timeliness of synchronizing files by adopting an FTP mode and small data volume and high peak concurrency pressure when adopting an HTTP interface in the related technology, the problems of low data transmission efficiency and instability are easily caused when a CDN management and control platform and a data processing platform carry out data transmission.
As an optional technical solution, a data processing platform for performing data transmission with the CDN management and control platform may be implemented by setting functions of related modules in the log data processing apparatus, and a processing message processed according to the obtained log data of the predetermined cache content is issued to the Kafka cluster by the log data processing apparatus, and the processing message is subscribed to the CDN management and control platform by the Kafka cluster. By the advantages of high writing speed, high reliability, high capacity and durability of Kafka, the technical effects of high writing speed and high reliability when the data processing platform issues and subscribes the messages can be realized.
As an alternative embodiment, the publishing module 26 may include:
the allocation unit is used for allocating partition identifications to the processing messages;
and the issuing unit is connected with the distribution unit and used for issuing the obtained processing message to the Kafka cluster according to the distributed partition identification.
It should be noted here that the above-mentioned distribution unit and the above-mentioned release unit correspond to steps S1021 to S1022 in embodiment 1, and the two units are the same as the example and application scenarios realized by the corresponding steps, but are not limited to the disclosure of embodiment 1.
As a preferred embodiment, in the embodiment of the present invention, the Kafka cluster may be used for the CDN management and control platform to process the message through the Storm cluster subscription.
The Storm has the characteristics of low delay, high performance, distribution, expandability, fault tolerance and the like, and further has the advantages of ensuring that the message is not lost, strictly and orderly processing the message, supporting development of multiple languages and the like, so that the acquired log data are processed to obtain the processed message by acquiring the log data of the preset cache content in a mode of combining the Kafka cluster and the Storm cluster, the obtained processed message is issued to the Kafka cluster, and the Storm cluster subscribes the processed message issued by the data processing platform through the Kafka cluster, so that the aim of providing data transmission preprocessing service with good real-time property, high availability and strong timeliness is fulfilled, and the technical problems of poor data transmission timeliness and high peak pressure in the related technology are solved.
Example 3
In the related technology, a File Transfer Protocol (FTP) file or a RESTful WebService interface is mainly adopted to realize data synchronization, wherein the FTP mode has the problems of low timeliness and higher safety risk, while the RESTful WebSevice interface based on the HTTP protocol has the problems of limited single-time data transmission capacity, increased response time at peak time and easy risk of causing the shutdown of a server. And then, when the CDN management and control platform and the data processing platform perform data transmission, the problem of low data transmission efficiency and instability is easily caused.
In order to solve the above technical problem, the present application further proposes another log data processing method, and fig. 3 is a flowchart of the log data processing method according to an embodiment of the present invention, as shown in fig. 3, the method includes the following steps:
step S302, the CDN management and control platform subscribes to a processing message issued by the data processing platform through a Storm cluster, wherein the processing message is obtained by processing log data of preset cache content by the data processing platform;
step S304, the CDN management and control platform processes the subscribed processing message to obtain a processing result, and stores the processing result.
In an optional implementation scheme, in an operation process of the CDN management and control platform that has a large amount of data transmission with the data processing platform, log data needs to be acquired from a CDN service provider side, submitted to the data processing platform for log analysis and result statistics, and then the statistical result is synchronized to the CDN management and control platform in a certain manner. Due to the problems of poor timeliness of synchronizing files by adopting an FTP mode and small data volume and high peak concurrency pressure when adopting an HTTP interface in the related technology, the problems of low data transmission efficiency and instability are easily caused when the CDN management and control platform and the data processing platform transmit data.
And through the characteristics of low delay, high performance, distribution, expandability, fault tolerance and the like owned by Storm, a real-time computing framework Storm cluster can be deployed on a CDN management and control platform. The Storm cluster subscribes to the processing message issued by the data processing platform, and the CDN control platform stores the processing result obtained by processing the subscribed processing message, so that the technical effects of high writing speed and high reliability in subscribing the message issued by the data processing platform can be realized.
As an optional embodiment, before the CDN management and control platform subscribes to the processing message issued by the data processing platform through the Storm cluster at step S302, the log data processing may further include the following steps:
step S301, the CDN management and control platform allocates the Storm cluster through zookeeper service.
The zookeeper as software for providing consistency service for distributed application provides the following functions: configuration maintenance, distributed synchronization, group services, etc. Specifically, the zookeeper aims to package complex and error-prone key services, and provides a simple and easy-to-use interface and a system with efficient performance and stable functions for a user. Therefore, in the optional embodiment, the Storm cluster is deployed by adopting the zookeeper service, so that the technical effect of high efficiency and reliability when the Storm cluster subscribes and processes the message can be realized.
As an alternative embodiment, in step S302, the step of subscribing, by the CDN management and control platform, to the processing message issued by the data processing platform through the Storm cluster may include the following steps:
step S3021, the CDN management and control platform subscribes to the processing message issued by the data processing platform through the Storm cluster through the Kafka cluster.
Specifically, during operation of the CDN, a distributed message subscription system Kafka cluster may be deployed on a data processing platform, and a log analysis file program on a big data processing service provider side processes acquired log data to obtain a processed message, so that the processed message is used as a producer, assembles an analyzed result into a Topic message containing a subject, an operator, area information, a timestamp, and bandwidth traffic data, and designates a partioning Key as a partition identifier to implement a load balancing function, and issues the Topic message to the Kafka cluster system.
And then the Storm cluster can be used as a consumer of the subject Topic message, the Topic message in the Kafka cluster is subscribed through the Kafka Spout component, the result is stored in a database Mysql/MongoDB of the CDN management and control platform after data processing is carried out according to the service rule, and the data change of the service provider can be displayed in real time on a foreground page through calling of an API (application programming interface) of the CDN management and control platform.
Because Kafka has the advantages of high writing speed, high reliability, high capacity and durability, the technical effects of high writing speed and high reliability when the data processing platform issues and subscribes the messages can be realized. Therefore, by means of the combination of the Kafka cluster and the Storm cluster, the Storm cluster subscribes to the processing message issued by the data processing platform through the Kafka cluster, and the subscribed processing message is processed to obtain a processing result which is stored, so that the technical effects of good real-time performance, high availability and strong timeliness of data transmission preprocessing in data transmission between the CDN management and control platform and the data processing platform are achieved, and the technical problems of poor data transmission timeliness and high peak pressure in the related technology are solved.
Example 4
According to an embodiment of the present application, there is further provided a log data processing apparatus for implementing the foregoing embodiment 3, fig. 4 is a schematic structural diagram of the log data processing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus may be applied to a CDN management and control platform, and includes:
the subscription module 42 is configured to subscribe to a processing message issued by the data processing platform through the Storm cluster, where the processing message is obtained by processing log data of predetermined cache content by the data processing platform;
and a storage module 44, connected to the subscription module 42, for storing a processing result obtained by processing the subscribed processing message.
It should be noted here that the subscription module 42 and the storage module 44 correspond to steps S302 to S304 in embodiment 3, and the two modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in embodiment 3.
In an optional implementation scheme, in an operation process of the CDN management and control platform that has a large amount of data transmission with the data processing platform, log data needs to be acquired from a CDN service provider side, submitted to the data processing platform for log analysis and result statistics, and then a calculation result is synchronized to the CDN management and control platform in a certain manner. Due to the problems of poor timeliness of synchronizing files by adopting an FTP mode and small data volume and high peak concurrency pressure when adopting an HTTP interface in the related technology, the problems of low data transmission efficiency and instability are easily caused when the CDN management and control platform and the data processing platform transmit data.
And through the characteristics of low delay, high performance, distribution, expandability, fault tolerance and the like owned by Storm, a real-time computing framework Storm cluster can be deployed on a CDN management and control platform. Through the log data processing device in the embodiment of the invention, the CDN management and control platform subscribes the processing message issued by the data processing platform through the Storm cluster, and the CDN management and control platform stores the processing result obtained by processing the subscribed processing message, so that the technical effects of high writing speed and high reliability in subscribing the message issued by the data processing platform can be realized.
As an alternative embodiment, the log data processing apparatus may further include, on the basis of the structure shown in fig. 4:
and the deployment module is connected to the subscription module 42 and is used for deploying the Storm cluster through the zookeeper service before subscribing the processing message issued by the data processing platform through the Storm cluster.
It should be noted that the above-mentioned deployment module corresponds to step S301 in embodiment 3, and the module is the same as the example and application scenario realized by the corresponding step, but is not limited to the disclosure of embodiment 3.
As an alternative embodiment, the subscription module is further configured to subscribe to the processing message published by the data processing platform through the Kafka cluster through the Storm cluster.
Because Kafka has the advantages of high writing speed, high reliability, high capacity and durability, the technical effects of high writing speed and high reliability when the data processing platform issues and subscribes the messages can be realized. Therefore, in the log data processing device in the embodiment of the invention, the Storm cluster subscribes to the processing message issued by the data processing platform through the Kafka cluster in a mode of combining the Kafka cluster and the Storm cluster, and the processing result obtained by processing the subscribed processing message is stored, so that the technical effects of good real-time performance, high availability and strong timeliness of data transmission preprocessing in data transmission between the CDN management and control platform and the data processing platform are realized, and the technical problems of poor data transmission timeliness and high peak period pressure in the related technology are further solved.
Example 5
According to the embodiment of the present invention, a method for improving the bandwidth traffic data synchronization of the CDN management and control platform based on the streaming real-time computation framework is further provided, and fig. 5 is a flowchart of the method for improving the bandwidth traffic data synchronization of the CDN management and control platform based on the streaming real-time computation framework according to the embodiment of the present invention, as shown in fig. 5, the main design concept of the method includes:
(1) downloading a log data file of a service provider side by a CDN management and control platform in a mode of calling an API (application programming interface) interface, and submitting the log data file to a data processing platform for log analysis and calculation;
(2) deploying a distributed message subscription system Kafka cluster on a data processing platform, processing the acquired log data by a log analysis file program on a big data processing service provider side to obtain analysis processing messages, using the analysis processing messages as a producer, assembling the analyzed result into a theme Topic message containing theme, operator, regional information, timestamp and bandwidth flow data, and simultaneously appointing a partioning Key as a partition identifier to realize a load balancing function and issue the Partitioning identifier to the Kafka cluster system;
(3) deploying a Storm cluster of a real-time computing framework on a CDN management and control platform, subscribing a message record in a Kafka cluster by a subject Topic consumer through a Kafka Spout component, performing data processing according to a service rule, storing a result in a database Mysql/MongoDB of the CDN management and control platform, and displaying data change of a service provider on a foreground page in real time through calling of an API (application programming interface) interface of the CDN management and control platform; meanwhile, a zookeeper service is installed on the CDN management and control platform and serves as a coordinator of the Storm cluster.
In the method, a producer, a consumer and a Kafka queue are all configured into a plurality of nodes, and a zookeeper service is configured into 1-3 nodes.
Specifically, for a certain specific cache content (domain name, page, multimedia file, etc.), the CDN service provider provides log data according to a time sequence, and the method shortens the time difference between the CDN service provider and the CDN management and control platform by shortening the interval for obtaining the log data of the service provider and submitting large data for analysis for multiple times. By processing the analysis result of the data processing platform in real time, the design goals of high real-time performance and high usability are achieved.
It should be noted that the Kafka framework supports three message publish-subscribe patterns: at most once (a message may be lost but never transmitted repeatedly), At least once (a message may be lost but may be transmitted repeatedly), exact once (each message must be transmitted once and only once), and according to the characteristics of bandwidth traffic data and the main sequencing basis of time, preferably, the embodiment of the present invention adopts the message model of exact once.
Meanwhile, based on the characteristics of bandwidth flow data, the method adopts a mode based on the time stamp as a partioning Key to store the message into a partition, and linear writing and reading are realized.
Aiming at the characteristics of Storm having low delay, high performance, distributed, expandable, fault-tolerant and the like, the method can ensure that the message is not lost, the message processing is strictly and orderly, the development of multiple languages is supported, and simultaneously Kafka has the advantages of high writing speed, high reliability, high capacity, durability and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (7)

1. A log data processing method, comprising:
the data processing platform acquires log data of preset cache contents;
the data processing platform processes the acquired log data to obtain a processing message;
the data processing platform issues the obtained processing message to a Kafka cluster, wherein the Kafka cluster is used for a CDN management and control platform to subscribe the processing message;
the CDN management and control platform downloads a log data file at a service provider side in a mode of calling an Application Program Interface (API), and submits the log data file to the data processing platform for log analysis and calculation;
the step of issuing the obtained processing message to the Kafka cluster by the data processing platform comprises: the data processing platform allocates a partition identifier to the processing message; the data processing platform issues the obtained processing message to the Kafka cluster according to the allocated partition identification; wherein, a partition Key is appointed to be used as the partition identifier; and the log analysis file program processes the acquired log data to obtain a processing message, assembles the analyzed result into a subject message containing a subject, an operator, regional information, a timestamp and bandwidth flow data, and issues the subject message to the Kafka cluster.
2. The method of claim 1, wherein the Kafka cluster is configured for the CDN management platform to subscribe to the process message via a Storm cluster.
3. A log data processing method, comprising:
a CDN management and control platform of a content delivery network subscribes to a processing message issued by a data processing platform through a Storm cluster, wherein the processing message is obtained by processing log data of preset cache content by the data processing platform;
the CDN management and control platform processes the subscribed processing message to obtain a processing result, and stores the processing result;
the CDN management and control platform downloads a log data file at a service provider side in a mode of calling an Application Program Interface (API), and submits the log data file to the data processing platform for log analysis and calculation;
the step of subscribing, by the CDN management and control platform, the processing message issued by the data processing platform through the Storm cluster includes: the CDN management and control platform subscribes to the processing message issued by the data processing platform through the Kafka cluster through the Storm cluster; wherein the data processing platform assigns a partition identifier to the processing message; the data processing platform issues the obtained processing message to the Kafka cluster according to the allocated partition identification; wherein, a partition Key is appointed to be used as the partition identifier; and the log analysis file program processes the acquired log data to obtain a processing message, assembles the analyzed result into a subject message containing a subject, an operator, regional information, a timestamp and bandwidth flow data, and issues the subject message to the Kafka cluster.
4. The method of claim 3, further comprising, prior to the CDN management platform subscribing to the transaction message published by the data processing platform via the Storm cluster:
and the CDN management and control platform allocates the Storm cluster through zookeeper service.
5. A log data processing device applied to a data processing platform comprises:
the acquisition module is used for acquiring log data of preset cache contents;
the processing module is used for processing the acquired log data to obtain a processing message;
the publishing module is used for publishing the obtained processing message to a Kafka cluster, wherein the Kafka cluster is used for a CDN management and control platform to subscribe the processing message;
the CDN management and control platform downloads a log data file at a service provider side in a mode of calling an Application Program Interface (API), and submits the log data file to the data processing platform for log analysis and calculation;
the publishing module comprises: an allocation unit, configured to allocate a partition identifier to the processing message; the issuing unit is connected with the distributing unit and used for issuing the obtained processing message to the Kafka cluster according to the distributed partition identification; wherein, a partition Key is appointed to be used as the partition identifier; and the log analysis file program processes the acquired log data to obtain a processing message, assembles the analyzed result into a subject message containing a subject, an operator, regional information, a timestamp and bandwidth flow data, and issues the subject message to the Kafka cluster.
6. The apparatus of claim 5, wherein the Kafka cluster is configured for the CDN management platform to subscribe to the processing message via a Storm cluster.
7. The log data processing device is applied to a Content Delivery Network (CDN) management and control platform and comprises the following steps:
the subscription module is used for subscribing processing information issued by the data processing platform through the Storm cluster, wherein the processing information is obtained by processing log data of preset cache content by the data processing platform;
the storage module is used for storing a processing result obtained by processing the subscribed processing message;
the CDN management and control platform downloads a log data file at a service provider side in a mode of calling an Application Program Interface (API), and submits the log data file to the data processing platform for log analysis and calculation;
the subscription module is further used for subscribing the processing message issued by the data processing platform through the Kafka cluster through the Storm cluster; wherein the data processing platform assigns a partition identifier to the processing message; the data processing platform issues the obtained processing message to the Kafka cluster according to the allocated partition identification; wherein, a partition Key is appointed to be used as the partition identifier; and the log analysis file program processes the acquired log data to obtain a processing message, assembles the analyzed result into a subject message containing a subject, an operator, regional information, a timestamp and bandwidth flow data, and issues the subject message to the Kafka cluster.
CN201810184438.8A 2018-03-06 2018-03-06 Log data processing method and device Active CN108600300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810184438.8A CN108600300B (en) 2018-03-06 2018-03-06 Log data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810184438.8A CN108600300B (en) 2018-03-06 2018-03-06 Log data processing method and device

Publications (2)

Publication Number Publication Date
CN108600300A CN108600300A (en) 2018-09-28
CN108600300B true CN108600300B (en) 2021-11-12

Family

ID=63625739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810184438.8A Active CN108600300B (en) 2018-03-06 2018-03-06 Log data processing method and device

Country Status (1)

Country Link
CN (1) CN108600300B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109274540A (en) * 2018-11-16 2019-01-25 四川长虹电器股份有限公司 A kind of web access log processing method based on storm
CN109901992B (en) * 2019-01-18 2022-03-04 竞技世界(北京)网络技术有限公司 Method for remotely and dynamically monitoring program execution behavior
CN109918349B (en) * 2019-02-25 2021-05-25 网易(杭州)网络有限公司 Log processing method, log processing device, storage medium and electronic device
CN109951323B (en) * 2019-02-27 2022-11-08 网宿科技股份有限公司 Log analysis method and system
CN110401724B (en) * 2019-08-22 2022-04-12 北京旷视科技有限公司 File management method, file transfer protocol server and storage medium
CN110719332B (en) * 2019-10-17 2022-07-26 北京旷视科技有限公司 Data transmission method, device, system, computer equipment and storage medium
CN111897997A (en) * 2020-06-15 2020-11-06 济南浪潮高新科技投资发展有限公司 Data processing method and system based on ROS operating system
CN111723156A (en) * 2020-06-29 2020-09-29 深圳壹账通智能科技有限公司 Data disaster tolerance method and system
CN113015203B (en) * 2021-03-22 2022-08-16 Oppo广东移动通信有限公司 Information acquisition method, device, terminal, system and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631026A (en) * 2015-12-30 2016-06-01 北京奇艺世纪科技有限公司 Security data analysis system
CN105681303A (en) * 2016-01-15 2016-06-15 中国科学院计算机网络信息中心 Big data driven network security situation monitoring and visualization method
CN105868075A (en) * 2016-03-31 2016-08-17 浪潮通信信息系统有限公司 System and method for monitoring and analyzing great deal of logs in real time
CN107332719A (en) * 2017-08-16 2017-11-07 北京云端智度科技有限公司 A kind of method that daily record is analyzed in real time in CDN system
CN107391606A (en) * 2017-06-30 2017-11-24 中国联合网络通信集团有限公司 Log processing method and device based on Storm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294357B (en) * 2015-05-14 2019-07-09 阿里巴巴集团控股有限公司 Data processing method and stream calculation system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631026A (en) * 2015-12-30 2016-06-01 北京奇艺世纪科技有限公司 Security data analysis system
CN105681303A (en) * 2016-01-15 2016-06-15 中国科学院计算机网络信息中心 Big data driven network security situation monitoring and visualization method
CN105868075A (en) * 2016-03-31 2016-08-17 浪潮通信信息系统有限公司 System and method for monitoring and analyzing great deal of logs in real time
CN107391606A (en) * 2017-06-30 2017-11-24 中国联合网络通信集团有限公司 Log processing method and device based on Storm
CN107332719A (en) * 2017-08-16 2017-11-07 北京云端智度科技有限公司 A kind of method that daily record is analyzed in real time in CDN system

Also Published As

Publication number Publication date
CN108600300A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN108600300B (en) Log data processing method and device
CN108776934B (en) Distributed data calculation method and device, computer equipment and readable storage medium
CN107707943B (en) A kind of method and system for realizing cloud service fusion
CN106453564A (en) Elastic cloud distributed massive request processing method, device and system
CN108282514B (en) Distributed service establishing method and device
CN111459986B (en) Data computing system and method
CN101652750B (en) Data processing device, distributed processing system and data processing method
CN107888666B (en) Cross-region data storage system and data synchronization method and device
CN111092921B (en) Data acquisition method, device and storage medium
CN110716744A (en) Data stream processing method, system and computer readable storage medium
CN111309448B (en) Container instance creating method and device based on multi-tenant management cluster
WO2020119060A1 (en) Method and system for scheduling container resources, server, and computer readable storage medium
US20140173591A1 (en) Differentiated service levels in virtualized computing
CN114244717B (en) Configuration method and device of virtual network card resources, computer equipment and medium
CN103986748A (en) Method and device for achieving servitization
CN109618003B (en) Server planning method, server and storage medium
CN103577251A (en) Event based Internet computing processing system and method
CN105610869B (en) Method and device for scheduling streaming media
CN103067486A (en) Big-data processing method based on platform-as-a-service (PaaS) platform
KR20220141070A (en) Apparatus for container orchestration in geographically distributed multi cloud environment and method using the same
CN113422808B (en) Internet of things platform HTTP information pushing method, system, device and medium
CN114489985A (en) Data processing method, device and storage medium
CN109842497B (en) Configuration updating method and device of DNS (Domain name Server), terminal equipment and configuration updating system
CN111352726A (en) Streaming data processing method and device based on containerized micro-service
CN116775420A (en) Information creation cloud platform resource display and early warning method and system based on Flink flow calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant