CN103838867A - Log processing method and device - Google Patents

Log processing method and device Download PDF

Info

Publication number
CN103838867A
CN103838867A CN 201410106430 CN201410106430A CN103838867A CN 103838867 A CN103838867 A CN 103838867A CN 201410106430 CN201410106430 CN 201410106430 CN 201410106430 A CN201410106430 A CN 201410106430A CN 103838867 A CN103838867 A CN 103838867A
Authority
CN
China
Prior art keywords
log
server
cluster
analysis
data
Prior art date
Application number
CN 201410106430
Other languages
Chinese (zh)
Inventor
洪珂
刘华明
卢荣斌
闵杰
李波
陈燕华
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Priority to CN 201410106430 priority Critical patent/CN103838867A/en
Publication of CN103838867A publication Critical patent/CN103838867A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention discloses a log processing method and device. The log processing method comprises the steps that a cluster server receives log files of user terminals; the cluster server stores the log files; the cluster server analyzes the log files, so that analyzing results are obtained; the cluster server outputs the analysis results. According to the log processing method and device, the effect of improving log processing efficiency is achieved.

Description

日志处理方法和装置 Logging apparatus and processing method

技术领域 FIELD

[0001 ] 本发明涉及日志处理领域,具体而言,涉及一种日志处理方法和装置。 [0001] The present invention relates to a log processing, and in particular, to a logging apparatus and processing method.

背景技术 Background technique

[0002] 现有的日志处理系统通常采用传统数据库作为大数据载体,将非结构化数据或半结构化数据存储于数据表中,这样日志数据的读写较为复杂,且性能较低,伸缩性差,无法适应业务的快速变化。 [0002] Existing systems typically employ a conventional log processing large databases, as the data carrier, the data unstructured or semi-structured data is stored in the data table, such write log data is more complicated and lower performance, poor scalability , unable to adapt to rapid changes in the business. 传统的日志处理系统对海量日志数据的存储和分析的时间很长,且随着日志数据的爆炸式增长,只能一味地靠硬件提高数据处理效率及增加存储量,不仅成本高,处理高维数据的效率也不会提高很多。 The traditional log processing system for a long time to store and analyze vast amounts of log data, and with the explosive growth of log data to rely on hardware to improve the efficiency of data processing and blindly increase the amount of storage, not only the high cost, high-dimensional the efficiency of data will not improve a lot.

[0003] 传统架构无法实现日志处理系统的存储性能的线性扩展,当存储性能压力达到存储的极限时,无法快速有效的提升存储的读写性能。 [0003] linearly extended storage performance can not be achieved in traditional architectures log processing system, when the pressure reaches the limit of memory storage performance can not be quickly and effectively improve the write performance of the memory. 随着日志数据的爆炸式增长,现有的日志处理效率低的问题日益严峻。 With the explosive growth of log data, the low efficiency of the existing log increasingly serious problem.

[0004] 针对现有技术中日志处理效率低的问题,目前尚未提出有效的解决方案。 [0004] For the prior art low log processing efficiency has not yet come up with effective solutions.

发明内容 SUMMARY

[0005] 本发明的主要目的在于提供一种日志处理方法和装置,以解决日志处理效率低的问题。 [0005] The main object of the present invention is to provide a method and apparatus for processing logs to solve the low efficiency of log processing.

[0006] 为了实现上述目的,根据本发明的一个方面,提供了一种日志处理方法。 [0006] To achieve the above object, according to one aspect of the invention, there is provided a method for processing logs. 根据本发明的日志处理方法包括:集群服务器接收用户端的日志文件;集群服务器存储日志文件;集群服务器对日志文件进行分析,得到分析结果;以及集群服务器输出分析结果。 The log processing method of the present invention comprises: the UE receiving cluster server log files; cluster server to store log files; cluster server log file analysis, analysis result; and cluster server outputs the analysis result.

[0007] 进一步地,集群服务器存储日志文件包括:集群服务器将日志文件拆分成日志数据;以及集群服务器将日志数据传送到分布式消息队列中,其中,集群服务器从分布式消息队列中读取日志数据,并对日志数据进行分析。 [0007] Further, the storage cluster server log files comprising: a server cluster split log data into the log file; cluster server and transmitting log data to a distributed message queue, wherein the server cluster is read from the message queue distributed log data, and log data for analysis.

[0008] 进一步地,在集群服务器将日志数据传送到分布式消息队列中之后,日志处理方法还包括:集群服务器从分布式消息队列中读取日志数据;集群服务器对读取的日志数据进行解析,得到解析结果;集群服务器根据解析结果生成日志数据对应的键值对;以及集群服务器通过将键值对存储到分布式数据库中来存储日志文件。 After [0008] Further, in the server cluster transmits the log data to a distributed message queue, log processing method further comprising: reading cluster server log data distributed from the message queue; cluster server log data read parses to obtain analytical results; cluster server generates log data corresponding to the key according to the analysis result; and a trunking server by the key-value pairs stored in the distributed database to store the log file.

[0009] 进一步地,集群服务器对日志文件进行分析包括:集群服务器从分布式数据库中实时获取增量的日志数据;以及集群服务器对增量的日志数据采用流式计算进行统计。 [0009] Further, the cluster server log file analysis comprising: obtaining incremental cluster server log data from a distributed real-time database; and cluster server log increments using data flow statistics calculations.

[0010] 进一步地,集群服务器对日志文件进行分析包括:集群服务器按照预设周期从分布式数据库中获取增量的日志数据;以及集群服务器对增量的日志数据进行统计计算。 [0010] Further, the cluster server log files for analysis include: incremental cluster server log data acquired from the distributed database according to the preset period; and incremental cluster server log data for statistical calculations.

[0011] 为了实现上述目的,根据本发明的另一方面,提供了一种日志处理装置。 [0011] To achieve the above object, according to another aspect of the present invention, there is provided a log processing apparatus. 根据本发明的日志处理装置包括:接收单元,用于使得集群服务器接收用户端的日志文件;存储单元,用于使得集群服务器存储日志文件;分析单元,用于使得集群服务器对日志文件进行分析,得到分析结果;以及输出单元,用于使得集群服务器输出分析结果。 Log processing apparatus according to the present invention comprises: receiving means for receiving the server cluster so that client's log file; a storage unit for storing the log file so that the server cluster; analyzing means for causing the log files to the server cluster analysis, results; and an output unit configured to output the analysis result so that the cluster server.

[0012] 进一步地,存储单元包括:拆分模块,用于使得集群服务器将日志文件拆分成日志数据;以及传送模块,用于使得集群服务器将日志数据传送到分布式消息队列中,其中,集群服务器从分布式消息队列中读取日志数据,并对日志数据进行分析。 [0012] Further, the storage unit comprising: a splitting module, for causing the server to split the cluster into a log file the log data; and a transmission module configured such that the cluster server transmits log data to a distributed message queue, wherein, cluster server reads data from the distributed log message queue, and to analyze the log data.

[0013] 进一步地,存储单元还包括:读取模块,用于在集群服务器将日志数据传送到分布式消息队列中之后,使得集群服务器从分布式消息队列中读取日志数据;解析模块,用于使得集群服务器对读取的日志数据进行解析,得到解析结果;生成模块,用于使得集群服务器根据解析结果生成日志数据对应的键值对;以及存储模块,用于使得集群服务器通过将键值对存储到分布式数据库中来存储日志文件。 [0013] Further, the storage unit further comprises: a reading module configured to transmit the log data to the server cluster to a distributed queue after the message, so that the cluster server reads data from the distributed log message queue; parsing module, with in that the server cluster the read log data is parsed to obtain an analysis result; generating module configured such that the cluster server generates log data corresponding to the key according to the analysis result; and a storage module, for causing the key by the server cluster stored in the distributed database to store log files.

[0014] 进一步地,分析单元包括:第一获取模块,用于使得集群服务器从分布式数据库中实时获取增量的日志数据;以及第一计算模块,用于使得集群服务器对增量的日志数据采用流式计算进行统计。 [0014] Further, the analysis unit comprises: a first acquiring module, for acquiring the server cluster so that the incremental log data from a distributed real-time database; and a first calculating module configured such that a cluster of server log data increments using the calculated flow statistics.

[0015] 进一步地,分析单元包括:第二获取模块,用于使得集群服务器按照预设周期从分布式数据库中获取增量的日志数据;以及第二计算模块,用于使得集群服务器对增量的日志数据进行统计计算。 [0015] Further, the analysis unit comprises: a second acquiring module, for acquiring the server cluster so that the incremental log data from the distributed database according to a preset period; and a second calculating module, such that the cluster server for incremental log data for statistical calculations.

[0016] 通过本发明,采用集群服务器来存储和分析的分类处理来达到海量日志处理的高效能,实现了海量日志分析,解决了现有技术中日志处理效率低的问题,达到了提高日志处理效率的效果。 [0016] By the present invention, a classification processing cluster server to store and analyze the massive log to achieve high performance processing to achieve a massive log analysis, solves the prior art is low efficiency of log processing, log processing to achieve improved the effect of efficiency.

附图说明 BRIEF DESCRIPTION

[0017] 构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。 [0017] The drawings constitute a part of this application are intended to provide further understanding of the invention, exemplary embodiments of the present invention are used to explain the present invention without unduly limiting the present invention. 在附图中: In the drawings:

[0018] 图1是根据本发明实施例的日志处理方法的流程图; [0018] FIG. 1 is a flowchart illustrating a log processing method according to an embodiment of the present invention;

[0019] 图2是根据本发明实施例一种优选的日志处理方法的流程图; [0019] FIG 2 is a flowchart illustrating a log processing method according to a preferred embodiment of the present embodiment of the invention;

[0020] 图3是根据本发明实施例的日志处理装置的示意图;以及 [0020] FIG. 3 is a schematic view of log processing apparatus according to an embodiment of the present invention; and

[0021] 图4是根据本发明实施例的一种优选的日志处理装置的示意图。 [0021] FIG. 4 is a schematic view of a preferred embodiment of the log of the processing apparatus of the embodiment according to the present invention.

具体实施方式 Detailed ways

[0022] 需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。 [0022] Incidentally, in the case of no conflict, embodiments and features of the embodiments of the present application can be combined with each other. 下面将参考附图并结合实施例来详细说明本发明。 Below with reference to accompanying drawings and embodiments of the present invention will be described in detail.

[0023] 为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。 [0023] In order to make those skilled in the art a better understanding of the invention, in conjunction with the following drawings of the present invention embodiments, the technical solutions of the embodiments of the present invention will be clearly and completely described, obviously, the described Example embodiments are merely part of embodiments of the present invention rather than all embodiments. 基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。 Based on the embodiments of the present invention, all other embodiments of ordinary skill in the art without creative efforts shall be made available, should fall within the scope of the present invention.

[0024] 需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。 [0024] Incidentally, the above-mentioned book and in the figures the terms "first," "second," and the like are used for distinguishing between similar objects, and not necessarily for describing a particular sequential or claimed in the specification and claims of the present invention priorities. 应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例。 It should be understood that this embodiment of the data used are interchangeable under appropriate circumstances, in order to describe the invention herein. 此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。 Furthermore, the terms "including" and "having," as well as any of their deformation, intended to cover non-exclusive inclusion, for example, comprising a series of steps or the process unit, the method, system, or apparatus is not necessarily limited expressly listed those steps or elements, but may include or not inherent to other steps or units for such process, method, article, or apparatus expressly listed. [0025] 本发明实施例还提供了一种日志处理方法。 Example [0025] The present invention also provides a method for processing logs. 该方法运行在计算机设备上。 The method runs on a computer device.

[0026] 图1是根据本发明实施例的日志处理方法的流程图。 [0026] FIG. 1 is a flowchart illustrating a log processing method according to an embodiment of the present invention. 如图1所示,该日志处理方法包括步骤如下: As shown in FIG. 1, the log processing method comprising the steps of:

[0027] 步骤S102,集群服务器接收用户端的日志文件。 [0027] step S102, the UE receiving cluster server log files.

[0028] 用户端可以是需要采集日志的服务器,也可以是用户那一侧需要采集日志的客户端。 [0028] The client may be a need to collect the log server, the user side may be a need for acquisition of log client. 例如,用户通过一台服务器对应客户端,不同的客户端分别运行各自的业务,客户端会产生日志。 For example, a user corresponding to a server by a client, different clients each running their business, client, logs are generated. 同时,服务器在为个客户端提供后台服务,服务器在运行过程中也会产生一些日志。 Meanwhile, the server providing back-office services for the client, the server will generate some logs during operation. 集群服务器可以接收服务器或者客户端的发送过来的日志文件,用于对日志文件进行处理。 Cluster server or client may receive the server log files sent from the terminal, for processing a log file. 集群服务器可以同时接收多个用户端的日志文件,对不同用户端的日志文件分别进行处理。 Cluster server may receive a plurality of clients simultaneously log file, the log files to different end users will be treated separately.

[0029] 本发明实施例中,可以在需要采集日志的用户端设置或者搭载一个代理模块,用于定时采集日志文件,发送到集群服务器。 [0029] Example embodiments of the present invention, the log can be collected or disposed of mounting a client proxy means for timing acquisition log file is sent to the server in the cluster. 用户端通过HTTP协议发送请求及其对应的日志文件,集群服务器响应请求后,通过提供的服务接口接收日志文件,以便于将日志文件存储在集群服务器上。 The client sends a request through the HTTP protocol and the corresponding log file, the cluster server responds to the request by receiving a log file provides service interface to facilitate the log files are stored on a server in the cluster.

[0030] 步骤S104,集群服务器存储日志文件。 [0030] step S104, clustered servers store log files.

[0031] 在接收到用户端的日志文件之后,可以将日志文件存储到集群服务器。 [0031] Upon receipt of the client's log file, the log file can be stored to the cluster server.

[0032] 具体地,存储日志文件可以是先将日志文件拆分成多行日志数据,然后将多行日志数据依次传送至分布式消息队列中,例如kafka消息队列,以便于集群服务器从分布式消息队列中读取日志数据进行分析。 [0032] Specifically, the log file stored in the log file may be split into a plurality of first-line log data, and then sequentially transferred to the plurality of data lines distributed log message queue, the message queue kafka e.g., in order from the distributed server cluster message queue read log data analysis. 在将日志数据依次传送至分布式消息队列之后,集群服务器还可以从分布式消息队列中读取日志数据,对读取的日志数据进行解析,并生成键值对(key-value)的形式存储在分布式数据库中。 After the log data distributed sequentially transferred to the message queue, the server cluster can also be read from the message queue distributed log data, log data read is parsed and stored to generate the key (key-value) of in a distributed database. 在存储日志文件的同时,可以获取日志文件的描述信息(如日志文件的路径、创建时间等),存放在集群服务器的数据库中。 While log files are stored, you can obtain a description of the log file (such as the path of the log file creation time, etc.), stored in the database cluster server.

[0033] 步骤S106,集群服务器对日志文件进行分析,得到分析结果。 [0033] step S106, the cluster server log file analysis, the analysis results.

[0034] 当用户端将日志文件传输到集群服务器之后,用户可以访问集群服务器,查询集群服务器对日志文件的分析结果。 [0034] When the user ends the log file to the server cluster after cluster user can access the server, query server cluster analysis of log files. 例如,通过日志分析,可以得到用户端业务的运行状况或者故障状况。 For example, the log analysis, the UE can obtain the health services or a fault condition. 对日志文件进行分析可以是对日志文件中的信息进行统计,得到统计结果。 Log files can be analyzed for information in the log file statistics, statistical results obtained.

[0035] 由于用户对日志文件的分析结果的查询情况的不同,根据查询要求的及时性可以将日志的分析分为实时分析和离线分析。 [0035] Due to the different circumstances of the query results to a user log file, in accordance with the requirements of timeliness query log analysis can be divided into real-time analysis and off-line analysis. 实时分析通常要求在数秒内返回上亿行日志数据的分析,才能达到不影响用户查询分析结果的目的。 Real-time analysis is usually asked to return to the analysis of millions of lines of log data in seconds, without affecting the user in order to achieve the purpose of the query results. 对日志数据进行实时统计,这部分日志数据量一般不会太大,可以通过流式计算来统计分析,结果暂存数据库例如redis数据库中,处理后再对分析结果进行存储。 Real-time statistics log data, the data amount of the log that is generally not too large, may be calculated by statistical analysis of flow, e.g. scratchpad database redis database, the process then stores the analysis results.

[0036] 离线分析对统计数据的及时性要求不高,可以隔天或者隔月分析结果进行展示。 [0036] The offline analysis of statistical data less demanding in time and to be the next day or every other month analysis result display. 把解析后的日志数据先存放在分布式数据库如Hbase数据库中,事先根据业务逻辑要求写好任务job,按预设周期定时跑任务来计算统计分析日志。 The parsed data log stored in the first database in a distributed database as Hbase, prior written job tasks based on business logic requirements, the timing of a preset period to calculate the statistical analysis tasks running log.

[0037] 步骤S108,集群服务器输出分析结果。 [0037] Step S108, the trunking server outputs the analysis result.

[0038] 输出分析结果可以是将分析结果输出给相应的用户端,在用户端可以通过网页或者应用程序对分析结果进行展示,以便于工作人员进行查看。 [0038] outputs the analysis result may be output to the analysis result to the corresponding user terminal may be performed through a web client or an application on the analysis results show, for the staff to see.

[0039] 本发明实施例中,集群服务器中多个服务器用于接收日志文件,多个服务器用于存储日志文件,以及多个服务器用于分析日志文件,本发明实施例将复杂的运算均分配到各台服务器,实现了整个系统的高并发能力,处理能力可以达到传统架构的10倍以上。 [0039] embodiment, a plurality of servers in the server cluster is configured to receive a log file, the log file for storing a plurality of servers, and a plurality of servers for log file analysis embodiment of the present invention, embodiments of the present invention, the complicated operation is assigned to each server, a highly concurrent capacity of the entire system, the processing capacity can reach more than 10 times the traditional architecture. 通过集群服务器来存储和分析的分类处理来达到海量日志处理的高效能,实现了海量日志分析,解决了现有技术中日志处理效率低的问题,达到了提高日志处理效率的效果。 By classification processing cluster server to store and analyze massive log to achieve high-performance processing, to achieve a massive log analysis, to solve the prior art low log processing efficiency, to achieve the effect of improving the processing efficiency of the log.

[0040] 本发明实施例可以是采用云计算原理,对日志文件进行处理。 Example [0040] The present invention may employ the principle of the cloud, the log file handling. 其中,云计算(cloudcomputing)是基于互联网的相关服务的增加、使用和交付模式,通常涉及通过互联网来提供动态易扩展且经常是虚拟化的资源。 Among them, cloud computing (cloudcomputing) is based on the increase of Internet-related services, use and delivery models, usually involving providing dynamic and scalable and often virtualized resources via the Internet. 云是网络、互联网的一种比喻说法。 Cloud is a network, an Internet metaphor. 过去在图中往往用云来表示电信网,后来也用来表示互联网和底层基础设施的抽象。 In view of the past is often a cloud to represent the telecommunications network, it was also used to represent the abstract and the underlying infrastructure of the Internet. 狭义云计算指IT基础设施的交付和使用模式,指通过网络以按需、易扩展的方式获得所需资源;广义云计算指服务的交付和使用模式,指通过网络以按需、易扩展的方式获得所需服务。 Cloud computing refers to narrow the delivery of IT infrastructure and usage patterns, refers to the network in order to demand, and scalable way to obtain the necessary resources; generalized cloud computing refers to the delivery of services and usage patterns, refers to on-demand through the network, easy to expand way to get needed services. 这种服务可以是IT和软件、互联网相关,也可是其他服务。 This service can be IT and software, Internet-related, but also other services. 它意味着计算能力也可作为一种商品通过互联网进行流通。 It means that computing power can also be used as a commodity circulation via the Internet. 云计算作为一种新兴的技术理念,其提供的云存储(海量数据分布存储技术)、云计算(hadoop的map reduce、流式实时计算)、云安全等很适用于大数据存储、挖掘、分析、预警、统计等需求,且其高效的性能让数据处理的及时和准确得到保障。 Cloud computing as an emerging technology concepts, it provides cloud storage (mass distribution of data storage technology), cloud computing (hadoop of map reduce, streaming real-time computing), cloud security is very suitable for large data storage, mining, analysis , warning, statistics and other needs, and its efficient performance so that timely and accurate data processing guaranteed. 基于云计算平台的原理,进行前期日志数据存储的选择和根据数据量和查询实时性的要求做了分类处理,最主要的是做到了一个业务任务分析的并行处理,而不是的多任务的并行处理,大大提升了查询效率和统计结果的正确性。 Based on the principle of cloud computing platform, parallel pre-log data storage options and do a query based on the amount of data and real-time requirements of the classification process, the most important is to do parallel processing of a business task analysis, rather than the multi-tasking processing, greatly enhance the efficiency of queries and statistical accuracy of the results.

[0041] 本发明实施例的目的在于解决海量日志的云存储,以及海量日志能够得到及时分析和深入分析挖掘的云计算服务,并且保证日志数据的安全性、准确性。 [0041] The object of the embodiment of the present invention is to solve the embodiment massive log cloud storage, and massive log analysis can be timely and in-depth analysis mining cloud services, and log data of the safety and accuracy. 同时解决了日志量的增长只要通过新的计算节点来解决,而无需只是一味地靠硬件提高数据处理效率及增加 While addressing the log volume growth as long as addressed by the new compute nodes, without just blindly rely on hardware to improve efficiency and increase data processing

存储量。 Storage capacity.

[0042] 优选地,集群服务器存储日志文件的步骤包括以下步骤: [0042] Preferably, the step of storing cluster server log files comprising the steps of:

[0043] 步骤SI,集群服务器将日志文件拆分成日志数据。 [0043] Step SI, cluster server log file to split the log data.

[0044] 由于不同用户端的日志文件的格式各不相同,而每个日志文件中包含有多个日志记录,将日志文件拆分成日志数据可以是将日志文件拆分成多行日志数据,形成数据行,以便于将不通过格式的日志文件拆分成日志数据传送至分布式消息列中。 [0044] Due to the different formats of the client log file different, and each log file contains a plurality of log records, the log data into the log file splitting can be split into multiple log files log data lines, are formed data lines, so as not to split the log into the log file format data to the distributed message column.

[0045] 步骤S2,集群服务器将日志数据传送到分布式消息队列中。 [0045] Step S2, the cluster server transmits the log data to a distributed message queue. 其中,集群服务器从分布式消息队列中读取日志数据,并对日志数据进行分析。 Wherein a cluster server reads data from the distributed log message queue, and to analyze the log data.

[0046] 分布式消息队列可以是kafka消息队列,kafka的分布式消息队列比较适合简单的消息传递和分发,能支持大数据量,尤其是日志数据,而且与mapreduce结合做实时分析也能达到很好的效果。 [0046] distributed message queue can be kafka message queue, message queue kafka distributed more suitable for simple messaging and distribution, can support large amounts of data, especially the log data, and combined with mapreduce do real-time analysis can achieve very Good results.

[0047] 优选地,在集群服务器将日志数据传送到分布式消息队列中的步骤之后,日志处理方法还包括:集群服务器从分布式消息队列中读取日志数据;集群服务器对读取的日志数据进行解析,得到解析结果;集群服务器根据解析结果生成日志数据对应的键值对;以及集群服务器通过将键值对存储到分布式数据库中来存储日志文件。 After [0047] Preferably, the cluster server transmits the log data to the distributed message queue step, log processing method further comprising: reading cluster server log data distributed from the message queue; cluster server log data read analyzing, to obtain an analysis result; cluster server generates log data corresponding to the key according to the analysis result; cluster server and stored by the key stored in the log files in a distributed database.

[0048] 具体地,从分布式消息队列中读取日志数据,对每条日志数据进行解析,解析得到日志的关键字,例如mac地址、流量、具体应用等,基于这些解析结果生成日志数据对应的键值对,如利用mac地址为key,其他的解析结果为value,然后得到日志数据的键值对,然后把日志数据映射存储到分布式数据库如hbase数据库中。 [0048] Specifically, the distributed read from the message queue of log data, log data for each parsing the keyword analyzing logs obtained, e.g. mac address, traffic, and other specific applications, based on these analysis results to generate log data corresponding to key-value pairs, such as the use of mac address key, another analysis result is value, which then get the key of the log data and the log data stored in the map database in a distributed database as hbase.

[0049] 本发明实施例,利用分布式数据库hbase存储日志解析的数据,由于hbase数据库是基于key-value的数据存储模式,扩展性好,从hbase取数进行分析速度够快,而且结果可以任意存储,要继续存储hbase、关系型数据或者redis均可,不会有不兼容的情况出现。 [0049] The embodiments of the present invention, using data stored in the distributed database hbase log analysis, since hbase database is based on the key-value data storage mode, scalability, taken from analysis hbase number fast enough, and the results may be arbitrarily storage, to continue to store hbase, relational data or redis available, there will be no incompatibilities arise.

[0050] 优选地,集群服务器对日志文件进行分析包括:集群服务器从分布式数据库中实时获取增量的日志数据;以及集群服务器对增量的日志数据采用流式计算进行统计。 [0050] Preferably, the cluster server log file analysis comprising: obtaining incremental cluster server log data from a distributed real-time database; and cluster server log increments using data flow statistics calculations.

[0051] 由于日志文件的不断累加,存储在分布式数据库中的日志数据也不断增加,本发明实施例中的实时分析可以是集群服务器实时从分布式数据库中实时获取增量的日志数据,对增量的日志数据进行计算统计,避免对已经计算过的日志数据进行重复计算。 [0051] due to the constant accumulation of the log file, the log data stored in the distributed database is also increased, real-time analysis in the embodiment of the present invention may be acquired in real time incremental cluster server from the distributed database and real-time log data, incremental log data to calculate statistics have been calculated to avoid double-counting of log data. 实时获取增量的日志数据,采用流式计算对增量的数据进行统计。 Log data acquired in real time increment, the use of flow calculation of the incremental data statistics. 其中,流式计算是采用storm的bolt来完成,bolt中自带过滤、聚合、查询数据库等一系列操作,其中,过滤操作可以在前期的parse分析中完成,以DB表的形式存放在hbase中,只在流式计算中做了map映射把需要的数据组织起来进行聚合计算分析。 Wherein the flow calculations are done using the bolt storm, Bolt comes filtering, aggregation, query the database and a series of operations, in which the filter can be done in the pre-parse analysis, in the form of a table stored in the DB in hbase only made in the flow map mapping calculations need to organize data to calculate and analyze the polymerization.

[0052] 具体地,首先,从kafka队列中取出日志数据经过parse解析存放在hbase中,此过程将日志记录进行拆分,映射成DB表的形式存放在hbase中。 [0052] Specifically, first, removed from the queue kafka parse parse through log data stored in hbase, this process logging split, mapped to DB table form stored in the hbase. 然后,采用流式计算来进行实时分析统计,流式计算是采用storm的bolt来完成,bolt中自带过滤、聚合、查询数据库等一系列操作,其中,过滤操作可以在前期的parse分析中完成,以DB表的形式存放在hbase中,只在流式计算中做了map映射把需要的数据组织起来进行聚合计算分析。 Then, using flow for real-time analysis of statistical calculations, flow calculations are done using the bolt storm, Bolt comes filtering, aggregation, a series of queries and other database operations, wherein the filter can be done early in the analysis of parse in the form stored in a DB table hbase, only made in the flow map mapping calculations required to organize data calculation and analysis polymerization. 接着将流式计算统计完的结果存放在数据库如redis数据库中。 The flow is then calculated statistical results End redis stored in the database as the database. 最后,把存储在redis的结果数据依据实际需要存放在hbase数据库,或者关系型数据库mysql中,供用户查询这些统计数据。 Finally, the results are stored in the data based on the actual needs of redis hbase stored in the database, or relational database mysql, the user queries for these statistics.

[0053] 上述实施例描述了日志分析中的实时分析的一个流程,根据实时分析流程处理海量日志的实时分析,瞬间把结果反馈给客户,提高日志分析结果的及时性。 [0053] The above embodiments describe the flow of a real-time analysis of log analysis, real-time analysis of real-time massive log analysis process, the results instant feedback to the customer, to improve the timeliness of the results of the analysis of the log.

[0054] 优选地,集群服务器对日志文件进行分析包括:集群服务器按照预设周期从分布式数据库中获取增量的日志数据;以及集群服务器对增量的日志数据进行统计计算。 [0054] Preferably, the cluster server log file analysis comprising: obtaining incremental cluster server log data from the distributed database according to a preset period; and cluster server incremental log data statistical calculations.

[0055] 由于用户对日志文件的分析结果的查询情况的不同可以采用离线分析的方式对日志数据进行分析处理。 [0055] Depending on the analysis result of check on the status of the user log files embodiment offline analysis of log data analyzing process may be employed. 可以预先设置分析的周期即预设周期,预设周期可以根据需要进行设置,例如一个星期或者一个月等。 May be preset analysis period, i.e., a preset period, the preset period may be set according to needs, and the like, for example, one week or a month. 按照预设周期从分布式数据库中获取增量的日志数据,在对增量的日志数据进行统计计算。 Incremental log data acquired from the distributed database according to a preset period, calculated in increments of log data statistics.

[0056] 具体地,可以通过以下步骤实现: [0056] In particular, can be achieved by the following steps:

[0057] 步骤一,从kafka队列中取出日志数据经过parse解析存放在hbase中,此过程将日志记录进行拆分,映射成DB表的形式存放在hbase中。 [0057] Step a, is removed from the queue kafka parse parse through log data stored in hbase, this process logging split, mapped to DB table form stored in the hbase.

[0058] 步骤二,根据具体需要创建一个个的作业任务,任务逻辑根据实际的业务逻辑而定。 [0058] Step two, depending on the need to create a job of a task, the task logic based on the actual business logic.

[0059] 步骤三,创建周期性调度Task,就是设置周期性的调度作业任务,比如预先创建任务1,每天零点跑任务I。 [0059] Step three, create a recurring schedule Task, provided is a periodic task scheduling operations, such as pre-created task 1, task zero run day I.

[0060] 步骤四,到达的调度时间,依据调度内容启动任务。 [0060] Step Four, scheduled arrival time, according to the task start schedule content.

[0061] 步骤五,执行具体的任务逻辑计算统计日志数据。 [0061] Step 5 calculates statistics log data perform particular tasks logic.

[0062] 步骤六,如果任务执行失败,则通过预先设置的通知模块以短信或者mail的方式通知相关用户,用户在手动排查原因后重启作业任务。 [0062] Step six, if the task fails, the notification module is set in advance by SMS or notify the user by way of mail, after the user manually reboot job task troubleshooting reasons.

[0063] 步骤七,任务执行成功后,把执行结果存放在hbase数据库中,方便用户查询。 [0063] Step seven, the task is executed successfully, the results of the implementation hbase stored in the database, user queries. [0064] 步骤八,任务执行成功并且把结果存放在hbase数据库后,可以通过通知模块以短信或者mail的方式通知用户,任务执行成功。 [0064] Step Eight, the successful implementation of the task and the results stored in the database after hbase, SMS or mail can notify users by way of notification module, perform the task successfully.

[0065] 上述实施例描述了日志分析中的离线分析的一个流程,根据这样的离线分析流程并行处理海量日志的离线分析,并且把结果上报给前端供用户展示。 [0065] The embodiment describes a process off-line analysis of log analysis, off-line parallel processing massive log analysis off-line analysis in accordance with this process, and the result is reported to the user front end for display.

[0066] 图2是根据本发明实施例一种优选的日志处理方法的流程图。 [0066] FIG 2 is a flowchart illustrating a log processing method according to a preferred embodiment of the present embodiment of the invention. 如图2所示,该日志处理方法包括步骤如下: The log processing method shown in FIG. 2 comprising the steps of:

[0067] 步骤202,提取用户端的日志文件。 [0067] Step 202, the UE extracts a log file. 提取日志文件可以是提取预设关键字相关的日志文件。 The log file can be extracted to extract the default log files related keywords. 通过设计一个脚本类型的agent代理模块,将其搭建在用户端的服务器上,基于业务需要每隔一定时间采集需要的日志。 By designing the proxy agent a script-type module, it was built on the client server, based on the business needs log acquisition intervals required. 提取用户端的日志文件之后,可以将提取的日志文件推送到集群服务器。 After extracting the client log file, the log file can be extracted to the push server cluster.

[0068] 步骤204,将推送出的日志文件存储在集群服务器。 [0068] Step 204, the push delivery server log files are stored in the cluster. 集群服务器上存储日志文件包括:首先是存储日志文件,其次是把日志的描述文件(包括日志存放的路径、大小、时间等)存储在redis中。 Server stores the log file on the cluster comprising: a first log file is stored, followed by the description of the log file (including log storage path, size, time, etc.) stored in the redis.

[0069] 步骤206,集群服务器读取日志数据,把日志数据传送到分布式消息队列中。 [0069] Step 206, the trunking server reads the log data, transmits the log data to a distributed message queue.

[0070] 步骤208,集群服务器从分布式消息队列中读取日志数据,并对日志数据进行解析。 [0070] Step 208, the trunking server log data read from the distributed message queue, and the log data analyzing. 先进行日志解析,把有用的数据解析出来,解析后的数据存储在hbase相应的表字段中。 First log analysis performed, the useful data parsed, the parsed data is stored in the corresponding field in the table hbase.

[0071] 步骤210,读取日志解析后的数据进行分析,得到分析结果。 [0071] Step 210, after reading the log analysis data are analyzed to obtain analysis results. 对解析后的数据可以采取实时分析和离线分析两种方式。 After parsing the data can be taken off-line analysis and real-time analysis in two ways.

[0072] 步骤212,把分析结果通过展示在用户端。 [0072] Step 212, the analysis result by displaying on the user side. 可以是通过Thrift以网页或者手机APP的形式来展现。 It may be in the form of web pages or mobile phone APP to show through Thrift.

[0073] 上述实施例描述了一个日志从采集到分析最后到结果展示整个一个流程,通过集群服务器来存储和分析的分类处理来达到海量日志处理的高效能,实现了海量日志分析。 [0073] The above embodiment describes an analysis of the last log collected from the results demonstrate an entire process, the classification process by clustered servers for storage and analysis to achieve high performance processing massive log, log analysis achieved massive.

[0074] 下面通过本发明实施例的日志处理方法的一个应用场景来详细描述本发明。 A log application scenario processing method [0074] The following embodiments of the present invention to embodiments of the present invention is described in detail.

[0075] 对于聚合视频流量日志的处理过程包括:首先,采集聚合视频流量日志。 [0075] The polymerization process video traffic log comprising: first acquisition polymerization video traffic log. 然后,集群服务器把采集到的流量日志拆分成日志数据行传送到kfaka队列中。 Then, the cluster server to collect the flow is split into a log log data lines to transmit kfaka queue.

[0076] 在流量日志传送到kfaka队列中之后,集群服务器从kfaka队列中依次读取日志数据,对每条日志进行解析,解析成一些关键字,比如mac地址、流量、具体应用等。 After the [0076] transmission queue into kfaka in the traffic log, the server cluster is read from the queue sequentially kfaka log data, parsing of each log, parsed into several keywords, such as the mac address, traffic, and other specific applications.

[0077] 集群服务器解析后的结果,会形成日志数据对应的key-value的模式,如利用mac为key,其余为value,把日志数据映射存储到hbase中。 [0077] The analytical result of server clusters, the formation of the log data corresponding to the key-value model, such as the use of mac key, the remaining value, given the log data stored in the map in hbase.

[0078] 然后可以根据需要,采用实时分析或者离线分析的方式对日志文件进行分析统计。 [0078] and then as needed, by way of real-time analysis or off-line analysis of log files to analyze statistics. 其中,离线分析可以是每2H作为一次调度周期,调度时刻一到启动事先设计好的任务,增量计算这2H的流量情况并且更新每月的流量记录。 Wherein each 2H offline analysis may be used as a scheduling period to the scheduled start time of a task of pre-designed, this increment calculation 2H traffic case record and updates the monthly flow. 同时告知用户任务的执行情况。 And inform the implementation of user tasks.

[0079] 实时分析可根据查询指令,迅速查询上一次任务跑完到查询点的流量信息,并把实时查询的结果和上一次任务跑完的统计数据作为实际的流量数据反馈给用户。 [0079] Real-time analysis can be based on the query command, rapid query on a mission to finish the query point traffic information, real-time query and the results of the last task and finish of statistical data feedback as the actual flow rate data to the user.

[0080] 最后,将分析结果界面展示给用户。 [0080] Finally, the results interface to the user.

[0081] 基于云计算平台的原理,进行前期数据存储的选择和根据数据量和查询实时性的要求做了分类处理,最主要的是做到了一个业务任务分析的并行处理,而不是的多任务的并行处理,大大提升了查询效率和统计结果的正确性。 Multitasking [0081] Based on the principle of cloud computing platform, a pre-selected data storage and real-time requirements based on the amount of data and do a query classification process, the most important is to do parallel processing of a business task analysis, rather than the parallel processing, greatly enhance the query efficiency and statistical accuracy of the results. [0082] 本发明实施例提供了一种日志处理装置,该装置可以通过集群服务器实现其功能。 Example embodiments provide a log processing apparatus [0082] according to the present invention, the apparatus may be implemented by the function server cluster. 需要说明的是,本发明实施例的日志处理装置可以用于执行本发明实施例所提供的日志处理方法,本发明实施例的日志处理方法也可以通过本发明实施例所提供的日志处理装置来执行。 Incidentally, log processing apparatus of the present invention may be used to perform embodiments of the log processing method provided by embodiments of the present invention, a log processing method according to embodiments of the present invention may also log processing apparatus provided by the present invention may be practiced by carried out.

[0083] 图3是根据本发明实施例的日志处理装置的示意图。 [0083] FIG. 3 is a schematic view of log processing apparatus according to an embodiment of the present invention. 如图3所示,该日志处理装置包括接收单元10、存储单元30、分析单元50和输出单元70。 As shown in FIG. 3, the log processing apparatus includes a receiving unit 10, a storage unit 30, analysis unit 50 and output unit 70.

[0084] 接收单元10用于使得集群服务器接收用户端的日志文件。 [0084] The receiving unit 10 for receiving the server cluster so that client's log file.

[0085] 用户端可以是需要采集日志的服务器,也可以是用户那一侧需要采集日志的客户端。 [0085] The client may be a need to collect the log server, the user side may be a need for acquisition of log client. 例如,用户通过一台服务器对应客户端,不同的客户端分别运行各自的业务,客户端会产生日志。 For example, a user corresponding to a server by a client, different clients each running their business, client, logs are generated. 同时,服务器在为个客户端提供后台服务,服务器在运行过程中也会产生一些日志。 Meanwhile, the server providing back-office services for the client, the server will generate some logs during operation. 集群服务器可以接收服务器或者客户端的发送过来的日志文件,用于对日志文件进行处理。 Cluster server or client may receive the server log files sent from the terminal, for processing a log file. 集群服务器可以同时接收多个用户端的日志文件,对不同用户端的日志文件分别进行处理。 Cluster server may receive a plurality of clients simultaneously log file, the log files to different end users will be treated separately.

[0086] 本发明实施例中,可以在需要采集日志的用户端设置或者搭载一个代理模块,用于定时采集日志文件,发送到集群服务器。 Embodiment [0086] of the present invention may be provided or mounted a proxy client module logs to be collected for timing acquisition log file is sent to the server cluster. 用户端通过HTTP协议发送请求及其对应的日志文件,集群服务器响应请求后,通过提供的服务接口接收日志文件,以便于将日志文件存储在集群服务器上。 The client sends a request through the HTTP protocol and the corresponding log file, the cluster server responds to the request by receiving a log file provides service interface to facilitate the log files are stored on a server in the cluster.

[0087] 存储单元30用于使得集群服务器存储日志文件。 [0087] The storage unit 30 for storing the log file so that the cluster server.

[0088] 在接收到用户端的日志文件之后,可以将日志文件存储到集群服务器。 [0088] Upon receipt of the client's log file, the log file can be stored to the cluster server.

[0089] 具体地,存储日志文件可以是先将日志文件拆分成多行日志数据,然后将多行日志数据依次传送至分布式消息队列中,例如kafka消息队列,以便于集群服务器从分布式消息队列中读取日志数据进行分析。 [0089] Specifically, the log file stored in the log file may be split into a plurality of first-line log data, and then sequentially transferred to the plurality of data lines distributed log message queue, the message queue kafka e.g., in order from the distributed server cluster message queue read log data analysis. 在将日志数据依次传送至分布式消息队列之后,集群服务器还可以从分布式消息队列中读取日志数据,对读取的日志数据进行解析,并生成键值对(key-value)的形式存储在分布式数据库中。 After the log data distributed sequentially transferred to the message queue, the server cluster can also be read from the message queue distributed log data, log data read is parsed and stored to generate the key (key-value) of in a distributed database. 在存储日志文件的同时,可以获取日志文件的描述信息(如日志文件的路径、创建时间等),存放在集群服务器的数据库中。 While log files are stored, you can obtain a description of the log file (such as the path of the log file creation time, etc.), stored in the database cluster server.

[0090] 分析单元50用于使得集群服务器对日志文件进行分析,得到分析结果。 [0090] The analysis unit 50 for causing the server cluster analysis of log files to obtain analysis results.

[0091] 当用户端将日志文件传输到集群服务器之后,用户可以访问集群服务器,查询集群服务器对日志文件的分析结果。 [0091] When the user ends the log file to the server cluster after cluster user can access the server, query server cluster analysis of log files. 例如,通过日志分析,可以得到用户端业务的运行状况或者故障状况。 For example, the log analysis, the UE can obtain the health services or a fault condition. 对日志文件进行分析可以是对日志文件中的信息进行统计,得到统计结果。 Log files can be analyzed for information in the log file statistics, statistical results obtained.

[0092] 由于用户对日志文件的分析结果的查询情况的不同,根据查询要求的及时性可以将日志的分析分为实时分析和离线分析。 [0092] Due to the different circumstances of the query results to a user log file, in accordance with the requirements of timeliness query log analysis can be divided into real-time analysis and off-line analysis. 实时分析通常要求在数秒内返回上亿行日志数据的分析,才能达到不影响用户查询分析结果的目的。 Real-time analysis is usually asked to return to the analysis of millions of lines of log data in seconds, without affecting the user in order to achieve the purpose of the query results. 对日志数据进行实时统计,这部分日志数据量一般不会太大,可以通过流式计算来统计分析,结果暂存数据库例如redis数据库中,处理后再对分析结果进行存储。 Real-time statistics log data, the data amount of the log that is generally not too large, may be calculated by statistical analysis of flow, e.g. scratchpad database redis database, the process then stores the analysis results.

[0093] 离线分析对统计数据的及时性要求不高,可以隔天或者隔月分析结果进行展示。 [0093] The offline analysis of statistical data less demanding in time and to be the next day or every other month analysis result display. 把解析后的日志数据先存放在分布式数据库如Hbase数据库中,事先根据业务逻辑要求写好任务job,按预设周期定时跑任务来计算统计分析日志。 The parsed data log stored in the first database in a distributed database as Hbase, prior written job tasks based on business logic requirements, the timing of a preset period to calculate the statistical analysis tasks running log.

[0094] 输出单元70用于使得集群服务器输出分析结果。 [0094] The output unit 70 for outputting the analysis result so that the server cluster.

[0095] 输出分析结果可以是将分析结果输出给相应的用户端,在用户端可以通过网页或者应用程序对分析结果进行展示,以便于工作人员进行查看。 [0095] outputs the analysis result may be output to the analysis result to the corresponding user terminal may be performed through a web client or an application on the analysis results show, for the staff to see.

[0096] 本发明实施例中,集群服务器中多个服务器用于接收日志文件,多个服务器用于存储日志文件,以及多个服务器用于分析日志文件,本发明实施例将复杂的运算均分配到各台服务器,实现了整个系统的高并发能力,处理能力可以达到传统架构的10倍以上。 [0096] embodiment, a plurality of servers in the server cluster is configured to receive a log file, the log file for storing a plurality of servers, and a plurality of servers for log file analysis embodiment of the present invention, embodiments of the present invention, the complicated operation is assigned to each server, a highly concurrent capacity of the entire system, the processing capacity can reach more than 10 times the traditional architecture. 通过集群服务器来存储和分析的分类处理来达到海量日志处理的高效能,实现了海量日志分析,解决了现有技术中日志处理效率低的问题,达到了提高日志处理效率的效果。 By classification processing cluster server to store and analyze massive log to achieve high-performance processing, to achieve a massive log analysis, to solve the prior art low log processing efficiency, to achieve the effect of improving the processing efficiency of the log.

[0097] 本发明实施例可以是采用云计算原理,对日志文件进行处理。 Example [0097] The present invention may employ the principle of the cloud, the log file handling. 其中,云计算(cloudcomputing)是基于互联网的相关服务的增力卩、使用和交付模式,通常涉及通过互联网来提供动态易扩展且经常是虚拟化的资源。 Among them, cloud computing (cloudcomputing) is based on boosting Internet-related services Jie, use and delivery models, usually involving providing dynamic and scalable and often virtualized resources via the Internet. 云是网络、互联网的一种比喻说法。 Cloud is a network, an Internet metaphor. 过去在图中往往用云来表示电信网,后来也用来表示互联网和底层基础设施的抽象。 In view of the past is often a cloud to represent the telecommunications network, it was also used to represent the abstract and the underlying infrastructure of the Internet. 狭义云计算指IT基础设施的交付和使用模式,指通过网络以按需、易扩展的方式获得所需资源;广义云计算指服务的交付和使用模式,指通过网络以按需、易扩展的方式获得所需服务。 Cloud computing refers to narrow the delivery of IT infrastructure and usage patterns, refers to the network in order to demand, and scalable way to obtain the necessary resources; generalized cloud computing refers to the delivery of services and usage patterns, refers to on-demand through the network, easy to expand way to get needed services. 这种服务可以是IT和软件、互联网相关,也可是其他服务。 This service can be IT and software, Internet-related, but also other services. 它意味着计算能力也可作为一种商品通过互联网进行流通。 It means that computing power can also be used as a commodity circulation via the Internet. 云计算作为一种新兴的技术理念,其提供的云存储(海量数据分布存储技术)、云计算(hadoop的map reduce、流式实时计算)、云安全等很适用于大数据存储、挖掘、分析、预警、统计等需求,且其高效的性能让数据处理的及时和准确得到保障。 Cloud computing as an emerging technology concepts, it provides cloud storage (mass distribution of data storage technology), cloud computing (hadoop of map reduce, streaming real-time computing), cloud security is very suitable for large data storage, mining, analysis , warning, statistics and other needs, and its efficient performance so that timely and accurate data processing guaranteed. 基于云计算平台的原理,进行前期日志数据存储的选择和根据数据量和查询实时性的要求做了分类处理,最主要的是做到了一个业务任务分析的并行处理,而不是的多任务的并行处理,大大提升了查询效率和统计结果的正确性。 Based on the principle of cloud computing platform, parallel pre-log data storage options and do a query based on the amount of data and real-time requirements of the classification process, the most important is to do parallel processing of a business task analysis, rather than the multi-tasking processing, greatly enhance the efficiency of queries and statistical accuracy of the results.

[0098] 本发明实施例的目的在于解决海量日志的云存储,以及海量日志能够得到及时分析和深入分析挖掘的云计算服务,并且保证日志数据的安全性、准确性。 [0098] The object of the embodiment of the present invention is to solve embodiment massive log cloud storage, and massive log analysis can be timely and in-depth analysis mining cloud services, and log data of the safety and accuracy. 同时解决了日志量的增长只要通过新的计算节点来解决,而无需只是一味地靠硬件提高数据处理效率及增加 While addressing the log volume growth as long as addressed by the new compute nodes, without just blindly rely on hardware to improve efficiency and increase data processing

存储量。 Storage capacity.

[0099] 优选地,存储单元包括拆分模块和传送模块。 [0099] Preferably, the storage unit includes a dividing module and a transfer module.

[0100] 拆分模块用于使得集群服务器将日志文件拆分成日志数据。 [0100] splitting means for splitting such that the cluster server log files to the log data.

[0101] 由于不同用户端的日志文件的格式各不相同,而每个日志文件中包含有多个日志记录,将日志文件拆分成日志数据可以是将日志文件拆分成多行日志数据,形成数据行,以便于将不通过格式的日志文件拆分成日志数据传送至分布式消息列中。 [0101] Due to the different formats of the client log file different, and each log file contains a plurality of log records, the log data into the log file splitting can be split into multiple log files log data lines, are formed data lines, so as not to split the log into the log file format data to the distributed message column.

[0102] 传送模块用于使得集群服务器将日志数据传送到分布式消息队列中。 [0102] means for transmitting the cluster such that the server transmits the log data to a distributed message queue. 其中,集群服务器从分布式消息队列中读取日志数据,并对日志数据进行分析。 Wherein a cluster server reads data from the distributed log message queue, and to analyze the log data.

[0103] 分布式消息队列可以是kafka消息队列,kafka的分布式消息队列比较适合简单的消息传递和分发,能支持大数据量,尤其是日志数据,而且与mapreduce结合做实时分析也能达到很好的效果。 [0103] distributed message queue can be kafka message queue, message queue kafka distributed more suitable for simple messaging and distribution, can support large amounts of data, especially the log data, and combined with mapreduce do real-time analysis can achieve very Good results.

[0104] 优选地,存储单元还包括读取模块、解析模块、生成模块和存储模块。 [0104] Preferably, the storage unit further includes a read module, a parsing module, a generating module and a storage module.

[0105] 读取模块用于在集群服务器将日志数据传送到分布式消息队列中之后,使得集群服务器从分布式消息队列中读取日志数据。 After the [0105] reading module for transferring log data to a message queue in a distributed server cluster, the server cluster so that the log data read from the distributed message queue. 解析模块用于使得集群服务器对读取的日志数据进行解析,得到解析结果。 Means for parsing the server cluster so that the log data read is parsed to obtain analysis results. 生成模块用于使得集群服务器根据解析结果生成日志数据对应的键值对。 Generating means for generating log data so that the server cluster key values ​​corresponding to the analysis result. 存储模块用于使得集群服务器通过将键值对存储到分布式数据库中来存储日志文件。 Means for storing the cluster such that the key is stored by the server log files stored in a distributed database. [0106] 具体地,从分布式消息队列中读取日志数据,对每条日志数据进行解析,解析得到日志的关键字,例如mac地址、流量、具体应用等,基于这些解析结果生成日志数据对应的键值对,如利用mac地址为key,其他的解析结果为value,然后得到日志数据的键值对,然后把日志数据映射存储到分布式数据库如hbase数据库中。 [0106] Specifically, the distributed read from the message queue of log data, log data for each parsing the keyword analyzing logs obtained, e.g. mac address, traffic, and other specific applications, based on these analysis results to generate log data corresponding to key-value pairs, such as the use of mac address key, another analysis result is value, which then get the key of the log data and the log data stored in the map database in a distributed database as hbase.

[0107] 本发明实施例,利用分布式数据库hbase存储日志解析的数据,由于hbase数据库是基于key-value的数据存储模式,扩展性好,从hbase取数进行分析速度够快,而且结果可以任意存储,要继续存储hbase、关系型数据或者redis均可,不会有不兼容的情况出现。 [0107] Example embodiments of the present invention, using data stored in the distributed database hbase log analysis, since hbase database is based on the key-value data storage mode, scalability, taken from analysis hbase number fast enough, and the results may be arbitrarily storage, to continue to store hbase, relational data or redis available, there will be no incompatibilities arise.

[0108] 优选地,分析单元包括第一获取模块和第一计算模块。 [0108] Preferably, the analysis unit comprises a first acquisition module and a first calculation module.

[0109] 第一获取模块用于使得集群服务器从分布式数据库中实时获取增量的日志数据。 [0109] a first obtaining means for obtaining the server cluster so that the incremental log data from a real-time distributed databases. 第一计算模块用于使得集群服务器对增量的日志数据采用流式计算进行统计。 Means for calculating a first cluster server that uses incremental log data flow statistics calculations.

[0110] 由于日志文件的不断累加,存储在分布式数据库中的日志数据也不断增加,本发明实施例中的实时分析可以是集群服务器实时从分布式数据库中实时获取增量的日志数据,对增量的日志数据进行计算统计,避免对已经计算过的日志数据进行重复计算。 [0110] due to the constant accumulation of the log file, the log data stored in the distributed database is also increased, real-time analysis in the embodiment of the present invention may be acquired in real time incremental cluster server from the distributed database and real-time log data, incremental log data to calculate statistics have been calculated to avoid double-counting of log data. 实时获取增量的日志数据,采用流式计算对增量的数据进行统计。 Log data acquired in real time increment, the use of flow calculation of the incremental data statistics. 其中,流式计算是采用storm的bolt来完成,bolt中自带过滤、聚合、查询数据库等一系列操作,其中,过滤操作可以在前期的parse分析中完成,以DB表的形式存放在hbase中,只在流式计算中做了map映射把需要的数据组织起来进行聚合计算分析。 Wherein the flow calculations are done using the bolt storm, Bolt comes filtering, aggregation, query the database and a series of operations, in which the filter can be done in the pre-parse analysis, in the form of a table stored in the DB in hbase only made in the flow map mapping calculations need to organize data to calculate and analyze the polymerization.

[0111] 具体地,首先,从kafka队列中取出日志数据经过parse解析存放在hbase中,此过程将日志记录进行拆分,映射成DB表的形式存放在hbase中。 [0111] Specifically, first, removed from the queue kafka parse parse through log data stored in hbase, this process logging split, mapped to DB table form stored in the hbase. 然后,采用流式计算来进行实时分析统计,流式计算是采用storm的bolt来完成,bolt中自带过滤、聚合、查询数据库等一系列操作,其中,过滤操作可以在前期的parse分析中完成,以DB表的形式存放在hbase中,只在流式计算中做了map映射把需要的数据组织起来进行聚合计算分析。 Then, using flow for real-time analysis of statistical calculations, flow calculations are done using the bolt storm, Bolt comes filtering, aggregation, a series of queries and other database operations, wherein the filter can be done early in the analysis of parse in the form stored in a DB table hbase, only made in the flow map mapping calculations required to organize data calculation and analysis polymerization. 接着将流式计算统计完的结果存放在数据库如redis数据库中。 The flow is then calculated statistical results End redis stored in the database as the database. 最后,把存储在redis的结果数据依据实际需要存放在hbase数据库,或者关系型数据库mysql中,供用户查询这些统计数据。 Finally, the results are stored in the data based on the actual needs of redis hbase stored in the database, or relational database mysql, the user queries for these statistics.

[0112] 上述实施例描述了日志分析中的实时分析的一个流程,根据实时分析流程处理海量日志的实时分析,瞬间把结果反馈给客户,提高日志分析结果的及时性。 [0112] The embodiment describes a process of real-time analysis of log analysis, real-time analysis of real-time massive log analysis process, the results instant feedback to the customer, to improve the timeliness of the results of the analysis of the log.

[0113] 优选地,分析单元包括第二获取模块和第二计算模块。 [0113] Preferably, the analysis unit comprises a second acquisition module and the second computing module.

[0114] 第二获取模块用于使得集群服务器按照预设周期从分布式数据库中获取增量的日志数据。 [0114] The second acquiring means for acquiring the server cluster so that the incremental log data from the distributed database according to a preset period. 第二计算模块用于使得集群服务器对增量的日志数据进行统计计算。 Means for calculating a second cluster server so that the incremental log data for statistical calculations.

[0115] 由于用户对日志文件的分析结果的查询情况的不同可以采用离线分析的方式对日志数据进行分析处理。 [0115] Depending on the analysis result of check on the status of the user log files embodiment offline analysis of log data analyzing process may be employed. 可以预先设置分析的周期即预设周期,预设周期可以根据需要进行设置,例如一个星期或者一个月等。 May be preset analysis period, i.e., a preset period, the preset period may be set according to needs, and the like, for example, one week or a month. 按照预设周期从分布式数据库中获取增量的日志数据,在对增量的日志数据进行统计计算。 Incremental log data acquired from the distributed database according to a preset period, calculated in increments of log data statistics.

[0116] 具体地,可以通过以下步骤实现: [0116] In particular, can be achieved by the following steps:

[0117] 步骤一,从kafka队列中取出日志数据经过parse解析存放在hbase中,此过程将日志记录进行拆分,映射成DB表的形式存放在hbase中。 [0117] Step a, is removed from the queue kafka parse parse through log data stored in hbase, this process logging split, mapped to DB table form stored in the hbase.

[0118] 步骤二,根据具体需要创建一个个的作业任务,任务逻辑根据实际的业务逻辑而定。 [0118] Step two, depending on the need to create a job of a task, the task logic based on the actual business logic. [0119] 步骤三,创建周期性调度Task,就是设置周期性的调度作业任务,比如预先创建任务1,每天零点跑任务I。 [0119] Step three, create a recurring schedule Task, provided is a periodic task scheduling operations, such as pre-created task 1, task zero run day I.

[0120] 步骤四,到达的调度时间,依据调度内容启动任务。 [0120] Step Four, scheduled arrival time, according to the task start schedule content.

[0121 ] 步骤五,执行具体的任务逻辑计算统计日志数据。 [0121] Step 5 calculates statistics log data perform particular tasks logic.

[0122] 步骤六,如果任务执行失败,则通过预先设置的通知模块以短信或者mail的方式通知相关用户,用户在手动排查原因后重启作业任务。 [0122] Step six, if the task fails, the notification module is set in advance by SMS or notify the user by way of mail, after the user manually reboot job task troubleshooting reasons.

[0123] 步骤七,任务执行成功后,把执行结果存放在hbase数据库中,方便用户查询。 [0123] Step seven, the task is executed successfully, the results of the implementation hbase stored in the database, user queries.

[0124] 步骤八,任务执行成功并且把结果存放在hbase数据库后,可以通过通知模块以短信或者mail的方式通知用户,任务执行成功。 [0124] Step Eight, the successful implementation of the task and the results stored in the database after hbase, SMS or mail can notify users by way of notification module, perform the task successfully.

[0125] 上述实施例描述了日志分析中的离线分析的一个流程,根据这样的离线分析流程并行处理海量日志的离线分析,并且把结果上报给前端供用户展示。 [0125] The embodiment describes a process off-line analysis of log analysis, off-line parallel processing massive log analysis off-line analysis in accordance with this process, and the result is reported to the user front end for display.

[0126] 图4是根据本发明实施例的一种优选的日志处理装置的示意图。 [0126] FIG. 4 is a schematic view of a preferred embodiment of the log of the processing apparatus of the embodiment according to the present invention. 如图4所示,该实施例的日志处理装置包括日志采集模块20、日志存储模块40、日志分析模块60和显示模块80。 As shown, the log processing apparatus of this embodiment comprises a 4 log collection module 20, the log storage module 40, log analysis module 60 and display module 80.

[0127]日志采集模块20用于从外部系统上提取相关日志。 [0127] log collection module 20 is configured to extract the relevant log from the external system. 外部系统可以是需要采集日志的服务器,也可以是用户那一侧需要采集日志的客户端,即,本发明实施例中提供的用户端。 External system may be required for acquisition of log server, the user side may be a need for acquisition of log client, i.e., client embodiment of the present invention is provided in the embodiment. 具体地,可以是通过设计的一个agent代理,搭载在需要采集日志的服务器上,定时采集相关日志往存储模块传送。 In particular, the agent may be designed by an agent, to be collected is mounted on the log server, timing acquisition related to the log storage module transmission.

[0128] 日志存储模块40用于把采集来的日志存储在collector集群服务器上。 [0128] the log storage module 40 for storing the log collected in the collector to the server cluster. 日志存储模块40具有两部分功能,一是通过HTTP协议将采集来的日志文件存放在集群服务器上,并且把日志文件的描述信息(例如文件路径、创建时间等)存放在Redis中;二是processor处理过程,通过redis读取日志文件的描述信息把具体日志文件数据传送到kafka消息队列中,供日志分析模块60调用分析。 A log storage module 40 has two functions, one log file collected through the HTTP protocol to a cluster is stored on the server, and the descriptive information (e.g., file path, creation time, etc.) stored in the log file in Redis; second processor description of information processing procedures, a read log files redis transferring data to a specific log file kafka message queue for call log analysis module 60 analyzes. 日志存储模块40可以通过本发明实施例中的存储单元来实现其功能。 A log storage module 40 may perform its function in the embodiment of the storage unit embodiment of the present invention.

[0129]日志分析模块60用于计算统计日志相关数据,根据查询要求的及时性分为实时分析和离线分析。 [0129] log analysis module 60 for calculating the statistics log data, real-time analysis and off-line analysis divided based on timely query requirements. 日志分析模块60可以通过本发明实施例的分析单元实现其功能。 Analysis unit according to its function log analysis module 60 may be implemented by the present invention.

[0130] 实时分析通常要求在数秒内返回上亿行数据的分析,从日志存储模块40中分发出即时的日志数据进行实时统计,这部分数据量一般不会太大,可以通过流式计算来统计分析,结果暂存redis中,处理后往hbase中存放,方便取数前端展示。 [0130] Real-time analysis is usually required to return the data analysis billion rows in seconds, issued instant real-time statistics log data from the log storage module 40 carved, this part of the amount of data is generally not too large, the flow may be calculated by statistical analysis of the results of the temporary redis, the post-processing to storage hbase, the easy access front-end display.

[0131] 离线分析对统计数据的及时性要求不高,可以隔天或者隔月展示。 [0131] Off-line analysis for less demanding timely statistics may show the next day, or every other month. 从日志存储模块中把解析后的日志数据先存放在Hbase数据库中,根据业务逻辑要求预先写好任务job,定时跑任务来计算统计分析日志。 From the log in the log storage module on the parsed data stored in the first database Hbase, based on business logic requirements of Job prewritten task, the task to calculate the timing to run statistical analysis of the log.

[0132] 显示模块80用于将日志分析结果通过网页或者手机APP展示给用户。 [0132] The display module 80 to display the log analysis results to the user via a web page or phone APP.

[0133] 本发明实施例的优点在于:第一,采用可搭可卸的agent代理采集日志,可以方便配置采集日志类型,不需要也可以随时卸载,方便快捷,无需重新定制开发。 [0133] Advantages of embodiments of the present invention: first, the use of the agent can take removable log collection agents can facilitate the configuration for acquisition of log type, do not need to be unloaded at any time convenient, without re custom development. 第二,采用集群存储,作为一个日志中心,可接受所有传送过来的日志,集中进行key-value处理后进行存储,尤其是随着日志量的增长,只要通过增加硬盘、内存等硬件来扩容即可,即方便快捷又节省开销。 Second, using clustered storage, as a center log, the log transmitted from all acceptable, centrally stored key-value after processing, especially with the growing volume of logs, as long as the expansion by adding to a hard disk, memory and other hardware that is can, that is convenient and save money. 第三,日志分析模块60针对日志量和实际需求进行分类处理,对大数据量的分析速率较快,且准确性较高,对于结果的反馈可自动通知用户,及时性得到很好的保障。 Third, the log analysis module 60 and the actual demand for the amount of log classification process, analyze the rate of large amounts of data faster, and higher accuracy, the results of feedback can automatically notify the user, timeliness is well protected. 第四,利用hbase存储日志解析的数据,由于hbase是基于key value的数据存储模式,扩展性好,从hbase数据库取数进行分析速度够快,而且结果可以任意存储,要继续存储hbase数据库、关系型数据或者redis数据库均可,不会有不兼容的情况出现。 Fourth, using the data stored hbase log analysis, since the key value is based hbase data storage mode, scalability, and fast enough to be analyzed from several database hbase taken, and the result may be arbitrarily store, continue to store databases hbase relationship redis database or data type can be, there will be no incompatibilities arise.

[0134] 综上,本发明具有如下效果: [0134] In summary, the present invention has the following effects:

[0135] 高运算能力,将复杂的运算均分配到各台服务器,实现了整个装置的高并发能力,处理能力是传统架构的10倍以上。 [0135] high computing power, the complex operations are assigned to each server, to achieve a high concurrency of the entire apparatus, the processing capacity of more than 10 times of the traditional architecture.

[0136] 在用户实际应用环境中,各种不同类型的软硬件故障发生的概率较高,如硬件损坏、网络中断、系统崩溃等异常都会引起服务中断,甚至造成数据丢失。 [0136] In the practical application of user environment, the higher the probability of various types of hardware and software failures occur, such as damage to hardware, network outages, system crashes and other anomalies can cause a service interruption, and even lead to data loss. 本发明实施例是一个构建在云平台之上的海量日志的日志处理装置,因此它可利用云计算环境的多主机冗余来保障服务的高可靠性。 Embodiment of the present invention is a massive log log processing apparatus above the cloud platform construction, so it can use multiple redundant host cloud computing environment to guarantee high reliability of services.

[0137] 本发明实施例能够将所有用户端的本地存储做汇总,可支持PB规模的存储容量,且非常容易进行存储扩容,整个扩展过程不会影响服务的持续运行。 Embodiment [0137] The present invention can be stored locally all clients do summary, can support storage capacity PB size, and very easy to expansion memory, the entire process does not affect the extended continuous operation and services.

[0138] 本发明实施例使用的软件产品为开源产品,硬件采用低端的PC-SERVER,总成本较低。 Example software product used in the examples [0138] The present invention is open products, low-end hardware using PC-SERVER, lower overall cost.

[0139] 需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。 [0139] Incidentally, the foregoing embodiments of the methods for, for ease of description, it is described as a series combination of actions, those skilled in the art should understand that the present invention is not described in the operation sequence It limited since according to the present invention, some steps may be performed simultaneously or in other sequences. 其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。 Secondly, those skilled in the art should also understand that the embodiments are described in the specification are exemplary embodiments, actions and modules involved are not necessarily required by the present invention.

[0140] 在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。 [0140] In the above embodiment, the description of the various embodiments have different emphases, certain embodiments not detailed in part, be related descriptions in other embodiments.

[0141] 在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。 [0141] In several embodiments provided herein present embodiment, it should be understood that the disclosed apparatus can be implemented in other ways. 例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。 For example, the described apparatus embodiments are merely illustrative of, for example, the unit division is merely logical function division, there may be other division in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. 另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。 Another point, displayed or coupling or direct coupling or communication between interconnected in question may be through some interface, device, or indirect coupling or communication connection unit, may be electrical or other forms.

[0142] 所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。 [0142] The unit described as separate components may be or may not be physically separate, parts displayed as units may be or may not be physical units, i.e. may be located in one place, or may be distributed to a plurality of networks unit. 可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。 You can select some or all of the units according to actual needs to achieve the object of the solutions of the embodiments.

[0143] 另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。 [0143] Additionally, functional units may be integrated in various embodiments of the present invention in a processing unit, separate units may be physically present, may be two or more units are integrated into one unit. 上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。 The integrated unit may be implemented in the form of hardware, software functional units may also be implemented.

[0144] 所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。 [0144] If the integrated unit is realized as an independent product sold or used in the form of a software functional unit may be stored in a computer-readable storage medium. 基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、移动终端、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。 Based on such understanding, the technical solutions of the present invention essentially, or the part or all of the technical solutions contributing to the prior art may be embodied in part or in the form of a software product, which computer software product is stored in a storage medium , including several instructions for instructing a computer device to perform all or part of the steps of the method according to various embodiments of the present invention (may be a personal computer, a mobile terminal, server or network device). 而前述的存储介质包括:U盘、只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM, Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。 The storage medium includes: U disk, read only memory (ROM, Read-OnlyMemory), a random access various media may store program code memory (RAM, Random Access Memory), removable hard disk, a magnetic disk or optical disk.

[0145] 以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。 [0145] The foregoing is only preferred embodiments of the present invention, it is not intended to limit the invention to those skilled in the art, the present invention may have various changes and variations. 凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 Any modification within the spirit and principle of the present invention, made, equivalent substitutions, improvements, etc., should be included within the scope of the present invention.

Claims (10)

1.一种日志处理方法,其特征在于,包括: 集群服务器接收用户端的日志文件; 所述集群服务器存储所述日志文件; 所述集群服务器对所述日志文件进行分析,得到分析结果;以及所述集群服务器输出所述分析结果。 CLAIMS 1. A method for processing a log, characterized in that, comprising: the UE receiving cluster server log files; the trunking server stores the log file; the trunking server processes the log files are analyzed to obtain analysis result; as well as said cluster server output the analysis result.
2.根据权利要求1所述的日志处理方法,其特征在于,所述集群服务器存储所述日志文件包括: 所述集群服务器将所述日志文件拆分成日志数据;以及所述集群服务器将所述日志数据传送到分布式消息队列中, 其中,所述集群服务器从所述分布式消息队列中读取所述日志数据,并对所述日志数据进行分析。 The log processing method according to claim 1, wherein said cluster server stores the log file comprises: splitting the cluster server, the log data into the log file; The server of the cluster, and transmitting said log data to a distributed message queue, wherein the server cluster is read from the message queue of the distributed log data, log data and the analysis.
3.根据权利要求2所述的日志处理方法,其特征在于,在所述集群服务器将所述日志数据传送到分布式消息队列中之后,所述日志处理方法还包括: 所述集群服务器从所述分布式消息队列中读取所述日志数据; 所述集群服务器对读取的日志数据进行解析,得到解析结果; 所述集群服务器根据所述解析结果生成所述日志数据对应的键值对;以及所述集群服务器通过将所述键值对存储到分布式数据库中来存储所述日志文件。 The log processing method according to claim 2, wherein, in said cluster server transmits the log data to a distributed message queue after the log processing method further comprises: the server from the cluster reading said distributed message queue of the log data; said read cluster server log data is parsed to obtain an analysis result; the cluster server generates log data corresponding to the key according to the result of the analysis; and the trunking server by the key-value pair stored in the distributed database to store the log file.
4.根据权利要求3所述的日志处理方法,其特征在于,所述集群服务器对所述日志文件进行分析包括: 所述集群服务器从所述分布式数据库中实时获取增量的日志数据;以及所述集群服务器对所述增量的日志数据采用流式计算进行统计。 The log processing method according to claim 3, characterized in that the said cluster server log file analysis comprising: obtaining the trunking server incremental log data from the distributed real-time database; and the cluster server uses the statistical calculation of the delta stream log data.
5.根据权利要求3所述的日志处理方法,其特征在于,所述集群服务器对所述日志文件进行分析包括: 所述集群服务器按照预设周期从所述分布式数据库中获取增量的日志数据;以及所述集群服务器对所述增量的日志数据进行统计计算。 The cluster server log increments acquired from the distributed database according to a preset cycle: The log processing method according to claim 3, characterized in that the said cluster server log file analysis comprises data; and the server cluster to the delta log data for statistical calculations.
6.一种日志处理装置,其特征在于,包括: 接收单元,用于使得集群服务器接收用户端的日志文件; 存储单元,用于使得所述集群服务器存储所述日志文件; 分析单元,用于使得所述集群服务器对所述日志文件进行分析,得到分析结果;以及输出单元,用于使得所述集群服务器输出所述分析结果。 A log processing apparatus, characterized by comprising: receiving means for receiving the server cluster so that client's log file; a storage unit, such that the cluster server for storing the log file; analyzing means for causing the cluster server of the log file analysis, the analysis results obtained; and an output unit for causing the output of the server cluster analysis result.
7.根据权利要求6所述的日志处理装置,其特征在于,所述存储单元包括: 拆分模块,用于使得所述集群服务器将所述日志文件拆分成日志数据;以及传送模块,用于使得所述集群服务器将所述日志数据传送到分布式消息队列中, 其中,所述集群服务器从所述分布式消息队列中读取所述日志数据,并对所述日志数据进行分析。 The log processing apparatus according to claim 6, wherein said memory unit comprises: a splitting module, for causing the server to split the cluster into a log file in the log data; and a transmission module, with such that the cluster server to transmit the log data to a distributed message queue, wherein the server cluster is read from the message queue of the distributed log data, log data and the analysis.
8.根据权利要求7所述的日志处理装置,其特征在于,所述存储单元还包括: 读取模块,用于在所述集群服务器将所述日志数据传送到分布式消息队列中之后,使得所述集群服务器从所述分布式消息队列中读取所述日志数据; 解析模块,用于使得所述集群服务器对读取的日志数据进行解析,得到解析结果;生成模块,用于使得所述集群服务器根据所述解析结果生成所述日志数据对应的键值对;以及存储模块,用于使得所述集群服务器通过将所述键值对存储到分布式数据库中来存储所述日志文件。 8. The log processing apparatus according to claim 7, wherein the storage unit further comprises: after the reading module, the cluster server configured to transmit the log data to a distributed message queue, such that the distributed trunking server read from the message queue of the log data; parsing module, for causing the cluster server log data read is parsed to obtain an analysis result; generating module, for causing the the cluster server generates the analysis result of the log data corresponding to the key value; and a storage module, such that the cluster server for the key value by storing the log file stored in the distributed database.
9.根据权利要求8所述的日志处理装置,其特征在于,所述分析单元包括: 第一获取模块,用于使得所述集群服务器从所述分布式数据库中实时获取增量的日志数据;以及第一计算模块,用于使得所述集群服务器对所述增量的日志数据采用流式计算进行统计。 9. The log processing apparatus according to claim 8, wherein the analysis unit comprises: a first acquiring module, for causing the server to obtain incremental cluster log data from the distributed real-time database; and a first calculating means for enabling the cluster server log data stream using the incremental calculated statistics.
10.根据权利要求8所述的日志处理装置,其特征在于,所述分析单元包括: 第二获取模块,用于使得所述集群服务器按照预设周期从所述分布式数据库中获取增量的日志数据;以及第二计算模块,用于使得所述集群服务器对所述增量的日志数据进行统计计算。 10. The log processing apparatus according to claim 8, wherein the analysis unit comprises: a second acquiring module, for causing the Cluster Server acquires increments from the distributed database according to a preset period log data; and a second calculating module, for causing the server cluster to the delta log data for statistical calculations.
CN 201410106430 2014-03-20 2014-03-20 Log processing method and device CN103838867A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201410106430 CN103838867A (en) 2014-03-20 2014-03-20 Log processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201410106430 CN103838867A (en) 2014-03-20 2014-03-20 Log processing method and device

Publications (1)

Publication Number Publication Date
CN103838867A true CN103838867A (en) 2014-06-04

Family

ID=50802363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201410106430 CN103838867A (en) 2014-03-20 2014-03-20 Log processing method and device

Country Status (1)

Country Link
CN (1) CN103838867A (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113605A (en) * 2014-07-30 2014-10-22 浪潮软件股份有限公司 Enterprise cloud application development monitoring processing method
CN104486107A (en) * 2014-12-05 2015-04-01 曙光信息产业(北京)有限公司 Log collection device and method
CN104501848A (en) * 2014-12-04 2015-04-08 国家电网公司 Data accessing method and system of substation equipment
CN104516970A (en) * 2014-12-23 2015-04-15 广州酷狗计算机科技有限公司 Method and device both for log analysis
CN104579789A (en) * 2015-01-23 2015-04-29 广东能龙教育股份有限公司 Massive user behavior data acquisition method and system based on message queue
CN105205167A (en) * 2015-10-10 2015-12-30 国网信息通信产业集团有限公司 Log data system
CN105278996A (en) * 2015-11-03 2016-01-27 亚信科技(南京)有限公司 Log collection method and device and log service system
CN105337748A (en) * 2014-06-20 2016-02-17 北京奇虎科技有限公司 Log file collection method and system, server, and service cluster controlling apparatus
CN105426292A (en) * 2015-10-29 2016-03-23 网易(杭州)网络有限公司 Game log real-time processing system and method
CN105512297A (en) * 2015-12-10 2016-04-20 中国测绘科学研究院 Distributed stream-oriented computation based spatial data processing method and system
CN105590259A (en) * 2015-11-04 2016-05-18 中国银联股份有限公司 Device and method for diagnosis of transaction system
CN105589856A (en) * 2014-10-21 2016-05-18 阿里巴巴集团控股有限公司 Log data processing method and log data processing system
CN105608188A (en) * 2015-12-23 2016-05-25 北京奇虎科技有限公司 Data processing method and a data processing apparatus
CN105681397A (en) * 2015-12-30 2016-06-15 曙光信息产业(北京)有限公司 Network traffic data storage method and system, query method and device
CN105718295A (en) * 2016-01-27 2016-06-29 四川长虹电器股份有限公司 Data collecting and analyzing method and system
CN105812202A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Log real time monitoring and early warning method and device employing same
CN105933736A (en) * 2016-04-18 2016-09-07 天脉聚源(北京)传媒科技有限公司 Log processing method and device
CN106055703A (en) * 2016-06-22 2016-10-26 北京科摩仕捷科技有限公司 Real-time log analysis method and system
CN106126730A (en) * 2016-07-01 2016-11-16 百势软件(北京)有限公司 A method and a device for batch generation of alarm information
CN106156079A (en) * 2015-03-31 2016-11-23 西门子公司 Log data processing method and device
CN106201739A (en) * 2016-06-29 2016-12-07 上海浦东发展银行股份有限公司信用卡中心 Storm remote calling method based on Redis
CN106254086A (en) * 2015-06-04 2016-12-21 重庆达特科技有限公司 Cloud log centralized management, analysis, monitoring and alarm platform
CN106294721A (en) * 2016-08-08 2017-01-04 无锡天脉聚源传媒科技有限公司 Cluster data statistics and exporting method and apparatus
CN106354434A (en) * 2016-08-31 2017-01-25 中国人民大学 Log data storing method and system
CN106383917A (en) * 2016-11-11 2017-02-08 苏州天平先进数字科技有限公司 Data processing method based on user logs
CN106406858A (en) * 2016-08-30 2017-02-15 国电南瑞科技股份有限公司 Streaming type statistical definition and operation method based on configuration file
CN106484709A (en) * 2015-08-26 2017-03-08 北京神州泰岳软件股份有限公司 Log data auditing method and log data auditing device
CN106528798A (en) * 2016-11-11 2017-03-22 苏州天平先进数字科技有限公司 Data processing system based on user logs
CN106792876A (en) * 2016-12-26 2017-05-31 浙江省公众信息产业有限公司 End-to-end network perception evaluation method and system
CN106850295A (en) * 2017-02-04 2017-06-13 郑州云海信息技术有限公司 Log collection monitoring method for privatized cloud platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏彬: "基于分布式日志系统的数据云服务平台设计与实现", 《万方数据库浙江大学硕士学位论文》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105337748A (en) * 2014-06-20 2016-02-17 北京奇虎科技有限公司 Log file collection method and system, server, and service cluster controlling apparatus
CN104113605A (en) * 2014-07-30 2014-10-22 浪潮软件股份有限公司 Enterprise cloud application development monitoring processing method
CN105589856A (en) * 2014-10-21 2016-05-18 阿里巴巴集团控股有限公司 Log data processing method and log data processing system
CN105589856B (en) * 2014-10-21 2019-04-26 阿里巴巴集团控股有限公司 Daily record data processing method and system
CN104501848A (en) * 2014-12-04 2015-04-08 国家电网公司 Data accessing method and system of substation equipment
CN104486107A (en) * 2014-12-05 2015-04-01 曙光信息产业(北京)有限公司 Log collection device and method
CN104516970A (en) * 2014-12-23 2015-04-15 广州酷狗计算机科技有限公司 Method and device both for log analysis
CN104516970B (en) * 2014-12-23 2018-06-22 广州酷狗计算机科技有限公司 Method and apparatus for log analysis Species
CN105812202A (en) * 2014-12-31 2016-07-27 阿里巴巴集团控股有限公司 Log real time monitoring and early warning method and device employing same
CN104579789A (en) * 2015-01-23 2015-04-29 广东能龙教育股份有限公司 Massive user behavior data acquisition method and system based on message queue
CN106156079A (en) * 2015-03-31 2016-11-23 西门子公司 Log data processing method and device
CN106254086A (en) * 2015-06-04 2016-12-21 重庆达特科技有限公司 Cloud log centralized management, analysis, monitoring and alarm platform
CN106484709A (en) * 2015-08-26 2017-03-08 北京神州泰岳软件股份有限公司 Log data auditing method and log data auditing device
CN105205167A (en) * 2015-10-10 2015-12-30 国网信息通信产业集团有限公司 Log data system
CN105426292B (en) * 2015-10-29 2018-03-16 网易(杭州)网络有限公司 A game log real-time processing system and method
CN105426292A (en) * 2015-10-29 2016-03-23 网易(杭州)网络有限公司 Game log real-time processing system and method
CN105278996A (en) * 2015-11-03 2016-01-27 亚信科技(南京)有限公司 Log collection method and device and log service system
CN105590259A (en) * 2015-11-04 2016-05-18 中国银联股份有限公司 Device and method for diagnosis of transaction system
CN105512297A (en) * 2015-12-10 2016-04-20 中国测绘科学研究院 Distributed stream-oriented computation based spatial data processing method and system
CN105608188A (en) * 2015-12-23 2016-05-25 北京奇虎科技有限公司 Data processing method and a data processing apparatus
CN105681397A (en) * 2015-12-30 2016-06-15 曙光信息产业(北京)有限公司 Network traffic data storage method and system, query method and device
CN105718295A (en) * 2016-01-27 2016-06-29 四川长虹电器股份有限公司 Data collecting and analyzing method and system
CN105933736A (en) * 2016-04-18 2016-09-07 天脉聚源(北京)传媒科技有限公司 Log processing method and device
CN106055703A (en) * 2016-06-22 2016-10-26 北京科摩仕捷科技有限公司 Real-time log analysis method and system
CN106201739A (en) * 2016-06-29 2016-12-07 上海浦东发展银行股份有限公司信用卡中心 Storm remote calling method based on Redis
CN106126730A (en) * 2016-07-01 2016-11-16 百势软件(北京)有限公司 A method and a device for batch generation of alarm information
CN106294721A (en) * 2016-08-08 2017-01-04 无锡天脉聚源传媒科技有限公司 Cluster data statistics and exporting method and apparatus
CN106406858A (en) * 2016-08-30 2017-02-15 国电南瑞科技股份有限公司 Streaming type statistical definition and operation method based on configuration file
CN106354434A (en) * 2016-08-31 2017-01-25 中国人民大学 Log data storing method and system
CN106528798A (en) * 2016-11-11 2017-03-22 苏州天平先进数字科技有限公司 Data processing system based on user logs
CN106383917A (en) * 2016-11-11 2017-02-08 苏州天平先进数字科技有限公司 Data processing method based on user logs
CN106792876A (en) * 2016-12-26 2017-05-31 浙江省公众信息产业有限公司 End-to-end network perception evaluation method and system
CN106850295A (en) * 2017-02-04 2017-06-13 郑州云海信息技术有限公司 Log collection monitoring method for privatized cloud platform

Similar Documents

Publication Publication Date Title
CA2865184C (en) Method and system relating to re-labelling multi-document clusters
US20130124466A1 (en) Data Processing Service
CN100596353C (en) Method and system for providing log service
CN101957863B (en) Data parallel processing method, device and system
CN104820670B (en) A power information of the large data collection and storage methods
WO2011092203A1 (en) System and method for building a cloud aware massive data analytics solution background
CN102426609B (en) Index generation method and index generation device based on MapReduce programming architecture
Anuradha A brief introduction on Big Data 5Vs characteristics and Hadoop technology
Kraska Finding the needle in the big data systems haystack
Das et al. Big data analytics: A framework for unstructured data analysis
CN101969386A (en) Log acquisition device and log acquisition method
CN103024014B (en) By a process mass data message queue processing and distribution system
CN104111996A (en) Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN103761309B (en) One kind of operating system and data processing method
WO2014015488A1 (en) Method and apparatus for data storage and query
Dean Software engineering advice from building large-scale distributed systems
Zhang et al. A survey on emerging computing paradigms for big data
Padhy Big data processing with Hadoop-MapReduce in cloud systems
CN103095819A (en) Data information pushing method and data information pushing system
CN103116661B (en) Data processing method database
US9420068B1 (en) Log streaming facilities for computing applications
Schlossnagle Scalable internet architectures
CN103838867A (en) Log processing method and device
CN105677844B (en) A mobile advertising push large directional data and multi-screen user identification method
CN101937474A (en) Mass data query method and device

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
RJ01