WO2017107812A1 - 一种用户日志存储方法及设备 - Google Patents

一种用户日志存储方法及设备 Download PDF

Info

Publication number
WO2017107812A1
WO2017107812A1 PCT/CN2016/109674 CN2016109674W WO2017107812A1 WO 2017107812 A1 WO2017107812 A1 WO 2017107812A1 CN 2016109674 W CN2016109674 W CN 2016109674W WO 2017107812 A1 WO2017107812 A1 WO 2017107812A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
stored
node
preset
storage
Prior art date
Application number
PCT/CN2016/109674
Other languages
English (en)
French (fr)
Inventor
李灼灵
熊奇
韩森
李巨雷
Original Assignee
阿里巴巴集团控股有限公司
李灼灵
熊奇
韩森
李巨雷
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 李灼灵, 熊奇, 韩森, 李巨雷 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017107812A1 publication Critical patent/WO2017107812A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a user log storage method.
  • the application also relates to a user log storage device.
  • cloud computing platforms have increasingly become the object of attention, and cloud computing platforms are also called cloud platforms.
  • the cloud platform can be divided into three categories according to functions: a storage-based cloud platform based on data storage, a computing-based cloud platform based on data processing, and an integrated cloud computing platform that combines computing and data storage processing.
  • the cloud platform allows developers to either run the written program in the "cloud”, use the services provided in the "cloud”, or both.
  • the architecture design of the cloud platform log service is usually divided into five layers: (log) collection layer, (log) transport layer, processing layer, storage layer and access layer.
  • the collection layer is responsible for reading various types of logs of the user, and then sending the logs to be stored to the transport layer.
  • the functions of the layer are combined by various agents (agents).
  • agents agents
  • the existing cloud service function is implemented, in which the Agent is deployed on physical machines or virtual machines at various levels, and the user logs are read and sent according to the rules.
  • the processing layer is generally composed of a plurality of extensible working nodes (processing workers in FIG. 1), and receives logs of the transport layer, and stores them in various storage devices after processing.
  • the transport layer is in the middle of the collection layer and the processing layer. It is responsible for ensuring that logs are sent to the processing layer. Generally, it is implemented by a message queue that can be disaster-tolerant and stackable. It is a bridge between the collection layer and the processing layer.
  • the storage layer is responsible for data storage.
  • the access layer is provided with a dedicated access API to provide a unified data access interface.
  • the existing cloud platform log service often simply sorts the returned results at the access API layer, which can reduce some of the log out of order problems, but in the case of paging queries or large log volumes, the order of the logs cannot be guaranteed. Therefore, how to ensure accurate and orderly logs in the cloud platform has become a technical problem to be solved by those skilled in the art.
  • the invention provides a user log storage method. To solve the problem of log disorder in the existing cloud platform.
  • the method is applied to a log processing system including a collection node, a storage node, and a processing node, and a transmission channel corresponding to each of the processing nodes is set in advance between the collection node and the processing node, and the method includes:
  • the transmission channel corresponding to the to-be-stored log is determined according to the user information of the log to be stored, specifically:
  • the to-be-stored log is sent from the processing node to the storage node according to a preset sending policy, specifically:
  • the sending policy includes at least the data sending ratio and the cache threshold.
  • the cache threshold includes a log cache threshold and a log cache time threshold, and the sorted log is sequentially sent according to the data sending ratio, specifically:
  • the quantity of logs is selected from the sorted processed logs and sent.
  • the to-be-stored log is stored in the storage node according to the log cache condition of the storage node and the preset log cache condition, specifically:
  • the log to be stored is stored in parallel with other logs to be stored in the storage node;
  • log cache condition of the storage node meets a preset log storage condition, specifically:
  • the present application further provides a user log storage device, where the device is applied to a log processing system including a collection node, a storage node, and a processing node, where the device is in advance at the collection node and the processing node.
  • a log processing system including a collection node, a storage node, and a processing node, where the device is in advance at the collection node and the processing node.
  • Setting a transmission channel corresponding to each of the processing nodes, the device includes:
  • Determining a module determining, according to the user information of the log to be stored, a transmission channel corresponding to the to-be-stored log, and using the transmission channel to send the to-be-stored log to the processing node, and the log file to be stored with the same user information
  • the corresponding transmission channels are consistent;
  • the sending module sends the to-be-stored log from the processing node to the storage node according to a preset sending policy
  • the storage module stores the to-be-stored log in the storage node according to a log cache condition of the storage node and a preset log cache condition.
  • the determining module is specifically configured to:
  • the sending module is specifically configured to:
  • the logs are sorted according to the receiving time
  • the sending policy includes at least the data sending ratio and the cache threshold.
  • the cache threshold includes a log cache threshold and a log cache time threshold, and the sorted log is sequentially sent according to the data sending ratio, specifically:
  • the quantity of logs is selected from the sorted processed logs and sent.
  • the storage module is specifically configured to:
  • the log to be stored is stored in parallel with other logs to be stored in the storage node;
  • the determining module is specifically configured to:
  • FIG. 1 is a schematic diagram of a cloud platform log service architecture in the prior art
  • FIG. 2 is a schematic flowchart of a method for storing a user log according to the present application
  • FIG. 3 is a schematic diagram of a cloud platform log service architecture provided by a specific embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a user log storage device according to the present application.
  • the existing cloud platform log service often simply sorts the returned results at the access API layer, which can reduce a part of the log disorder problem, but in the paging query or the day In the case of a large amount of information, the order of the logs cannot be guaranteed.
  • the present application proposes a user log storage method. Since the solution of the present application is directed to improving the daily transmission process in the log processing system, the log processing system needs to include a collection node, a storage node, and a processing node.
  • a transmission channel corresponding to each processing node is preset between the collection node and the processing node. It should be noted that the transmission channel corresponding to each processing node may be a plurality of transmission lines actually set, or may be the same transmission.
  • the line is logically set to a plurality of lines corresponding to the respective processing nodes, which are all within the scope of protection of the present application.
  • the method includes the following steps:
  • S201 Determine, according to user information of the log to be stored, a transmission channel corresponding to the to-be-stored log, and use the transmission channel to send the to-be-stored log to the processing node, where the log to be stored with the same user information is corresponding.
  • the transmission channel is consistent.
  • Step a) receiving the to-be-stored log sent by the collection layer
  • Step b) determining a user corresponding to the log to be stored, and acquiring the user information of the user;
  • Step d) querying a transmission channel currently corresponding to the value, and generating a correspondence between the value, the transmission channel, and the user information.
  • the processing layer is generally designed as a stateless and extensible working node.
  • the processing speed of different nodes may be inconsistent, which may result in the order of the final processing results being inconsistent with the order of sending, which causes the logs to be out of order. Therefore, in order to further avoid this situation, a preferred embodiment of the present application sets a data transmission ratio and the cache threshold as a transmission policy when the number and/or time of logs buffered in the buffer pool of the processing node reaches the cache threshold. , sorting the logs according to the receiving time, and according to the data sending ratio The example sends the sorted log in turn, thereby ensuring the order of log sending.
  • the log cache threshold and the log cache time threshold may be used as a cache threshold in advance, and when the log needs to be sent according to the transmission ratio, according to the capacity of the processing node and the data transmission ratio. Determine the number of logs that can be sent, select the number of logs from the sorted log and send them.
  • the collection layer in Figure 3 is composed of different agents.
  • the processing layer is composed of multiple processing nodes (processing workers), and the storage layer is composed of multiple storage nodes (including storage workers and storage nodes), in the collection layer and processing layer.
  • This step can ensure that the transmission channel selected based on the user information is unique and balanced by a preset algorithm or other means.
  • the uniqueness refers to uniquely identifying the user, and the equalization refers to the distribution of multiple users by the algorithm.
  • the transmission channels are equally probable and do not appear to be overloaded for a certain transmission channel.
  • the hash algorithm can be used to process the user ID and send the log of the same user to a channel.
  • the processing node caches and sorts the processed data; then sends the previous data to the storage node, and the later data participates in the next sorting.
  • the technician can adjust the cache policy, the size of the cache pool, and the proportion of the sent data according to the actual situation of the system, so that the log output by the processing node is 100% ordered.
  • the cache policy can take the dual control of the number of caches and the cache time, that is, when the number of cached logs reaches a certain number, or the cache time reaches a certain time. Sort. After sorting, only the data before a certain time is output, and the new data is to participate in the next round of sorting.
  • S203 Store the to-be-stored log in the storage node according to a log cache condition of the storage node and a preset log cache condition.
  • the preferred embodiment of the present application determines whether the log cache condition of the storage node meets a preset log storage condition, and only records the to-be-stored log and the storage node if the determination result is yes. If the result of the determination is no, the log cache condition of the storage node is determined to meet the preset log storage condition.
  • the processing may be performed based on the following three points.
  • the technician may set other judgment conditions that can achieve the purpose on the basis of the above, which are all within the protection scope of the present application:
  • the storage node performs different caching and merging policies on the log data according to different storage services, thereby preventing the storage service from merging the data into small packets, and ensuring the final data.
  • the order of "landing" can be set to "cache log time exceeds 30 seconds, the number of logs reaches 300, or the log size reaches 1MB.”
  • General storage services are packaged from these three dimensions.
  • the function of the storage node is itself. Achieve and package, and ensure the order of business time.
  • the present application further provides a user log storage device.
  • the device is applied to a log processing system including a collection node, a storage node, and a processing node.
  • a transmission channel corresponding to each of the processing nodes is disposed between the collection node and the processing node, and the device includes:
  • the determining module 410 is configured to determine, according to the user information of the log to be stored, a transmission channel corresponding to the to-be-stored log, and use the transmission channel to send the to-be-stored log to the processing node, and the log with the same user information to be stored
  • the corresponding transmission channels are consistent;
  • the sending module 420 is configured to send the to-be-stored log from the processing node to the storage node according to a preset sending policy.
  • the storage module 430 stores the to-be-stored log in the storage node according to the log cache condition of the storage node and a preset log cache condition.
  • the determining module is specifically configured to:
  • the sending module is specifically configured to:
  • the logs are sorted according to the receiving time
  • the sending policy includes at least the data sending ratio and the cache threshold.
  • the cache threshold includes a log cache threshold and a log cache time threshold, and the log after the sorting process is sequentially sent according to the data sending ratio, specifically:
  • the quantity of logs is selected from the sorted processed logs and sent.
  • the storage module is specifically configured to:
  • the log to be stored is stored in parallel with other logs to be stored in the storage node;
  • the determining module is specifically configured to:
  • the present application determines a transmission channel corresponding to the to-be-stored log according to the user information of the log to be stored, by using a transmission channel corresponding to each processing node between the collection node and the processing node. And sending, by using the transmission channel, the to-be-stored log to the processing node, and after the log to be stored is sent from the processing node to the storage node according to the preset sending policy, the storage node is to be stored according to the log cache condition of the storage node and the preset log cache condition. The log is stored on the storage node. Because the transmission channels corresponding to the logs to be stored with the same user information are consistent, the problem of out-of-order logs can be effectively avoided, thereby ensuring the order of the entire cloud platform log system.
  • the present invention can be implemented by hardware or by means of software plus a necessary general hardware platform.
  • the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including several The instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various implementation scenarios of the present invention.
  • modules in the apparatus in the implementation scenario may be distributed in the apparatus for implementing the scenario according to the implementation scenario description, or may be correspondingly changed in one or more devices different from the implementation scenario.
  • the modules of the above implementation scenarios may be combined into one module, or may be further split into multiple sub-modules.

Abstract

本发明公开了一种用户日志存储方法。在收集节点和处理节点之间预设与各处理节点对应的传输通道的基础上,根据待存储日志的用户信息确定与待存储日志对应的传输通道,并利用传输通道将待存储日志发送至处理节点,在根据预设的发送策略将待存储日志从处理节点发送至存储节点后,根据存储节点的日志缓存情况以及预设的日志缓存条件将待存储日志存储于存储节点。由于具有相同用户信息的待存储日志所对应的传输通道一致,因此可有效地避免日志出现乱序问题,从而保证整个云平台日志系统的有序性。

Description

一种用户日志存储方法及设备 技术领域
本发明涉及通信技术领域,特别涉及一种用户日志存储方法。本申请同时还涉及一种用户日志存储设备。
背景技术
随着互联网技术的不断发展,云计算平台越来越成为人们所重视的对象,云计算平台也称为云平台。云平台可以按照功能划分为3类:以数据存储为主的存储型云平台,以数据处理为主的计算型云平台以及计算和数据存储处理兼顾的综合云计算平台。云平台允许开发者们或是将写好的程序放在“云”里运行,或是使用“云”里提供的服务,或二者皆是的平台。
云平台日志服务的架构设计通常分为五层:(日志)收集层、(日志)传输层、处理层、存储层和访问层。收集层负责读取用户的各类日志,然后将需要存储的日志发送到传输层,在图1所示的现有云平台日志服务架构示意图中,该层的功能由各种Agent(代理)结合现有的云服务功能实现,其中Agent部署在各级物理机或虚拟机上,按规则读取用户的日志并发送。处理层一般由多个可扩展的工作节点(图1中的处理worker)组成,接收传输层的日志,处理后存储到各类存储设备,一般来说,日志能否保证顺序跟处理层的逻辑密切相关。传输层处于收集层与处理层中间,负责保证日志被发送到处理层,一般由可容灾可堆积的消息队列实现,它是收集层和处理层的桥梁。存储层负责数据存储。访问层设置有专用的访问API,用以对外提供统一的数据访问接口。
在实现本发明的过程中,发明人发现现有的云平台日志服务架构中的多服务实例会导致日志乱序。以图1为例,当收集层中的云服务往往有多个实例时,不同实例的日志写到不同的日志文件,并由不同的Agent发送出来。出于性能的考虑,Agent是异步发送的,所以在不同实例的日志进行合并时就会有乱序的可能。
针对上述情况,现有的云平台日志服务往往只在访问API层对返回结果进行简单排序,这可减少一部分日志乱序问题,但在分页查询或者日志量大的情况下,无法保证日志的顺序,因此如何保证云平台中的日志准确有序,成为本领域技术人员亟待解决的技术问题。
发明内容
本发明提供了一种用户日志存储方法。用以解决现有的云平台中的日志乱序的问题。所述方法应用于包括收集节点、存储节点以及处理节点的日志处理系统中,预先在所述收集节点和所述处理节点之间设置与各所述处理节点对应的传输通道,该方法包括:
根据待存储日志的用户信息确定与所述待存储日志对应的传输通道,并利用所述传输通道将所述待存储日志发送至所述处理节点,具有相同用户信息的待存储日志所对应的传输通道一致;
根据预设的发送策略将所述待存储日志从所述处理节点发送至所述存储节点;
根据所述存储节点的日志缓存情况以及预设的日志缓存条件将所述待存储日志存储于所述存储节点。
优选地,根据待存储日志的用户信息确定与所述待存储日志对应的传输通道,具体为:
接收由所述收集层发送的所述待存储日志;
确定所述待存储日志对应的用户,并获取所述用户的所述用户信息;
获取所述用户信息在通过预设的哈希算法处理后得到的数值;
查询当前与所述数值对应的传输通道,生成所述数值、所述传输通道以及所述用户信息之间的对应关系。
优选地,根据预设的发送策略将所述待存储日志从所述处理节点发送至所述存储节点,具体为:
当所述处理节点的缓存池中所缓存的日志的数量和/或时间达到所述 缓存阈值时,将所述日志按照接收时间排序处理;
按照所述数据发送比例依次发送所述排序处理后的日志;
其中,所述发送策略至少包括所述数据发送比例以及所述缓存阈值。
优选地,所述缓存阈值包括日志缓存数量阈值以及日志缓存时间阈值,按照所述数据发送比例依次发送所述排序处理后的日志,具体为:
根据所述处理节点的缓存池的容量以及所述数据发送比例确定可发送的日志的数量;
从所述排序处理后的日志中选取所述数量的日志并发送。
优选地,根据所述存储节点的日志缓存情况以及预设的日志缓存条件将所述待存储日志存储于所述存储节点,具体为:
判断所述存储节点的日志缓存情况是否满足预设的日志存储条件;
若判断结果为是,将所述待存储日志与所述存储节点中的其他待存储日志进行并包存储;
若判断结果为否,在预设的周期之后判断所述存储节点的日志缓存情况是否满足预设的日志存储条件。
优选地,判断所述存储节点的日志缓存情况是否满足预设的日志存储条件,具体为:
判断所述存储节点当前已缓存日志的时间是否超过预设的时间阈值;
或,判断所述存储节点当前已缓存的日志的数量是否超过预设的数量阈值;
或,判断所述存储节点当前已缓存的日志的大小是否超过预设的容量阈值。
相应地,本申请还提出了一种用户日志存储设备,所述设备应用于包括收集节点、存储节点以及处理节点的日志处理系统中,所述设备预先在所述收集节点和所述处理节点之间设置与各所述处理节点对应的传输通道,该设备包括:
确定模块,根据待存储日志的用户信息确定与所述待存储日志对应的传输通道,并利用所述传输通道将所述待存储日志发送至所述处理节点,具有相同用户信息的待存储日志所对应的传输通道一致;
发送模块,根据预设的发送策略将所述待存储日志从所述处理节点发送至所述存储节点;
存储模块,根据所述存储节点的日志缓存情况以及预设的日志缓存条件将所述待存储日志存储于所述存储节点。
优选地,所述确定模块具体用于:
接收由所述收集层发送的所述待存储日志;
确定所述待存储日志对应的用户,并获取所述用户的所述用户信息;
获取所述用户信息在通过预设的哈希算法处理后得到的数值;
查询当前与所述数值对应的传输通道,生成所述数值、所述传输通道以及所述用户信息之间的对应关系。
优选地,所述发送模块具体用于:
当所述处理节点的缓存池中所缓存的日志的数量和/或时间达到所述缓存阈值时,将所述日志按照接收时间排序处理;
按照所述数据发送比例依次发送所述排序处理后的日志;
其中,所述发送策略至少包括所述数据发送比例以及所述缓存阈值。
优选地,所述缓存阈值包括日志缓存数量阈值以及日志缓存时间阈值,按照所述数据发送比例依次发送所述排序处理后的日志,具体为:
根据所述处理节点的缓存池的容量以及所述数据发送比例确定可发送的日志的数量;
从所述排序处理后的日志中选取所述数量的日志并发送。
优选地,所述存储模块具体用于:
判断所述存储节点的日志缓存情况是否满足预设的日志存储条件;
若判断结果为是,将所述待存储日志与所述存储节点中的其他待存储日志进行并包存储;
若判断结果为否,在预设的周期之后判断所述存储节点的日志缓存情况是否满足预设的日志存储条件。
优选地,所述判断模块具体用于:
判断所述存储节点当前已缓存日志的时间是否超过预设的时间阈值;
或,判断所述存储节点当前已缓存的日志的数量是否超过预设的数量阈值;
或,判断所述存储节点当前已缓存的日志的大小是否超过预设的容量阈值。
由此可见,通过应用本申请的技术方案,在收集节点和处理节点之间预设与各处理节点对应的传输通道的基础上,根据待存储日志的用户信息确定与待存储日志对应的传输通道,并利用传输通道将待存储日志发送至处理节点,在根据预设的发送策略将待存储日志从处理节点发送至存储节点后,根据存储节点的日志缓存情况以及预设的日志缓存条件将待存储日志存储于存储节点。由于具有相同用户信息的待存储日志所对应的传输通道一致,因此可有效地避免日志出现乱序问题,从而保证整个云平台日志系统的有序性。
附图说明
图1为现有技术中云平台日志服务架构示意图;
图2为本申请提出的一种用户日志存储方法的流程示意图;
图3为本申请具体实施例所提供的一种云平台日志服务架构示意图;
图4为本申请提出的一种用户日志存储设备的结构示意图。
具体实施方式
如背景技术所述,现有的云平台日志服务往往只在访问API层对返回结果进行简单排序,这可减少一部分日志乱序问题,但在分页查询或者日 志量大的情况下,无法保证日志的顺序。为此本申请提出了一种用户日志存储方法,由于本申请方案旨在针对日志处理系统中的日常传输过程进行改进,因此在该日志处理系统需要包括收集节点、存储节点以及处理节点,同时在收集节点和处理节点之间预先设置了与各处理节点对应的传输通道,在此需要说明的是,与各个处理节点对应的传输通道可以为实际设置的多条传输线路,也可以为将同一传输线路在逻辑上设置为多条与各个处理节点对应的线路,这些都属于本申请的保护范围。
如图2所示,该方法包括以下步骤:
S201,根据待存储日志的用户信息确定与所述待存储日志对应的传输通道,并利用所述传输通道将所述待存储日志发送至所述处理节点,具有相同用户信息的待存储日志所对应的传输通道一致。
在目前的云平台中,由于一般都存在用于收集用户日志的收集节点,因此在本申请的优选实施例中,针对包括收集节点的日志处理系统提出了相应的传输通道确定流程:
步骤a)接收由所述收集层发送的所述待存储日志;
步骤b)确定所述待存储日志对应的用户,并获取所述用户的所述用户信息;
步骤c)获取所述用户信息在通过预设的哈希算法处理后得到的数值;
步骤d)查询当前与所述数值对应的传输通道,生成所述数值、所述传输通道以及所述用户信息之间的对应关系。
此外,由于现有的云平台的日志量较大,处理层一般设计成一个个无状态可扩展的工作节点。不同节点处理速度可能不一致,会导致最终处理结果的顺序跟发送的顺序不一致,从而导致日志发生乱序。因此为了进一步避免出现该情况,本申请优选实施例设置数据发送比例以及所述缓存阈值作为发送策略,在当处理节点的缓存池中所缓存的日志的数量和/或时间达到所述缓存阈值时,将日志按照接收时间排序处理,并按照数据发送比 例依次发送排序处理后的日志,从而保证了日志发送的有序性。
需要说明的是,在上述优选实施例中,可预先将日志缓存数量阈值以及日志缓存时间阈值作为缓存阈值,并在需要按照发送比例发送日志时,根据处理节点的缓存池的容量以及数据发送比例确定可发送的日志的数量,从排序处理后的日志中选取所述数量的日志并发送。
为了进一步阐述本发明的技术思想,现结合图3所示的云平台日志服务架构示意图对本发明的技术方案进行说明。图3中的收集层由各个不同的Agent组成,处理层由多个处理节点(处理worker)组成,存储层则由多个存储节点(包含存储worker以及存储Node)组成,在收集层与处理层之间存在着用于传输用户日志的传输层,该传输层由多个传输通道组成。由于各个Agent独立地对云平台中的用户日志随机进行获取,因此当收集层需要发送日志时,收集层将需要保证顺序的日志发送到处理层的同一条日志传输通道。该步骤可以通过预设算法或是其他的方式保证基于用户信息所选择的传输通道是唯一且保证均衡的,唯一性是指能唯一识别用户,均衡性是指通过算法为多个用户所分布的传输通道都是均等概率的,不会出现为某一传输通道负荷过高的情况。
举例来说,要保证同一用户的日志顺序,则可利用哈希算法对用户ID进行处理,将同一用户的日志发送到一个通道中。这里应尽量缩小哈希的粒度和保证哈希的均衡。保证每个传输通道都由同一个处理节点接收处理,因此需要保证顺序的日志都会到达同一个处理节点,方便后续处理。如果同一通道的数据量很大,可以通过缩小哈希粒度,增加数据通道来减少同一通道的数据量或增加处理节点的资源、节点内并发处理等方式来提高处理节点的吞吐量。
处理节点对处理完的数据进行缓存,排序;然后将靠前的数据发送到存储节点,靠后的数据参与下次排序。在此需要说明的是,技术人员可以根据系统的实际情况,调整缓存策略、缓存池大小,以及发送数据的比例,从而使处理节点输出的日志100%有序。缓存策略可采取缓存数量和缓存时间双项控制,即当缓存日志数量达到一定数量,或缓存时间达到一定时间 进行排序。排序后只输出一定时间以前的数据,新的数据要参与下轮排序。
S202,根据预设的发送策略将所述待存储日志从所述处理节点发送至所述存储节点。
S203,根据所述存储节点的日志缓存情况以及预设的日志缓存条件将所述待存储日志存储于所述存储节点。
在目前的云平台日志缓存系统中,很多云存储服务考虑到存取效率问题,会对小包进行合并。在进行小包合并时会可能会导致多个小包之间乱序,从而导致日志乱序。为了避免该情况的发生,本申请优选实施例判断存储节点的日志缓存情况是否满足预设的日志存储条件,并且仅在判断结果为是的情况下才将所述待存储日志与所述存储节点中的其他待存储日志进行并包存储,若判断结果为否,则在预设的周期之后判断所述存储节点的日志缓存情况是否满足预设的日志存储条件。
具体地,在上述判断的过程中,可基于以下三点进行处理,然而技术人员可以在此基础上设置其他相同能够达到目的的判断条件,这些都属于本申请的保护范围:
(1)判断所述存储节点当前已缓存日志的时间是否超过预设的时间阈值;
(2)判断所述存储节点当前已缓存的日志的数量是否超过预设的数量阈值;
(3)判断所述存储节点当前已缓存的日志的大小是否超过预设的容量阈值。
以图3所示的架构为例,在该具体实施例中,存储节点将根据存储服务的不同对日志数据进行不同的缓存和合并策略,从而避免存储服务对数据进行小包合并,以及保证数据最终“落地”的有序性。举例来说,缓存的条件可设置为“缓存日志时间超过30秒,日志条数达到300条或日志大小达到1MB”一般存储服务都是从这三个维度进行并包,存储节点的功能就是自己实现并包,而且保证业务时间的有序。
基于上述实施例所公开的内容,可以有效解决现有的日志处理系统中由“多服务实例”、“处理层并发”以及“小包合并”引起的日志乱序问题,从而保证整个云平台日志系统的日志的有序性。
为了达到以上技术目的,本申请还提出了一种用户日志存储设备,如图4所示,所述设备应用于包括收集节点、存储节点以及处理节点的日志处理系统中,所述设备预先在所述收集节点和所述处理节点之间设置与各所述处理节点对应的传输通道,该设备包括:
确定模块410,根据待存储日志的用户信息确定与所述待存储日志对应的传输通道,并利用所述传输通道将所述待存储日志发送至所述处理节点,具有相同用户信息的待存储日志所对应的传输通道一致;
发送模块420,根据预设的发送策略将所述待存储日志从所述处理节点发送至所述存储节点;
存储模块430,根据所述存储节点的日志缓存情况以及预设的日志缓存条件将所述待存储日志存储于所述存储节点。
在具体的应用场景中,所述确定模块具体用于:
接收由所述收集层发送的所述待存储日志;
确定所述待存储日志对应的用户,并获取所述用户的所述用户信息;
获取所述用户信息在通过预设的哈希算法处理后得到的数值;
查询当前与所述数值对应的传输通道,生成所述数值、所述传输通道以及所述用户信息之间的对应关系。
在具体的应用场景中,所述发送模块具体用于:
当所述处理节点的缓存池中所缓存的日志的数量和/或时间达到所述缓存阈值时,将所述日志按照接收时间排序处理;
按照所述数据发送比例依次发送所述排序处理后的日志;
其中,所述发送策略至少包括所述数据发送比例以及所述缓存阈值。
在具体的应用场景中,所述缓存阈值包括日志缓存数量阈值以及日志缓存时间阈值,按照所述数据发送比例依次发送所述排序处理后的日志,具体为:
根据所述处理节点的缓存池的容量以及所述数据发送比例确定可发送的日志的数量;
从所述排序处理后的日志中选取所述数量的日志并发送。
在具体的应用场景中,所述存储模块具体用于:
判断所述存储节点的日志缓存情况是否满足预设的日志存储条件;
若判断结果为是,将所述待存储日志与所述存储节点中的其他待存储日志进行并包存储;
若判断结果为否,在预设的周期之后判断所述存储节点的日志缓存情况是否满足预设的日志存储条件。
在具体的应用场景中,所述判断模块具体用于:
判断所述存储节点当前已缓存日志的时间是否超过预设的时间阈值;
或,判断所述存储节点当前已缓存的日志的数量是否超过预设的数量阈值;
或,判断所述存储节点当前已缓存的日志的大小是否超过预设的容量阈值。
通过以上技术方案可以看出,本申请通过在收集节点和处理节点之间预设与各处理节点对应的传输通道的基础上,根据待存储日志的用户信息确定与待存储日志对应的传输通道,并利用传输通道将待存储日志发送至处理节点,在根据预设的发送策略将待存储日志从处理节点发送至存储节点后,根据存储节点的日志缓存情况以及预设的日志缓存条件将待存储日志存储于存储节点。由于具有相同用户信息的待存储日志所对应的传输通道一致,因此可有效地避免日志出现乱序问题,从而保证整个云平台日志系统的有序性。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本发明可以通过硬件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,本发明的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施场景所述的方法。
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本发明所必须的。
本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。
上述本发明序号仅仅为了描述,不代表实施场景的优劣。
以上公开的仅为本发明的几个具体实施场景,但是,本发明并非局限于此,任何本领域的技术人员能思之的变化都应落入本发明的保护范围。

Claims (12)

  1. 一种用户日志存储方法,其特征在于,所述方法应用于包括收集节点、存储节点以及处理节点的日志处理系统中,预先在所述收集节点和所述处理节点之间设置与各所述处理节点对应的传输通道,该方法包括:
    根据待存储日志的用户信息确定与所述待存储日志对应的传输通道,并利用所述传输通道将所述待存储日志发送至所述处理节点,具有相同用户信息的待存储日志所对应的传输通道一致;
    根据预设的发送策略将所述待存储日志从所述处理节点发送至所述存储节点;
    根据所述存储节点的日志缓存情况以及预设的日志缓存条件将所述待存储日志存储于所述存储节点。
  2. 如权利要求1所述的方法,其特征在于,根据待存储日志的用户信息确定与所述待存储日志对应的传输通道,具体为:
    接收由所述收集层发送的所述待存储日志;
    确定所述待存储日志对应的用户,并获取所述用户的所述用户信息;
    获取所述用户信息在通过预设的哈希算法处理后得到的数值;
    查询当前与所述数值对应的传输通道,生成所述数值、所述传输通道以及所述用户信息之间的对应关系。
  3. 如权利要求1所述的方法,其特征在于,根据预设的发送策略将所述待存储日志从所述处理节点发送至所述存储节点,具体为:
    当所述处理节点的缓存池中所缓存的日志的数量和/或时间达到所述缓存阈值时,将所述日志按照接收时间排序处理;
    按照所述数据发送比例依次发送所述排序处理后的日志;
    其中,所述发送策略至少包括所述数据发送比例以及所述缓存阈值。
  4. 如权利要求3所述的方法,其特征在于,所述缓存阈值包括日志缓存数量阈值以及日志缓存时间阈值,按照所述数据发送比例依次发送所述排序处理后的日志,具体为:
    根据所述处理节点的缓存池的容量以及所述数据发送比例确定可发送的日志的数量;
    从所述排序处理后的日志中选取所述数量的日志并发送。
  5. 如权利要求1所述的方法,其特征在于,根据所述存储节点的日志缓存情况以及预设的日志缓存条件将所述待存储日志存储于所述存储节点,具体为:
    判断所述存储节点的日志缓存情况是否满足预设的日志存储条件;
    若判断结果为是,将所述待存储日志与所述存储节点中的其他待存储日志进行并包存储;
    若判断结果为否,在预设的周期之后判断所述存储节点的日志缓存情况是否满足预设的日志存储条件。
  6. 如权利要求5所述的方法,其特征在于,判断所述存储节点的日志缓存情况是否满足预设的日志存储条件,具体为:
    判断所述存储节点当前已缓存日志的时间是否超过预设的时间阈值;
    或,判断所述存储节点当前已缓存的日志的数量是否超过预设的数量阈值;
    或,判断所述存储节点当前已缓存的日志的大小是否超过预设的容量阈值。
  7. 一种用户日志存储设备,其特征在于,所述设备应用于包括收集节点、存储节点以及处理节点的日志处理系统中,所述设备预先在所述收集节点和所述处理节点之间设置与各所述处理节点对应的传输通道,该设备包括:
    确定模块,根据待存储日志的用户信息确定与所述待存储日志对应的传输通道,并利用所述传输通道将所述待存储日志发送至所述处理节点,具有相同用户信息的待存储日志所对应的传输通道一致;
    发送模块,根据预设的发送策略将所述待存储日志从所述处理节点发 送至所述存储节点;
    存储模块,根据所述存储节点的日志缓存情况以及预设的日志缓存条件将所述待存储日志存储于所述存储节点。
  8. 如权利要求7所述的设备,其特征在于,所述确定模块具体用于:
    接收由所述收集层发送的所述待存储日志;
    确定所述待存储日志对应的用户,并获取所述用户的所述用户信息;
    获取所述用户信息在通过预设的哈希算法处理后得到的数值;
    查询当前与所述数值对应的传输通道,生成所述数值、所述传输通道以及所述用户信息之间的对应关系。
  9. 如权利要求7所述的设备,其特征在于,所述发送模块具体用于:
    当所述处理节点的缓存池中所缓存的日志的数量和/或时间达到所述缓存阈值时,将所述日志按照接收时间排序处理;
    按照所述数据发送比例依次发送所述排序处理后的日志;
    其中,所述发送策略至少包括所述数据发送比例以及所述缓存阈值。
  10. 如权利要求7所述的设备,其特征在于,所述缓存阈值包括日志缓存数量阈值以及日志缓存时间阈值,按照所述数据发送比例依次发送所述排序处理后的日志,具体为:
    根据所述处理节点的缓存池的容量以及所述数据发送比例确定可发送的日志的数量;
    从所述排序处理后的日志中选取所述数量的日志并发送。
  11. 如权利要求7所述的设备,其特征在于,所述存储模块具体用于:
    判断所述存储节点的日志缓存情况是否满足预设的日志存储条件;
    若判断结果为是,将所述待存储日志与所述存储节点中的其他待存储日志进行并包存储;
    若判断结果为否,在预设的周期之后判断所述存储节点的日志缓存情况是否满足预设的日志存储条件。
  12. 如权利要求11所述的设备,其特征在于,所述判断模块具体用于:
    判断所述存储节点当前已缓存日志的时间是否超过预设的时间阈值;
    或,判断所述存储节点当前已缓存的日志的数量是否超过预设的数量阈值;
    或,判断所述存储节点当前已缓存的日志的大小是否超过预设的容量阈值。
PCT/CN2016/109674 2015-12-21 2016-12-13 一种用户日志存储方法及设备 WO2017107812A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510965308.4 2015-12-21
CN201510965308.4A CN106899643A (zh) 2015-12-21 2015-12-21 一种用户日志存储方法及设备

Publications (1)

Publication Number Publication Date
WO2017107812A1 true WO2017107812A1 (zh) 2017-06-29

Family

ID=59089082

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/109674 WO2017107812A1 (zh) 2015-12-21 2016-12-13 一种用户日志存储方法及设备

Country Status (2)

Country Link
CN (1) CN106899643A (zh)
WO (1) WO2017107812A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111752895A (zh) * 2020-06-28 2020-10-09 北京经纬恒润科技有限公司 一种多系统级芯片之间的日志存储方法及装置
CN112732999A (zh) * 2021-01-21 2021-04-30 建信金融科技有限责任公司 静态容灾方法、系统、电子设备及存储介质
CN113301285A (zh) * 2021-05-11 2021-08-24 深圳市度信科技有限公司 多通道数据传输方法、装置及系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291928B (zh) * 2017-06-29 2020-03-10 国信优易数据有限公司 一种日志存储系统和方法
CN107979490A (zh) * 2017-11-17 2018-05-01 北京联想超融合科技有限公司 日志数据的记录方法及服务器集群
CN108762984B (zh) * 2018-05-23 2021-05-25 杭州宏杉科技股份有限公司 一种连续性数据备份的方法及装置
CN110245059B (zh) * 2019-05-20 2022-11-08 平安普惠企业管理有限公司 一种数据处理方法、设备及存储介质
CN115086296B (zh) * 2022-05-27 2024-04-05 阿里巴巴(中国)有限公司 一种日志传输系统、日志传输方法及相关装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411533A (zh) * 2011-08-08 2012-04-11 浪潮电子信息产业股份有限公司 一种集群存储系统的日志管理优化方法
CN103368756A (zh) * 2012-03-29 2013-10-23 福建星网视易信息系统有限公司 用于监控嵌入式系统运行的日志的管理系统
CN104883269A (zh) * 2014-02-28 2015-09-02 中国移动通信集团上海有限公司 一种处理ac日志的方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102447633A (zh) * 2011-12-29 2012-05-09 北京亿赞普网络技术有限公司 一种日志传输的方法和系统
US8614821B2 (en) * 2012-01-24 2013-12-24 Xerox Corporation Systems and methods for managing customer replaceable unit monitor (CRUM) paired identifiers using a cloud administration system
CN103312544B (zh) * 2013-06-14 2015-12-02 青岛海信传媒网络技术有限公司 一种控制终端上报日志文件的方法、设备和系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411533A (zh) * 2011-08-08 2012-04-11 浪潮电子信息产业股份有限公司 一种集群存储系统的日志管理优化方法
CN103368756A (zh) * 2012-03-29 2013-10-23 福建星网视易信息系统有限公司 用于监控嵌入式系统运行的日志的管理系统
CN104883269A (zh) * 2014-02-28 2015-09-02 中国移动通信集团上海有限公司 一种处理ac日志的方法和装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111752895A (zh) * 2020-06-28 2020-10-09 北京经纬恒润科技有限公司 一种多系统级芯片之间的日志存储方法及装置
CN112732999A (zh) * 2021-01-21 2021-04-30 建信金融科技有限责任公司 静态容灾方法、系统、电子设备及存储介质
CN112732999B (zh) * 2021-01-21 2023-06-09 建信金融科技有限责任公司 静态容灾方法、系统、电子设备及存储介质
CN113301285A (zh) * 2021-05-11 2021-08-24 深圳市度信科技有限公司 多通道数据传输方法、装置及系统

Also Published As

Publication number Publication date
CN106899643A (zh) 2017-06-27

Similar Documents

Publication Publication Date Title
WO2017107812A1 (zh) 一种用户日志存储方法及设备
US20130081042A1 (en) Dynamic reduction of stream backpressure
US20150127649A1 (en) Efficient implementations for mapreduce systems
CN110795257A (zh) 处理多集群作业记录的方法、装置、设备及存储介质
EP3346379A1 (en) Database management system with dynamic allocation of database requests
US10706027B2 (en) Database management system with dynamic allocation of database requests
Xie et al. Pandas: robust locality-aware scheduling with stochastic delay optimality
US9313270B2 (en) Adaptive asynchronous data replication in a data storage system
CN105159604A (zh) 一种磁盘数据读写方法和系统
KR101719500B1 (ko) 캐싱된 플로우들에 기초한 가속
CN106027595A (zh) 用于cdn节点的访问日志处理方法及系统
US10866894B2 (en) Controlling memory usage in a cache
CN107145394B (zh) 一种针对数据倾斜的均衡负载处理方法及装置
JPWO2018220708A1 (ja) 資源割当システム、管理装置、方法およびプログラム
US10965610B1 (en) Systems and methods for allocating shared resources in multi-tenant environments
WO2022021501A1 (zh) 恶意文件的确定方法及装置
CN103995863B (zh) 一种重复数据删除的方法及装置
CN112988066A (zh) 一种数据处理方法及装置
US11863675B2 (en) Data flow control in distributed computing systems
US9813331B2 (en) Assessing response routes in a network
US10237336B2 (en) Methods and systems for protecting computing resources
CN108471385B (zh) 一种针对分布式系统的流量控制方法及装置
US10587526B2 (en) Federated scheme for coordinating throttled network data transfer in a multi-host scenario
CN110493323A (zh) 基于区块链的公平性文件分发方法、系统及存储介质
TW201828084A (zh) 用戶日誌儲存方法及設備

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16877623

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16877623

Country of ref document: EP

Kind code of ref document: A1