CN109992469B - Method and device for merging logs - Google Patents

Method and device for merging logs Download PDF

Info

Publication number
CN109992469B
CN109992469B CN201711489592.8A CN201711489592A CN109992469B CN 109992469 B CN109992469 B CN 109992469B CN 201711489592 A CN201711489592 A CN 201711489592A CN 109992469 B CN109992469 B CN 109992469B
Authority
CN
China
Prior art keywords
logs
cache
log
persistent
persistent cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711489592.8A
Other languages
Chinese (zh)
Other versions
CN109992469A (en
Inventor
严锁鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
3600 Technology Group Co ltd
Original Assignee
3600 Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 3600 Technology Group Co ltd filed Critical 3600 Technology Group Co ltd
Priority to CN201711489592.8A priority Critical patent/CN109992469B/en
Publication of CN109992469A publication Critical patent/CN109992469A/en
Application granted granted Critical
Publication of CN109992469B publication Critical patent/CN109992469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0787Storage of error reports, e.g. persistent data storage, storage using memory protection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • G06F11/3082Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting the data filtering being achieved by aggregating or compressing the monitored data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method and a device for merging logs, wherein the method comprises the steps of writing received logs into a set non-persistent cache and a persistent cache in a replicative manner, wherein the cache time of the non-persistent cache to the logs is smaller than that of the persistent cache to the logs. Searching the logs with the same field in the non-persistent cache, acquiring and merging various logs meeting preset backtracking conditions from the searched logs with the same field, and caching the merged logs into the persistent cache. According to the embodiment of the invention, the received log is replicatively written into the set non-persistent cache and the persistent cache, so that a large number of acquisition requests can be prevented from being directly distributed into the persistent cache when the log is acquired later, and the log processing pressure of the persistent cache is reduced. And moreover, the logs with the same fields are combined, so that the storage space of the logs can be effectively saved, and the subsequent centralized management of the logs is facilitated.

Description

Method and device for merging logs
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for merging logs.
Background
Journaling is a very broad concept in computer systems, where any program (e.g., operating system kernel, various application servers, etc.) may export journals. Although the content, scale and purpose of the log are different, the overall function is to record the running state of the software and store event information generated by the system.
With the continuous development of computer technology, the current log quantity is more and more large, especially when the log backtracking rule is much (for example, the backtracking rule of the same log includes that log backtracking is performed every 5 minutes or 1 hour or 5 hours), the log needs to be continuously obtained from the cache database storing the log, and the performance of the persistent cache is poor, so that the persistent cache database is stressed greatly, and the processing performance of the log processing system on the log is easily affected. Thus, how to provide enough flexibility for the caching and processing of logs and good scalability is a major challenge at present.
Disclosure of Invention
The present invention has been made in view of the above problems, and it is an object of the present invention to provide a method and apparatus for merging logs that overcomes or at least partially solves the above problems.
According to an aspect of the present invention, there is provided a method of merging logs, including:
writing the received log into a set non-persistent cache and a persistent cache, wherein the cache time of the non-persistent cache to the log is smaller than that of the persistent cache to the log;
searching logs with the same field in the non-persistent cache;
and acquiring and merging various logs meeting preset backtracking conditions from the searched logs in the same field, and caching the merged logs into the persistent cache.
Optionally, if the log with the same field is not found in the non-persistent cache, continuing to find the log with the same field from the persistent cache.
Optionally, before the receiving log replicative write setting of the non-persistent cache and the persistent cache, the method further includes: setting a first-level cache and a second-level cache as non-persistent caches, and setting a third-level cache as a persistent cache.
Optionally, searching the log with the same field in the non-persistent cache includes:
and searching logs with the same field from the first-level cache and the second-level cache in sequence.
Optionally, the first level cache includes a redis cluster, the second level cache includes an aeropike, and the third level cache includes hbase.
Optionally, the method is applied to the kafka system, and the replicative writing of the received log into the set non-persistent cache and persistent cache includes:
distributing the logs with the same fields in the received logs to the same worker travel in the kafka system, wherein the fields comprise user IDs or other unique identification keys of users;
and each worker stroke duplicatively writes the allocated log into the non-persistent cache and the persistent cache.
Optionally, the allocating the logs with the same field in the received log to the same worker travel in the kafka system includes:
the flink key operation is used to distribute the logs with the same fields to the same worker run in the kafka system.
Optionally, searching the log with the same field in the non-persistent cache includes:
and starting backtracking operation when the preset backtracking time is reached, and searching the logs with the same field in the non-persistent cache.
Optionally, the method further comprises:
When reaching the preset backtracking time and starting the backtracking operation, judging whether the difference between the backtracking time and the log writing time exceeds the cache time of the non-persistent cache;
if yes, directly searching the logs with the same fields from the persistent cache.
Optionally, the copying the received log into the set non-persistent cache and persistent cache includes:
and receiving the log, adding a time stamp to the log, and copying the log carrying the time stamp into the set non-persistent cache and the persistent cache.
Optionally, when the preset backtracking time is reached and the backtracking operation is started, determining whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache includes:
when the preset backtracking time is reached and the backtracking operation is started, judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache according to the time stamp of the log.
Optionally, the plurality of logs meeting the preset backtracking condition include:
and the logs corresponding to the plurality of component objects in the complete session action are obtained by determining the plurality of component objects in the complete session action according to the service requirement.
Optionally, the method further comprises: if multiple logs meeting preset backtracking conditions cannot be obtained from the searched logs in the same field, building a topic according to the searched logs in the same field, and setting the maximum backtracking time for the built topic;
searching whether various logs which are the same as the log fields in the topic and meet preset backtracking conditions exist in the non-persistent cache and the persistent cache when the maximum backtracking time is reached;
if yes, merging various logs which have the same field and meet preset backtracking conditions, and caching the merged logs into the persistent cache.
According to another aspect of the present invention, there is also provided an apparatus for merging logs, including:
the writing module is suitable for writing the received log replicability into a non-persistent cache and a persistent cache which are arranged, wherein the cache time of the non-persistent cache to the log is smaller than the cache time of the persistent cache to the log;
the searching module is suitable for searching the logs with the same field in the non-persistent cache;
and the merging module is suitable for acquiring and merging various logs meeting preset backtracking conditions from the searched logs in the same field, and caching the merged logs into the persistent cache.
Optionally, the search module is further adapted to:
if the logs with the same fields are not found in the non-persistent cache, continuing to find the logs with the same fields from the persistent cache.
Optionally, the apparatus further comprises:
the setting module is suitable for setting the first-level cache and the second-level cache as non-persistent caches and setting the third-level cache as persistent caches before the writing module is used for reproducibly writing the received log into the set non-persistent caches and persistent caches.
Optionally, the search module is further adapted to:
and searching logs with the same field from the first-level cache and the second-level cache in sequence.
Optionally, the first level cache includes a redis cluster, the second level cache includes an aeropike, and the third level cache includes hbase.
Optionally, the apparatus is applied to a kafka system, and the writing module is further adapted to:
distributing the logs with the same fields in the received logs to the same worker travel in the kafka system, wherein the fields comprise user IDs or other unique identification keys of users;
and each worker stroke duplicatively writes the allocated log into the non-persistent cache and the persistent cache.
Optionally, the writing module is further adapted to:
the flink key operation is used to distribute the logs with the same fields to the same worker run in the kafka system.
Optionally, the search module is further adapted to:
and starting backtracking operation when the preset backtracking time is reached, and searching the logs with the same field in the non-persistent cache.
Optionally, the apparatus further comprises:
the judging module is suitable for judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache when the preset backtracking time is reached and the backtracking operation is started;
if yes, the searching module directly searches the log with the same field from the persistent cache.
Optionally, the writing module is further adapted to:
and receiving the log, adding a time stamp to the log, and copying the log carrying the time stamp into the set non-persistent cache and the persistent cache.
Optionally, the judging module is further adapted to:
when the preset backtracking time is reached and the backtracking operation is started, judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache according to the time stamp of the log.
Optionally, the plurality of logs meeting the preset backtracking condition include:
And the logs corresponding to the plurality of component objects in the complete session action are obtained by determining the plurality of component objects in the complete session action according to the service requirement.
Optionally, the apparatus further comprises:
the establishing module is suitable for establishing a topic according to the searched logs of the same field if the merging module cannot acquire various logs meeting preset backtracking conditions from the searched logs of the same field, and setting the maximum backtracking time for the established topic;
the searching module searches whether various logs which are the same as the log fields in the topic and meet preset backtracking conditions exist in the non-persistent cache and the persistent cache when the maximum backtracking time is reached;
if yes, the merging module merges various logs which have the same field and meet preset backtracking conditions, and caches the merged logs into the persistent cache.
According to still another aspect of the present invention, there is also provided an electronic apparatus including:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method of merging logs according to any of the embodiments above.
According to yet another aspect of the present invention, there is also provided a computer storage medium, wherein the computer-readable storage medium stores one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of merging logs according to any of the embodiments above.
In the embodiment of the invention, when the log is received, the received log is written into the set non-persistent cache and the persistent cache in a replicative manner, wherein the cache time of the non-persistent cache to the log is smaller than that of the persistent cache to the log. After writing the log, the log with the same field can be searched in the non-persistent cache, multiple logs meeting the preset backtracking condition are obtained from the searched log with the same field and combined, and the combined log is cached in the persistent cache. Therefore, the embodiment of the invention sets the non-persistent cache and the persistent cache, and writes the received log replicative write into the set non-persistent cache and the persistent cache, so that a large number of acquisition requests can be prevented from being directly distributed into the persistent cache when the log is acquired later, and the log processing pressure of the persistent cache is reduced. Furthermore, the logs with the same field are combined, so that the storage space of the logs can be effectively saved, and the subsequent centralized management of the logs is facilitated.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
The above, as well as additional objectives, advantages, and features of the present invention will become apparent to those skilled in the art from the following detailed description of a specific embodiment of the present invention when read in conjunction with the accompanying drawings.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 shows a flow diagram of a method of merging logs according to one embodiment of the invention;
FIG. 2 shows a flow diagram of a method of merging logs according to another embodiment of the present invention;
FIG. 3 shows a schematic diagram of an apparatus for merging logs according to one embodiment of the present invention;
FIG. 4 is a schematic diagram showing the structure of an apparatus for merging logs according to another embodiment of the present invention;
FIG. 5 illustrates a block diagram of a computing device for performing a method of merging logs in accordance with the present invention; and
fig. 6 illustrates a memory unit for holding or carrying program code implementing a method of merging logs according to the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In order to solve the technical problems, the embodiment of the invention provides a method for merging logs, which can be applied to a Kafka system, wherein Kafka is a high-throughput distributed publish-subscribe message system and can process all action stream data in a consumer-scale website, and the main aim of Kafka development is to construct a data processing framework for processing massive logs, user behaviors, website operation statistics and the like. FIG. 1 shows a flow diagram of a method of merging logs according to one embodiment of the invention. Referring to fig. 1, the method at least includes steps S102 to S106.
Step S102, the received log is reproducibly written into a set non-persistent cache and a persistent cache, wherein the cache time of the non-persistent cache to the log is smaller than the cache time of the persistent cache to the log.
In this step, the writing of the log replicability into the set non-persistent cache and the persistent cache refers to writing the log into the set non-persistent cache and writing the log into the set persistent cache after receiving a log. The log may be written into two types of caches (i.e., a non-persistent cache and a persistent cache) at the same time, or written into one type of cache first and then written into the other type of cache, which is not particularly limited in the embodiment of the present invention.
Step S104, searching the logs with the same field in the non-persistent cache.
In this step, the log having the same field means that the log content is different (or the same) but has the same field. The field may include a user ID, other unique identification keys of the user, and the like.
And S106, acquiring and combining various logs meeting preset backtracking conditions from the searched logs with the same field, and caching the combined logs into a persistent cache.
In the embodiment of the invention, when the log is received, the received log is written into the set non-persistent cache and the persistent cache in a replicative manner, wherein the cache time of the non-persistent cache to the log is smaller than that of the persistent cache to the log. After writing the log, the log with the same field can be searched in the non-persistent cache, multiple logs meeting the preset backtracking condition are obtained from the searched log with the same field and combined, and the combined log is cached in the persistent cache. Therefore, the embodiment of the invention sets the non-persistent cache and the persistent cache, and writes the received log replicative write into the set non-persistent cache and the persistent cache, so that a large number of acquisition requests can be prevented from being directly distributed into the persistent cache when the log is acquired later, and the log processing pressure of the persistent cache is reduced. Furthermore, the logs with the same field are combined, so that the storage space of the logs can be effectively saved, and the subsequent centralized management of the logs is facilitated.
Referring to step S102 above, in an embodiment of the present invention, before the receiving log is replicatively written into the set non-persistent cache and persistent cache, two types of caches, namely, the non-persistent cache and the persistent cache, are further set. For example, a first level cache and a second level cache may be set as non-persistent caches, and a third level cache may be set as persistent caches. Wherein, the first level cache can adopt a Redis cluster (namely, a Redis cluster), and the Redis is an open source Key-Value database and provides an API (Application Program Interface ) with multiple languages. The second level cache may employ an aeropike database, which is a distributed scalable NoSql (Not Only SQL, non-relational database). The third level cache may employ hbase databases, which are a distributed, column-oriented open source database. The databases employed by the various levels of caches herein are merely illustrative and may include other types of databases, as embodiments of the present invention are not limited in detail.
In the embodiment of the invention, the caching time of the non-persistent cache to the log can be far less than the caching time of the persistent cache to the log. For example, the non-persistent cache may have a cache duration of 1 hour, 2 hours, etc., and the persistent cache may have a cache duration of 10 days, 20 days, etc., or a specific time may not be set for the cache duration of the persistent cache, that is, the log cached in the persistent cache may be kept as long as the log is not manually deleted. If the non-persistent cache includes a first-level cache and a second-level cache, corresponding cache time periods can be set for the first-level cache and the second-level cache respectively, and the cache time periods of the two-level caches can be the same or different. For example, the buffer time of the first-level buffer and the second-level buffer is 1 hour, so that the logs in the two-level buffer are updated once every 1 hour, that is, the logs in the original buffer are deleted, and the newly received logs are continuously buffered. If the persistent cache includes a third-level cache, the cache duration is 10 days, and similarly, the log in the third-level cache is updated every 10 days, but if the third-level cache is not configured with the cache duration, the log therein can be permanently saved, and the user can manually delete the log in the persistent cache. The embodiment of the invention does not limit the cache time of the non-persistent cache and the persistent cache.
In the embodiment of the invention, the received logs are cached in the databases at all levels instead of the system memory, and the main reason is that if the logs are stored in the system memory, the logs in the memory may be lost when the system is restarted, i.e. the logs cached in the memory are all miss-dropped before the system is restarted, so that only the logs in the persistent cache (such as hbase database) are left. The log can only be obtained from the persistent cache when there is a subsequent log of service usage, which may make the processing performance of the persistent cache unable to keep up with the service requirements. By setting the non-persistent cache and the persistent cache, such as the first-level cache, the second-level cache and the third-level cache, the logs in each level of cache cannot be lost after the system is restarted, and the logs in each level of cache can meet various service requirements, so that the increase of the log processing pressure of the persistent cache is avoided.
Referring to step S104 above, in an embodiment of the present invention, if the non-persistent cache includes a first level cache and a second level cache. When the logs with the same field are searched in the non-persistent cache, the logs with the same field can be searched in the first-level cache and the second-level cache in sequence, namely, whether the logs with the same field are searched in the first-level cache or not is firstly, if yes, various logs meeting the preset backtracking condition can be obtained from the logs with the same field, and if not, the logs with the same field are searched in the second-level cache. Of course, if the log with the same field is still not found in the second-level cache, the log with the same field may also be continuously found from the persistent cache (e.g., the third-level cache).
In one embodiment of the present invention, the process of searching the non-persistent cache for logs with the same field is the process of backtracking the cached logs. When a preset backtracking time is reached and a backtracking operation (i.e., join operation) starts, a log with the same field is looked up in the non-persistent cache. The preset backtracking time may be 5 minutes, 1 hour, 5 hours, etc., which is not limited in the embodiment of the present invention.
In this embodiment, when the preset trace-back time is reached and the trace-back operation is started, it may be further determined whether the difference between the trace-back time and the log writing time exceeds the cache duration of the non-persistent cache, and if so, the log with the same field may be directly searched from the persistent cache. Because the cache time of the non-persistent cache is shorter, namely the storage time of the log is shorter, the log needing to be traced back may not be in the non-persistent cache when the preset tracing time is reached, and therefore, when the difference between the tracing time and the log writing time is judged to exceed the cache time of the non-persistent cache, the log with the same field can be directly searched from the persistent cache, so that unnecessary operation flow is reduced, and the tracing efficiency is improved.
When judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache, judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache according to the time stamp of the log in the non-persistent cache. The time stamp of the log may be a time stamp added to the log after the log is received, and the log carrying the time stamp is written into the set non-persistent cache and persistent cache in a replicative manner.
Referring to step S106 above, in the embodiment of the present invention, the plurality of logs satisfying the preset backtracking condition may be the logs corresponding to the plurality of constituent objects in the complete session action. The plurality of component objects in the complete session action are determined according to the service requirement. For example, in service 1, a complete session action may include a bid, show, and click operation of user traffic, where the bid, show, and click operation is three component objects in the complete session action of service 1. In the service 2, one complete session action can include bidding and displaying of user traffic, and does not include clicking operation, wherein the bidding and displaying are two constituent objects in the complete session action of the service 2. Thus, the complete session actions corresponding to different business requirements are also different.
In this embodiment, if a plurality of kinds of logs satisfying the preset backtracking condition cannot be acquired from among the logs of the same field found. At this time, various logs which do not meet the preset backtracking condition can be treated as special log types, namely, a topic is built according to the searched logs with the same field, and the maximum backtracking time is set for the built topic. And searching whether various logs which are the same as the log fields in the topic and meet the preset backtracking conditions exist in the non-persistent cache and the persistent cache when the maximum backtracking time is reached, if so, merging the various logs which have the same fields and meet the preset backtracking conditions, and caching the merged logs into the persistent cache. The kafka system can use topic to conduct classified management on logs, one business can apply for multiple topics, and the logs of one topic can be shared by multiple businesses.
For example, in the above-mentioned service 1, only the user a traffic bid can be found and the corresponding log is displayed in the backtracking process, and the log corresponding to the click operation of the user a is not found, that is, the backtracking is not successful. At this time, a topic may be established according to the searched log of the user a, and a maximum backtracking time is set for the established topic, for example, the set maximum backtracking time is 5 hours, then after 5 hours, whether the log corresponding to the bidding, displaying and clicking operation of the user a traffic exists is searched again from the non-persistent cache and the persistent cache, if so, the searched logs are merged, and the merged log is cached in the persistent cache.
It should be noted that, for most backtracking operations, the backtracking operation can be completed in seconds, and the set maximum backtracking time refers to the maximum backtracking time that the service end can tolerate, that is, the maximum waiting time that the service end can tolerate to obtain a backtracking success log. Of course, for the user log that is not successful in backtracking, when the maximum backtracking time (such as 5 hours, etc.) is not reached, it is possible that the user log that is not successful in backtracking before satisfies the preset backtracking condition. For example, in the previous backtracking process, only the user a traffic bid may be found, the corresponding log is displayed, but the log corresponding to the click operation of the user a is not found, and the set maximum backtracking time is 5 hours, if the click log is received in the next 1 minute, the process of the backtracking operation may not be performed when the set maximum backtracking time arrives.
Therefore, the maximum backtracking time can be regarded as a spam guarantee, namely whether the backtracking of the log can be successful or not under the condition that the service end tries to go within the maximum tolerance time. Because some business rules are such that even if no backtracking is successful at the maximum tolerated time (i.e., the maximum backtracking time), it is necessary to merge the partially backtracked logs.
With continued reference to step S106, in the present invention, after the log merge is successful, the log can be directly cached in the persistent cache for offline job processing. Of course, after the logs are successfully combined, the logs can be directly distributed outwards by taking the kafka system as the center for real-time job processing.
The embodiment of the invention also provides another method for merging logs. FIG. 2 shows a flow diagram of a method of merging logs according to another embodiment of the present invention. Referring to fig. 2, the method at least includes steps S202 to S212.
Step S202, the logs with the same fields in the received log are distributed to the same worker travel in the kafka-based system. Wherein the field may include a user ID or other unique identification key for the user.
In this step, a flink keyby operation (e.g., hash) may be employed to assign logs with the same fields to the same worker run in the kafka-based system. The link is an open source technology stack similar to spark, and can provide functions of batch processing, stream computing, graph computing, interactive query, machine learning and the like. By distributing the logs with the same field to the same worker travel, the logs with the same field can be ensured to be combined more conveniently later, and the consistency of the logs is ensured effectively when the system is expanded or upgraded. For example, a log with a user ID of "11" is assigned to the worker1, a log with a user ID of "12" is assigned to the worker2, and a log with a user ID of "13" is assigned to the worker3 through a flink key by operation.
In step S204, each worker journey writes the allocated log replicatively into the non-persistent cache and the persistent cache.
In this step, since the logs of the same field have been allocated to the same worker run above, each worker run can write the log of the same field allocated to itself into the non-persistent cache and the persistent cache in replicability. For example, in connection with the above embodiment where worker1 is assigned to the log with the user ID of "11", and worker2 is assigned to the log with the user ID of "12", then worker1 duplicatively writes the log with the user ID of "11" into the non-persistent cache and the persistent cache, worker2 duplicatively writes the log with the user ID of "12" into the non-persistent cache and the persistent cache, and worker3 duplicatively writes the log with the user ID of "13" into the non-persistent cache and the persistent cache.
Step S206, searching whether the logs with the same field exist in the non-persistent cache, if so, executing step S208; if not, go to step S210.
Step S208, obtaining and combining various logs meeting preset backtracking conditions from the searched logs in the same field, caching the combined logs into a persistent cache, and executing step S212.
Step S210, searching whether the logs with the same field are in the persistent cache, if so, executing step S208; if not, go to step S212 and end.
The embodiment of the invention can perform stream processing on the received log based on the kafka system and the flink, and further can realize capacity expansion and capacity reduction of the system through parallelism configuration.
Based on the same inventive concept, an embodiment of the present invention further provides a log merging device, fig. 3 shows a schematic structural diagram of a log merging device according to an embodiment of the present invention, and referring to fig. 3, a log merging device 300 includes a writing module 310, a searching module 320, and a merging module 330.
The function of each component or device of the log merging apparatus 300 according to the embodiment of the present invention will be described, and the connection relationship between each part is as follows:
the writing module 310 is adapted to write the received log in the set non-persistent cache and the persistent cache in a replicative manner, wherein the cache duration of the non-persistent cache to the log is smaller than the cache duration of the persistent cache to the log;
a lookup module 320, coupled to the write module 310, adapted to lookup a log having the same field in the non-persistent cache;
and the merging module 330 is coupled with the searching module 320, and is suitable for acquiring and merging various logs meeting preset backtracking conditions from the logs in the same field, and caching the merged logs into the persistent cache. The plurality of logs meeting the preset backtracking condition may include logs corresponding to a plurality of constituent objects in the complete session action, where the plurality of constituent objects in the complete session action are determined according to service requirements.
In an embodiment of the present invention, the lookup module 320 is further adapted to continue to lookup the log with the same field from the persistent cache if the log with the same field is not found in the non-persistent cache.
In an embodiment of the present invention, the apparatus for merging logs may be further applied to a kafka system, and the writing module 310 is further adapted to allocate the logs having the same field in the received log to the same worker travel in the kafka system, where the field includes a user ID or other unique identification key of the user. Each worker run duplicatively writes the allocated log into the non-persistent cache and the persistent cache.
In an embodiment of the present invention, the write module 310 is further adapted to allocate logs with the same fields into the same worker run in the kafka based system using a flink keyby operation.
In an embodiment of the present invention, the searching module 320 is further adapted to initiate a backtracking operation when a preset backtracking time is reached, and search the non-persistent cache for the log having the same field.
The embodiment of the present invention further provides another log merging device, referring to fig. 4, the log merging device 300 further includes a setting module 340, a judging module 350, and an establishing module 360, in addition to the above modules.
A setting module 340, coupled to the writing module 310, is adapted to set the first level cache and the second level cache as non-persistent caches and the third level cache as persistent caches before the writing module 310 writes the received log replicative write to the set non-persistent caches and persistent caches.
The judging module 350 is coupled to the writing module 310 and the searching module 320, and is adapted to judge whether the difference between the trace back time and the log writing time exceeds the cache duration of the non-persistent cache when reaching the preset trace back time and starting the trace back operation after the writing module 310 is writing the received log into the set non-persistent cache and persistent cache, if yes, the searching module 320 directly searches the log with the same field from the persistent cache.
The establishing module 360 is coupled to the merging module 330 and the searching module 320, and is adapted to establish a topic according to the searched logs of the same field if the merging module 330 cannot obtain a plurality of logs meeting the preset backtracking condition from the searched logs of the same field, and set a maximum backtracking time for the established topic. The searching module 320 searches whether various logs which are the same as the log fields in the topic and meet the preset backtracking condition exist in the non-persistent cache and the persistent cache when the maximum backtracking time is reached, if yes, the merging module 330 merges the various logs which have the same fields and meet the preset backtracking condition, and caches the merged logs into the persistent cache.
In an embodiment of the present invention, the searching module 320 is further adapted to sequentially search the logs with the same field from the first level cache and the second level cache. Wherein, the first level cache may include redis cluster, the second level cache may include aeropike, and the third level cache may include hbase.
In an embodiment of the present invention, the writing module 310 is further adapted to receive the log and add a timestamp to the log, and to write the log carrying the timestamp reproducibly into the set non-persistent cache and persistent cache.
In an embodiment of the present invention, the determining module 350 is further adapted to determine, according to the timestamp of the log, whether the difference between the trace back time and the log writing time exceeds the buffering duration of the non-persistent cache when the trace back time reaches a preset trace back time and the trace back operation is started.
According to any one of the above preferred embodiments or a combination of the preferred embodiments, the following advantageous effects can be achieved according to the embodiments of the present invention:
in the embodiment of the invention, when the log is received, the received log is written into the set non-persistent cache and the persistent cache in a replicative manner, wherein the cache time of the non-persistent cache to the log is smaller than that of the persistent cache to the log. After writing the log, the log with the same field can be searched in the non-persistent cache, multiple logs meeting the preset backtracking condition are obtained from the searched log with the same field and combined, and the combined log is cached in the persistent cache. Therefore, the embodiment of the invention sets the non-persistent cache and the persistent cache, and writes the received log replicative write into the set non-persistent cache and the persistent cache, so that a large number of acquisition requests can be prevented from being directly distributed into the persistent cache when the log is acquired later, and the log processing pressure of the persistent cache is reduced. Furthermore, the logs with the same field are combined, so that the storage space of the logs can be effectively saved, and the subsequent centralized management of the logs is facilitated.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in an apparatus for merging logs according to an embodiment of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
According to still another aspect of the present invention, there is also provided an electronic device including a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method of merging logs according to any of the embodiments above.
According to another aspect of the present invention, there is also provided a computer storage medium storing one or more programs, which when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the method of merging logs according to any of the embodiments above.
For example, FIG. 5 illustrates a computing device that may implement a method of merging logs. The computing device conventionally includes a computer program product or a computer readable medium in the form of a processor 510 and a memory 520. The memory 520 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 520 has a memory space 530 storing program code 531 for performing any of the method steps described above. For example, the memory space 530 storing the program code may include respective program codes 531 for realizing the respective steps in the above method, respectively. The program code can be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a portable or fixed storage unit as shown in fig. 6, for example. The storage unit may have memory segments, memory spaces, etc. arranged similarly to the memory 520 in the computing device of fig. 5. The program code may be compressed, for example, in a suitable form. In general, the memory unit comprises computer readable code 531', i.e. code readable by a processor such as 510, for performing the method steps of the invention, which code, when run by a computing device, causes the computing device to perform the steps in the method described above.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims (22)

1. A method of merging logs, the method being applied to a kafka system, comprising:
writing received logs into a set non-persistent cache and a persistent cache in a replicative manner, wherein the cache time of the non-persistent cache to the logs is smaller than that of the persistent cache to the logs, and the non-persistent cache comprises a first-level cache and a second-level cache;
When reaching the preset backtracking time and starting the backtracking operation, judging whether the difference between the backtracking time and the log writing time exceeds the cache time of the non-persistent cache;
if yes, directly searching the logs with the same fields from the persistent cache;
acquiring and combining various logs meeting preset backtracking conditions from the searched logs in the same field, and caching the combined logs into the persistent cache;
if multiple logs meeting preset backtracking conditions cannot be obtained from the searched logs in the same field, building a topic according to the searched logs in the same field, and setting the maximum backtracking time for the built topic;
searching whether various logs which are the same as the log fields in the topic and meet preset backtracking conditions exist in the non-persistent cache and the persistent cache when the maximum backtracking time is reached;
if yes, merging various logs which have the same field and meet preset backtracking conditions, and caching the merged logs into the persistent cache.
2. The method of claim 1, further comprising:
if the logs with the same fields are not found in the non-persistent cache, continuing to find the logs with the same fields from the persistent cache.
3. The method of claim 1 or 2, wherein prior to the replicative writing of the received log into the set non-persistent cache and persistent cache, further comprising: setting a first-level cache and a second-level cache as non-persistent caches, and setting a third-level cache as a persistent cache.
4. The method of claim 3, wherein looking up the log with the same field in the non-persistent cache comprises:
and searching logs with the same field from the first-level cache and the second-level cache in sequence.
5. The method of claim 3, wherein the first level cache comprises a redis cluster, the second level cache comprises an aerosepike, and the third level cache comprises hbase.
6. The method of claim 1 or 2, wherein the method is applied to a kafka system, the replicative writing of a received log into a set non-persistent cache and persistent cache, comprising:
distributing the logs with the same fields in the received logs to the same worker travel in the kafka system, wherein the fields comprise user IDs or other unique identification keys of users;
and each worker stroke duplicatively writes the allocated log into the non-persistent cache and the persistent cache.
7. The method of claim 6, wherein the assigning logs with the same fields in the received log to the same worker run in the kafka-based system comprises:
the flink key operation is used to distribute the logs with the same fields to the same worker run in the kafka system.
8. The method of claim 1, wherein the replicatively writing the received log into the set non-persistent cache and persistent cache comprises:
and receiving the log, adding a time stamp to the log, and copying the log carrying the time stamp into the set non-persistent cache and the persistent cache.
9. The method of claim 8, wherein when the preset trace back time is reached and the trace back operation is started, determining whether a difference between the trace back time and the log writing time exceeds a cache duration of the non-persistent cache comprises:
when the preset backtracking time is reached and the backtracking operation is started, judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache according to the time stamp of the log.
10. The method according to claim 1 or 2, wherein the plurality of logs satisfying a preset backtracking condition include:
And the logs corresponding to the plurality of component objects in the complete session action are obtained by determining the plurality of component objects in the complete session action according to the service requirement.
11. An apparatus for merging logs, wherein the apparatus is applied to a kafka system, comprising:
the writing module is suitable for writing the received log into a set non-persistent cache and a persistent cache, wherein the cache time of the non-persistent cache to the log is smaller than that of the persistent cache to the log, and the non-persistent cache comprises a first-level cache and a second-level cache;
the judging module is suitable for judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache when the preset backtracking time is reached and the backtracking operation is started;
if yes, the searching module is suitable for directly searching the logs with the same fields from the persistent cache;
the merging module is suitable for acquiring and merging various logs meeting preset backtracking conditions from the searched logs in the same field, and caching the merged logs into the persistent cache;
the establishing module is suitable for establishing a topic according to the searched logs of the same field if the merging module cannot acquire various logs meeting preset backtracking conditions from the searched logs of the same field, and setting the maximum backtracking time for the established topic;
The searching module searches whether various logs which are the same as the log fields in the topic and meet preset backtracking conditions exist in the non-persistent cache and the persistent cache when the maximum backtracking time is reached;
if yes, the merging module merges various logs which have the same field and meet preset backtracking conditions, and caches the merged logs into the persistent cache.
12. The apparatus of claim 11, wherein the lookup module is further adapted to:
if the logs with the same fields are not found in the non-persistent cache, continuing to find the logs with the same fields from the persistent cache.
13. The apparatus of claim 11 or 12, further comprising:
the setting module is suitable for setting the first-level cache and the second-level cache as non-persistent caches and setting the third-level cache as persistent caches before the writing module is used for reproducibly writing the received log into the set non-persistent caches and persistent caches.
14. The apparatus of claim 13, wherein the lookup module is further adapted to:
and searching logs with the same field from the first-level cache and the second-level cache in sequence.
15. The apparatus of claim 13, wherein the first level cache comprises a redis cluster, the second level cache comprises an aeropike, and the third level cache comprises a hbase.
16. The apparatus of claim 11 or 12, the writing module further adapted to:
distributing the logs with the same fields in the received logs to the same worker travel in the kafka system, wherein the fields comprise user IDs or other unique identification keys of users;
and each worker stroke duplicatively writes the allocated log into the non-persistent cache and the persistent cache.
17. The apparatus of claim 16, wherein the writing module is further adapted to:
the flink key operation is used to distribute the logs with the same fields to the same worker run in the kafka system.
18. The apparatus of claim 17, wherein the writing module is further adapted to:
and receiving the log, adding a time stamp to the log, and copying the log carrying the time stamp into the set non-persistent cache and the persistent cache.
19. The apparatus of claim 18, wherein the determination module is further adapted to:
When the preset backtracking time is reached and the backtracking operation is started, judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache according to the time stamp of the log.
20. The apparatus according to claim 11 or 12, wherein the plurality of logs satisfying a preset backtracking condition include:
and the logs corresponding to the plurality of component objects in the complete session action are obtained by determining the plurality of component objects in the complete session action according to the service requirement.
21. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions which when executed cause the processor to perform the method of merging logs according to any of claims 1-10.
22. A computer storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of merging logs according to any of claims 1-10.
CN201711489592.8A 2017-12-29 2017-12-29 Method and device for merging logs Active CN109992469B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711489592.8A CN109992469B (en) 2017-12-29 2017-12-29 Method and device for merging logs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711489592.8A CN109992469B (en) 2017-12-29 2017-12-29 Method and device for merging logs

Publications (2)

Publication Number Publication Date
CN109992469A CN109992469A (en) 2019-07-09
CN109992469B true CN109992469B (en) 2023-08-18

Family

ID=67110676

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711489592.8A Active CN109992469B (en) 2017-12-29 2017-12-29 Method and device for merging logs

Country Status (1)

Country Link
CN (1) CN109992469B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656645B (en) * 2020-05-12 2024-08-20 北京字节跳动网络技术有限公司 Log consumption method and device
CN112000698B (en) * 2020-08-25 2023-09-19 青岛海尔科技有限公司 Log recording method and device, storage medium and electronic device
CN112667686B (en) * 2020-12-30 2024-07-05 中国农业银行股份有限公司 Real-time stream data splicing method and device
CN112988741B (en) * 2021-02-04 2024-08-16 北京淇瑀信息科技有限公司 Real-time service data merging method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100725415B1 (en) * 2005-12-24 2007-06-07 삼성전자주식회사 Log compaction method and apparatus of database
CN105138481A (en) * 2014-05-30 2015-12-09 华为技术有限公司 Stored data processing method and apparatus and system
CN106502875A (en) * 2016-10-21 2017-03-15 过冬 A kind of daily record generation method and system based on cloud computing
CN106649627A (en) * 2016-12-06 2017-05-10 杭州迪普科技股份有限公司 Log searching method and device
CN106775498A (en) * 2017-01-23 2017-05-31 深圳国泰安教育技术股份有限公司 A kind of data cached synchronous method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100725415B1 (en) * 2005-12-24 2007-06-07 삼성전자주식회사 Log compaction method and apparatus of database
CN105138481A (en) * 2014-05-30 2015-12-09 华为技术有限公司 Stored data processing method and apparatus and system
CN106502875A (en) * 2016-10-21 2017-03-15 过冬 A kind of daily record generation method and system based on cloud computing
CN106649627A (en) * 2016-12-06 2017-05-10 杭州迪普科技股份有限公司 Log searching method and device
CN106775498A (en) * 2017-01-23 2017-05-31 深圳国泰安教育技术股份有限公司 A kind of data cached synchronous method and system

Also Published As

Publication number Publication date
CN109992469A (en) 2019-07-09

Similar Documents

Publication Publication Date Title
US11455217B2 (en) Transaction consistency query support for replicated data from recovery log to external data stores
CN109992469B (en) Method and device for merging logs
AU2014212780B2 (en) Data stream splitting for low-latency data access
CN111209352B (en) Data processing method and device, electronic equipment and storage medium
US20170155707A1 (en) Multi-level data staging for low latency data access
US20170075965A1 (en) Table level distributed database system for big data storage and query
US10417265B2 (en) High performance parallel indexing for forensics and electronic discovery
CN104133867A (en) DOT in-fragment secondary index method and DOT in-fragment secondary index system
CN105049287A (en) Log processing method and log processing devices
CN103620601A (en) Joining tables in a mapreduce procedure
CN102810089A (en) Short link system based on content and implementation method thereof
CN108563697B (en) Data processing method, device and storage medium
CN110737682A (en) cache operation method, device, storage medium and electronic equipment
CN111046041B (en) Data processing method and device, storage medium and processor
CN105574054A (en) Distributed cache range query method, apparatus and system
CN113760847A (en) Log data processing method, device, equipment and storage medium
US9092338B1 (en) Multi-level caching event lookup
CN116760661A (en) Data storage method, apparatus, computer device, storage medium, and program product
CN111427910A (en) Data processing method and device
Wang et al. Event Indexing and Searching for High Volumes of Event Streams in the Cloud
CN106934044B (en) Data processing method and device
CN113268483B (en) Request processing method and device, electronic equipment and storage medium
CN114218471A (en) Data query method, device, system, electronic equipment and storage medium
US20130282654A1 (en) Query engine communication
CN111352897A (en) Real-time data storage method, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230727

Address after: Room 03, 2nd Floor, Building A, No. 20 Haitai Avenue, Huayuan Industrial Zone (Huanwai), Binhai New Area, Tianjin, 300450

Applicant after: 3600 Technology Group Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Applicant before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant