CN110324211A - A kind of data capture method and device - Google Patents

A kind of data capture method and device Download PDF

Info

Publication number
CN110324211A
CN110324211A CN201910731693.4A CN201910731693A CN110324211A CN 110324211 A CN110324211 A CN 110324211A CN 201910731693 A CN201910731693 A CN 201910731693A CN 110324211 A CN110324211 A CN 110324211A
Authority
CN
China
Prior art keywords
initial data
statistical
period
collected
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910731693.4A
Other languages
Chinese (zh)
Inventor
李善任
董会存
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puxin Hengye Technology Development (beijing) Co Ltd
Yiren Hengye Technology Development (beijing) Co Ltd
Original Assignee
Puxin Hengye Technology Development (beijing) Co Ltd
Yiren Hengye Technology Development (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puxin Hengye Technology Development (beijing) Co Ltd, Yiren Hengye Technology Development (beijing) Co Ltd filed Critical Puxin Hengye Technology Development (beijing) Co Ltd
Priority to CN201910731693.4A priority Critical patent/CN110324211A/en
Publication of CN110324211A publication Critical patent/CN110324211A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • H04L43/067Generation of reports using time frame reporting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data capture method and devices, this method comprises: acquiring log information in real time;Using statistical rules as foundation, it is based on collected log information, obtains the initial data of statistical correlation, first message queue is written into the initial data of acquisition;From first message queue, initial data corresponding with the current statistic period is collected;When being collected into initial data, according to the current initial data, the statistical result of first time period and statistical rules being collected into, the statistical result of second time period is obtained.The present invention can go out the statistical result in the current statistic period by the end of current time in the moment for collecting log information, real-time statistics, realize the period statistics of real-time.

Description

A kind of data capture method and device
Technical field
This application involves technical field of data processing more particularly to a kind of data capture methods and device.
Background technique
In Working Life, it is often necessary to be counted (hereinafter referred to as the period counts) to the data in some cycles, i.e., The data value in a certain period of time is counted, for example when crawling data using crawler, needs to count the crawl in a period of time Success rate.
But in the prior art, it generally requires after a measurement period, it could be based on the data in the period Realize period statistics.Such as when statistics grabs success rate, after a measurement period, according to grabbing in the measurement period Successfully sum and crawl sum is taken to obtain the crawl success rate in the period, the period real-time of statistics is poor, and there are certain Delay, data acquisition it is ineffective.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of data capture method and device, it is able to solve in the prior art The problem of period statistics real-time difference.
The embodiment of the present application first aspect provides a kind of data capture method, comprising:
Acquisition log information in real time;
Using statistical rules as foundation, it is based on collected log information, the initial data of statistical correlation is obtained, by acquisition First message queue is written in initial data;
From the first message queue, initial data corresponding with the current statistic period is collected;
When being collected into initial data corresponding with the current statistic period, according to the current initial data being collected into, first The statistical result of period and the statistical rules, obtain the statistical result of second time period;
Wherein, the first time period includes at the beginning of the current statistic period to the preceding initial data being once collected into At the time of correspondence, the second time period includes the extremely described current initial data correspondence at the beginning of the current statistic period At the time of;The statistical result of the first time period is based on the extremely described preceding primary collection at the beginning of the current statistic period To initial data correspond at the time of between corresponding each initial data and the statistical rules obtain.
Optionally, the first message queue is realized based on KafKa.
Optionally, described using statistical rules as foundation, it is based on collected log information, obtains the original number of statistical correlation According to specifically including:
When there is field matched with the statistical rules in the collected log information, based on it is described with it is described The matched field of statistical rules, obtains the initial data.
Optionally, when measurement period is multiple, the initial data includes participating in the data and measurement period mark of statistics Know;It is described to be based on the described and matched field of the statistical rules, the initial data is obtained, is specifically included:
It is identified, is obtained based on the measurement period corresponding with the matched field of the statistical rules and each measurement period Multiple initial data;
Wherein, the multiple initial data obtained and the measurement period correspond, and the initial data each obtained carries The measurement period mark of corresponding measurement period.
Optionally, it is described obtain the statistical result of second time period after, further includes:
Second message queue is written into obtained statistical result;
The statistical result in the second message queue is obtained, default storage region is written in the statistical result that will acquire.
Optionally, the second message queue is realized based on KafKa.
The embodiment of the present application second aspect provides a kind of data acquisition facility, comprising:
Acquisition module, for acquiring log information in real time;
Sorting module, for being based on the collected log information of the acquisition module, being united using statistical rules as foundation Count relevant initial data;
First message queue is written in first writing module, the initial data for obtaining the sorting module;
Collection module, for collecting initial data corresponding with the current statistic period from the first message queue;
Statistical module, the statistics knot of current initial data, first time period for being collected into according to the collection module Fruit and the statistical rules, obtain the statistical result of second time period;
Wherein, the first time period includes at the beginning of the current statistic period to the preceding initial data being once collected into At the time of correspondence, the second time period includes the extremely described current initial data correspondence at the beginning of the current statistic period At the time of;The statistical result of the first time period is based on the extremely described preceding primary collection at the beginning of the current statistic period To initial data correspond at the time of between corresponding each initial data and the statistical rules obtain.
Optionally, further includes:
Second writing module, for obtaining by the statistical result in statistical module write-in second message queue, and will Default storage region is written in the statistical result got.
The embodiment of the present application third aspect provides a kind of computer readable storage medium, is stored thereon with computer journey Sequence realizes the data recipient provided such as the embodiment of the present application first aspect when the computer program is executed by processor Any one in method.
The embodiment of the present application fourth aspect provides a kind of server, comprising: processor and memory;
The memory is transferred to the processor for storing program code, and by said program code;
The processor, for executing as the embodiment of the present application first aspect is mentioned according to the instruction in said program code Any one in the data capture method of confession.
Compared with prior art, the application has at least the following advantages:
In the embodiment of the present application, acquisition log information in real time first, then using statistical rules as foundation, based on collecting Log information obtain the initial data of statistical correlation, first message queue is written into the initial data of acquisition.Then, from first In message queue, initial data corresponding with the current statistic period is collected, and be collected into original corresponding with the current statistic period When beginning data, according to the initial data being collected into, statistical result and corresponding statistical rules known to the current statistic period, obtain Statistical result of the current statistic period at current time.The embodiment of the present application can be in the moment for collecting log information, in real time The statistical result in the current statistic period by the end of current time is counted, the period statistics of real-time is realized.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts, It can also be obtained according to these attached drawings other attached drawings.
Fig. 1 is a kind of flow diagram of data capture method provided by the embodiments of the present application;
Fig. 2 is a kind of schematic diagram of measurement period in the embodiment of the present application;
Fig. 3 is the flow diagram of another data capture method provided by the embodiments of the present application;
Fig. 4 is a kind of structural schematic diagram of data acquisition facility provided by the embodiments of the present application;
Fig. 5 is the structural schematic diagram of another data acquisition facility provided by the embodiments of the present application.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.
It should be appreciated that in this application, " at least one (item) " refers to one or more, and " multiple " refer to two or two More than a."and/or" indicates may exist three kinds of relationships, for example, " A and/or B " for describing the incidence relation of affiliated partner It can indicate: only exist A, only exist B and exist simultaneously tri- kinds of situations of A and B, wherein A, B can be odd number or plural number.Word Symbol "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or"." at least one of following (a) " or its similar expression, refers to Any combination in these, any combination including individual event (a) or complex item (a).At least one of for example, in a, b or c (a) can indicate: a, b, c, " a and b ", " a and c ", " b and c ", or " a and b and c ", and wherein a, b, c can be individually, can also To be multiple.
In order to realize that the period statistics of real-time, the embodiment of the present application provide a kind of data capture method and device, base In message queue mechanism, the real-time period statistics of low cost is realized.First statistical correlation is obtained from collected log information Simultaneously first message queue is written in initial data.Original number corresponding with the current statistic period is obtained from first message queue again According to according to the initial data and statistical rules, the period statistical result at current time being obtained, to realize real-time period system Meter can effectively reduce development and maintenance cost, increase the flexibility of business intervention, when simultaneously effective shortening full link data stream Prolong, real time implementation calculating logic, divide calculating cost, the final business demand for effectively meeting processing big data in real time.Also, due to Collecting thread will be in the initial data write-in first message queue of acquisition, additionally it is possible to avoid the risk of loss of data.
Based on above-mentioned thought, in order to make the above objects, features, and advantages of the present application more apparent, below with reference to Attached drawing is described in detail the specific embodiment of the application.
Referring to Fig. 1, which is a kind of flow diagram of data capture method provided by the embodiments of the present application.
Data capture method provided by the embodiments of the present application, comprising:
S101: log information is acquired in real time.
In the embodiment of the present application, its log information can be acquired from least one object, include in the log information Need statistical data (participating in the data of statistics).For example, when being counted to crawl success rate, collected log letter It include the successful number of crawl sum and crawl in breath.
In practical applications, real-time acquisition of the collecting thread realization to log information can be disposed.The collecting thread can be with It is realized by data collection client (such as Filebeat).
S102: using statistical rules as foundation, being based on collected log information, obtain the initial data of statistical correlation, will First message queue is written in the initial data of acquisition.
In the embodiment of the present application, using statistical rules as foundation, the arrangement of data is carried out based on collected log information, Obtain the initial data of statistical correlation.In the specific implementation, which can realize also with collecting thread, will Obtained initial data write-in first message queue, so that can be received from first message queue when carrying out period statistics Collect initial data corresponding with measurement period, carries out real-time period statistics.
In practical applications, the corresponding statistical rules of statistics thread can be set according to actual statistical demand, Statistical rules can specifically include: the field identification of statistical correlation and the rule of calculating etc..As an example, in statistics crawler When the crawl success rate of system, statistical rules can specifically include to crawl sum and crawling successfully count relevant field identification and Crawl the computation rule of success rate.It is then possible to be believed according to the field information for including in statistical rules based on collected log Breath carries out data preparation, obtains the initial data of statistical correlation, and first message queue is written.
It should be noted that because there is system event in the initial data write-in first message queue that will acquire It will not lead to loss of data when barrier, ensure that real-time period statistical accuracy and efficiency.In practical applications, first message Queue can be realized based on KafKa.KafKa can automatically record the data in each stage, avoid the generation of event of data loss.
In some possible implementations of the embodiment of the present application, step S102 be can specifically include:
When there is field matched with statistical rules in collected log information, it is based on and the matched word of statistical rules Section, obtains initial data.
It is understood that the field for the field identification for meeting statistical correlation is referred to the matched field of statistical rules, It such as crawls the corresponding field of sum and crawls and successfully count corresponding field.In practical applications, any one word can be used Section identification and matching algorithm judge in collected log information whether there is with the matched field of statistical rules, here without Restriction also will not enumerate.
In some possible implementations of the embodiment of the present application, can once to the corresponding data of multiple measurement periods into Row real-time statistics.It can be each statistics week to meet the statistical demand of different measurement periods when measurement period is multiple Phase generates corresponding initial data.In order to identify the corresponding initial data of different measurement periods, which specifically be can wrap It includes: participating in the data (such as crawl is total and grabs successfully number) and measurement period mark of statistics.Then, step S102 specifically can be with Include:
It is identified, is obtained multiple original based on measurement period corresponding with the matched field of statistical rules and each measurement period Data.
In the embodiment of the present application, measurement period identify, for mark the corresponding measurement period of the initial data and its Temporal information in corresponding measurement period, such as it is n-th of statistics that measurement period mark, which can mark the initial data, M-th of initial data in period.Obtained multiple initial data and measurement period correspond, and the initial data each obtained is taken Measurement period mark with corresponding measurement period.First message queue is written into obtained multiple initial data, so as into When the row period counts, the initial data needed for being collected into current statistic in first message queue is identified based on the measurement period, Accurately realize the real-time statistics to data in each measurement period.
S103: from first message queue, initial data corresponding with the current statistic period is collected.
In the embodiment of the present application, to the execution of step S101-S102 and step S103 sequence without limiting, the two it Between do not interfere with each other, can independently execute.In practical applications, it can use different threads and execute step S101-S102 and step Rapid S103.
S104: when being collected into initial data corresponding with the current statistic period, according to the current initial data being collected into, The statistical result and statistical rules of first time period, obtain the statistical result of second time period.
In the embodiment of the present application, first time period includes once being collected at the beginning of the current statistic period to preceding At the time of initial data corresponds to, when second time period includes corresponding to current initial data at the beginning of the current statistic period It carves;The statistical result of first time period based on the current statistic period at the beginning of it is corresponding to the preceding initial data that is once collected into At the time of between corresponding each initial data and statistical rules obtain.That is, in a upper statistical result in current statistic period On the basis of (i.e. the statistical result of first time period), in addition the current initial data that current collection arrives, is based on statistical rules, when By the end of the statistical result (i.e. the statistical result of second time period) at current time in preceding measurement period.It is understood that working as The current initial data being collected into be in corresponding measurement period first be collected into initial data when, can directly according to should Current initial data and corresponding statistical rules obtain statistical result.
Such as data axis shown in Fig. 2, the current statistic period is t2 to t3, in T moment, T1 moment, T2 moment and T3 Quarter is collected into corresponding initial data respectively, then is first based on T1 moment corresponding initial data and statistical rules, obtains t2 to T1 Moment corresponds to the statistical result of period, then, when being collected into T2 moment corresponding initial data, then based on t2 to T1 moment Statistical result, T2 moment corresponding initial data and the statistical rules of corresponding period obtains t2 to the T2 moment and corresponds to the period Statistical result, and so on, finish, obtained in entire measurement period until by the corresponding each original data processing of t2 to t3 Final statistical result.
In practical applications, step S103-S104 can use the statistics thread different from collecting thread and realize, count line Journey can set computation rule used in statistics thread according to actual statistical demand, for example, statistics thread can prop up It holds maximum value (max), minimum value (min), count (count), summation (sum), duplicate removal (distinct), inquiry (where), phase Add (add), subtract each other (subtract), multiplication (multiply) and the atom operations such as (divide) that are divided by.In the specific implementation, it counts Thread utilizes KafKa streaming computing frame.KafKa is the Computational frame of a lightweight, it is not necessary to great amount of cost be spent to dispose one It covers special heavy computing cluster and realizes real-time period statistics, can effectively reduce development and maintenance cost.
It should also be noted that, collecting thread and statistics thread can be added in the form of jar packet in practical application It, can more light weight be flexible and convenient combines service logic and calculating logic in different projects, it is not necessary to spend a large amount of The a set of special heavy computing cluster of cost deployment, and equally can satisfy real-time calculating demand.
In some possible implementations of the application, in order to avoid the loss of statistical result, as shown in figure 3, in step After S104, can also include:
S301: second message queue is written into obtained statistical result.
S302: obtaining the statistical result in second message queue, and default storage region is written in the statistical result that will acquire.
In the embodiment of the present application, default storage region can be message queue, be also possible to other databases etc., here Without limiting.In practical applications, some front end frames can be used based on the statistical result being written in default storage region Chart UI demonstration tool intuitively shows cycle result.
In the specific implementation, step S301 also can use statistics thread and realize, when statistics thread obtains current slot After the statistical result of (such as second time period), second message queue is written into obtained statistical result.Step S302 can benefit It is realized with the write-in thread different from collecting thread and statistics thread, write-in thread is independent to obtain system from second message queue For meter as a result, default storage region is written in the statistical result that will acquire, realization shows the long-term preservation of statistical result with intuitive.
In practical applications, second message queue can also be realized based on KafKa, specifically similar with first message queue, It is referred to the explanation to first message queue, which is not described herein again.
It should be noted that needing to tie statistics newest in measurement period to describe the statistical result at current time Last statistical result is fallen in fruit manifolding, can be by safeguarding a timer come periodic refreshing system in memory in practical application It counts result to realize, or after persistent storage is written in statistical result, carries out express query with compound primary key, what determination need to make carbon copies Update operation is done after statistical result to realize.
In the embodiment of the present application, acquisition log information in real time first, then using statistical rules as foundation, based on collecting Log information obtain the initial data of statistical correlation, first message queue is written into the initial data of acquisition.Then, from first In message queue, initial data corresponding with the current statistic period is collected, and be collected into original corresponding with the current statistic period When beginning data, according to the initial data being collected into, statistical result and corresponding statistical rules known to the current statistic period, obtain Statistical result of the current statistic period at current time.The embodiment of the present application can be in the moment for collecting log information, in real time The statistical result in the current statistic period by the end of current time is counted, the period statistics of real-time is realized.
The data capture method provided based on the above embodiment, the embodiment of the present application also provides a kind of data acquisition dresses It sets.
Referring to fig. 4, which is a kind of structural schematic diagram of data acquisition facility provided by the embodiments of the present application.
Data acquisition facility provided by the embodiments of the present application, comprising:
Acquisition module 100, for acquiring log information in real time;
Sorting module 200 is obtained for being based on the collected log information of acquisition module 100 using statistical rules as foundation The initial data of statistical correlation;
First message queue is written in first writing module 300, the initial data for obtaining sorting module 200;
Collection module 400, for collecting initial data corresponding with the current statistic period from first message queue;
Statistical module 500, current initial data, the statistics of first time period for being collected into according to collection module 400 As a result and statistical rules, the statistical result of second time period is obtained;
Wherein, first time period includes the extremely preceding initial data correspondence being once collected at the beginning of the current statistic period At the time of, second time period includes first time period at the time of correspondence at the beginning of the current statistic period to current initial data Statistical result based on the current statistic period at the beginning of to the preceding initial data being once collected into correspond at the time of between it is right The each initial data and statistical rules answered obtain.
In the specific implementation, first message queue can be realized based on KafKa.
In some possible implementations of the embodiment of the present application, acquisition module 100 specifically can be used for: adopt when described When there is field matched with the statistical rules in the log information collected, based on the described and matched word of the statistical rules Section, obtains the initial data.
In some possible implementations of the embodiment of the present application, when measurement period is multiple, the raw data packets Include the data for participating in statistics and measurement period mark;Acquisition module 100, specifically can be used for:
It is identified, is obtained based on the measurement period corresponding with the matched field of the statistical rules and each measurement period Multiple initial data;
Wherein, the multiple initial data obtained and the measurement period correspond, and the initial data each obtained carries The measurement period mark of corresponding measurement period.
In some possible implementations of the embodiment of the present application, as shown in figure 5, the data acquisition facility, can also wrap It includes:
Second writing module 600, for obtaining the statistical result being written in second message queue by statistical module 500, and Default storage region is written in the statistical result that will acquire.
Optionally, second message queue can be realized based on KafKa.
In the specific implementation, which can be added in different projects in the form of jar packet, can be with More light weight is flexible and convenient to combine service logic and calculating logic, it is not necessary to great amount of cost be spent to dispose a set of special heavy type Computing cluster, and equally can satisfy real-time calculating demand.
In the embodiment of the present application, acquisition log information in real time first, then using statistical rules as foundation, based on collecting Log information obtain the initial data of statistical correlation, first message queue is written into the initial data of acquisition.Then, from first In message queue, initial data corresponding with the current statistic period is collected, and be collected into original corresponding with the current statistic period When beginning data, according to the initial data being collected into, statistical result and corresponding statistical rules known to the current statistic period, obtain Statistical result of the current statistic period at current time.The embodiment of the present application can be in the moment for collecting log information, in real time The statistical result in the current statistic period by the end of current time is counted, the period statistics of real-time is realized.
The data capture method and device provided based on the above embodiment, the embodiment of the present application also provides a kind of computers Readable storage medium storing program for executing is stored thereon with computer program, when the computer program is executed by processor, realizes such as above-mentioned implementation Example provide data capture method in any one.
The data capture method and device provided based on the above embodiment, the embodiment of the present application also provides a kind of services Device, comprising: processor and memory;
The memory is transferred to the processor for storing program code, and by said program code;
The processor, for executing as data provided by the above embodiment are obtained according to the instruction in said program code Take any one in method.
It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment emphasis is said Bright is the difference from other embodiments, and the same or similar parts in each embodiment may refer to each other.For reality For applying system disclosed in example, since it is corresponded to the methods disclosed in the examples, so description is fairly simple, related place ginseng See method part illustration.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
The above is only the preferred embodiment of the application, not makes any form of restriction to the application.Though Right the application has been disclosed in a preferred embodiment above, however is not limited to the application.It is any to be familiar with those skilled in the art Member, in the case where not departing from technical scheme ambit, all using the methods and technical content of the disclosure above to the application Technical solution makes many possible changes and modifications or equivalent example modified to equivalent change.Therefore, it is all without departing from The content of technical scheme, any simple modification made to the above embodiment of the technical spirit of foundation the application are equal Variation and modification, still fall within technical scheme protection in the range of.

Claims (10)

1. a kind of data capture method characterized by comprising
Acquisition log information in real time;
Using statistical rules as foundation, it is based on collected log information, the initial data of statistical correlation is obtained, by the original of acquisition First message queue is written in data;
From the first message queue, initial data corresponding with the current statistic period is collected;
When being collected into initial data corresponding with the current statistic period, according to be collected into current initial data, first time The statistical result of section and the statistical rules, obtain the statistical result of second time period;
Wherein, the first time period includes the extremely preceding initial data correspondence being once collected at the beginning of the current statistic period At the time of, when the second time period includes corresponding to the current initial data at the beginning of the current statistic period It carves;The statistical result of the first time period based on the current statistic period at the beginning of to being once collected into before described Corresponding each initial data and the statistical rules obtain between at the time of initial data corresponds to.
2. the method according to claim 1, wherein the first message queue is realized based on KafKa.
3. method according to claim 1 or 2, which is characterized in that it is described using statistical rules as foundation, based on collected Log information obtains the initial data of statistical correlation, specifically includes:
When there is field matched with the statistical rules in the collected log information, based on the described and statistics The field of rule match obtains the initial data.
4. according to the method described in claim 3, it is characterized in that, the initial data includes when measurement period is multiple Participate in the data and measurement period mark of statistics;It is described to be based on the described and matched field of the statistical rules, obtain the original Beginning data, specifically include:
It is identified, is obtained multiple based on the measurement period corresponding with the matched field of the statistical rules and each measurement period Initial data;
Wherein, the multiple initial data obtained and the measurement period correspond, and the initial data each obtained, which carries, to be corresponded to The measurement period of measurement period identifies.
5. method according to claim 1 or 2, which is characterized in that the statistical result for obtaining second time period it Afterwards, further includes:
Second message queue is written into obtained statistical result;
The statistical result in the second message queue is obtained, default storage region is written in the statistical result that will acquire.
6. according to the method described in claim 5, it is characterized in that, the second message queue is realized based on KafKa.
7. a kind of data acquisition facility characterized by comprising
Acquisition module, for acquiring log information in real time;
Sorting module obtains statistics phase for being based on the collected log information of the acquisition module using statistical rules as foundation The initial data of pass;
First message queue is written in first writing module, the initial data for obtaining the sorting module;
Collection module, for collecting initial data corresponding with the current statistic period from the first message queue;
Statistical module, current initial data, the statistical result of first time period for being collected into according to the collection module and The statistical rules obtains the statistical result of second time period;
Wherein, the first time period includes the extremely preceding initial data correspondence being once collected at the beginning of the current statistic period At the time of, when the second time period includes corresponding to the current initial data at the beginning of the current statistic period It carves;The statistical result of the first time period based on the current statistic period at the beginning of to being once collected into before described Corresponding each initial data and the statistical rules obtain between at the time of initial data corresponds to.
8. device according to claim 7, which is characterized in that further include:
Second writing module for obtaining by the statistical result in statistical module write-in second message queue, and will acquire Default storage region is written in the statistical result arrived.
9. a kind of computer readable storage medium, which is characterized in that computer program is stored thereon with, when the computer program When being executed by processor, such as data capture method as claimed in any one of claims 1 to 6 is realized.
10. a kind of server characterized by comprising processor and memory;
The memory is transferred to the processor for storing program code, and by said program code;
The processor, for executing such as number as claimed in any one of claims 1 to 6 according to the instruction in said program code According to acquisition methods.
CN201910731693.4A 2019-08-08 2019-08-08 A kind of data capture method and device Pending CN110324211A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910731693.4A CN110324211A (en) 2019-08-08 2019-08-08 A kind of data capture method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910731693.4A CN110324211A (en) 2019-08-08 2019-08-08 A kind of data capture method and device

Publications (1)

Publication Number Publication Date
CN110324211A true CN110324211A (en) 2019-10-11

Family

ID=68125773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910731693.4A Pending CN110324211A (en) 2019-08-08 2019-08-08 A kind of data capture method and device

Country Status (1)

Country Link
CN (1) CN110324211A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881686A (en) * 2020-07-20 2020-11-03 杭州安恒信息技术股份有限公司 Detection method and device for newly appeared entity, electronic device and storage medium
CN112084147A (en) * 2020-09-10 2020-12-15 珠海美佳音科技有限公司 Data storage method, data acquisition recorder and electronic equipment
CN112764947A (en) * 2021-01-15 2021-05-07 百果园技术(新加坡)有限公司 Message data pulling method, device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2369438A1 (en) * 2010-02-24 2011-09-28 Fujitsu Semiconductor Limited Calibration method of a real time clock signal
CN102594581A (en) * 2011-01-12 2012-07-18 鼎桥通信技术有限公司 Method for recording log data
CN103944162A (en) * 2014-04-15 2014-07-23 国网辽宁省电力有限公司沈阳供电公司 Power distribution network fault recovery method based on real-time contingency sets
CN106533735A (en) * 2016-10-11 2017-03-22 北京奇虎科技有限公司 Mobile terminal use behavior monitoring method and device, server and system
CN106656660A (en) * 2016-11-30 2017-05-10 努比亚技术有限公司 Traffic monitoring device and method
CN108769167A (en) * 2018-05-17 2018-11-06 北京奇艺世纪科技有限公司 A kind of the push distribution method and device of business datum
CN108989463A (en) * 2018-08-27 2018-12-11 浙江易享节能技术服务股份有限公司 A kind of data processing method and device
CN109587075A (en) * 2018-11-27 2019-04-05 联想(北京)有限公司 A kind of method for processing business, device, equipment and storage medium
CN109639396A (en) * 2018-12-19 2019-04-16 惠科股份有限公司 Transmission method, device and the computer readable storage medium of data
CN109753368A (en) * 2018-12-20 2019-05-14 清华大学 A kind of real time data sending method and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2369438A1 (en) * 2010-02-24 2011-09-28 Fujitsu Semiconductor Limited Calibration method of a real time clock signal
CN102594581A (en) * 2011-01-12 2012-07-18 鼎桥通信技术有限公司 Method for recording log data
CN103944162A (en) * 2014-04-15 2014-07-23 国网辽宁省电力有限公司沈阳供电公司 Power distribution network fault recovery method based on real-time contingency sets
CN106533735A (en) * 2016-10-11 2017-03-22 北京奇虎科技有限公司 Mobile terminal use behavior monitoring method and device, server and system
CN106656660A (en) * 2016-11-30 2017-05-10 努比亚技术有限公司 Traffic monitoring device and method
CN108769167A (en) * 2018-05-17 2018-11-06 北京奇艺世纪科技有限公司 A kind of the push distribution method and device of business datum
CN108989463A (en) * 2018-08-27 2018-12-11 浙江易享节能技术服务股份有限公司 A kind of data processing method and device
CN109587075A (en) * 2018-11-27 2019-04-05 联想(北京)有限公司 A kind of method for processing business, device, equipment and storage medium
CN109639396A (en) * 2018-12-19 2019-04-16 惠科股份有限公司 Transmission method, device and the computer readable storage medium of data
CN109753368A (en) * 2018-12-20 2019-05-14 清华大学 A kind of real time data sending method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881686A (en) * 2020-07-20 2020-11-03 杭州安恒信息技术股份有限公司 Detection method and device for newly appeared entity, electronic device and storage medium
CN112084147A (en) * 2020-09-10 2020-12-15 珠海美佳音科技有限公司 Data storage method, data acquisition recorder and electronic equipment
CN112764947A (en) * 2021-01-15 2021-05-07 百果园技术(新加坡)有限公司 Message data pulling method, device, equipment and storage medium
CN112764947B (en) * 2021-01-15 2023-12-26 百果园技术(新加坡)有限公司 Message data pulling method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109961204B (en) Service quality analysis method and system under micro-service architecture
WO2020233212A1 (en) Log record processing method, server, and storage medium
EP2411927B1 (en) Monitoring of distributed applications
CN109725899B (en) Data stream processing method and device
CN106487596A (en) Distributed Services follow the tracks of implementation method
US8856313B2 (en) Systems and methods for using provenance information for data retention in stream-processing
CN110324211A (en) A kind of data capture method and device
US8521871B2 (en) System and method for merging monitoring data streams from a server and a client of the server
CN108459939A (en) A kind of log collecting method, device, terminal device and storage medium
US20100223446A1 (en) Contextual tracing
CN111143286B (en) Cloud platform log management method and system
CN107103064B (en) Data statistical method and device
WO2020238130A1 (en) Big data log monitoring method and apparatus, storage medium, and computer device
US20210044423A1 (en) Summary chains in distributed systems
CN105069029B (en) A kind of real-time ETL system and method
CN106156198A (en) Task executing method based on distributed data base and device
CN105242873B (en) The acquisition of the performance data of cloud computing system and storage method and device
US11704216B2 (en) Dynamically adjusting statistics collection time in a database management system
CN110222039A (en) Data storage and garbage data cleaning method, device, equipment and storage medium
Suthakar et al. Optimised lambda architecture for monitoring scientific infrastructure
CN109088782A (en) The log collecting method and device of distributed system
CN110022343B (en) Adaptive event aggregation
CN110399095A (en) A kind of statistical method and device of memory space
CN102930046B (en) Data processing method, computing node and system
CN114020595A (en) Server performance data analysis method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191011