CN103401934A - Method and system for acquiring log data - Google Patents

Method and system for acquiring log data Download PDF

Info

Publication number
CN103401934A
CN103401934A CN2013103404125A CN201310340412A CN103401934A CN 103401934 A CN103401934 A CN 103401934A CN 2013103404125 A CN2013103404125 A CN 2013103404125A CN 201310340412 A CN201310340412 A CN 201310340412A CN 103401934 A CN103401934 A CN 103401934A
Authority
CN
China
Prior art keywords
kafka
daily record
record data
data
flume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013103404125A
Other languages
Chinese (zh)
Inventor
姚仁捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Vipshop Information And Technology Co Ltd
Original Assignee
Guangzhou Vipshop Information And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Vipshop Information And Technology Co Ltd filed Critical Guangzhou Vipshop Information And Technology Co Ltd
Priority to CN2013103404125A priority Critical patent/CN103401934A/en
Publication of CN103401934A publication Critical patent/CN103401934A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for acquiring log data. The method comprises the following steps: first Flume acquires log data from an application server; the first Flume transmits the acquired log data to Kafka, and the Kafka converts the received log data into a Kafka message queue. According to the method and the system for acquiring log data, the log data in the application server are transmitted to the Kafka via the first Flume, the log data are converted into the Kafka message queue via the Kafka, and a user only needs to link to the Kafka during acquisition of the log data from the Kafka, so that complex restart and insertion operation is not needed, and the acquisition flexibility of the log data can be enhanced.

Description

Obtain the method and system of daily record data
Technical field
The present invention relates to data communication technology field, particularly relate to a kind of method and system that obtains daily record data.
Background technology
Development along with e-commerce technology, the pressure of the back-end server carrying of network is also increasing, need simultaneously " data " to be processed also be how much levels and increase, collect accurately, transmit, calculate in real time massive logs becomes an urgent demand in ecommerce thereupon.The flume-ng technology that relates to while mainly using the twitter collector journal in prior art.Flume is distributed, a reliable high-performance instrument, is used for from different data source collections, polymerization, a large amount of daily record datas to of a transmission central data source.Flume has three important concepts, source, and channel, sink, these three logical concept form the agency of a Flume.Source has defined the source (such as file) of data, and Sink has defined the outlet of data, and Channel is the passage in the middle of Source and Sink.Source wherein, Channel, Sink are horizontal extension, can adjust according to performance.
But, flume is a java process, just be written into the file format jar(Java Archive of the various and platform independence in lib when starting, the Java archive file),, if there is program need to read wherein information, need to write the plug-in unit (using java) of a flume, and it is just passable to restart flume, complicated operation, very flexible; Too pay attention to the reliability of message, throughput is low, is not easy to the user from flume quick obtaining daily record data.
Summary of the invention
Based on this, being necessary provides a kind of method and system that obtains daily record data for above-mentioned Flume collector journal data very flexible and the low problem of throughput.
A kind of method of obtaining daily record data comprises the following steps:
The one Flume obtains daily record data from application server;
The daily record data that a described Flume will obtain is sent to Kafka, and the daily record data that described Kafka will receive is converted to the Kafka message queue.
A kind of system of obtaining daily record data, comprise application server, a Flume and Kafka, wherein:
A described Flume is sent to described Kafka for the daily record data that obtains daily record data from described application server and will obtain;
The daily record data that described Kafka is used for receiving is converted to the Kafka message queue.
The above-mentioned method and system that obtains daily record data, by a Flume, the daily record data in application server is sent to Kafka, and by Kafka, daily record data is converted to the Kafka message queue, when the user obtains daily record data from Kafka, only need to be connected to Kafka gets final product, do not need to carry out loaded down with trivial detailsly to restart and update, can improve the flexibility of obtaining daily record data.
Description of drawings
Fig. 1 is the schematic flow sheet that the present invention obtains method first execution mode of daily record data;
Fig. 2 is the schematic flow sheet that the present invention obtains method second execution mode of daily record data;
Fig. 3 is the schematic flow sheet that the present invention obtains method the 3rd execution mode of daily record data;
Fig. 4 is the structural representation that the present invention obtains system first execution mode of daily record data;
Fig. 5 is the structural representation that the present invention obtains system second execution mode of daily record data;
Fig. 6 is the structural representation that the present invention obtains system the 3rd execution mode of daily record data.
Embodiment
See also Fig. 1, Fig. 1 is the schematic flow sheet that the present invention obtains method first execution mode of daily record data.
The described method of obtaining daily record data of present embodiment comprises the following steps:
Step 101, a Flume obtains daily record data from application server.
Step 102, the daily record data that a described Flume will obtain is sent to Kafka, and the daily record data that described Kafka will receive is converted to the Kafka message queue.
The described method of obtaining daily record data of present embodiment, by a Flume, the daily record data in application server is sent to Kafka, and by Kafka, daily record data is converted to the Kafka message queue, when the user obtains daily record data from Kafka, only need to be connected to Kafka gets final product, do not need to carry out loaded down with trivial detailsly to restart and update, can improve the flexibility of obtaining daily record data.
Wherein, for step 101, described Flume preferably can by self three logical gate source, channel and sink from application server crawl log data.The number of described application server and type can need according to the user in advance and user type is set.
For step 102, described Kafka is a kind of distributed post subscribe message system of high-throughput, and at first, the file cache of the operating system of described Kafka enough improves and be powerful, and only otherwise write at random, the performance of order read-write is very efficient.The data of described Kafka only can sequentially be inserted, and the deletion strategy of data is to be accumulated to a certain degree or to surpass certain hour to delete again.Another unique characteristic of described Kafka is that user profile is kept at client rather than MQ server, server is just without the delivery process of recording messages like this, where where each client is known oneself oneself next time should from reading message, the delivery process of message is also to adopt the client Active model, has so greatly alleviated the burden of server.Described Kafka also emphasizes to reduce serializing and the copy expense of data, and it can be made into some message groups message queue and does storage in batches and send.
Preferably, described Kafka has following characteristic:
1, the persistence that gives information of the data in magnetic disk structure by O (1), this structure also can keep long stability for instant data with the message stores of TB.
2, high-throughput: even very common hardware kafka also can support the message that per second is hundreds thousand of.
3, support to carry out subregion message by kafka server and charge machine cluster.
4, support the Hadoop parallel data to load.
In one embodiment, the described Flume daily record data that will the obtain step that is sent to Kafka comprises the following steps:
Step 1021, a described Flume passes through the data transmission channel between a Github system made and described Kafka, and carries out parameter configuration take described Kafka as data receiving terminal.
Step 1022, a described Flume carries out being sent to described Kafka by the data transmission channel of setting up after preliminary treatment to every the daily record data that obtains.
Step 1023, when the daily record data that described Kafka receives reaches the preset data amount, carry out the data packing to the daily record data of preset data amount, stores the storage area of described Kafka into and be converted to described Kafka message queue.
Wherein, in the present embodiment, but the various Git of described Github trustship storehouse, and a web interface is provided, but different from other service as SourceForge or Google Code, the unique distinction of Github is to carry out from the another one project simplification of branch.Described Git is a distributed version control system, is write by Linus Torvalds at first, as the management of linux kernel code.
Preferably, a described Flume and described Kafka are by a flume plug-in unit flume-kakfa(data transmission channel) carry out the daily record data transmission, described flume-kafka is hosted in a described Github system.Described flume-kafka supports from described Kafka crawl log data, also supports to push daily record data to described Kafka.
Further, before the transmission daily record data, at first a described Flume is defined as described Kafka the code snippet of data source, and preliminary treatment, configuration and shut-down operation that between right by process software program, congfigure software program and stop, program is preset each daily record data, concrete operation code is as follows:
Figure BDA00003629304900041
Preferably, take described Kafka as data source, and the code of a described Kafka of Flume connection is as follows:
Figure BDA00003629304900042
In above-mentioned code, props.put is for the attribute of the connection to described Kafka, below respectively each attribute in above-mentioned code is described:
Roupid: the title of connection
Autocommit.enable: inform automatically described Kafka consumption at present is to which bar log information
Autooffset.reset: the log information that automatic acquisition is up-to-date
Socket.buffersize: the buffer sizes of port communication.
According to the definition of these attributes, a described Flume and described Kafka connect finally.Next illustrate, after a described Flume was connected to described Kafka, how described Kafka received data.We have defined the quantity of message in a batch data.A so-called batch data, refer to described Kafka with the packing of the data of some, the disposable data destination that sends to, rather than from a described Flume, receive a secondary data, send out once toward the storage area of described Kafka.Batch sending can be saved the expense of Internet Transmission.
Same, be below the code that the described Kafka of connection and described Kafka think the data purpose:
Connect and send daily record data with take described Kafka as data source the time and have difference:, for the batch process of daily record data, during take described Kafka as the data purpose, by described Kafka oneself, controlled, rather than a described Flume.According to batch.size in above-mentioned code, described Kafka gets the log information of some, in a collection of mode, sends.
In another embodiment, after the described daily record data that will receive by Kafka is converted to the step of Kafka message queue, further comprising the steps of:
The jmx monitor-interface that provides by described Kafka obtains the service data of Kafka.
Described Kafka compares a described Flume, and having more monitor data can obtain, the convenient health condition that monitors whole system.
See also Fig. 2, Fig. 2 is the schematic flow sheet that the present invention obtains method second execution mode of daily record data.
Present embodiment described obtains the method for daily record data and the difference of the first execution mode is: after the daily record data that described Kafka will receive is converted to the step of Kafka message queue, further comprising the steps of:
Step 201, Storm calculates cluster in real time by the data transmission channel between Storm system made and described Kafka.
Step 202, described Storm calculates in real time cluster and obtain the daily record data that needs by the data channel of setting up from described Kafka message queue.
Wherein, for step 201, described Storm, for distributed real-time calculating provides one group of generic primitives, can be used among " stream is processed " processing messages and more new database in real time.Described Storm is a kind of mode of administration queue and worker's cluster.Described Storm also can be used to " calculating continuously " (continuous computation), data flow is done continuous-query, when calculating just with result with the formal output of stream to the user, also can be used to " distributed RPC ", move the computing of costliness in parallel mode
For step 202, Storm calculates in real time cluster and can connect described Kafka with the java method of storm-contrib in described Storm and obtain log information.The other system that calculates in real time cluster except described Storm also can obtain log information (being the daily record of every delegation) from the described Kafka as message-oriented middleware.
At present, described Kafka has affinity for a lot of language, as: java, python, the language that ruby etc. are popular, have the storehouse of supporting described Kafka.
The described method of obtaining daily record data of present embodiment, can, from described Kafka quick obtaining daily record data, need not repeatedly to restart described Kafka as the user by distinctive connected mode between self and described Kafka.
See also Fig. 3, Fig. 3 is the schematic flow sheet that the present invention obtains method the 3rd execution mode of daily record data.
Present embodiment described obtains the method for daily record data and the difference of the first execution mode is: after the daily record data that described Kafka will receive is converted to the step of Kafka message queue, further comprising the steps of:
The 2nd Flume obtains the daily record data that the user needs from described Kafka message queue.
For the other system that calculates in real time except described Storm cluster, as: HAFS cluster, full-text search cluster etc., if itself does not have intrinsic connected mode with described Kafka, can connect by the 2nd Flume and described Kafka, and obtain log information from the described Kafka as message-oriented middleware.
Described the 2nd Flume step of obtaining the daily record data that the user needs from described Kafka message queue comprises the following steps in one embodiment:
Step 301, described the 2nd Flume passes through the data transmission channel between the 2nd Github system made and described Kafka, and take described Kafka as the data transmitting terminal, carries out parameter configuration.
Step 302, described the 2nd Flume sends log request by the data transmission channel of setting up to described Kafka.
Step 303, described Kafka, according to described log request, obtains corresponding daily record data from described Kafka message queue, and by described data transmission channel, the daily record data of described correspondence is sent to described the 2nd Flume in batches.
The mode that the 2nd Flume described in the present embodiment and described Kafka connect can with the mode that connects by described flume-kafka take described Kafka as the data purpose in the first execution mode in to be connected code identical.
The described method of obtaining daily record data of present embodiment, connect by described the 2nd Flume and described Kafka, and be other system quick obtaining log information from the described Kafka as message-oriented middleware.
See also Fig. 4, Fig. 4 is this structural representation that obtains system first execution mode of daily record data.
The described system of obtaining daily record data of present embodiment comprises application server 100, a Flume200 and Kafka300, wherein:
The one Flume200, be sent to Kafka300 for the daily record data that obtains daily record data from application server 100 and will obtain.
Kafka300, the daily record data that is used for receiving is converted to the Kafka message queue.
The described system of obtaining daily record data of present embodiment, by a Flume, the daily record data in application server is sent to Kafka, and by Kafka, daily record data is converted to the Kafka message queue, when the user obtains daily record data from Kafka, only need to be connected to Kafka gets final product, do not need to carry out loaded down with trivial detailsly to restart and update, can improve the flexibility of obtaining daily record data.
Wherein, for application server 100, its number and type can need according to the user in advance and user type is set.
For a Flume200, described Flume preferably can by self three logical gate source, channel and sink from application server crawl log data.
For Kafka300, be a kind of distributed post subscribe message system of high-throughput in Kafka300, at first, the file cache of its operating system enough improves and is powerful, and only otherwise write at random, the performance of order read-write is very efficient.The data of Kafka300 only can sequentially be inserted, and the deletion strategy of data is to be accumulated to a certain degree or to surpass certain hour to delete again.Another unique characteristic of Kafka300 is that user profile is kept at client rather than MQ server, server is just without the delivery process of recording messages like this, where where each client is known oneself oneself next time should from reading message, the delivery process of message is also to adopt the client Active model, has so greatly alleviated the burden of server.Kafka300 also emphasizes to reduce serializing and the copy expense of data, and it can be made into some message groups message queue and does storage in batches and send.
Preferably, Kafka300 has following characteristic:
1, the persistence that gives information of the data in magnetic disk structure by O (1), this structure also can keep long stability for instant data with the message stores of TB.
2, high-throughput: even very common hardware kafka also can support the message that per second is hundreds thousand of.
3, support to carry out subregion message by kafka server and charge machine cluster.
4, support the Hadoop parallel data to load.
In one embodiment, the described system of obtaining daily record data of present embodiment also comprises a Github system, wherein:
The one Flume200 also is used for by the data transmission channel between a Github system made and Kafka300, and carry out parameter configuration take Kafka300 as data receiving terminal, and every the daily record data that obtains is carried out being sent to Kafka300 by the data transmission channel of setting up after preliminary treatment.
Kafka300 also is used for when the daily record data that receives reaches the preset data amount, and the daily record data of preset data amount is carried out the data packing, stores the storage area of Kafka300 into and is converted to described Kafka message queue.
Wherein, in the present embodiment, but the various Git of described Github trustship storehouse, and a web interface is provided, but different from other service as SourceForge or Google Code, the unique distinction of Github is to carry out from the another one project simplification of branch.Described Git is a distributed version control system, is write by Linus Torvalds at first, as the management of linux kernel code.
Preferably, a Flume200 and Kafka300 are by a flume plug-in unit flume-kakfa(data transmission channel) carry out the daily record data transmission, described flume-kafka is hosted in a described Github system.Described flume-kafka supports from Kafka300 crawl log data, also supports to push daily record data to Kafka300.
Further, before the transmission daily record data, at first the one Flume200 is defined as Kafka300 the code snippet of data source, and preliminary treatment, configuration and shut-down operation that between right by process software program, congfigure software program and stop, program is preset each daily record data, concrete operation code is as follows:
Figure BDA00003629304900091
Preferably, take Kafka300 as data source, and the code of a Flume200 connection Kafka300 is as follows:
Figure BDA00003629304900092
Figure BDA00003629304900101
In above-mentioned code, props.put is for the attribute of the connection to Kafka300, below respectively each attribute in above-mentioned code is described:
Roupid: the title of connection
Autocommit.enable: inform automatically Kafka300 consumption at present is to which bar log information
Autooffset.reset: the log information that automatic acquisition is up-to-date
Socket.buffersize: the buffer sizes of port communication.
According to the definition of these attributes, a Flume200 and Kafka300 connect finally.Next illustrate, after a Flume200 was connected to Kafka300, how Kafka300 received data.We have defined the quantity of message in a batch data.A so-called batch data, refer to Kafka300 with the packing of the data of some, the disposable data destination that sends to, rather than from a Flume200, receive a secondary data, send out once toward the storage area of Kafka300.Batch sending can be saved the expense of Internet Transmission.
Same, be below the code that connection Kafka300 and Kafka300 think the data purpose:
Figure BDA00003629304900102
Figure BDA00003629304900111
Connect and send daily record data with take Kafka300 as data source the time and have difference:, for the batch process of daily record data, during take Kafka300 as the data purpose, by the Kafka300 oneself of institute, controlled, rather than a Flume200.According to batch.size in above-mentioned code, Kafka300 gets the log information of some, in a collection of mode, sends.
In another embodiment, the described system of obtaining daily record data of present embodiment can also comprise a monitoring unit, described monitoring unit is used for after the described daily record data that will receive by Kafka is converted to the Kafka message queue, and the jmx monitor-interface that provides by Kafka300 obtains the service data of Kafka300.
Kafka300 compares a Flume200, and having more monitor data can obtain, the convenient health condition that monitors whole system.
See also Fig. 5, Fig. 5 is the structural representation that the present invention obtains system second execution mode of daily record data.
Present embodiment described obtains the system of daily record data and the difference of the first execution mode is: also comprise that Storm calculates cluster 400 and Storm system 500 in real time, Storm calculates in real time cluster 400 and is used for obtaining by the data transmission channel between 500 foundation of Storm system and Kafka300 the daily record data that needs by the data channel of setting up from described Kafka message queue.
Wherein, for Storm system 500, described Storm, for distributed real-time calculating provides one group of generic primitives, can be used among " stream is processed " processing messages and more new database in real time.Described Storm is a kind of mode of administration queue and worker's cluster.Described Storm also can be used to " calculating continuously " (continuous computation), data flow is done continuous-query, when calculating just with result with the formal output of stream to the user, also can be used to " distributed RPC ", move the computing of costliness in parallel mode
Calculate in real time cluster 400 for Storm, it can connect Kafka300 and obtain log information with the java method of storm-contrib in described Storm.Other system except Storm calculates cluster 400 in real time also can obtain log information (being the daily record of every delegation) from the Kafka300 as message-oriented middleware.
At present, Kafka300 has affinity for a lot of language, as: java, python, the language that ruby etc. are popular, have the storehouse of supporting Kafka300.
The described system of obtaining daily record data of present embodiment, can, from described Kafka quick obtaining daily record data, need not repeatedly to restart described Kafka as the user by distinctive connected mode between self and described Kafka.
See also Fig. 6, Fig. 6 is the structural representation that the present invention obtains system the 3rd execution mode of daily record data.
Present embodiment described obtains the system of daily record data and the difference of the first execution mode is: also comprise the 2nd Flume600, be used for obtaining from described Kafka message queue the daily record data that the user needs.
For the other system except Storm calculates cluster 400 in real time, as: HAFS cluster, full-text search cluster etc., if itself does not have intrinsic connected mode with Kafka300, can connect by the 2nd Flume600 and Kafka300, and obtain log information from the Kafka300 as message-oriented middleware.
In one embodiment, the described system of obtaining daily record data of present embodiment also comprises the 2nd Github system, wherein:
The 2nd Flume600 also is used for by the data transmission channel between the 2nd Github system made and Kafka300, take Kafka300 as the data transmitting terminal, carries out parameter configuration, and to Kafka300, sends log request.
Kafka300 also is used for according to described log request, obtains corresponding daily record data from described Kafka message queue, and by described data transmission channel, the daily record data of described correspondence is sent to described the 2nd Flume600 in batches.
The mode that in the present embodiment, the 2nd Flume600 and Kafka300 connect can with the mode that connects by described flume-kafka take described Kafka300 as the data purpose in the first execution mode in the first execution mode in to be connected code identical.
The described system of obtaining daily record data of present embodiment, connect by described the 2nd Flume and described Kafka, and be other system quick obtaining log information from the described Kafka as message-oriented middleware.
The above embodiment has only expressed several execution mode of the present invention, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to the scope of the claims of the present invention.Should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection range of patent of the present invention should be as the criterion with claims.

Claims (10)

1. a method of obtaining daily record data, is characterized in that, comprises the following steps:
The one Flume obtains daily record data from application server;
The daily record data that a described Flume will obtain is sent to Kafka, and the daily record data that described Kafka will receive is converted to the Kafka message queue.
2. the method for obtaining daily record data according to claim 1, is characterized in that, the step that the daily record data that a described Flume will obtain is sent to Kafka comprises the following steps:
A described Flume passes through the data transmission channel between a Github system made and described Kafka, and carries out parameter configuration take described Kafka as data receiving terminal;
A described Flume carries out being sent to described Kafka by the data transmission channel of setting up after preliminary treatment to every the daily record data that obtains;
When the daily record data that described Kafka receives reaches the preset data amount, described daily record data is carried out the data packing, store the storage area of described Kafka into and be converted to described Kafka message queue.
3. the method for obtaining daily record data according to claim 1, is characterized in that, and is after the daily record data that described Kafka will receive is converted to the step of Kafka message queue, further comprising the steps of:
Storm calculates cluster in real time by the data transmission channel between Storm system made and described Kafka;
Described Storm calculates in real time cluster and obtain the daily record data that needs by the data channel of setting up from described Kafka message queue.
4. the described method of obtaining daily record data of any one according to claim 1 to 3, is characterized in that, and is after the daily record data that described Kafka will receive is converted to the step of Kafka message queue, further comprising the steps of:
The 2nd Flume obtains the daily record data that the user needs from described Kafka message queue.
5. the method for obtaining daily record data according to claim 4, is characterized in that, described the 2nd Flume obtains the daily record data of user's needs from described Kafka message queue step comprises the following steps:
Described the 2nd Flume passes through the data transmission channel between the 2nd Github system made and described Kafka, and take described Kafka as the data transmitting terminal, carries out parameter configuration;
Described the 2nd Flume sends log request by the data transmission channel of setting up to described Kafka;
Described Kafka, according to described log request, obtains corresponding daily record data from described Kafka message queue, and by described data transmission channel, the daily record data of described correspondence is sent to described the 2nd Flume in batches.
6. a system of obtaining daily record data, is characterized in that, comprises application server, a Flume and Kafka, wherein:
A described Flume is sent to described Kafka for the daily record data that obtains daily record data from described application server and will obtain;
The daily record data that described Kafka is used for receiving is converted to the Kafka message queue.
7. the system of obtaining daily record data according to claim 6, is characterized in that, also comprises a Github system, wherein:
A described Flume also is used for by the data transmission channel between a described Github system made and described Kafka, and carry out parameter configuration take described Kafka as data receiving terminal, and every the daily record data that obtains is carried out being sent to described Kafka by the data transmission channel of setting up after preliminary treatment;
Described Kafka also is used for, when the daily record data that receives reaches the preset data amount, described daily record data is carried out the data packing, stores the storage area of described Kafka into and is converted to described Kafka message queue.
8. the system of obtaining daily record data according to claim 6, it is characterized in that, also comprise that Storm calculates cluster and Storm system in real time, described Storm calculates in real time cluster and is used for by the data transmission channel between described Storm system made and described Kafka, obtains the daily record data that needs from described Kafka message queue by the data channel of setting up.
9. the described system of obtaining daily record data of any one according to claim 6 to 8, is characterized in that, also comprises the 2nd Flume, is used for obtaining from described Kafka message queue the daily record data that the user needs.
10. the system of obtaining daily record data according to claim 9, is characterized in that, also comprises the 2nd Github system, wherein:
Described the 2nd Flume also is used for by the data transmission channel between the 2nd Github system made and described Kafka, take described Kafka as the data transmitting terminal, carries out parameter configuration, and to described Kafka, sends log request;
Described Kafka also is used for according to described log request, obtains corresponding daily record data from described Kafka message queue, and by described data transmission channel, the daily record data of described correspondence is sent to described the 2nd Flume in batches.
CN2013103404125A 2013-08-06 2013-08-06 Method and system for acquiring log data Pending CN103401934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013103404125A CN103401934A (en) 2013-08-06 2013-08-06 Method and system for acquiring log data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013103404125A CN103401934A (en) 2013-08-06 2013-08-06 Method and system for acquiring log data

Publications (1)

Publication Number Publication Date
CN103401934A true CN103401934A (en) 2013-11-20

Family

ID=49565457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013103404125A Pending CN103401934A (en) 2013-08-06 2013-08-06 Method and system for acquiring log data

Country Status (1)

Country Link
CN (1) CN103401934A (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN104657502A (en) * 2015-03-12 2015-05-27 浪潮集团有限公司 System and method for carrying out real-time statistics on mass data based on Hadoop
CN105335406A (en) * 2014-07-30 2016-02-17 阿里巴巴集团控股有限公司 Log data processing method and device
CN105450618A (en) * 2014-09-26 2016-03-30 Tcl集团股份有限公司 Operation method and operation system of big data process through API (Application Programming Interface) server
CN105490854A (en) * 2015-12-11 2016-04-13 传线网络科技(上海)有限公司 Real-time log collection method and system, and application server cluster
CN105589856A (en) * 2014-10-21 2016-05-18 阿里巴巴集团控股有限公司 Log data processing method and log data processing system
CN105630869A (en) * 2015-12-15 2016-06-01 北京奇虎科技有限公司 Voice data storage method and device
CN105653662A (en) * 2015-12-29 2016-06-08 中国建设银行股份有限公司 Flume based data processing method and apparatus
CN105786683A (en) * 2016-03-03 2016-07-20 四川长虹电器股份有限公司 Customized log collecting system and method
CN105868075A (en) * 2016-03-31 2016-08-17 浪潮通信信息系统有限公司 System and method for monitoring and analyzing large amount of logs in real time
CN105933169A (en) * 2016-07-04 2016-09-07 江苏飞搏软件股份有限公司 Efficient, robust and safe large data polymerization system and method
CN105933736A (en) * 2016-04-18 2016-09-07 天脉聚源(北京)传媒科技有限公司 Log processing method and device
WO2017008658A1 (en) * 2015-07-14 2017-01-19 阿里巴巴集团控股有限公司 Storage checking method and system for text data
CN106569936A (en) * 2016-09-26 2017-04-19 深圳盒子支付信息技术有限公司 Method and system for acquiring scrolling log in real time
CN106682071A (en) * 2016-11-17 2017-05-17 安徽华博胜讯信息科技股份有限公司 University library digital resource sharing method based on big data
CN106682119A (en) * 2016-12-08 2017-05-17 杭州销冠网络科技有限公司 System and method for asynchronous data synchronization on basis of http service aspect and log system
CN106709069A (en) * 2017-01-25 2017-05-24 焦点科技股份有限公司 High-reliability big data logging collection and transmission method
CN106775989A (en) * 2016-12-31 2017-05-31 北京神州绿盟信息安全科技股份有限公司 A kind of job control method and device
CN106776231A (en) * 2017-01-09 2017-05-31 武汉斗鱼网络科技有限公司 Android crash logs optimization method and system based on Git
CN107181612A (en) * 2017-05-08 2017-09-19 深圳市众泰兄弟科技发展有限公司 A kind of visual network method for safety monitoring based on big data
CN107508888A (en) * 2017-08-25 2017-12-22 同方(深圳)云计算技术股份有限公司 A kind of car networking service platform
CN107704545A (en) * 2017-11-08 2018-02-16 华东交通大学 Railway distribution net magnanimity information method for stream processing based on Storm Yu Kafka message communicatings
CN107704478A (en) * 2017-01-16 2018-02-16 贵州白山云科技有限公司 A kind of method and system for writing daily record
CN107748756A (en) * 2017-09-20 2018-03-02 努比亚技术有限公司 Collecting method, mobile terminal and readable storage medium storing program for executing
CN107948234A (en) * 2016-10-13 2018-04-20 北京国双科技有限公司 The processing method and processing device of data
CN107979477A (en) * 2016-10-21 2018-05-01 苏宁云商集团股份有限公司 A kind of method and system of business monitoring
CN108092849A (en) * 2017-12-06 2018-05-29 链家网(北京)科技有限公司 Business datum monitoring method, apparatus and system
CN108388478A (en) * 2018-02-07 2018-08-10 平安普惠企业管理有限公司 Daily record data processing method and system
CN108989314A (en) * 2018-07-20 2018-12-11 北京木瓜移动科技股份有限公司 A kind of Transmitting Data Stream, processing method and processing device
CN109446215A (en) * 2018-10-31 2019-03-08 北京百分点信息科技有限公司 A kind of logical engine method of ID drawing real-time priority-based
CN109684370A (en) * 2018-09-07 2019-04-26 平安普惠企业管理有限公司 Daily record data processing method, system, equipment and storage medium
CN109800128A (en) * 2019-01-15 2019-05-24 苏州工品汇软件技术有限公司 Operation log recording collection method based on micro services
CN110262807A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Cluster creates Progress Log acquisition system, method and apparatus
CN110460876A (en) * 2019-08-15 2019-11-15 网易(杭州)网络有限公司 Processing method, device and the electronic equipment of log is broadcast live
CN110502491A (en) * 2019-07-25 2019-11-26 北京神州泰岳智能数据技术有限公司 A kind of Log Collect System and its data transmission method, device
CN111371586A (en) * 2018-12-26 2020-07-03 顺丰科技有限公司 Log data transmission method, device and equipment
CN111382022A (en) * 2018-12-27 2020-07-07 北京神州泰岳软件股份有限公司 Method and device for monitoring real-time streaming computing platform, electronic equipment and storage medium
CN113190528A (en) * 2021-04-21 2021-07-30 中国海洋大学 Parallel distributed big data architecture construction method and system
CN113468259A (en) * 2021-09-01 2021-10-01 北京华品博睿网络技术有限公司 Real-time data acquisition and storage method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
EWHAUSER: "flume-kafka-plugin", 《GITHUB》 *
FLUME 官网: "Welcome to Apache Flume", 《FLUME 官网-APPACHE FLUME1.4.0》 *
TOMNOTCAT: "flume-kafka-sink", 《GITHUB》 *
张鑫: "Kafka+FlumeNG+Storm+HBase", 《百度文库》 *

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN105335406A (en) * 2014-07-30 2016-02-17 阿里巴巴集团控股有限公司 Log data processing method and device
CN105335406B (en) * 2014-07-30 2018-10-02 阿里巴巴集团控股有限公司 Daily record data processing method and processing device
CN105450618A (en) * 2014-09-26 2016-03-30 Tcl集团股份有限公司 Operation method and operation system of big data process through API (Application Programming Interface) server
CN105589856A (en) * 2014-10-21 2016-05-18 阿里巴巴集团控股有限公司 Log data processing method and log data processing system
CN105589856B (en) * 2014-10-21 2019-04-26 阿里巴巴集团控股有限公司 Daily record data processing method and system
CN104657502A (en) * 2015-03-12 2015-05-27 浪潮集团有限公司 System and method for carrying out real-time statistics on mass data based on Hadoop
WO2017008658A1 (en) * 2015-07-14 2017-01-19 阿里巴巴集团控股有限公司 Storage checking method and system for text data
CN105490854A (en) * 2015-12-11 2016-04-13 传线网络科技(上海)有限公司 Real-time log collection method and system, and application server cluster
CN105490854B (en) * 2015-12-11 2019-03-12 传线网络科技(上海)有限公司 Real-time logs collection method, system and application server cluster
CN105630869A (en) * 2015-12-15 2016-06-01 北京奇虎科技有限公司 Voice data storage method and device
CN105630869B (en) * 2015-12-15 2019-02-05 北京奇虎科技有限公司 A kind of storage method and device of voice data
CN105653662A (en) * 2015-12-29 2016-06-08 中国建设银行股份有限公司 Flume based data processing method and apparatus
CN105786683A (en) * 2016-03-03 2016-07-20 四川长虹电器股份有限公司 Customized log collecting system and method
CN105786683B (en) * 2016-03-03 2019-02-12 四川长虹电器股份有限公司 Customed result collection system and method
CN105868075A (en) * 2016-03-31 2016-08-17 浪潮通信信息系统有限公司 System and method for monitoring and analyzing large amount of logs in real time
CN105933736A (en) * 2016-04-18 2016-09-07 天脉聚源(北京)传媒科技有限公司 Log processing method and device
CN105933169A (en) * 2016-07-04 2016-09-07 江苏飞搏软件股份有限公司 Efficient, robust and safe large data polymerization system and method
CN106569936B (en) * 2016-09-26 2019-05-03 深圳盒子信息科技有限公司 A kind of real-time acquisition rolls the method and system of log
CN106569936A (en) * 2016-09-26 2017-04-19 深圳盒子支付信息技术有限公司 Method and system for acquiring scrolling log in real time
CN107948234A (en) * 2016-10-13 2018-04-20 北京国双科技有限公司 The processing method and processing device of data
CN107979477A (en) * 2016-10-21 2018-05-01 苏宁云商集团股份有限公司 A kind of method and system of business monitoring
CN106682071A (en) * 2016-11-17 2017-05-17 安徽华博胜讯信息科技股份有限公司 University library digital resource sharing method based on big data
CN106682119A (en) * 2016-12-08 2017-05-17 杭州销冠网络科技有限公司 System and method for asynchronous data synchronization on basis of http service aspect and log system
CN106775989A (en) * 2016-12-31 2017-05-31 北京神州绿盟信息安全科技股份有限公司 A kind of job control method and device
CN106776231A (en) * 2017-01-09 2017-05-31 武汉斗鱼网络科技有限公司 Android crash logs optimization method and system based on Git
CN106776231B (en) * 2017-01-09 2019-11-15 武汉斗鱼网络科技有限公司 Android crash log optimization method and system based on Git
CN107704478B (en) * 2017-01-16 2019-03-15 贵州白山云科技股份有限公司 A kind of method and system that log is written
CN107704478A (en) * 2017-01-16 2018-02-16 贵州白山云科技有限公司 A kind of method and system for writing daily record
WO2018130222A1 (en) * 2017-01-16 2018-07-19 贵州白山云科技有限公司 Method for writing to log, system, medium, and device
CN106709069B (en) * 2017-01-25 2018-06-15 焦点科技股份有限公司 The big data log collection and transmission method of high reliability
CN106709069A (en) * 2017-01-25 2017-05-24 焦点科技股份有限公司 High-reliability big data logging collection and transmission method
CN107181612A (en) * 2017-05-08 2017-09-19 深圳市众泰兄弟科技发展有限公司 A kind of visual network method for safety monitoring based on big data
CN107508888A (en) * 2017-08-25 2017-12-22 同方(深圳)云计算技术股份有限公司 A kind of car networking service platform
CN107748756A (en) * 2017-09-20 2018-03-02 努比亚技术有限公司 Collecting method, mobile terminal and readable storage medium storing program for executing
CN107704545A (en) * 2017-11-08 2018-02-16 华东交通大学 Railway distribution net magnanimity information method for stream processing based on Storm Yu Kafka message communicatings
CN108092849A (en) * 2017-12-06 2018-05-29 链家网(北京)科技有限公司 Business datum monitoring method, apparatus and system
CN108388478A (en) * 2018-02-07 2018-08-10 平安普惠企业管理有限公司 Daily record data processing method and system
CN108388478B (en) * 2018-02-07 2020-10-27 平安普惠企业管理有限公司 Log data processing method and system
CN108989314A (en) * 2018-07-20 2018-12-11 北京木瓜移动科技股份有限公司 A kind of Transmitting Data Stream, processing method and processing device
CN109684370A (en) * 2018-09-07 2019-04-26 平安普惠企业管理有限公司 Daily record data processing method, system, equipment and storage medium
CN109446215B (en) * 2018-10-31 2022-04-12 北京百分点科技集团股份有限公司 Real-time ID pull-through engine method based on priority
CN109446215A (en) * 2018-10-31 2019-03-08 北京百分点信息科技有限公司 A kind of logical engine method of ID drawing real-time priority-based
CN111371586A (en) * 2018-12-26 2020-07-03 顺丰科技有限公司 Log data transmission method, device and equipment
CN111371586B (en) * 2018-12-26 2023-01-10 顺丰科技有限公司 Log data transmission method, device and equipment
CN111382022A (en) * 2018-12-27 2020-07-07 北京神州泰岳软件股份有限公司 Method and device for monitoring real-time streaming computing platform, electronic equipment and storage medium
CN111382022B (en) * 2018-12-27 2024-02-20 北京神州泰岳软件股份有限公司 Method, device, electronic equipment and storage medium for monitoring real-time stream computing platform
CN109800128A (en) * 2019-01-15 2019-05-24 苏州工品汇软件技术有限公司 Operation log recording collection method based on micro services
CN110262807A (en) * 2019-06-20 2019-09-20 北京百度网讯科技有限公司 Cluster creates Progress Log acquisition system, method and apparatus
CN110262807B (en) * 2019-06-20 2023-12-26 北京百度网讯科技有限公司 Cluster creation progress log acquisition system, method and device
CN110502491A (en) * 2019-07-25 2019-11-26 北京神州泰岳智能数据技术有限公司 A kind of Log Collect System and its data transmission method, device
CN110460876A (en) * 2019-08-15 2019-11-15 网易(杭州)网络有限公司 Processing method, device and the electronic equipment of log is broadcast live
CN113190528A (en) * 2021-04-21 2021-07-30 中国海洋大学 Parallel distributed big data architecture construction method and system
CN113190528B (en) * 2021-04-21 2022-12-06 中国海洋大学 Parallel distributed big data architecture construction method and system
CN113468259A (en) * 2021-09-01 2021-10-01 北京华品博睿网络技术有限公司 Real-time data acquisition and storage method and system

Similar Documents

Publication Publication Date Title
CN103401934A (en) Method and system for acquiring log data
CN110262807B (en) Cluster creation progress log acquisition system, method and device
CN110147398A (en) A kind of data processing method, device, medium and electronic equipment
CN111475575B (en) Data synchronization method and device based on block chain and computer readable storage medium
CN106940677A (en) One kind application daily record data alarm method and device
CN103064731A (en) Device and method for improving message queue system performance
CN105577772B (en) Material receiving method, material uploading method and device
CN104899274B (en) A kind of memory database Efficient Remote access method
CN108270860A (en) The acquisition system and method for environmental quality online monitoring data
US11188443B2 (en) Method, apparatus and system for processing log data
CN108062368B (en) Full data translation method, device, server and storage medium
CN108228363A (en) A kind of message method and device
CN113779094B (en) Batch-flow-integration-based data processing method and device, computer equipment and medium
CN106383764A (en) Data acquisition method and device
CN111813573A (en) Communication method of management platform and robot software and related equipment thereof
CN113329139B (en) Video stream processing method, device and computer readable storage medium
TW201733312A (en) Automatic fusing-based message sending method, device and system
CN101977361A (en) Method for preprocessing messages in batch
CN112527530A (en) Message processing method, device, equipment, storage medium and computer program product
CN110737655B (en) Method and device for reporting data
CN112702430A (en) Data transmission system and method based on cloud edge mode and Web technology and application thereof
CN109670952B (en) Collecting and paying transaction platform
CN107196818A (en) A kind of system and method for Linux cluster monitorings
CN110647575B (en) Distributed heterogeneous processing framework construction method and system
CN111401819B (en) Intersystem data pushing method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20131120

RJ01 Rejection of invention patent application after publication