CN106855837A - A kind of data processing method and device based on Flume - Google Patents

A kind of data processing method and device based on Flume Download PDF

Info

Publication number
CN106855837A
CN106855837A CN201611161579.5A CN201611161579A CN106855837A CN 106855837 A CN106855837 A CN 106855837A CN 201611161579 A CN201611161579 A CN 201611161579A CN 106855837 A CN106855837 A CN 106855837A
Authority
CN
China
Prior art keywords
source file
title
read
name group
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611161579.5A
Other languages
Chinese (zh)
Other versions
CN106855837B (en
Inventor
陈尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MIGU Culture Technology Co Ltd
Original Assignee
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MIGU Culture Technology Co Ltd filed Critical MIGU Culture Technology Co Ltd
Priority to CN201611161579.5A priority Critical patent/CN106855837B/en
Publication of CN106855837A publication Critical patent/CN106855837A/en
Application granted granted Critical
Publication of CN106855837B publication Critical patent/CN106855837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting

Abstract

The invention discloses a kind of data processing method based on Flume, methods described includes:After source file reads and finishes, acquisition is read the source file title of the source file for finishing;Preserve the source file title to source file name group;Filtered out by inquiring about the source file name group and read the source file for finishing.The invention also discloses a kind of data processing equipment based on Flume.

Description

A kind of data processing method and device based on Flume
Technical field
The present invention relates to data processing technique, more particularly to a kind of data processing method and device based on Flume.
Background technology
Flume is a High Availabitity, highly reliable, distributed massive logs collection, polymerization and the biography that Cloudera is provided Defeated system, Flume supports to customize Various types of data sender in log system, for collecting data, wherein, Flume is provided Held from console (console), RPC (Thrift-RPC), file (text), tail (UNIX tail), syslog and order The ability of data is collected in the data sources such as row (exec);Meanwhile, Flume is provided and data is carried out with simple process, and is write various The ability of customizable data recipient.
Fig. 1 is the system architecture schematic diagram of Flume in the prior art, as shown in figure 1, collection agent is responsible for carrying out data Collection, the data refer to the daily record in the equipment to be collected such as server;It is that agent specifies monitoring mesh in data acquisition After record, agent acquiescences read the source file of All Files under the catalogue;Passed, it is necessary to filter during reading source file It is totally lost and finishes the source file that has been read.
In the prior art, when the source file to end of transmission is filtered, source documents of the agent to end of transmission Part carries out renaming;By renaming, it is possible to achieve in the title of the source file of end of transmission, addition is used to characterize The mark of " source file has been read and has finished ".
But, after renaming source file, the equipment where may causing agent or source file cannot be according to source file Original name form finds the source file with original title, so as to cause ff to fail.
The content of the invention
In order to solve the above technical problems, the embodiment of the present invention provides a kind of data processing method and device based on Flume, The source file being read can be filtered in the case where source file title is not changed.
What the technical scheme of the embodiment of the present invention was realized in:
The embodiment of the present invention provides a kind of data processing method and device based on Flume, including:Read in source file Bi Hou, acquisition is read the source file title of the source file for finishing;Preserve the source file title to source file name group; Filtered out by inquiring about the source file name group and read the source file for finishing.
It is described to preserve the source file title to source file name group in such scheme, including:Preserve the source filename Claim the source file name group in form document;Or, preserve the source file title to the source file name group in database.
In such scheme, methods described also includes:The source file is filtered according to regular expression.
In such scheme, it is described filter out the source file for having read and having finished after, methods described also includes:According to the source The last change time sequencing of file preferentially reads last change time earliest source file.
In such scheme, it is described the source file is filtered according to regular expression before, methods described also includes: Screening field is pre-set in the source file title that need not be filtered.
The embodiment of the present invention provides a kind of data processing equipment based on Flume, and described device includes:
Acquiring unit, for after source file reads and finishes, acquisition to be read the source filename of the source file for finishing Claim;Storage unit, for preserving the source file title to source file name group;Filter element, for by inquiring about the source File name group filters out the source file for having read and having finished.
In such scheme, the storage unit specifically for:Preserve the source file title to the source document in form document Part name group;Or, preserve the source file title to the source file name group in database.
In such scheme, the filter element is additionally operable to filter the source file according to regular expression.
In such scheme, described device also includes reading unit, suitable for the last change time according to the source file Sequence preferentially reads last change time earliest source file.
In such scheme, described device also includes default unit, for advance in the source file title that need not be filtered Screening field is set.
Data processing method based on Flume and device that the embodiment of the present invention is provided, the source file that preservation has been obtained Source file title to source file name group, and filtered by inquiring about the source file name group and read the source file for finishing, Compared to prior art, the source file being read can be filtered in the case where source file title is not changed.
Brief description of the drawings
Fig. 1 is the system architecture schematic diagram of FLume in the prior art;
Fig. 2 realizes flow chart for data processing method of the embodiment of the present invention based on Flume;
Fig. 3 is the specific execution schematic flow sheet of the data processing method that the embodiment of the present invention is based on Flume;
Fig. 4 is the composition structural representation of the data processing equipment that the embodiment of the present invention is based on Flume.
Specific embodiment
In order to more fully hereinafter understand the features of the present invention and technology contents, below in conjunction with the accompanying drawings to reality of the invention Now it is described in detail, appended accompanying drawing purposes of discussion only for reference, not for limiting the present invention.
Fig. 2 realizes flow chart for data processing method of the embodiment of the present invention based on Flume, as shown in Fig. 2 of the invention The data processing method based on Flume that embodiment is provided includes:
Step 201, after source file reads and finishes, acquisition is read the source file title of the source file for finishing.
Step 202, preserves the source file title to source file name group.
Step 203, is filtered out by inquiring about the source file name group and has read the source file for finishing.
In embodiments of the present invention, when preserving source file title in source file name group, the source filename can be preserved Claim the source file name group in form document;The source file title to the source file title in database can also be preserved Group.I.e. in implementation steps 202, source file title can be preserved by the way of form is set up, it would however also be possible to employ set up data The mode in storehouse preserves source file title.
In step 201, catalogue is monitored in agent scannings, monitors the reading situation of source file, once monitor source file Reading is finished, then obtain the source file title for being read and finishing.
In step 202., the title of the source file that the reading that agent will get is finished, is stored in form or database In.
In step 203, agent when being filtered to file, by inquiring about the form or database, you can so which to be known A little source files have been read and have finished, such that it is able to reach the purpose that the source file for finishing has been read in filtering.
Using the technical scheme in the embodiment of the present invention, if the server where agent or source file expects inquiry text Part, then can expect the source document of inquiry according to original file designation form (such as * days * * * * months * * * points of * * daily record) to generate Part title, and file polling is carried out according to the source file title of generation.Or, agent or the server can also be received The source file title of the form of user input, and file polling is carried out according to the source file title.
As shown in figure 3, in the data processing method based on Flume provided in an embodiment of the present invention, agent scans prison first Catalogue is listened, the rule according still further to setting is filtered to source file, afterwards, read source file.
Wherein, when being filtered to source file, not only including step 201 to step 203 in the source document that has read The file filter content that part is filtered, it is real in invention also including the filtering of the source file that need not be read to pre-setting Apply in example, agent is filtered according to regular expression to the source file.
, it is necessary in the source file title that need not be filtered before being filtered to the source file using regular expression In pre-set screening field.
For example, default value ^ can be pre-set, even there is this default value in source file title, then need to read the source File.The title of such as certain source file could be arranged to:^.* .tmp $ filtering .tmp files.So, the source file is for only The source file of filter, that is, need to be read out this document, this document is not read out otherwise.
After the filtering that regular expression filtering is carried out to source file and file has been read, according to the end of the source file Secondary change time sequencing preferentially reads last change time earliest source file.
For example, existing in source file of even date respectively with 18 points, 19 points and 20 points of source documents as last modification time Part need read when, then preferentially read with 18 points of source files as last modification time, then read with 19 points as last change when Between source file, finally read with 20 points of source files as last modification time.
The data processing method based on Flume that the embodiment of the present invention is provided, the source file obtained by preservation Source file title has read the source file for finishing, phase to source file name group by inquiring about the source file name group filtering Than in prior art, the source file being read can be filtered in the case where source file title is not changed.
As shown in figure 4, the data processing equipment based on Flume provided in an embodiment of the present invention includes:
Acquiring unit 401, for after source file reads and finishes, acquisition to be read the source file of the source file for finishing Title.
Storage unit 402, for preserving the source file title to source file name group.
Filter element 403, the source file for finishing has been read for filtering out by inquiring about the source file name group.
In embodiments of the present invention, when storage unit 402 preserves source file title in source file name group, specifically for Preserve the source file title to the source file name group in form document;Can be used for preserving the source file title to number According to the source file name group in storehouse.I.e. storage unit 402 can preserve source file title by the way of form is set up, it is also possible to Source file title is preserved by the way of database is set up.
When source file filtering has been read to source file, catalogue is monitored in the scanning of acquiring unit 401, monitoring source file Reading situation, finishes once having monitored source file and having read, then obtain the source file title for being read and finishing.
Afterwards, the title of the source file that the reading that storage unit 402 will get is finished, is stored in form or database In.
Filter element 403 when being filtered to file, by inquiring about the form or database, you can so which source known File has been read and has finished, such that it is able to reach the purpose that the source file for finishing has been read in filtering.
Using the technical scheme in the embodiment of the present invention, if the server where agent or source file expects inquiry text Part, then can expect the source document of inquiry according to original file designation form (such as * days * * * * months * * * points of * * daily record) to generate Part title, and file polling is carried out according to the source file title of generation.Or, agent or the server can also be received The source file title of the form of user input, and file polling is carried out according to the source file title.
In embodiments of the present invention, the file mistake not filtered using only 403 pairs of source files for having read of filter element Filter content, the also filtering of the source file that need not be read to pre-setting, therefore, filter element 403 is additionally operable to according to canonical Expression formula is filtered to the source file.
, it is necessary to need not filter before filter element 403 is filtered using regular expression to the source file Source file title in pre-set screening field.Therefore, the data processing equipment based on Flume also includes default unit (figure In to show), for pre-setting screening field in the source file title that need not be filtered.
For example, default value ^ can be pre-set, even there is this default value in source file title, then need to read the source File.The title of such as certain source file could be arranged to:^.* .tmp $ filtering .tmp files.So, the source file is for only The source file of filter, that is, need to be read out this document, this document is not read out otherwise.
After the filtering that regular expression filtering is carried out to source file and file has been read, in addition it is also necessary to after reading filtering Source file, therefore, the data processing equipment that the embodiment of the present invention is based on Flume also includes reading unit (not shown), uses Last change time earliest source file is preferentially read in the last change time sequencing according to the source file.
For example, existing in source file of even date respectively with 18 points, 19 points and 20 points of source documents as last modification time When part needs to read, reading unit is preferentially read with 18 points of source files as last modification time, then it is last to read with 19 points The source file of modification time, finally reads with 20 points of source files as last modification time.
The data processing equipment based on Flume that the embodiment of the present invention is provided, the source file obtained by preservation Source file title has read the source file for finishing, phase to source file name group by inquiring about the source file name group filtering Than in prior art, the source file being read can be filtered in the case where source file title is not changed.
In actual applications, acquiring unit 401, storage unit 402, filter element 403, reading unit and default unit Can by be based on the data processing equipment of Flume central processing unit (CPU, Central Processing Unit), Microprocessor (MPU, Micro Processor Unit), digital signal processor (DSP, Digital Signal ) or field programmable gate array (FPGA, Field Programmable Gate Array) etc. is realized Processor.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can be using the shape of the embodiment in terms of hardware embodiment, software implementation or combination software and hardware Formula.And, the present invention can be used can use storage in one or more computers for wherein including computer usable program code The form of the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that every first-class during flow chart and/or block diagram can be realized by computer program instructions The combination of flow and/or square frame in journey and/or square frame and flow chart and/or block diagram.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced for reality by the instruction of computer or the computing device of other programmable data processing devices The device of the function of being specified in present one flow of flow chart or multiple one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions may be alternatively stored in can guide computer or other programmable data processing devices with spy In determining the computer-readable memory that mode works so that instruction of the storage in the computer-readable memory is produced and include finger Make the manufacture of device, the command device realize in one flow of flow chart or multiple one square frame of flow and/or block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented treatment, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
The above, only presently preferred embodiments of the present invention is not intended to limit the scope of the present invention.

Claims (10)

1. a kind of data processing method based on Flume, it is characterised in that methods described includes:
After source file reads and finishes, acquisition is read the source file title of the source file for finishing;
Preserve the source file title to source file name group;
Filtered out by inquiring about the source file name group and read the source file for finishing.
2. method according to claim 1, it is characterised in that the preservation source file title to source file title Group, including:
Preserve the source file title to the source file name group in form document;
Or, preserve the source file title to the source file name group in database.
3. method according to claim 1 and 2, it is characterised in that methods described also includes:
The source file is filtered according to regular expression.
4. method according to claim 1 and 2, it is characterised in that it is described filter out the source file for having read and having finished after, Methods described also includes:
Last change time sequencing according to the source file preferentially reads last change time earliest source file.
5. method according to claim 3, it is characterised in that described to be carried out to the source file according to regular expression Before filter, methods described also includes:
Screening field is pre-set in the source file title that need not be filtered.
6. a kind of data processing equipment based on Flume, it is characterised in that described device includes:
Acquiring unit, for after source file reads and finishes, acquisition to be read the source file title of the source file for finishing;
Storage unit, for preserving the source file title to source file name group;
Filter element, the source file for finishing has been read for filtering out by inquiring about the source file name group.
7. device according to claim 6, it is characterised in that the storage unit specifically for:
Preserve the source file title to the source file name group in form document;
Or, preserve the source file title to the source file name group in database.
8. the device according to claim 6 or 7, it is characterised in that the filter element, is additionally operable to according to regular expression The source file is filtered.
9. the device according to claim 6 or 7, it is characterised in that described device also includes reading unit, for according to institute The last change time sequencing for stating source file preferentially reads last change time earliest source file.
10. device according to claim 8, it is characterised in that described device also includes default unit, for need not Screening field is pre-set in the source file title of filtering.
CN201611161579.5A 2016-12-15 2016-12-15 Data processing method and device based on Flume Active CN106855837B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611161579.5A CN106855837B (en) 2016-12-15 2016-12-15 Data processing method and device based on Flume

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611161579.5A CN106855837B (en) 2016-12-15 2016-12-15 Data processing method and device based on Flume

Publications (2)

Publication Number Publication Date
CN106855837A true CN106855837A (en) 2017-06-16
CN106855837B CN106855837B (en) 2020-12-18

Family

ID=59125857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611161579.5A Active CN106855837B (en) 2016-12-15 2016-12-15 Data processing method and device based on Flume

Country Status (1)

Country Link
CN (1) CN106855837B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073705A (en) * 2017-12-18 2018-05-25 郑州云海信息技术有限公司 A kind of distributed mass data polymerize acquisition method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010084344A1 (en) * 2009-01-20 2010-07-29 Secerno Ltd Method, computer program and apparatus for analysing symbols in a computer system
CN103092712A (en) * 2011-11-04 2013-05-08 阿里巴巴集团控股有限公司 Method and device for recovering interrupt tasks
CN104503864A (en) * 2014-11-20 2015-04-08 北京世纪高蓝科技有限公司 Method and device for file backup based on local area network
CN104753972A (en) * 2013-12-25 2015-07-01 腾讯科技(深圳)有限公司 Network resource collection processing method and server
CN105930379A (en) * 2016-04-14 2016-09-07 北京思特奇信息技术股份有限公司 Method and system for collecting log data by means of interceptor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010084344A1 (en) * 2009-01-20 2010-07-29 Secerno Ltd Method, computer program and apparatus for analysing symbols in a computer system
CN103092712A (en) * 2011-11-04 2013-05-08 阿里巴巴集团控股有限公司 Method and device for recovering interrupt tasks
CN104753972A (en) * 2013-12-25 2015-07-01 腾讯科技(深圳)有限公司 Network resource collection processing method and server
CN104503864A (en) * 2014-11-20 2015-04-08 北京世纪高蓝科技有限公司 Method and device for file backup based on local area network
CN105930379A (en) * 2016-04-14 2016-09-07 北京思特奇信息技术股份有限公司 Method and system for collecting log data by means of interceptor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073705A (en) * 2017-12-18 2018-05-25 郑州云海信息技术有限公司 A kind of distributed mass data polymerize acquisition method
CN108073705B (en) * 2017-12-18 2022-06-14 浪潮云信息技术股份公司 Distributed mass data aggregation acquisition method

Also Published As

Publication number Publication date
CN106855837B (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CA2825764C (en) Systems, methods, apparatuses, and computer program products for forensic monitoring
CN103839003B (en) Malicious file detection method and device
US9485317B2 (en) Method and system for monitoring execution of user request in distributed system
CN107241296B (en) Webshell detection method and device
CN105205397B (en) Rogue program sample sorting technique and device
CN103473346A (en) Android re-packed application detection method based on application programming interface
TW201248398A (en) Monitor data management method and system
CN107784026A (en) A kind of ETL data processing methods and device
CN104036187A (en) Method and system for determining computer virus types
JP2017506790A5 (en)
CN108304322B (en) Pressure testing method and terminal equipment
CN106855837A (en) A kind of data processing method and device based on Flume
EP2590038A2 (en) Method and system for storage of data collected from a real time process
US11379421B1 (en) Generating readable, compressed event trace logs from raw event trace logs
Gregorio et al. Forensic analysis of Telegram messenger desktop on macOS
US20180196858A1 (en) Api driven etl for complex data lakes
Zhang et al. A new epileptic seizure detection method based on fusion feature of weighted complex network
CN103106366A (en) Dynamic maintenance method of sample database based on cloud
CN113051278B (en) Processing method and system for data replication process delay
Martínez Impact of Tools on the Acquisition of RAM Memory
CN111680200A (en) Method, device and equipment for collecting user behavior data and storage medium
CN113157546B (en) Virtual simulation test method and system based on video stream
CN111125063A (en) Method and device for rapidly verifying data migration among clusters
CN111143329A (en) Data processing method and device
CN110704286A (en) Log processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant