CN106855837A - A kind of data processing method and device based on Flume - Google Patents
A kind of data processing method and device based on Flume Download PDFInfo
- Publication number
- CN106855837A CN106855837A CN201611161579.5A CN201611161579A CN106855837A CN 106855837 A CN106855837 A CN 106855837A CN 201611161579 A CN201611161579 A CN 201611161579A CN 106855837 A CN106855837 A CN 106855837A
- Authority
- CN
- China
- Prior art keywords
- source file
- title
- read
- name group
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
- G06F11/3072—Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
Abstract
The invention discloses a kind of data processing method based on Flume, methods described includes:After source file reads and finishes, acquisition is read the source file title of the source file for finishing;Preserve the source file title to source file name group;Filtered out by inquiring about the source file name group and read the source file for finishing.The invention also discloses a kind of data processing equipment based on Flume.
Description
Technical field
The present invention relates to data processing technique, more particularly to a kind of data processing method and device based on Flume.
Background technology
Flume is a High Availabitity, highly reliable, distributed massive logs collection, polymerization and the biography that Cloudera is provided
Defeated system, Flume supports to customize Various types of data sender in log system, for collecting data, wherein, Flume is provided
Held from console (console), RPC (Thrift-RPC), file (text), tail (UNIX tail), syslog and order
The ability of data is collected in the data sources such as row (exec);Meanwhile, Flume is provided and data is carried out with simple process, and is write various
The ability of customizable data recipient.
Fig. 1 is the system architecture schematic diagram of Flume in the prior art, as shown in figure 1, collection agent is responsible for carrying out data
Collection, the data refer to the daily record in the equipment to be collected such as server;It is that agent specifies monitoring mesh in data acquisition
After record, agent acquiescences read the source file of All Files under the catalogue;Passed, it is necessary to filter during reading source file
It is totally lost and finishes the source file that has been read.
In the prior art, when the source file to end of transmission is filtered, source documents of the agent to end of transmission
Part carries out renaming;By renaming, it is possible to achieve in the title of the source file of end of transmission, addition is used to characterize
The mark of " source file has been read and has finished ".
But, after renaming source file, the equipment where may causing agent or source file cannot be according to source file
Original name form finds the source file with original title, so as to cause ff to fail.
The content of the invention
In order to solve the above technical problems, the embodiment of the present invention provides a kind of data processing method and device based on Flume,
The source file being read can be filtered in the case where source file title is not changed.
What the technical scheme of the embodiment of the present invention was realized in:
The embodiment of the present invention provides a kind of data processing method and device based on Flume, including:Read in source file
Bi Hou, acquisition is read the source file title of the source file for finishing;Preserve the source file title to source file name group;
Filtered out by inquiring about the source file name group and read the source file for finishing.
It is described to preserve the source file title to source file name group in such scheme, including:Preserve the source filename
Claim the source file name group in form document;Or, preserve the source file title to the source file name group in database.
In such scheme, methods described also includes:The source file is filtered according to regular expression.
In such scheme, it is described filter out the source file for having read and having finished after, methods described also includes:According to the source
The last change time sequencing of file preferentially reads last change time earliest source file.
In such scheme, it is described the source file is filtered according to regular expression before, methods described also includes:
Screening field is pre-set in the source file title that need not be filtered.
The embodiment of the present invention provides a kind of data processing equipment based on Flume, and described device includes:
Acquiring unit, for after source file reads and finishes, acquisition to be read the source filename of the source file for finishing
Claim;Storage unit, for preserving the source file title to source file name group;Filter element, for by inquiring about the source
File name group filters out the source file for having read and having finished.
In such scheme, the storage unit specifically for:Preserve the source file title to the source document in form document
Part name group;Or, preserve the source file title to the source file name group in database.
In such scheme, the filter element is additionally operable to filter the source file according to regular expression.
In such scheme, described device also includes reading unit, suitable for the last change time according to the source file
Sequence preferentially reads last change time earliest source file.
In such scheme, described device also includes default unit, for advance in the source file title that need not be filtered
Screening field is set.
Data processing method based on Flume and device that the embodiment of the present invention is provided, the source file that preservation has been obtained
Source file title to source file name group, and filtered by inquiring about the source file name group and read the source file for finishing,
Compared to prior art, the source file being read can be filtered in the case where source file title is not changed.
Brief description of the drawings
Fig. 1 is the system architecture schematic diagram of FLume in the prior art;
Fig. 2 realizes flow chart for data processing method of the embodiment of the present invention based on Flume;
Fig. 3 is the specific execution schematic flow sheet of the data processing method that the embodiment of the present invention is based on Flume;
Fig. 4 is the composition structural representation of the data processing equipment that the embodiment of the present invention is based on Flume.
Specific embodiment
In order to more fully hereinafter understand the features of the present invention and technology contents, below in conjunction with the accompanying drawings to reality of the invention
Now it is described in detail, appended accompanying drawing purposes of discussion only for reference, not for limiting the present invention.
Fig. 2 realizes flow chart for data processing method of the embodiment of the present invention based on Flume, as shown in Fig. 2 of the invention
The data processing method based on Flume that embodiment is provided includes:
Step 201, after source file reads and finishes, acquisition is read the source file title of the source file for finishing.
Step 202, preserves the source file title to source file name group.
Step 203, is filtered out by inquiring about the source file name group and has read the source file for finishing.
In embodiments of the present invention, when preserving source file title in source file name group, the source filename can be preserved
Claim the source file name group in form document;The source file title to the source file title in database can also be preserved
Group.I.e. in implementation steps 202, source file title can be preserved by the way of form is set up, it would however also be possible to employ set up data
The mode in storehouse preserves source file title.
In step 201, catalogue is monitored in agent scannings, monitors the reading situation of source file, once monitor source file
Reading is finished, then obtain the source file title for being read and finishing.
In step 202., the title of the source file that the reading that agent will get is finished, is stored in form or database
In.
In step 203, agent when being filtered to file, by inquiring about the form or database, you can so which to be known
A little source files have been read and have finished, such that it is able to reach the purpose that the source file for finishing has been read in filtering.
Using the technical scheme in the embodiment of the present invention, if the server where agent or source file expects inquiry text
Part, then can expect the source document of inquiry according to original file designation form (such as * days * * * * months * * * points of * * daily record) to generate
Part title, and file polling is carried out according to the source file title of generation.Or, agent or the server can also be received
The source file title of the form of user input, and file polling is carried out according to the source file title.
As shown in figure 3, in the data processing method based on Flume provided in an embodiment of the present invention, agent scans prison first
Catalogue is listened, the rule according still further to setting is filtered to source file, afterwards, read source file.
Wherein, when being filtered to source file, not only including step 201 to step 203 in the source document that has read
The file filter content that part is filtered, it is real in invention also including the filtering of the source file that need not be read to pre-setting
Apply in example, agent is filtered according to regular expression to the source file.
, it is necessary in the source file title that need not be filtered before being filtered to the source file using regular expression
In pre-set screening field.
For example, default value ^ can be pre-set, even there is this default value in source file title, then need to read the source
File.The title of such as certain source file could be arranged to:^.* .tmp $ filtering .tmp files.So, the source file is for only
The source file of filter, that is, need to be read out this document, this document is not read out otherwise.
After the filtering that regular expression filtering is carried out to source file and file has been read, according to the end of the source file
Secondary change time sequencing preferentially reads last change time earliest source file.
For example, existing in source file of even date respectively with 18 points, 19 points and 20 points of source documents as last modification time
Part need read when, then preferentially read with 18 points of source files as last modification time, then read with 19 points as last change when
Between source file, finally read with 20 points of source files as last modification time.
The data processing method based on Flume that the embodiment of the present invention is provided, the source file obtained by preservation
Source file title has read the source file for finishing, phase to source file name group by inquiring about the source file name group filtering
Than in prior art, the source file being read can be filtered in the case where source file title is not changed.
As shown in figure 4, the data processing equipment based on Flume provided in an embodiment of the present invention includes:
Acquiring unit 401, for after source file reads and finishes, acquisition to be read the source file of the source file for finishing
Title.
Storage unit 402, for preserving the source file title to source file name group.
Filter element 403, the source file for finishing has been read for filtering out by inquiring about the source file name group.
In embodiments of the present invention, when storage unit 402 preserves source file title in source file name group, specifically for
Preserve the source file title to the source file name group in form document;Can be used for preserving the source file title to number
According to the source file name group in storehouse.I.e. storage unit 402 can preserve source file title by the way of form is set up, it is also possible to
Source file title is preserved by the way of database is set up.
When source file filtering has been read to source file, catalogue is monitored in the scanning of acquiring unit 401, monitoring source file
Reading situation, finishes once having monitored source file and having read, then obtain the source file title for being read and finishing.
Afterwards, the title of the source file that the reading that storage unit 402 will get is finished, is stored in form or database
In.
Filter element 403 when being filtered to file, by inquiring about the form or database, you can so which source known
File has been read and has finished, such that it is able to reach the purpose that the source file for finishing has been read in filtering.
Using the technical scheme in the embodiment of the present invention, if the server where agent or source file expects inquiry text
Part, then can expect the source document of inquiry according to original file designation form (such as * days * * * * months * * * points of * * daily record) to generate
Part title, and file polling is carried out according to the source file title of generation.Or, agent or the server can also be received
The source file title of the form of user input, and file polling is carried out according to the source file title.
In embodiments of the present invention, the file mistake not filtered using only 403 pairs of source files for having read of filter element
Filter content, the also filtering of the source file that need not be read to pre-setting, therefore, filter element 403 is additionally operable to according to canonical
Expression formula is filtered to the source file.
, it is necessary to need not filter before filter element 403 is filtered using regular expression to the source file
Source file title in pre-set screening field.Therefore, the data processing equipment based on Flume also includes default unit (figure
In to show), for pre-setting screening field in the source file title that need not be filtered.
For example, default value ^ can be pre-set, even there is this default value in source file title, then need to read the source
File.The title of such as certain source file could be arranged to:^.* .tmp $ filtering .tmp files.So, the source file is for only
The source file of filter, that is, need to be read out this document, this document is not read out otherwise.
After the filtering that regular expression filtering is carried out to source file and file has been read, in addition it is also necessary to after reading filtering
Source file, therefore, the data processing equipment that the embodiment of the present invention is based on Flume also includes reading unit (not shown), uses
Last change time earliest source file is preferentially read in the last change time sequencing according to the source file.
For example, existing in source file of even date respectively with 18 points, 19 points and 20 points of source documents as last modification time
When part needs to read, reading unit is preferentially read with 18 points of source files as last modification time, then it is last to read with 19 points
The source file of modification time, finally reads with 20 points of source files as last modification time.
The data processing equipment based on Flume that the embodiment of the present invention is provided, the source file obtained by preservation
Source file title has read the source file for finishing, phase to source file name group by inquiring about the source file name group filtering
Than in prior art, the source file being read can be filtered in the case where source file title is not changed.
In actual applications, acquiring unit 401, storage unit 402, filter element 403, reading unit and default unit
Can by be based on the data processing equipment of Flume central processing unit (CPU, Central Processing Unit),
Microprocessor (MPU, Micro Processor Unit), digital signal processor (DSP, Digital Signal
) or field programmable gate array (FPGA, Field Programmable Gate Array) etc. is realized Processor.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program
Product.Therefore, the present invention can be using the shape of the embodiment in terms of hardware embodiment, software implementation or combination software and hardware
Formula.And, the present invention can be used can use storage in one or more computers for wherein including computer usable program code
The form of the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product
Figure and/or block diagram are described.It should be understood that every first-class during flow chart and/or block diagram can be realized by computer program instructions
The combination of flow and/or square frame in journey and/or square frame and flow chart and/or block diagram.These computer programs can be provided
The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that produced for reality by the instruction of computer or the computing device of other programmable data processing devices
The device of the function of being specified in present one flow of flow chart or multiple one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions may be alternatively stored in can guide computer or other programmable data processing devices with spy
In determining the computer-readable memory that mode works so that instruction of the storage in the computer-readable memory is produced and include finger
Make the manufacture of device, the command device realize in one flow of flow chart or multiple one square frame of flow and/or block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented treatment, so as in computer or
The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in individual square frame or multiple square frames.
The above, only presently preferred embodiments of the present invention is not intended to limit the scope of the present invention.
Claims (10)
1. a kind of data processing method based on Flume, it is characterised in that methods described includes:
After source file reads and finishes, acquisition is read the source file title of the source file for finishing;
Preserve the source file title to source file name group;
Filtered out by inquiring about the source file name group and read the source file for finishing.
2. method according to claim 1, it is characterised in that the preservation source file title to source file title
Group, including:
Preserve the source file title to the source file name group in form document;
Or, preserve the source file title to the source file name group in database.
3. method according to claim 1 and 2, it is characterised in that methods described also includes:
The source file is filtered according to regular expression.
4. method according to claim 1 and 2, it is characterised in that it is described filter out the source file for having read and having finished after,
Methods described also includes:
Last change time sequencing according to the source file preferentially reads last change time earliest source file.
5. method according to claim 3, it is characterised in that described to be carried out to the source file according to regular expression
Before filter, methods described also includes:
Screening field is pre-set in the source file title that need not be filtered.
6. a kind of data processing equipment based on Flume, it is characterised in that described device includes:
Acquiring unit, for after source file reads and finishes, acquisition to be read the source file title of the source file for finishing;
Storage unit, for preserving the source file title to source file name group;
Filter element, the source file for finishing has been read for filtering out by inquiring about the source file name group.
7. device according to claim 6, it is characterised in that the storage unit specifically for:
Preserve the source file title to the source file name group in form document;
Or, preserve the source file title to the source file name group in database.
8. the device according to claim 6 or 7, it is characterised in that the filter element, is additionally operable to according to regular expression
The source file is filtered.
9. the device according to claim 6 or 7, it is characterised in that described device also includes reading unit, for according to institute
The last change time sequencing for stating source file preferentially reads last change time earliest source file.
10. device according to claim 8, it is characterised in that described device also includes default unit, for need not
Screening field is pre-set in the source file title of filtering.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611161579.5A CN106855837B (en) | 2016-12-15 | 2016-12-15 | Data processing method and device based on Flume |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611161579.5A CN106855837B (en) | 2016-12-15 | 2016-12-15 | Data processing method and device based on Flume |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106855837A true CN106855837A (en) | 2017-06-16 |
CN106855837B CN106855837B (en) | 2020-12-18 |
Family
ID=59125857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611161579.5A Active CN106855837B (en) | 2016-12-15 | 2016-12-15 | Data processing method and device based on Flume |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106855837B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108073705A (en) * | 2017-12-18 | 2018-05-25 | 郑州云海信息技术有限公司 | A kind of distributed mass data polymerize acquisition method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010084344A1 (en) * | 2009-01-20 | 2010-07-29 | Secerno Ltd | Method, computer program and apparatus for analysing symbols in a computer system |
CN103092712A (en) * | 2011-11-04 | 2013-05-08 | 阿里巴巴集团控股有限公司 | Method and device for recovering interrupt tasks |
CN104503864A (en) * | 2014-11-20 | 2015-04-08 | 北京世纪高蓝科技有限公司 | Method and device for file backup based on local area network |
CN104753972A (en) * | 2013-12-25 | 2015-07-01 | 腾讯科技(深圳)有限公司 | Network resource collection processing method and server |
CN105930379A (en) * | 2016-04-14 | 2016-09-07 | 北京思特奇信息技术股份有限公司 | Method and system for collecting log data by means of interceptor |
-
2016
- 2016-12-15 CN CN201611161579.5A patent/CN106855837B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010084344A1 (en) * | 2009-01-20 | 2010-07-29 | Secerno Ltd | Method, computer program and apparatus for analysing symbols in a computer system |
CN103092712A (en) * | 2011-11-04 | 2013-05-08 | 阿里巴巴集团控股有限公司 | Method and device for recovering interrupt tasks |
CN104753972A (en) * | 2013-12-25 | 2015-07-01 | 腾讯科技(深圳)有限公司 | Network resource collection processing method and server |
CN104503864A (en) * | 2014-11-20 | 2015-04-08 | 北京世纪高蓝科技有限公司 | Method and device for file backup based on local area network |
CN105930379A (en) * | 2016-04-14 | 2016-09-07 | 北京思特奇信息技术股份有限公司 | Method and system for collecting log data by means of interceptor |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108073705A (en) * | 2017-12-18 | 2018-05-25 | 郑州云海信息技术有限公司 | A kind of distributed mass data polymerize acquisition method |
CN108073705B (en) * | 2017-12-18 | 2022-06-14 | 浪潮云信息技术股份公司 | Distributed mass data aggregation acquisition method |
Also Published As
Publication number | Publication date |
---|---|
CN106855837B (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2825764C (en) | Systems, methods, apparatuses, and computer program products for forensic monitoring | |
CN103839003B (en) | Malicious file detection method and device | |
US9485317B2 (en) | Method and system for monitoring execution of user request in distributed system | |
CN107241296B (en) | Webshell detection method and device | |
CN105205397B (en) | Rogue program sample sorting technique and device | |
CN103473346A (en) | Android re-packed application detection method based on application programming interface | |
TW201248398A (en) | Monitor data management method and system | |
CN107784026A (en) | A kind of ETL data processing methods and device | |
CN104036187A (en) | Method and system for determining computer virus types | |
JP2017506790A5 (en) | ||
CN108304322B (en) | Pressure testing method and terminal equipment | |
CN106855837A (en) | A kind of data processing method and device based on Flume | |
EP2590038A2 (en) | Method and system for storage of data collected from a real time process | |
US11379421B1 (en) | Generating readable, compressed event trace logs from raw event trace logs | |
Gregorio et al. | Forensic analysis of Telegram messenger desktop on macOS | |
US20180196858A1 (en) | Api driven etl for complex data lakes | |
Zhang et al. | A new epileptic seizure detection method based on fusion feature of weighted complex network | |
CN103106366A (en) | Dynamic maintenance method of sample database based on cloud | |
CN113051278B (en) | Processing method and system for data replication process delay | |
Martínez | Impact of Tools on the Acquisition of RAM Memory | |
CN111680200A (en) | Method, device and equipment for collecting user behavior data and storage medium | |
CN113157546B (en) | Virtual simulation test method and system based on video stream | |
CN111125063A (en) | Method and device for rapidly verifying data migration among clusters | |
CN111143329A (en) | Data processing method and device | |
CN110704286A (en) | Log processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |