CN112905555A - Log file merging method, system, device and medium - Google Patents
Log file merging method, system, device and medium Download PDFInfo
- Publication number
- CN112905555A CN112905555A CN202110195748.1A CN202110195748A CN112905555A CN 112905555 A CN112905555 A CN 112905555A CN 202110195748 A CN202110195748 A CN 202110195748A CN 112905555 A CN112905555 A CN 112905555A
- Authority
- CN
- China
- Prior art keywords
- files
- directory
- merging
- file
- log file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method, a system, equipment and a medium for merging log files. The log file merging method comprises the following steps: scanning all files in a directory to be processed, and preprocessing the files if new files are scanned; writing the preprocessed file data into new files under different directories according to types and time periods; and newly building a merging thread under the directory, merging the newly built files, and uploading the merged files to hdfs. A log file merging system, comprising: a scanning preprocessing module; a directory write module; and combining the uploading modules. The invention further provides log file merging equipment and a medium.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, a system, a device, and a medium for merging log files.
Background
In big data processing, it is popular to collect massive log files of each node as a data source through an ftp service. The log files are generally of random size, and may be relatively large or small. In large data, these log data are generally stored in a file on hdfs for subsequent service processing. If the log files with random sizes are directly stored on hdfs, the memory space of the NameNode nodes in the hdfs cluster can be greatly occupied, and the processing efficiency of later MR tasks can be seriously influenced.
Uploading the log file through ftp generally has a fixed time period. In order to ensure the real-time performance of data and simultaneously prevent the uploading times from being too many, the log file is uploaded once in 1 minute generally. In one minute, the number of logs is not stable, so the log file size is small and large. In addition, the log file may not be of a single type, but may be of multiple types. When big data is processed, the data of the same type correspond to the same business processing logic; different types of data correspond to different types of business processing logic.
By combining the above contents, it can be seen that the two characteristics of the log file are complex in type and different in size, but the large data service processing logic and the processing efficiency are designed conveniently. The logs need to be decimated and merged together according to different types, and meanwhile, the final file size has a block size which needs to be aligned with hdfs.
Therefore, efficiently combining several log files into a fixed-size file is a very practical way in big data processing.
Disclosure of Invention
Based on this, the invention aims to provide a method, a system, equipment and a medium for merging log files.
In a first aspect, the present invention provides a log file merging method, including:
scanning all files in a directory to be processed, and preprocessing the files if new files are scanned;
writing the preprocessed file data into new files under different directories according to types and time periods;
and newly building a merging thread under the directory, merging the newly built files, and uploading the merged files to hdfs.
In an embodiment of the foregoing technical solution, the scanning all files in a directory to be processed includes: walkfiletree scans all files under the entire ftp root directory.
In an embodiment of the foregoing technical solution, if a new file is scanned, preprocessing the file includes: and if a new file appears under a certain directory and the current directory cannot be inquired in the hashset, creating a thread and merging the files under the current directory.
In an embodiment of the foregoing technical solution, the creating a thread includes: when the thread is created, the current directory is written into hashset.
In an embodiment of the above technical solution, before the thread is created, whether the directory is stored in the hashset is detected, and if the directory does not exist, the thread is created.
In an embodiment of the foregoing technical solution, the merging the new file includes: and writing the merged file into a merging directory, and moving the file to the merged directory when the file size reaches 255M.
In an embodiment of the foregoing technical solution, the uploading the merged file to hdfs includes: after the files under the current directory are merged, when the size of the file reaches 255M, the merged file is uploaded to hdfs, the current directory is deleted from hashset, and then the current thread exits.
In a second aspect, the present invention provides a log file merging system, including:
the scanning preprocessing module is configured to scan all files in the directory to be processed, and preprocess the files if new files are scanned;
the directory writing module is configured to write the preprocessed file data into new files under different directories according to types and time periods;
and the merging and uploading module is configured to create a merging thread under the directory, merge the new files, and upload the merged files to hdfs.
In a third aspect, the present invention further provides a log file merging device, including:
a memory for storing one or more programs;
a processor for executing the program stored in the memory to implement the log file merging method as described in any one of the above.
In a fourth aspect, the present invention also provides a computer-readable storage medium storing at least one program which, when executed by a processor, implements the log file merging method according to any one of the above.
Compared with the prior art, the log file merging method, the log file merging system, the log file merging equipment and the log file merging medium have the beneficial effects that:
the log file merging method, the system, the equipment and the medium of the invention use multithreading, fully utilize a plurality of cores of a CPU, ensure no conflict of file reading and writing logically, can pretreat and merge log files very efficiently, and are very effective and practical for data pretreatment in big data.
For a better understanding and practice, the invention is described in detail below with reference to the accompanying drawings.
Drawings
FIG. 1 is an exemplary flow diagram of a log file merging method of the present invention.
FIG. 2 is a connection block diagram of the log file merge system of the present invention.
Detailed Description
The terms of orientation of up, down, left, right, front, back, top, bottom, and the like, referred to or may be referred to in this specification, are defined relative to their configuration, and are relative concepts. Therefore, it may be changed according to different positions and different use states. Therefore, these and other directional terms should not be construed as limiting terms.
The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a block diagram illustrating an exemplary process of a log file merging method according to the present invention.
In a first aspect, the present invention provides a log file merging method, including:
s1, scanning all files in a directory to be processed, and preprocessing the files if new files are scanned.
In the above S1, the scanning all files in the to-be-processed directory includes: walkfiletree scans all files under the entire ftp root directory.
Further, if a new file is scanned, preprocessing the file, including: and if a new file appears under a certain directory and the current directory cannot be inquired in the hashset, creating a thread and merging the files under the current directory.
The creating thread comprises: when the thread is created, the current directory is written into hashset.
In order to avoid that a plurality of threads merge files under the same directory and cause read-write collision, before the threads are created, whether the directory is stored in the hashset is detected, and if the directory does not exist, the threads are created.
And S2, writing the preprocessed file data into new files in different directories according to types and time periods.
Specifically, the preprocessed file data may be written into a new file in a different directory, preferably in a time period of 10 minutes of the log time.
For example: if the file is of ftp type and the logging time is 10/20/12/13/50/sec, the file is written into the ftp/20201020/1210 directory.
And S3, newly building a merging thread under the directory, merging the newly built files, and uploading the merged files to hdfs.
In the above S3, the merging the new file includes: and writing the merged file into a merging directory, and moving the file to the merged directory when the file size reaches 255M.
Further, the uploading the merged file to hdfs includes: after the files under the current directory are merged, when the size of the file reaches 255M, the merged file is uploaded to hdfs, the current directory is deleted from hashset, and then the current thread exits.
The log file merging method can be preferably realized by adopting a springboot + java code.
Referring further to fig. 2, fig. 2 is a connection block diagram of the log file merging system of the present invention.
In a second aspect, the present invention provides a log file merging system, including:
the scanning preprocessing module is configured to scan all files in the directory to be processed, and preprocess the files if new files are scanned;
the directory writing module is configured to write the preprocessed file data into new files under different directories according to types and time periods;
and the merging and uploading module is configured to create a merging thread under the directory, merge the new files, and upload the merged files to hdfs.
In a third aspect, the present invention further provides a log file merging device, including:
a memory for storing one or more programs;
and the processor is used for operating the program stored in the memory so as to realize the log file merging method.
The device may also preferably include a communication interface for communicating with external devices and for interactive transmission of data.
It should be noted that the memory may include a high-speed RAM memory, and may also include a nonvolatile memory (nonvolatile memory), such as at least one disk memory.
In a specific implementation, if the memory, the processor and the communication interface are integrated on a chip, the memory, the processor and the communication interface can complete mutual communication through the internal interface. If the memory, the processor and the communication interface are implemented independently, the memory, the processor and the communication interface may be connected to each other through a bus and perform communication with each other.
In a fourth aspect, the present invention also provides a computer-readable storage medium storing at least one program which, when executed by a processor, implements the log file merging method as described above.
It should be appreciated that the computer-readable storage medium is any data storage device that can store data or programs which can thereafter be read by a computer system. Examples of the computer readable storage medium include read-only memory, random-access memory, CD-ROMs, HDDs, DVDs, magnetic tapes, optical data storage devices, and the like. The computer readable storage medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
In some embodiments, the computer-readable storage medium may be non-transitory.
Compared with the prior art, the log file merging method, the log file merging system, the log file merging equipment and the log file merging medium have the beneficial effects that:
the log file merging method, the system, the equipment and the medium of the invention use multithreading, fully utilize a plurality of cores of a CPU, ensure no conflict of file reading and writing logically, can pretreat and merge log files very efficiently, and are very effective and practical for data pretreatment in big data.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.
Claims (10)
1. A log file merging method is characterized by comprising the following steps:
scanning all files in a directory to be processed, and preprocessing the files if new files are scanned;
writing the preprocessed file data into new files under different directories according to types and time periods;
and newly building a merging thread under the directory, merging the newly built files, and uploading the merged files to hdfs.
2. The log file merging method as claimed in claim 1, wherein the scanning all files under the pending directory comprises: walkfiletree scans all files under the entire ftp root directory.
3. The method of claim 2, wherein preprocessing the file if a new file is scanned comprises: and if a new file appears under a certain directory and the current directory cannot be inquired in the hashset, creating a thread and merging the files under the current directory.
4. The log file merging method of claim 3, wherein the creating a thread comprises: when the thread is created, the current directory is written into hashset.
5. The method for merging log files according to claim 4, wherein before the thread is created, whether the directory is stored in the hashset is detected, and if the directory does not exist, the thread is created.
6. The method of claim 5, wherein the merging the new file comprises: and writing the merged file into a merging directory, and moving the file to the merged directory when the file size reaches 255M.
7. The log file merging method as described in claim 6 wherein said uploading the merged file onto hdfs comprises: after the files under the current directory are merged, when the size of the file reaches 255M, the merged file is uploaded to hdfs, the current directory is deleted from hashset, and then the current thread exits.
8. A log file merging system, comprising:
the scanning preprocessing module is configured to scan all files in the directory to be processed, and preprocess the files if new files are scanned;
the directory writing module is configured to write the preprocessed file data into new files under different directories according to types and time periods;
and the merging and uploading module is configured to create a merging thread under the directory, merge the new files, and upload the merged files to hdfs.
9. A log file merging apparatus, comprising:
a memory for storing one or more programs;
a processor for executing the program stored in the memory to implement the log file merging method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing at least one program, which when executed by a processor, implements the log file merging method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110195748.1A CN112905555A (en) | 2021-02-19 | 2021-02-19 | Log file merging method, system, device and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110195748.1A CN112905555A (en) | 2021-02-19 | 2021-02-19 | Log file merging method, system, device and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112905555A true CN112905555A (en) | 2021-06-04 |
Family
ID=76124272
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110195748.1A Pending CN112905555A (en) | 2021-02-19 | 2021-02-19 | Log file merging method, system, device and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112905555A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577123A (en) * | 2013-11-12 | 2014-02-12 | 河海大学 | Small file optimization storage method based on HDFS |
CN105159966A (en) * | 2015-08-25 | 2015-12-16 | 航天恒星科技有限公司 | Method and apparatus for creating directory entity and directory entity processing system |
CN105488201A (en) * | 2015-12-08 | 2016-04-13 | 北京皮尔布莱尼软件有限公司 | Log inquiry method and system |
US20180102938A1 (en) * | 2016-10-11 | 2018-04-12 | Oracle International Corporation | Cluster-based processing of unstructured log messages |
CN108520016A (en) * | 2018-03-21 | 2018-09-11 | 四川斐讯信息技术有限公司 | Data storage method based on clock timer and Duo Tai upload servers and system |
CN109815198A (en) * | 2018-12-10 | 2019-05-28 | 北京龙拳风暴科技有限公司 | Moving game big data pastes active layer implementation method and device |
-
2021
- 2021-02-19 CN CN202110195748.1A patent/CN112905555A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103577123A (en) * | 2013-11-12 | 2014-02-12 | 河海大学 | Small file optimization storage method based on HDFS |
CN105159966A (en) * | 2015-08-25 | 2015-12-16 | 航天恒星科技有限公司 | Method and apparatus for creating directory entity and directory entity processing system |
CN105488201A (en) * | 2015-12-08 | 2016-04-13 | 北京皮尔布莱尼软件有限公司 | Log inquiry method and system |
US20180102938A1 (en) * | 2016-10-11 | 2018-04-12 | Oracle International Corporation | Cluster-based processing of unstructured log messages |
CN108520016A (en) * | 2018-03-21 | 2018-09-11 | 四川斐讯信息技术有限公司 | Data storage method based on clock timer and Duo Tai upload servers and system |
CN109815198A (en) * | 2018-12-10 | 2019-05-28 | 北京龙拳风暴科技有限公司 | Moving game big data pastes active layer implementation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100472445C (en) | Configuring load application method and system of communication apparatus | |
CN108847977A (en) | A kind of monitoring method of business datum, storage medium and server | |
US9836516B2 (en) | Parallel scanners for log based replication | |
CN106503008B (en) | File storage method and device and file query method and device | |
CN104516921A (en) | Automatic response method and device | |
CN109299152B (en) | Suffix array indexing method and device for real-time data stream | |
CN102508913A (en) | Cloud computing system with data cube storage index structure | |
CN103150149A (en) | Method and device for processing redo data of database | |
CN103595571B (en) | Preprocess method, the apparatus and system of web log | |
CN106569964A (en) | Power-off protection method, power-off protection device, power-off protection system and memory | |
CN109144955A (en) | A kind of file reading and electronic equipment | |
CN104408068A (en) | Report form data processing method and related equipment | |
CN105183384A (en) | Direct erasure correction implementation method and device | |
CN111897828A (en) | Data batch processing implementation method, device, equipment and storage medium | |
CN108762979A (en) | A kind of end message backup method and alternate device based on matching tree | |
CN103678360A (en) | Data storing method and device for distributed file system | |
CN112860412B (en) | Service data processing method and device, electronic equipment and storage medium | |
CN111159265A (en) | ETL data migration method and system | |
CN104699815A (en) | Data processing method and system | |
CN112905555A (en) | Log file merging method, system, device and medium | |
CN112235124B (en) | Method and device for configuring pico-cell, storage medium and electronic device | |
CN108491274A (en) | Optimization method, device, storage medium and the equipment of distributed data management | |
CN108197323A (en) | Applied to distributed system map data processing method | |
US20200133749A1 (en) | Method and apparatus for transformation of mpi programs for memory centric computers | |
CN110928484B (en) | Hybrid cloud storage method based on software defined storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210604 |