CN112905555A - Log file merging method, system, device and medium - Google Patents

Log file merging method, system, device and medium Download PDF

Info

Publication number
CN112905555A
CN112905555A CN202110195748.1A CN202110195748A CN112905555A CN 112905555 A CN112905555 A CN 112905555A CN 202110195748 A CN202110195748 A CN 202110195748A CN 112905555 A CN112905555 A CN 112905555A
Authority
CN
China
Prior art keywords
files
directory
merging
file
log file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110195748.1A
Other languages
Chinese (zh)
Inventor
马聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Yunzhi Anxin Technology Co ltd
Original Assignee
Guangdong Yunzhi Anxin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Yunzhi Anxin Technology Co ltd filed Critical Guangdong Yunzhi Anxin Technology Co ltd
Priority to CN202110195748.1A priority Critical patent/CN112905555A/en
Publication of CN112905555A publication Critical patent/CN112905555A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a system, equipment and a medium for merging log files. The log file merging method comprises the following steps: scanning all files in a directory to be processed, and preprocessing the files if new files are scanned; writing the preprocessed file data into new files under different directories according to types and time periods; and newly building a merging thread under the directory, merging the newly built files, and uploading the merged files to hdfs. A log file merging system, comprising: a scanning preprocessing module; a directory write module; and combining the uploading modules. The invention further provides log file merging equipment and a medium.

Description

Log file merging method, system, device and medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, a system, a device, and a medium for merging log files.
Background
In big data processing, it is popular to collect massive log files of each node as a data source through an ftp service. The log files are generally of random size, and may be relatively large or small. In large data, these log data are generally stored in a file on hdfs for subsequent service processing. If the log files with random sizes are directly stored on hdfs, the memory space of the NameNode nodes in the hdfs cluster can be greatly occupied, and the processing efficiency of later MR tasks can be seriously influenced.
Uploading the log file through ftp generally has a fixed time period. In order to ensure the real-time performance of data and simultaneously prevent the uploading times from being too many, the log file is uploaded once in 1 minute generally. In one minute, the number of logs is not stable, so the log file size is small and large. In addition, the log file may not be of a single type, but may be of multiple types. When big data is processed, the data of the same type correspond to the same business processing logic; different types of data correspond to different types of business processing logic.
By combining the above contents, it can be seen that the two characteristics of the log file are complex in type and different in size, but the large data service processing logic and the processing efficiency are designed conveniently. The logs need to be decimated and merged together according to different types, and meanwhile, the final file size has a block size which needs to be aligned with hdfs.
Therefore, efficiently combining several log files into a fixed-size file is a very practical way in big data processing.
Disclosure of Invention
Based on this, the invention aims to provide a method, a system, equipment and a medium for merging log files.
In a first aspect, the present invention provides a log file merging method, including:
scanning all files in a directory to be processed, and preprocessing the files if new files are scanned;
writing the preprocessed file data into new files under different directories according to types and time periods;
and newly building a merging thread under the directory, merging the newly built files, and uploading the merged files to hdfs.
In an embodiment of the foregoing technical solution, the scanning all files in a directory to be processed includes: walkfiletree scans all files under the entire ftp root directory.
In an embodiment of the foregoing technical solution, if a new file is scanned, preprocessing the file includes: and if a new file appears under a certain directory and the current directory cannot be inquired in the hashset, creating a thread and merging the files under the current directory.
In an embodiment of the foregoing technical solution, the creating a thread includes: when the thread is created, the current directory is written into hashset.
In an embodiment of the above technical solution, before the thread is created, whether the directory is stored in the hashset is detected, and if the directory does not exist, the thread is created.
In an embodiment of the foregoing technical solution, the merging the new file includes: and writing the merged file into a merging directory, and moving the file to the merged directory when the file size reaches 255M.
In an embodiment of the foregoing technical solution, the uploading the merged file to hdfs includes: after the files under the current directory are merged, when the size of the file reaches 255M, the merged file is uploaded to hdfs, the current directory is deleted from hashset, and then the current thread exits.
In a second aspect, the present invention provides a log file merging system, including:
the scanning preprocessing module is configured to scan all files in the directory to be processed, and preprocess the files if new files are scanned;
the directory writing module is configured to write the preprocessed file data into new files under different directories according to types and time periods;
and the merging and uploading module is configured to create a merging thread under the directory, merge the new files, and upload the merged files to hdfs.
In a third aspect, the present invention further provides a log file merging device, including:
a memory for storing one or more programs;
a processor for executing the program stored in the memory to implement the log file merging method as described in any one of the above.
In a fourth aspect, the present invention also provides a computer-readable storage medium storing at least one program which, when executed by a processor, implements the log file merging method according to any one of the above.
Compared with the prior art, the log file merging method, the log file merging system, the log file merging equipment and the log file merging medium have the beneficial effects that:
the log file merging method, the system, the equipment and the medium of the invention use multithreading, fully utilize a plurality of cores of a CPU, ensure no conflict of file reading and writing logically, can pretreat and merge log files very efficiently, and are very effective and practical for data pretreatment in big data.
For a better understanding and practice, the invention is described in detail below with reference to the accompanying drawings.
Drawings
FIG. 1 is an exemplary flow diagram of a log file merging method of the present invention.
FIG. 2 is a connection block diagram of the log file merge system of the present invention.
Detailed Description
The terms of orientation of up, down, left, right, front, back, top, bottom, and the like, referred to or may be referred to in this specification, are defined relative to their configuration, and are relative concepts. Therefore, it may be changed according to different positions and different use states. Therefore, these and other directional terms should not be construed as limiting terms.
The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a block diagram illustrating an exemplary process of a log file merging method according to the present invention.
In a first aspect, the present invention provides a log file merging method, including:
s1, scanning all files in a directory to be processed, and preprocessing the files if new files are scanned.
In the above S1, the scanning all files in the to-be-processed directory includes: walkfiletree scans all files under the entire ftp root directory.
Further, if a new file is scanned, preprocessing the file, including: and if a new file appears under a certain directory and the current directory cannot be inquired in the hashset, creating a thread and merging the files under the current directory.
The creating thread comprises: when the thread is created, the current directory is written into hashset.
In order to avoid that a plurality of threads merge files under the same directory and cause read-write collision, before the threads are created, whether the directory is stored in the hashset is detected, and if the directory does not exist, the threads are created.
And S2, writing the preprocessed file data into new files in different directories according to types and time periods.
Specifically, the preprocessed file data may be written into a new file in a different directory, preferably in a time period of 10 minutes of the log time.
For example: if the file is of ftp type and the logging time is 10/20/12/13/50/sec, the file is written into the ftp/20201020/1210 directory.
And S3, newly building a merging thread under the directory, merging the newly built files, and uploading the merged files to hdfs.
In the above S3, the merging the new file includes: and writing the merged file into a merging directory, and moving the file to the merged directory when the file size reaches 255M.
Further, the uploading the merged file to hdfs includes: after the files under the current directory are merged, when the size of the file reaches 255M, the merged file is uploaded to hdfs, the current directory is deleted from hashset, and then the current thread exits.
The log file merging method can be preferably realized by adopting a springboot + java code.
Referring further to fig. 2, fig. 2 is a connection block diagram of the log file merging system of the present invention.
In a second aspect, the present invention provides a log file merging system, including:
the scanning preprocessing module is configured to scan all files in the directory to be processed, and preprocess the files if new files are scanned;
the directory writing module is configured to write the preprocessed file data into new files under different directories according to types and time periods;
and the merging and uploading module is configured to create a merging thread under the directory, merge the new files, and upload the merged files to hdfs.
In a third aspect, the present invention further provides a log file merging device, including:
a memory for storing one or more programs;
and the processor is used for operating the program stored in the memory so as to realize the log file merging method.
The device may also preferably include a communication interface for communicating with external devices and for interactive transmission of data.
It should be noted that the memory may include a high-speed RAM memory, and may also include a nonvolatile memory (nonvolatile memory), such as at least one disk memory.
In a specific implementation, if the memory, the processor and the communication interface are integrated on a chip, the memory, the processor and the communication interface can complete mutual communication through the internal interface. If the memory, the processor and the communication interface are implemented independently, the memory, the processor and the communication interface may be connected to each other through a bus and perform communication with each other.
In a fourth aspect, the present invention also provides a computer-readable storage medium storing at least one program which, when executed by a processor, implements the log file merging method as described above.
It should be appreciated that the computer-readable storage medium is any data storage device that can store data or programs which can thereafter be read by a computer system. Examples of the computer readable storage medium include read-only memory, random-access memory, CD-ROMs, HDDs, DVDs, magnetic tapes, optical data storage devices, and the like. The computer readable storage medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
In some embodiments, the computer-readable storage medium may be non-transitory.
Compared with the prior art, the log file merging method, the log file merging system, the log file merging equipment and the log file merging medium have the beneficial effects that:
the log file merging method, the system, the equipment and the medium of the invention use multithreading, fully utilize a plurality of cores of a CPU, ensure no conflict of file reading and writing logically, can pretreat and merge log files very efficiently, and are very effective and practical for data pretreatment in big data.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (10)

1. A log file merging method is characterized by comprising the following steps:
scanning all files in a directory to be processed, and preprocessing the files if new files are scanned;
writing the preprocessed file data into new files under different directories according to types and time periods;
and newly building a merging thread under the directory, merging the newly built files, and uploading the merged files to hdfs.
2. The log file merging method as claimed in claim 1, wherein the scanning all files under the pending directory comprises: walkfiletree scans all files under the entire ftp root directory.
3. The method of claim 2, wherein preprocessing the file if a new file is scanned comprises: and if a new file appears under a certain directory and the current directory cannot be inquired in the hashset, creating a thread and merging the files under the current directory.
4. The log file merging method of claim 3, wherein the creating a thread comprises: when the thread is created, the current directory is written into hashset.
5. The method for merging log files according to claim 4, wherein before the thread is created, whether the directory is stored in the hashset is detected, and if the directory does not exist, the thread is created.
6. The method of claim 5, wherein the merging the new file comprises: and writing the merged file into a merging directory, and moving the file to the merged directory when the file size reaches 255M.
7. The log file merging method as described in claim 6 wherein said uploading the merged file onto hdfs comprises: after the files under the current directory are merged, when the size of the file reaches 255M, the merged file is uploaded to hdfs, the current directory is deleted from hashset, and then the current thread exits.
8. A log file merging system, comprising:
the scanning preprocessing module is configured to scan all files in the directory to be processed, and preprocess the files if new files are scanned;
the directory writing module is configured to write the preprocessed file data into new files under different directories according to types and time periods;
and the merging and uploading module is configured to create a merging thread under the directory, merge the new files, and upload the merged files to hdfs.
9. A log file merging apparatus, comprising:
a memory for storing one or more programs;
a processor for executing the program stored in the memory to implement the log file merging method according to any one of claims 1 to 7.
10. A computer-readable storage medium storing at least one program, which when executed by a processor, implements the log file merging method according to any one of claims 1 to 7.
CN202110195748.1A 2021-02-19 2021-02-19 Log file merging method, system, device and medium Pending CN112905555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110195748.1A CN112905555A (en) 2021-02-19 2021-02-19 Log file merging method, system, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110195748.1A CN112905555A (en) 2021-02-19 2021-02-19 Log file merging method, system, device and medium

Publications (1)

Publication Number Publication Date
CN112905555A true CN112905555A (en) 2021-06-04

Family

ID=76124272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110195748.1A Pending CN112905555A (en) 2021-02-19 2021-02-19 Log file merging method, system, device and medium

Country Status (1)

Country Link
CN (1) CN112905555A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
CN105159966A (en) * 2015-08-25 2015-12-16 航天恒星科技有限公司 Method and apparatus for creating directory entity and directory entity processing system
CN105488201A (en) * 2015-12-08 2016-04-13 北京皮尔布莱尼软件有限公司 Log inquiry method and system
US20180102938A1 (en) * 2016-10-11 2018-04-12 Oracle International Corporation Cluster-based processing of unstructured log messages
CN108520016A (en) * 2018-03-21 2018-09-11 四川斐讯信息技术有限公司 Data storage method based on clock timer and Duo Tai upload servers and system
CN109815198A (en) * 2018-12-10 2019-05-28 北京龙拳风暴科技有限公司 Moving game big data pastes active layer implementation method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
CN105159966A (en) * 2015-08-25 2015-12-16 航天恒星科技有限公司 Method and apparatus for creating directory entity and directory entity processing system
CN105488201A (en) * 2015-12-08 2016-04-13 北京皮尔布莱尼软件有限公司 Log inquiry method and system
US20180102938A1 (en) * 2016-10-11 2018-04-12 Oracle International Corporation Cluster-based processing of unstructured log messages
CN108520016A (en) * 2018-03-21 2018-09-11 四川斐讯信息技术有限公司 Data storage method based on clock timer and Duo Tai upload servers and system
CN109815198A (en) * 2018-12-10 2019-05-28 北京龙拳风暴科技有限公司 Moving game big data pastes active layer implementation method and device

Similar Documents

Publication Publication Date Title
CN100472445C (en) Configuring load application method and system of communication apparatus
CN108847977A (en) A kind of monitoring method of business datum, storage medium and server
US9836516B2 (en) Parallel scanners for log based replication
CN106503008B (en) File storage method and device and file query method and device
CN104516921A (en) Automatic response method and device
CN109299152B (en) Suffix array indexing method and device for real-time data stream
CN102508913A (en) Cloud computing system with data cube storage index structure
CN103150149A (en) Method and device for processing redo data of database
CN103595571B (en) Preprocess method, the apparatus and system of web log
CN106569964A (en) Power-off protection method, power-off protection device, power-off protection system and memory
CN109144955A (en) A kind of file reading and electronic equipment
CN104408068A (en) Report form data processing method and related equipment
CN105183384A (en) Direct erasure correction implementation method and device
CN111897828A (en) Data batch processing implementation method, device, equipment and storage medium
CN108762979A (en) A kind of end message backup method and alternate device based on matching tree
CN103678360A (en) Data storing method and device for distributed file system
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN111159265A (en) ETL data migration method and system
CN104699815A (en) Data processing method and system
CN112905555A (en) Log file merging method, system, device and medium
CN112235124B (en) Method and device for configuring pico-cell, storage medium and electronic device
CN108491274A (en) Optimization method, device, storage medium and the equipment of distributed data management
CN108197323A (en) Applied to distributed system map data processing method
US20200133749A1 (en) Method and apparatus for transformation of mpi programs for memory centric computers
CN110928484B (en) Hybrid cloud storage method based on software defined storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210604