CN106776617B - Log file saving method and device - Google Patents

Log file saving method and device Download PDF

Info

Publication number
CN106776617B
CN106776617B CN201510812631.8A CN201510812631A CN106776617B CN 106776617 B CN106776617 B CN 106776617B CN 201510812631 A CN201510812631 A CN 201510812631A CN 106776617 B CN106776617 B CN 106776617B
Authority
CN
China
Prior art keywords
distributed database
log file
target
file
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510812631.8A
Other languages
Chinese (zh)
Other versions
CN106776617A (en
Inventor
汤卫群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201510812631.8A priority Critical patent/CN106776617B/en
Publication of CN106776617A publication Critical patent/CN106776617A/en
Application granted granted Critical
Publication of CN106776617B publication Critical patent/CN106776617B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The application discloses a log file saving method and device. Wherein, the method comprises the following steps: acquiring a log file recorded by a server; creating a target file in a distributed database, wherein the target file is used for storing a log file; acquiring data transmission resources between a server and a distributed database; and saving the log file to a target file of the distributed database by using the data transmission resource. The technical problem that the log file of the server cannot be stored quickly and efficiently is solved.

Description

Log file saving method and device
Technical Field
The present application relates to the field of data processing, and in particular, to a method and an apparatus for saving a log file.
Background
The Java web application has a very famous application server-tomcat, the Java web application can run in the tomcat to provide services to the outside, and it is very valuable that each user accesses the log of the tomcat, and it is necessary to record and store the log. Tomcat has provided two ways to store these access logs, one is the file-based log approach accesslogvalue, and the other is JDBC accesslogvalue provided in Tomcat version 7.0.
If the log is saved by using a file-based log mode, accesslogvalue, the original log file needs to be converted, and the log files are written locally and are limited by the capacity of the disk.
To avoid the problem of being limited by the size of the disk capacity, jdbcaccesslogvalue based JDBC can be used to keep logs, but this approach has the following disadvantages: (1) the method comprises the steps of (1) setting a JDBC access LogValve in a JDBC access LogValve, wherein the JDBC access LogValve is not taken from a connection pool, resources are consumed when the connection is established, (2) the JDBC access LogValve does not adopt batch insertion, (3) the JDBC access LogValve adopts synchronous insertion, and (4) when the data volume is large, the database partitioning and the table partitioning are troublesome, so that the log file of a server cannot be stored quickly and efficiently.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides a method and a device for saving log files, which are used for at least solving the technical problem that the log files of a server cannot be saved quickly and efficiently.
According to an aspect of an embodiment of the present application, there is provided a log file saving method, including: acquiring a log file recorded by a server; creating an object file in a distributed database, wherein the object file is used for storing the log file; acquiring data transmission resources between the server and the distributed database; and saving the log file to the target file of the distributed database by using the data transmission resource.
Further, before saving the log file to the target file of the distributed database using the data transfer resource, the method further comprises: storing the log file to a buffer of the server; judging whether the buffer amount of the buffer area reaches a preset value; saving the log file to the target file of the distributed database by using the data transmission resource comprises: and under the condition that the buffering amount of the buffer area is judged to reach the preset value, the log file stored in the buffer area is saved to the target file of the distributed database through the data transmission resource.
Further, before acquiring the data transmission resource between the server and the distributed database, the method further includes: establishing a connection pool between the server and the distributed database, wherein the connection pool comprises a plurality of connections, and the data transmission resource is the connection; when the buffering amount of the buffer area is judged to reach the preset value, the saving of the log file stored in the buffer area to the target file of the distributed database through the data transmission resource comprises: under the condition that the buffering amount of the buffer area is judged to reach the preset value, acquiring a plurality of target connections from the connection pool, wherein the target connections are idle connections in the connection pool; and saving the log file to the target file of the distributed database by using the plurality of target connections.
Further, there are a plurality of servers, and acquiring the log file recorded by the server includes: acquiring a log file recorded by a server Sj, wherein j sequentially takes 1 to n, n is the number of the servers, and creating a target file in a distributed database comprises the following steps: creating sub-target files D1 to Dn in the distributed database, wherein the sub-target files D1 to Dn constitute the target files, and establishing a connection pool between the server and the distributed database comprises: establishing a connection pool Pj between the server Sj and the distributed database, and saving the log file to the target file of the distributed database by using the plurality of target connections comprises the following steps: and saving the log file recorded by the server Sj into a sub-target file Dj of the distributed database by using the target connection acquired from the connection pool Pj.
Further, saving the log file to the target file of the distributed database using the data transmission resource further includes: and storing the plurality of pieces of data in the log file in batches to the target file of the distributed database in an asynchronous transmission mode by using the data transmission resource.
According to another aspect of the embodiments of the present application, there is also provided a log file saving apparatus, including: the first acquisition unit is used for acquiring a log file recorded by the server; the creating unit is used for creating an object file in a distributed database, wherein the object file is used for storing the log file; a second obtaining unit, configured to obtain a data transmission resource between the server and the distributed database; and the saving unit is used for saving the log file into the target file of the distributed database by using the data transmission resource.
Further, the apparatus further comprises: a storage unit, configured to store the log file in a buffer of the server before the saving unit saves the log file in the target file of the distributed database using the data transmission resource; the judging unit is used for judging whether the buffering amount of the buffer area reaches a preset value or not; the saving unit includes: and the first saving subunit is configured to, when the determining unit determines that the buffering amount of the buffer reaches the preset value, save the log file stored in the buffer to the target file of the distributed database through the data transmission resource.
Further, the apparatus further comprises: the establishing unit is used for establishing a connection pool between the server and the distributed database before the second acquiring unit acquires the data transmission resources between the server and the distributed database, wherein the connection pool comprises a plurality of connections, and the data transmission resources are the connections; the first retention subunit includes: an obtaining module, configured to obtain a plurality of target connections from the connection pool when the determining unit determines that the buffer amount of the buffer reaches the preset value, where the target connections are idle connections in the connection pool; and the storage module is used for storing the log file into the target file of the distributed database by utilizing the plurality of target connections acquired by the acquisition module.
Further, the number of the servers is plural, and the first obtaining unit includes: a first obtaining subunit, configured to obtain a log file recorded by a server Sj, where j sequentially takes 1 to n, and n is the number of the servers, where the creating unit includes: a creating subunit, configured to create a sub-target file D1 through a sub-target file Dn in the distributed database, where the sub-target file D1 through the sub-target file Dn constitute the target file, and the creating unit includes: a building subunit, configured to build a connection pool Pj between the server Sj and the distributed database, where the storage module includes: and the saving sub-module is used for saving the log file recorded by the server Sj into the sub-target file Dj of the distributed database by using the target connection acquired from the connection pool Pj by the acquisition module.
Further, the saving unit further includes: and the second saving subunit is configured to save, in a batch manner in an asynchronous transmission manner, the plurality of pieces of data in the log file to the target file of the distributed database by using the data transmission resource.
In the embodiment of the application, a target file is created in the distributed database by acquiring the log file recorded by the server, wherein the target file is used for storing the log file, acquiring the data transmission resource between the server and the distributed database, and storing the log file into the target file of the distributed database by using the data transmission resource. The method comprises the steps of acquiring a log file recorded by a server, creating a target file for storing the log file in a distributed database, acquiring data transmission resources between the server and the distributed database, and storing the log file into the target file of the distributed database by using the data transmission resources.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a log file saving method according to an embodiment of the present application; and
fig. 2 is a schematic diagram of a log file saving device according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, technical terms related to the embodiments of the present application are explained as follows:
tomcat: tomcat is a free open-source Web application server, belongs to a lightweight server, and is generally applied to small and medium-sized systems.
HBase: the HBase is a distributed storage system which is high in reliability, good in performance, column-oriented and telescopic.
According to an embodiment of the present application, there is provided an embodiment of a log file saving method, it should be noted that the steps shown in the flowchart of the drawings may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that here.
Fig. 1 is a flowchart of a log file saving method according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S102, obtaining the log file recorded by the server. The log file recorded by the server may be an access log of the user or an operation log of the server.
Step S104, creating an object file in the distributed database, wherein the object file is used for storing the log file. The distributed database may be HBase, Cassandra, Hyper Table, etc. Under the condition that the data size of the files needing to be stored is very large, database division and table division are very troublesome for the database, and the distributed database such as HBase is used, so that the database division and table division are avoided, and the data storage efficiency is improved. The target file may be a table.
And step S106, acquiring data transmission resources between the server and the distributed database.
And step S108, saving the log file to a target file of the distributed database by using the data transmission resource.
HBase is a distributed, column-oriented database, which is the biggest difference from a general relational database: HBase is well suited to storing unstructured data, as well as it is a column-based rather than row-based schema.
Rowkey is a binary code stream with a maximum length of 64KB and the content can be customized by the user using it. Data loading is also typically performed according to the Rowkey binary order from small to large.
HBase is retrieved according to Rowkey, and the system obtains data by finding the Region where a certain Rowkey (or a certain Rowkey range) is located and then routing the request for querying data to the Region. HBase retrieval supports 3 approaches:
(1) accessing through a single Rowkey, namely performing get operation according to a certain Rowkey value, so as to obtain a unique record;
(2) scanning is carried out through the range of Rowkey, namely scanning is carried out in the range by setting startRowKey (start row key) and endRowKey (end row key), so that a batch of records can be obtained according to a specified condition;
(3) full table scanning, that is, directly scanning all row records in the whole table.
The efficiency of HBase retrieval by a single Rowkey is high, takes less than 1 millisecond, and can obtain 1000 and 2000 records per second.
By skillfully designing Rowkey, the acquired data in the file are close to each other (which should be under the same Region), and good performance can be obtained when traversing the result.
The method comprises the steps of acquiring a log file recorded by a server, creating a target file for storing the log file in a distributed database, acquiring data transmission resources between the server and the distributed database, and storing the log file into the target file of the distributed database by using the data transmission resources.
Optionally, before saving the log file to the target file of the distributed database by using the data transmission resource, the method for saving the log file provided in the embodiment of the present application further includes: storing the log file to a buffer area of a server; judging whether the buffer amount of the buffer zone reaches a preset value; saving the log file to a target file of a distributed database by using data transmission resources comprises the following steps: and under the condition that the buffering amount of the buffer area is judged to reach a preset value, saving the log file stored in the buffer area to a target file of the distributed database through the data transmission resource.
Before the log file is stored in the target file of the distributed database by using the data transmission resource, the log file is stored in a buffer area of a server, and after the buffer amount of the buffer area reaches a preset value, the log file is stored in the target file of the distributed database by using the data transmission resource. The preset value can be preset, and when the maximum buffer amount of the buffer area is M units, the preset value can be set to be 0.5M, 0.6M, 0.7M or 0.8M, etc. The preset value is also a threshold value, namely when the buffer amount of the buffer zone reaches the threshold value, the log file in the buffer zone is stored in the distributed database.
For example, the preset value is set to be 60% of the maximum buffer amount, the log file is stored in the buffer area, and after the buffer amount of the buffer area reaches 60% of the maximum buffer amount, the log file is stored in the target file of the distributed database by using the data transmission resource.
Since the speed of the CPU is fast, and the speed of the I/O (input/output port, english is called input/output) device is slow, a "bottleneck" phenomenon is easily generated due to insufficient channels. By using the buffer area, the problem of unmatched speed between the CPU and the I/O equipment can be solved, and the interruption frequency of the I/O equipment to the CPU is reduced, so that the working efficiency of the CPU is improved, the performance of the server is improved, a user can access the server more efficiently, and the user experience is improved.
The threshold value is set for the buffering amount of the buffer area, the log files stored in the buffer area can be transmitted to the distributed database in time when the buffering amount reaches the threshold value, and the data in the log files cannot be stored and lost after the buffering amount of the buffer area reaches the maximum value is avoided.
Optionally, before acquiring data transmission resources between the server and the distributed database, the method for saving the log file provided in the embodiment of the present application further includes: establishing a connection pool between a server and a distributed database, wherein the connection pool comprises a plurality of connections, and data transmission resources are connections; under the condition that the buffering amount of the buffer area is judged to reach a preset value, the log file stored in the buffer area is stored into a target file of the distributed database through a data transmission resource, and the method comprises the following steps: under the condition that the buffering amount of the buffer area is judged to reach a preset value, a plurality of target connections are obtained from the connection pool, wherein the target connections are idle connections in the connection pool; and saving the log file into a target file of the distributed database by using a plurality of target connections.
And establishing a connection pool between the server and the distributed database, wherein the connection pool comprises a plurality of connections, and the data transmission resources are the connections. And when the buffer amount of the buffer zone reaches a preset value, acquiring idle connection from the connection pool, wherein the acquired idle connection is target connection, and storing the log file into a target file of the distributed database by utilizing a plurality of target connections.
Connection is a key, limited and expensive resource, and management of connection can significantly affect the scalability and robustness of the entire application, and affect the performance index of the program. The connection pool is responsible for allocating, managing and releasing connections, and allows an application to reuse one existing database connection instead of building one, thereby avoiding resource waste and improving the performance of the application.
Optionally, there are a plurality of servers, and acquiring the log file recorded by the server includes: acquiring a log file recorded by a server Sj, wherein j sequentially takes 1 to n, n is the number of the servers, and creating a target file in a distributed database comprises the following steps: creating sub-target files D1 to Dn in the distributed database, wherein the sub-target files D1 to the sub-target files Dn form target files, and the creating of a connection pool between a server and the distributed database comprises the following steps: establishing a connection pool Pj between a server Sj and a distributed database, and storing a log file into a target file of the distributed database by using a plurality of target connections, wherein the log file comprises the following steps: and saving the log file recorded by the server Sj into a sub-target file Dj of the distributed database by using the target connection acquired from the connection pool Pj.
The method for saving the log file provided by the embodiment of the application can save the log files recorded by different servers at the same time, and is specifically described as follows. The servers S1 to Sn are n different servers, and log files recorded by the n servers need to be saved in a distributed database. N sub-object files are established under the object files in the distributed database, wherein the n sub-object files are the sub-object files D1 to Dn respectively. The sub-target file D1 is used to store the log file of the server S1, the sub-target file D2 is used to store the log file of the server S2, the sub-target file D3 is used to store the log file of the server S3, and so on, the sub-target file Dn is used to store the log file of the server Sn.
A connection pool is established between each server and the distributed database, and n connection pools are established in total, wherein the n connection pools are a connection pool P1, connection pools P2 and … … and a connection pool Pn respectively. The connection pool P1 is a connection pool between the server S1 and the distributed database, the connection pool P2 is a connection pool between the server S2 and the distributed database, the connection pool P3 is a connection pool between the server S3 and the distributed database, and so on, and the connection pool Pn is a connection pool between the server Sn and the distributed database.
Each of the n connection pools comprises at least one connection. The connection pool Pi contains m (i) connections, which are respectively a connection C1-iConnection Ci-2Connection Ci-3.i-m(i)
For example, assuming that there are 3 different servers, n is 3, the connection pool P1 is a connection pool between the server S1 and the distributed database, the connection pool P1 includes 6 connections, i.e., m (1) ═ 6, and the 6 connections are respectively connections C1-1Connection C1-2Connection C1-3Connection C1-4Connection C1-5And connection C1-6
The connection pool P2 is a connection pool between the server S2 and the distributed database, and the connection pool P2 includes 3 connections, i.e., m (2) ═ 3, where the 3 connections are respectively the connection C2-1Connection C2-2And connection C2-3
The connection pool P3 is a connection pool between the server S3 and the distributed database, and the connection pool P3 includes 4 connections, i.e., m (3) ═ 4, where the 4 connections are respectively connection C3-1Connection C3-2Connection C3-3And connection C3-4
The 6 connections in the connection pool P1 all contain the identification information of the server S1, the 3 connections in the connection pool P2 all contain the identification information of the server S2, the 4 connections in the connection pool P3 all contain the identification information of the server S3, it can be determined which server is connected to the distributed database according to the identification information contained in the connections, and then the log files recorded by the server are stored in the corresponding sub-target files in the distributed database, that is, the log files recorded by the server S1 are stored in the sub-target file D1, the log files recorded by the server S2 are stored in the sub-target file D2, and the log files recorded by the server S3 are stored in the sub-target file D3.
The log files recorded by different servers are stored into corresponding sub-target files, so that the log files are clearly and orderly stored, and the log files are very convenient to query later. For example, when a log file of a certain server needs to be queried, only the sub-target file corresponding to the server needs to be queried, and the whole target file does not need to be queried, so that the query speed can be increased, and the query efficiency can be improved.
Optionally, saving the log file to a target file of the distributed database by using a data transmission resource includes: and storing a plurality of pieces of data in the log file into a target file of the distributed database in a batch manner in an asynchronous transmission mode by utilizing data transmission resources.
The asynchronous batch processing is carried out on a plurality of pieces of data in the log file, so that the data processing speed can be greatly increased, and the log file can be more efficiently stored in a distributed database. For example, the batch size BatchSize can be set to 2000, i.e., 2000 pieces of data are processed per data interaction.
When establishing a connection between a server and a distributed database, there are two implementation methods:
(1) one method is public void invoke (Request, Response), which is responsible for forwarding the Request; (2) another method comprises the following steps: public void log (Request, responsense, long time).
And (3) breaking the realized HbaseaccessLogValve into jar packages and placing the jar packages into lib under a tomcat directory, and placing the jar packages of the Hbase client and jar packages dependent on the Hbase client into lib under the tomcat directory.
Adding the following configuration < Valve className ═ com.gridsum. tomcat.valves. hbaseaccesslogvalve "zkqualrum ═ 192.168.1.100" zkClientPort ═ 2181"maxConnections ═ 10" to server.
batchSize="1000"tableName="access_log"columeFamily="logInfo"/>
Wherein the content of the first and second substances,
"10" indicates that the maximum connection number is 10.
batch size ═ 1000 "indicates a batch size of 1000, i.e., 1000 pieces of data were processed per data interaction.
the tableName "access _ log" indicates that the table name of the table created in the HBase is "access _ log".
column family "logInfo" indicates that the name of the column family of the table created in HBase is "logInfo".
According to the embodiment of the invention, the invention further provides a log file storage device. The log file storage device may execute the log file storage method, and the log file storage method may be implemented by the log file storage device.
Fig. 2 is a schematic diagram of a log file saving device according to an embodiment of the present application. As shown in fig. 2, the apparatus includes a first acquisition unit 22, a creation unit 24, a second acquisition unit 26, and a saving unit 28.
The first obtaining unit 22 is configured to obtain a log file recorded by a server. The log file recorded by the server may be an access log of the user or an operation log of the server.
The creating unit 24 is configured to create an object file in the distributed database, wherein the object file is used for storing the log file. The distributed database may be HBase, Cassandra, Hyper Table, etc. Under the condition that the data size of the files needing to be stored is very large, database division and table division are very troublesome for the database, and the distributed database such as HBase is used, so that the database division and table division are avoided, and the data storage efficiency is improved. The target file may be a table.
The second obtaining unit 26 is configured to obtain data transmission resources between the server and the distributed database.
The saving unit 28 is configured to save the log file into a target file of the distributed database by using the data transmission resource.
HBase is a distributed, column-oriented database, which is the biggest difference from a general relational database: HBase is well suited to storing unstructured data, as well as it is a column-based rather than row-based schema.
Rowkey is a binary code stream with a maximum length of 64KB and the content can be customized by the user using it. Data loading is also typically performed according to the Rowkey binary order from small to large.
HBase is retrieved according to Rowkey, and the system obtains data by finding the Region where a certain Rowkey (or a certain Rowkey range) is located and then routing the request for querying data to the Region. HBase retrieval supports 3 approaches:
(1) accessing through a single Rowkey, namely performing get operation according to a certain Rowkey value, so as to obtain a unique record;
(2) scanning is carried out through the range of Rowkey, namely scanning is carried out in the range by setting startRowKey (start row key) and endRowKey (end row key), so that a batch of records can be obtained according to a specified condition;
(3) full table scanning, that is, directly scanning all row records in the whole table.
The efficiency of HBase retrieval by a single Rowkey is high, takes less than 1 millisecond, and can obtain 1000 and 2000 records per second.
By skillfully designing Rowkey, the acquired data in the file are close to each other (which should be under the same Region), and good performance can be obtained when traversing the result.
The method comprises the steps of acquiring a log file recorded by a server, creating a target file for storing the log file in a distributed database, acquiring data transmission resources between the server and the distributed database, and storing the log file into the target file of the distributed database by using the data transmission resources.
Optionally, the apparatus for saving a log file provided in the embodiment of the present application further includes a storage unit and a determination unit. The storage unit is used for storing the log file into a buffer area of the server before the storage unit stores the log file into a target file of the distributed database by using the data transmission resource. The judging unit is used for judging whether the buffering amount of the buffer area reaches a preset value. The holding unit includes a first holding subunit. The first saving subunit is configured to, when the determining unit determines that the buffering amount of the buffer reaches the preset value, save the log file stored in the buffer to a target file of the distributed database through the data transmission resource.
Before the log file is stored in the target file of the distributed database by using the data transmission resource, the log file is stored in a buffer area of a server, and after the buffer amount of the buffer area reaches a preset value, the log file is stored in the target file of the distributed database by using the data transmission resource. The preset value can be preset, and when the maximum buffer amount of the buffer area is M units, the preset value can be set to be 0.5M, 0.6M, 0.7M or 0.8M, etc. The preset value is also a threshold value, namely when the buffer amount of the buffer zone reaches the threshold value, the log file in the buffer zone is stored in the distributed database.
For example, the preset value is set to be 60% of the maximum buffer amount, the log file is stored in the buffer area, and after the buffer amount of the buffer area reaches 60% of the maximum buffer amount, the log file is stored in the target file of the distributed database by using the data transmission resource.
Since the speed of the CPU is fast, and the speed of the I/O (input/output port, english is called input/output) device is slow, a "bottleneck" phenomenon is easily generated due to insufficient channels. By using the buffer area, the problem of unmatched speed between the CPU and the I/O equipment can be solved, and the interruption frequency of the I/O equipment to the CPU is reduced, so that the working efficiency of the CPU is improved, the performance of the server is improved, a user can access the server more efficiently, and the user experience is improved.
The threshold value is set for the buffering amount of the buffer area, the log files stored in the buffer area can be transmitted to the distributed database in time when the buffering amount reaches the threshold value, and the data in the log files cannot be stored and lost after the buffering amount of the buffer area reaches the maximum value is avoided.
Optionally, the apparatus for saving a log file provided in the embodiment of the present application further includes an establishing unit. The establishing unit is used for establishing a connection pool between the server and the distributed database before the second acquiring unit acquires the data transmission resources between the server and the distributed database, wherein the connection pool comprises a plurality of connections, and the data transmission resources are connections. The first saving subunit comprises an acquisition module and a saving module. The obtaining module is used for obtaining a plurality of target connections from the connection pool when the judging unit judges that the buffering amount of the buffer zone reaches the preset value, wherein the target connections are idle connections in the connection pool. The storage module is used for storing the log file into the target file of the distributed database by utilizing the plurality of target connections acquired by the acquisition module.
And establishing a connection pool between the server and the distributed database, wherein the connection pool comprises a plurality of connections, and the data transmission resources are the connections. And when the buffer amount of the buffer zone reaches a preset value, acquiring idle connection from the connection pool, wherein the acquired idle connection is target connection, and storing the log file into a target file of the distributed database by utilizing a plurality of target connections.
Connection is a key, limited and expensive resource, and management of connection can significantly affect the scalability and robustness of the entire application, and affect the performance index of the program. The connection pool is responsible for allocating, managing and releasing connections, and allows an application to reuse one existing database connection instead of building one, thereby avoiding resource waste and improving the performance of the application.
Optionally, there are multiple servers. The first acquisition unit includes a first acquisition subunit. The first obtaining subunit is configured to obtain a log file recorded by the server Sj, where j sequentially takes 1 to n, and n is the number of servers. The creating unit includes a creating sub-unit. The creating subunit is configured to create sub-target files D1 to Dn in the distributed database, where the sub-target files D1 to Dn constitute target files. The establishing unit comprises an establishing subunit. The establishing subunit is configured to establish a connection pool Pj between the server Sj and the distributed database. The saving module comprises a saving submodule. The saving sub-module is used for saving the log file recorded by the server Sj into the sub-target file Dj of the distributed database by using the target connection obtained from the connection pool Pj by the obtaining module.
The log file saving device provided by the embodiment of the application can save log files recorded by different servers at the same time, and is specifically described as follows. The servers S1 to Sn are n different servers, and log files recorded by the n servers need to be saved in a distributed database. N sub-object files are established under the object files in the distributed database, wherein the n sub-object files are the sub-object files D1 to Dn respectively. The sub-target file D1 is used to store the log file of the server S1, the sub-target file D2 is used to store the log file of the server S2, the sub-target file D3 is used to store the log file of the server S3, and so on, the sub-target file Dn is used to store the log file of the server Sn.
A connection pool is established between each server and the distributed database, and n connection pools are established in total, wherein the n connection pools are a connection pool P1, connection pools P2 and … … and a connection pool Pn respectively. The connection pool P1 is a connection pool between the server S1 and the distributed database, the connection pool P2 is a connection pool between the server S2 and the distributed database, the connection pool P3 is a connection pool between the server S3 and the distributed database, and so on, and the connection pool Pn is a connection pool between the server Sn and the distributed database.
Each of the n connection pools comprises at least one connection. The connection pool Pi contains m (i) connections, which are respectively a connection C1-iConnection Ci-2Connection Ci-3.i-m(i)
For example, assuming that there are 3 different servers, n is 3, the connection pool P1 is a connection pool between the server S1 and the distributed database, the connection pool P1 includes 6 connections, i.e., m (1) ═ 6, and the 6 connections are respectively connections C1-1And is connected toC1-2Connection C1-3Connection C1-4Connection C1-5And connection C1-6
The connection pool P2 is a connection pool between the server S2 and the distributed database, and the connection pool P2 includes 3 connections, i.e., m (2) ═ 3, where the 3 connections are respectively the connection C2-1Connection C2-2And connection C2-3
The connection pool P3 is a connection pool between the server S3 and the distributed database, and the connection pool P3 includes 4 connections, i.e., m (3) ═ 4, where the 4 connections are respectively connection C3-1Connection C3-2Connection C3-3And connection C3-4
The 6 connections in the connection pool P1 all contain the identification information of the server S1, the 3 connections in the connection pool P2 all contain the identification information of the server S2, the 4 connections in the connection pool P3 all contain the identification information of the server S3, it can be determined which server is connected to the distributed database according to the identification information contained in the connections, and then the log files recorded by the server are stored in the corresponding sub-target files in the distributed database, that is, the log files recorded by the server S1 are stored in the sub-target file D1, the log files recorded by the server S2 are stored in the sub-target file D2, and the log files recorded by the server S3 are stored in the sub-target file D3.
The log files recorded by different servers are stored into corresponding sub-target files, so that the log files are clearly and orderly stored, and the log files are very convenient to query later. For example, when a log file of a certain server needs to be queried, only the sub-target file corresponding to the server needs to be queried, and the whole target file does not need to be queried, so that the query speed can be increased, and the query efficiency can be improved.
Optionally, the holding unit further comprises a second holding subunit. The second saving subunit is configured to save, in a batch manner in an asynchronous transmission manner, the plurality of pieces of data in the log file to a target file of the distributed database by using the data transmission resource.
The asynchronous batch processing is carried out on a plurality of pieces of data in the log file, so that the data processing speed can be greatly increased, and the log file can be more efficiently stored in a distributed database. For example, the batch size BatchSize can be set to 2000, i.e., 2000 pieces of data are processed per data interaction.
When establishing a connection between a server and a distributed database, there are two implementation methods:
(1) one method is public void invoke (Request, Response), which is responsible for forwarding the Request; (2) another method comprises the following steps: public void log (Request, responsense, long time).
And (3) breaking the realized HbaseaccessLogValve into jar packages and placing the jar packages into lib under a tomcat directory, and placing the jar packages of the Hbase client and jar packages dependent on the Hbase client into lib under the tomcat directory.
Adding the following configuration < Valve className ═ com.gridsum. tomcat.valves. hbaseaccesslogvalve "zkqualrum ═ 192.168.1.100" zkClientPort ═ 2181"maxConnections ═ 10" to server.
batchSize="1000"tableName="access_log"columeFamily="logInfo"/>
Wherein the content of the first and second substances,
"10" indicates that the maximum connection number is 10.
batch size ═ 1000 "indicates a batch size of 1000, i.e., 1000 pieces of data were processed per data interaction.
the tableName "access _ log" indicates that the table name of the table created in the HBase is "access _ log".
column family "logInfo" indicates that the name of the column family of the table created in HBase is "logInfo".
The log file saving device comprises a processor and a memory, wherein the first acquiring unit 22, the creating unit 24, the second acquiring unit 26, the saving unit 28 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can set one or more, and the log file is saved by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
The present application further provides a computer program product adapted to perform program code for initializing the following method steps when executed on a data processing device: acquiring a log file recorded by a server; creating a target file in a distributed database, wherein the target file is used for storing a log file; acquiring data transmission resources between a server and a distributed database; and saving the log file to a target file of the distributed database by using the data transmission resource.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (8)

1. A log file saving method is characterized by comprising the following steps:
acquiring a log file recorded by a server;
creating an object file in a distributed database, wherein the object file is used for storing the log file;
acquiring data transmission resources between the server and the distributed database; and
saving the log file to the target file of the distributed database by using the data transmission resource;
wherein, the number of the servers is a plurality,
the obtaining of the log file recorded by the server includes: acquiring a log file recorded by a server Sj, wherein j is 1 to n in sequence, and n is the number of the servers;
creating a target file in a distributed database includes: creating sub-target files D1 to Dn in the distributed database, wherein the sub-target files D1 to Dn form the target files;
before acquiring data transmission resources between the server and the distributed database, the method further comprises: establishing a connection pool Pj between the server Sj and the distributed database, wherein the connection pool comprises a plurality of connections, and the data transmission resource is the connection;
saving the log file to the target file of the distributed database comprises: and saving the log file recorded by the server Sj into a sub-target file Dj of the distributed database by using the target connection acquired from the connection pool Pj, wherein the target connection is an idle connection in the connection pool.
2. The method of claim 1, wherein prior to saving the log file to the target file of the distributed database using the data transfer resource, the method further comprises:
storing the log file to a buffer of the server;
judging whether the buffer amount of the buffer area reaches a preset value;
saving the log file to the target file of the distributed database by using the data transmission resource comprises:
and under the condition that the buffering amount of the buffer area is judged to reach the preset value, the log file stored in the buffer area is saved to the target file of the distributed database through the data transmission resource.
3. The method of claim 2, wherein prior to obtaining data transmission resources between the server and the distributed database, the method further comprises:
establishing a connection pool between the server and the distributed database, wherein the connection pool comprises a plurality of connections, and the data transmission resource is the connection;
when the buffering amount of the buffer area is judged to reach the preset value, the saving of the log file stored in the buffer area to the target file of the distributed database through the data transmission resource comprises:
under the condition that the buffering amount of the buffer area is judged to reach the preset value, acquiring a plurality of target connections from the connection pool, wherein the target connections are idle connections in the connection pool;
and saving the log file to the target file of the distributed database by using the plurality of target connections.
4. The method of claim 1, wherein saving the log file to the target file of the distributed database using the data transfer resource further comprises:
and storing the plurality of pieces of data in the log file in batches to the target file of the distributed database in an asynchronous transmission mode by using the data transmission resource.
5. An apparatus for saving a log file, comprising:
the first acquisition unit is used for acquiring a log file recorded by the server;
the creating unit is used for creating an object file in a distributed database, wherein the object file is used for storing the log file;
a second obtaining unit, configured to obtain a data transmission resource between the server and the distributed database; and
the saving unit is used for saving the log file into the target file of the distributed database by using the data transmission resource;
wherein, the number of the servers is a plurality,
the first acquisition unit includes: a first obtaining subunit, configured to obtain a log file recorded by a server Sj, where j sequentially takes 1 to n, n is the number of servers,
the creating unit includes: a creating subunit, configured to create sub-destination files D1 through Dn in the distributed database, where the sub-destination files D1 through Dn constitute the target files,
the device further comprises: a building unit, configured to, before the second obtaining unit obtains the data transmission resource between the server and the distributed database, the building unit includes: the establishing subunit is configured to establish a connection pool Pj between the server Sj and the distributed database, where the connection pool includes multiple connections, and the data transmission resource is the connection;
the saving unit is further configured to save the log file recorded by the server Sj into the sub-target file Dj of the distributed database by using the target connection obtained from the connection pool Pj by the obtaining module.
6. The apparatus of claim 5, further comprising:
a storage unit, configured to store the log file in a buffer of the server before the saving unit saves the log file in the target file of the distributed database using the data transmission resource;
the judging unit is used for judging whether the buffering amount of the buffer area reaches a preset value or not;
the saving unit includes:
and the first saving subunit is configured to, when the determining unit determines that the buffering amount of the buffer reaches the preset value, save the log file stored in the buffer to the target file of the distributed database through the data transmission resource.
7. The apparatus of claim 6, wherein the first retention subunit comprises:
an obtaining module, configured to obtain a plurality of target connections from the connection pool when the determining unit determines that the buffer amount of the buffer reaches the preset value, where the target connections are idle connections in the connection pool;
and the storage module is used for storing the log file into the target file of the distributed database by utilizing the plurality of target connections acquired by the acquisition module.
8. The apparatus of claim 5, wherein the saving unit further comprises:
and the second saving subunit is configured to save, in a batch manner in an asynchronous transmission manner, the plurality of pieces of data in the log file to the target file of the distributed database by using the data transmission resource.
CN201510812631.8A 2015-11-20 2015-11-20 Log file saving method and device Active CN106776617B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510812631.8A CN106776617B (en) 2015-11-20 2015-11-20 Log file saving method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510812631.8A CN106776617B (en) 2015-11-20 2015-11-20 Log file saving method and device

Publications (2)

Publication Number Publication Date
CN106776617A CN106776617A (en) 2017-05-31
CN106776617B true CN106776617B (en) 2020-11-06

Family

ID=58886030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510812631.8A Active CN106776617B (en) 2015-11-20 2015-11-20 Log file saving method and device

Country Status (1)

Country Link
CN (1) CN106776617B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107515813B (en) * 2017-09-07 2021-04-09 杭州安恒信息技术股份有限公司 Distributed modular log processing method, device and system
CN109492045A (en) * 2018-11-22 2019-03-19 郑州云海信息技术有限公司 A kind of log information processing method and system
CN110147411A (en) * 2019-05-20 2019-08-20 平安科技(深圳)有限公司 Method of data synchronization, device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101808121A (en) * 2010-02-24 2010-08-18 深圳市五巨科技有限公司 Method and device for writing server log of mobile terminal into database
WO2010107626A3 (en) * 2009-03-16 2011-01-13 Microsoft Corporation Flexible logging, such as for a web server
CN104899278A (en) * 2015-05-29 2015-09-09 北京京东尚科信息技术有限公司 Method and apparatus for generating data operation logs of Hbase database

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101163265B (en) * 2007-11-20 2010-08-18 中兴通讯股份有限公司 Distributed database based on multimedia message log inquiring method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010107626A3 (en) * 2009-03-16 2011-01-13 Microsoft Corporation Flexible logging, such as for a web server
CN101808121A (en) * 2010-02-24 2010-08-18 深圳市五巨科技有限公司 Method and device for writing server log of mobile terminal into database
CN104899278A (en) * 2015-05-29 2015-09-09 北京京东尚科信息技术有限公司 Method and apparatus for generating data operation logs of Hbase database

Also Published As

Publication number Publication date
CN106776617A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
US9922107B2 (en) System and method for correlating cloud-based big data in real-time for intelligent analytics and multiple end uses
US10013440B1 (en) Incremental out-of-place updates for index structures
TWI662426B (en) Method and device for distributed stream data processing
EP2637111B1 (en) Data management system and method using database middleware
US11762813B2 (en) Quality score compression apparatus and method for improving downstream accuracy
CN107704202B (en) Method and device for quickly reading and writing data
JP2014523024A (en) Incremental data extraction
CN106161633B (en) Transmission method and system for packed files based on cloud computing environment
US10860604B1 (en) Scalable tracking for database udpates according to a secondary index
CN102906751A (en) Method and device for data storage and data query
CN105468642A (en) Data storage method and apparatus
US9229961B2 (en) Database management delete efficiency
CN102436513A (en) Distributed search method and system
CN101442731A (en) Method and apparatus for removing call ticket repeat
WO2018036549A1 (en) Distributed database query method and device, and management system
CN106503008B (en) File storage method and device and file query method and device
CN106776617B (en) Log file saving method and device
CN105117433A (en) Method and system for statistically querying HBase based on analysis performed by Hive on HFile
US10262024B1 (en) Providing consistent access to data objects transcending storage limitations in a non-relational data store
WO2017174013A1 (en) Data storage management method and apparatus, and data storage system
US20210240663A1 (en) High density time-series data indexing and compression
Siddiqa et al. SmallClient for big data: an indexing framework towards fast data retrieval
US9767107B1 (en) Parallel file system with metadata distributed across partitioned key-value store
KR20140048396A (en) System and method for searching file in cloud storage service, and method for controlling file therein
CN111026709A (en) Data processing method and device based on cluster access

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Applicant after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing

Applicant before: Beijing Guoshuang Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant