CN113485962A

CN113485962A - Log file storage method, device, equipment and storage medium

Info

Publication number: CN113485962A
Application number: CN202110741313.2A
Authority: CN
Inventors: 周凯洋; 张成思; 刘叶; 周子站; 王雪飞
Original assignee: China Travelsky Technology Co Ltd
Current assignee: China Travelsky Technology Co Ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2021-10-08
Anticipated expiration: 2041-06-30
Also published as: WO2023273544A1; CN113485962B

Abstract

The application provides a storage method, a device, equipment and a storage medium of a log file, wherein the method comprises the steps of collecting log data from a target log file; processing the log data by a log analysis component; judging whether the current index is in an unavailable state due to the fact that the data size of the index fragment is too large, wherein the current index refers to an index which is used for storing a target log file at present; and if the current index is in an unavailable state, creating a new index by using the name and the acquisition time of the log data, and storing the log data in the new index. According to the scheme, the size of the index fragment of the current index is automatically identified in the storage process, and when the index fragment is large, a new index is automatically established to store the acquired log data, so that the occurrence of an oversized index fragment is avoided.

Description

Log file storage method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for storing a log file.

Background

An elastic search (hereinafter, abbreviated as ES) is an existing open-source distributed search and data analysis engine, which creates a plurality of indexes (indexes) when storing a file and then saves the file in the indexes. One index of the ES is equivalent to a database, one index can be divided into a plurality of index fragments, each index fragment is used for storing a certain amount of files, and the index fragments are respectively stored in a plurality of computer nodes of the distributed system.

The Index Template (elastic search Index Template) of the ES provides a multiplexing mechanism, which can automatically create the Index, but in actual use, because the Index is automatically created by adopting the ES Index Template, each log file can be stored by only one Index, and the number of Index fragments contained in the Index needs to be specified by a user in advance. However, in an actual service scenario, the data volume of the log file is difficult to accurately estimate, and it may happen that one log file is too large, which results in that the data volume stored in each index fragment of the corresponding index is too large (exceeding the data volume range of 20GB to 40GB suggested by the ES authority), that is, an oversized fragment occurs, and the occurrence of the oversized fragment will cause adverse effects on the stability and query performance of the ES system.

Disclosure of Invention

Based on the defects of the prior art, the application provides a storage method, device, equipment and storage medium of a log file, so as to solve the problem that the data volume of index shards in an Elasticsearch system is too large.

A first aspect of the present application provides a method for storing a log file, including:

collecting log data from a target log file;

processing the log data by a log analysis component;

judging whether the current index is in an available state or an unavailable state; wherein the current index refers to an index currently used for storing the target log file; when the data volume of the index fragment of the current index is not within a preset data volume range, the current index is in an unavailable state;

and if the current index is in an unavailable state, creating a new index by using the acquisition items and the acquisition time of the log data, and storing the log data in the new index.

A second aspect of the present application provides a storage apparatus for a log file, including:

the acquisition unit is used for acquiring log data from the target log file;

a processing unit for processing the log data by a log analysis component;

the judging unit is used for judging whether the current index is in an available state or an unavailable state; wherein the current index refers to an index currently used for storing the target log file; when the data volume of the index fragment of the current index is not within a preset data volume range, the current index is in an unavailable state;

the creating unit is used for creating a new index by using the acquisition items and the acquisition time of the log data if the current index is in an unavailable state;

and the storage unit is used for storing the log data in the new index.

A third aspect of the present application provides a computer storage medium for storing a computer program, where the computer program is specifically configured to implement the storage method of a log file provided in any one of the first aspects of the present application when executed.

A fourth aspect of the present application provides an electronic device comprising a memory and a processor;

wherein the memory is for storing a computer program;

the processor is configured to execute the computer program, and is specifically configured to implement the log file storage method provided in any one of the first aspects of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic block structure diagram of a storage system for log files according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a method for storing a log file according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a storage device for log files according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The storage method of the log file provided by the embodiment of the application is mainly used for storing the log file generated when each application server operates in the Elasticsearch system, and the method can be executed by the storage system of the log file shown in fig. 1. As shown in fig. 1, the log file storage system may include a log collection component, an index monitoring component, an index generation component and an index analysis component, and connection relationships among the components are shown in fig. 1.

The log collection component is respectively installed on each application server and used for collecting log files of each application server and sending the log files to the log analysis component. Specifically, the log collection component in the present application may be implemented by using filebeat, which is an existing log data collector (a data collection program) of a local file, and may monitor a log directory or a specific log file (tail file), and forward them to an elastic search through a Kafka message queue, so that the ES system stores the log in a corresponding index. In the system provided by the application, the log analysis component on each application server can run in an independent process, the log files of the application servers are monitored in real time, and when the log data are newly added in the log files of the application servers, the newly added log data are read and stored in the message queue Kafka, so that the log data can be transmitted to the log analysis component through the message queue.

The index monitoring component is used for monitoring the current capacity of each index in the ES system in real time, and specifically may monitor the current data amount stored in each index fragment in real time, where the total data amount of each index (the total data amount is the sum of the data amounts of all index fragments contained in the index) and the number of the index fragments, and when the data amount of an index fragment of a certain index is monitored to be too large or the total data amount of a certain index is monitored to be not matched with the number of the index fragment thereof, the index may be set in an unavailable state, which indicates that the index cannot continue to write data.

And the index generation component is used for not performing index generation operation if indexes which can be written normally exist according to the monitoring result of the index monitoring component. And if no index which can be written normally exists, a new index is newly generated according to the naming rule, and the newly generated index information is transmitted to the log analysis component, so that the log analysis component can write the log data newly acquired by the log acquisition component into the newly generated index.

The log analysis component, in the system provided by the present application, may be implemented by a logstack program (an existing log data processing program). The Logstash is equivalent to a pipeline with real-time data transmission capability, and is responsible for transmitting data information from an input end of the pipeline to an output end of the pipeline and processing the transmitted data information. In the scheme, the log analysis component is based on a logstack program deployed on a special log analysis server, and the log analysis component can acquire log data written by the log acquisition component from the Kafka message queue, process the log data according to a configured rule, and write the processed log data into an index corresponding to a log file.

Referring to fig. 2, an embodiment of the present application provides a method for storing a log file, which may include the following steps:

s201, collecting log data from the target log file.

The target log file may refer to a log file of each application server, in other words, the storage method of the log file provided by the present application may be used to collect data included in the log file of each application server in real time, and store the data in a plurality of indexes of the ES system.

Generally, each application server defines a log file (corresponding to a file path) in advance, and the application server writes log data generated in the running process into the log file in real time.

The log data may include data requests sent by the client to the application server, and reply data fed back to the client by the application server after the application server processes the data requests. For example, the data request sent by the client may be a flight information query request, and correspondingly, the reply data fed back by the application server may be flight information of one or more flights to be queried.

And S202, processing the log data through a log analysis component.

According to different processing rules configured by the log analysis component, there are various specific implementation manners of step S202, which are not limited herein.

S203, judging whether the current index is in an available state or an unavailable state.

Wherein the current index refers to an index currently used for storing the target log file; and when the data volume of the index fragment of the current index is not in the preset data volume range, the current index is in an unavailable state.

When an application server is started, the ES system may create an index for the log file of the application server, and designate the index for storing the log data of the application server, where the index is the current index in step S203.

Along with the operation of the application server, the data volume of the log data stored in the log file of the application server (i.e. the aforementioned target log file) is gradually increased, and correspondingly, the data volume of each index fragment storing the current index of the log data is also gradually increased, when the log monitoring component monitors that the data volume of the index fragment of the current index exceeds the data volume range (for example, 20G to 40G), the current index can be set in an unavailable state, and conversely, if the log monitoring component monitors that the data volume of the index fragment of the current index does not exceed the data volume range, the current index is kept in the available state.

If the current index is in the available state, step S204 is executed, otherwise, if the current index is in the unavailable state, step S205 is executed.

And S204, storing the log data in the current index.

And S205, creating a new index by using the collection item and the collection time of the log data, and storing the log data in the new index.

And the collection item of the log data is used for referring to one or more fields specified in advance in the log data.

Step S205 may call an original log template of the ES system to execute, and a detailed implementation process is not described in detail.

The following is a detailed description of the above steps:

in step S201, the method for collecting log data may be:

and starting a log collection component (namely, filehead) which is pre-installed on each application server, so that the log collection component collects log data from the path to which the target log file belongs according to an incremental collection mode.

In the scheme, the log collection component may specifically utilize two parts, namely a finder and a collector, to read data of a target log file (tail file) and send the data to a specified output (i.e., a Kafka message queue), where the number of finders may be multiple.

When the log collection component of an application server is started, the log collection component starts one or more finders, checks the path of a target log file appointed in the application server in advance, and starts a havester for each log file searched by a prospector. Each harvester reads the newly added log data of the log file from the log file, and finally fileback combines the log data read by the harvester from the log files into aggregated data, and then writes the aggregated data into a pre-specified output path, namely a Kafka message queue.

Specifically, in the present application, the log collection component fileteam may collect the log data of the target log file based on the following configuration information:

filebeat.inputs:

-input_type:log

path representing target Log File

-/var/log/apache/httpd-*.log

tail _ files: true// indicates that the acquisition mode is incremental acquisition or full acquisition

……………

Kafka:// denotes the output path of the log data, where the output path is a Kafka message queue

enabled true// denotes Kafka message queue open

hosts [ ' 10.1.1.1:9092 ', ' 10.1.1.2:9092 ', ' 10.1.1.3:9092 ' ]// Kafka message queue's IP port

topic: AOS _ APPLOG _1// Kafka message queue name

username:prouser

Username and password for passphrase Kafkapwd// Kafka message queue

In the above configuration information, the path item is a path specifying a target log file, and here, a plurality of paths may be specified, so that the log collection component can collect log data from a plurality of target log files. The path of the target log file can support matching by using a regular expression, for example, httpd-log contains all paths of httpd-1.log, httpd-2.log, httpd-a.log and the like, which begin with httpd-and end with log.

the tail _ files item specifies the manner in which the log file is collected, is incremental collection for true, and collects newly generated log data, and is full collection for false, and collects all log data. In the present application, the tail _ files entry is generally set to true, that is, the log data is collected usually in an incremental collection mode.

In the incremental acquisition mode, the harvester acquires log data at regular intervals, and only newly added log data in a target log file from the last acquired time to the current time are acquired each time.

Specifically, in the incremental acquisition mode, the harvester records a log offset (representing the position of currently acquired log data in the target log file) at each acquisition, and continues to acquire newly-added log data downwards from the log offset recorded last time at the next acquisition.

Optionally, a corresponding filecut monitoring process may be further set on the application server, the process state of the filecut may be monitored, and the process may be automatically pulled up when the filecut process is lost. The method and the device prevent a large amount of log data from being collected in a short time when the filebot process is restarted due to the fact that the filebot process is not started for a long time and the log data storage backlog is too much, and output the log data to the Kafka message queue.

The information below output.kafka is the configuration of the Kafka message queue in the present application, because the compatibility problem between filebeam and Kafka exists, filebeam does not support Kafka in other authentication modes except SASL/PLAIN, so SASL/PLAIN authentication modes need to be added to the Kafka message queue, and the Kafka message queue added with such authentication modes needs other application programs to provide a specified user name and password for access, so user name and password need to be added below output.kafka.

When the Kafka message queue stores the log data transmitted by the log acquisition component, the local storage space can be divided into 3 to 30 data partitions according to the data volume of the log data, so that the storage efficiency is improved.

Generally, because the log data of one target log file is stored in a plurality of different index fragments, the log data needs to be divided into a plurality of log fragments, and a context correspondence relationship between the data needs to be determined in the plurality of log fragments, specifically, which log data correspond to the same data request, in step S202, a specific execution process for processing the log data by the log analysis component may be:

and establishing a corresponding relation between the reply data contained in the current log fragment and the reply data contained in the previous log fragment and corresponding to the same request.

The current log fragment refers to each log fragment except the first log fragment obtained by dividing log data; and the previous log fragment refers to a log fragment previous to the current log fragment.

Specifically, after a piece of acquired log data is divided into a plurality of log fragments, the log fragments may be sorted according to the sequence of occurrence in the log data.

Then, aiming at a second log fragment, copying the first log fragment as redundant data, comparing the second log fragment with the redundant data, detecting data corresponding to the same data request as the redundant data in the second log fragment, and then recording the corresponding relation between the second log fragment and the redundant data as the context corresponding relation of the second log fragment. After obtaining all context correspondences of the second log segment, the currently copied redundant data may be discarded, and then the context correspondences of the third log segment are detected.

For example, an application server respectively feeds back two times of reply information for a data request, wherein the first time of reply information is recorded in a first log fragment, the second time of reply information is recorded in a second log fragment, and by performing the comparison, a corresponding relationship can be established between the second time of reply information and the first time of reply information, so as to indicate that the two times of reply information correspond to the same data request.

And then, aiming at a third log fragment, copying a second log fragment as redundant data, similarly comparing the third log fragment with the redundant data, determining that the third log fragment and the redundant data correspond to the data of the same data request, and then recording the corresponding relation between the third log fragment and the redundant data as the context corresponding relation of the third log fragment.

And repeating the steps until the context corresponding relation of each log fragment except the first log fragment is obtained.

In order to achieve the above effect, it is necessary to record the starting amount and the offset of each log fragment in the target log file, where the starting amount indicates where the log fragment starts in the target log file, and the offset indicates where the log fragment ends in the target log file.

The context correspondence of each log fragment may be recorded in a message manager inside the Kafka message queue, and the message manager may be implemented by using existing program frameworks such as Zookeeper and Redis.

The Kafka message queue is a distributed, partition-supported (partition), multi-copy (replenica), Zookeeper framework-based distributed message system. Kafka message queues were originally developed by Linkedin corporation, and the most prominent feature of Kafka message queues is that large amounts of data can be processed in real time to meet various demand scenarios. In the invention, the characteristics of high throughput, low delay, high concurrency and the like are considered, so that the Kafka message queue is adopted to store the log data acquired by the log acquisition component and the context corresponding relation between each log fragment. In the application, the Kafka message queue may be deployed on a Spark cluster, and accordingly, the data may be stored in an internal cache of the Spark cluster.

In addition to determining the context correspondence between the log fragments, the processing of the log data in step S202 may further include parsing the log data using a logstack program. The log data processing flow of the logstack program can include five stages of input, decode, filter, encode and output, wherein input is an input stage, which can be understood as reading log data to be processed from a kafka message queue, decode is a decoding stage, a logstack program converts the log data from a data format defined by the kafka message queue into a data format recognizable by the logstack program at the stage, a filter is a filtering stage, the logstack program processes the converted log data of the previous stage according to a preset analysis rule at the stage, an encode stage is an encoding stage, the logstack program encodes the processed log data into a data format recognizable by a subsequent program at the stage, in the application, the data is coded into a data format which can be recognized by an Elasticissearch system, an output phase is an output phase, and the logstack program sends the coded data to a subsequent program at the output phase, namely the Elasticissearch system.

In the present application, the logstack program may process log data based on the following configuration information:

in the above configuration information, the input entry is used to specify an input source of data to be processed by the logstack program, and in this application, the logstack program reads log data to be processed from the Kafka message queue, so that the input entry configures relevant information of the Kafka message queue. The bootstraps _ servers are IP and port information of the Kafka message queue, the topics _ pattern is a queue name for obtaining the Kafka message queue, and the topics _ pattern supports regular matching, which is to match all queues beginning with AOS _ APP _ LOG _ as configured above.

The Filter item is used to specify parsing rules when the logstack program processes log data, where the parsing rules can be defined using various syntaxes and plug-ins. In the configuration information, a grok is a plug-in that combines a plurality of predefined regular expressions to match the segmented text and map to the keywords. Which is typically used to perform relatively simple pre-processing of log data.

The output entry is used for specifying an address of the log data output after being processed by the logstack program, and in the application, the log data processed by the logstack program needs to be output to the Elasticsearch system so as to be stored in an index corresponding to a target log file in the Elasticsearch system, so that the output entry specifies the specified index output to the Elasticsearch system.

If the current index is in an available state, the index specified by the output item is the current index, and if the current index is in an unavailable state, the index generation component needs to generate a new index, and at the moment, the index specified in the output item is correspondingly changed into the newly generated index, so that the processed log data is stored in the new index.

As described above, whether each index is in an available state or an unavailable state may be configured by the index monitoring component, and specifically, the index monitoring component may:

monitoring whether the data volume of the index fragment of the current index is within the data volume range in real time;

and if the data volume of the index fragment of the current index is not in the data volume range, setting the use state of the current index in the index record file to be an unavailable state.

The index monitoring component can call a Rest interface (API) provided by the Elasticsearch and based on an Http protocol, and can monitor the data volume of the index fragment of each index in the Elasticsearch system by sending a Rest request to the Rest interface.

Specifically, the index monitoring component may obtain index information of the Elasticsearch by using a restful interface of the Elasticsearch, and for any index, the index information of the index includes information such as the number of index fragments of the index, and the data volume of data stored in each index fragment.

Generally, the index information obtained through restful is in a json data format, and after the index information in the json data format is analyzed, the total data size of each index represented by GB can be obtained, and then the data size of each index fragment of the index can be obtained by dividing the total data size of the index by the number of index fragments of the index.

Subsequently, the monitoring component may determine whether the size of each slice is larger than a preset data size range (generally, the data size range may be set to 20-40G, or may be set according to the actual situation), and if the size is larger than a specified value, set the index to be in an unavailable state.

Generally, an index name of an index, a data size of the index, an available state or an unavailable state, a latest update time of the index, and the like are recorded in an index record file of an Elasticsearch, so that the index monitoring component can access the index record file after judging that the data size of an index fragment of the current index exceeds a data size range, and set the current index in the index record file to be in the unavailable state. Thus, in step S203, it is determined that the current index is in the unavailable state.

The index record file of the Elasticsearch system can be synchronized to the server where the logstack program is located at regular time intervals. On the basis, the logstack program can access the index record file local to the server, and if the current index in the index record file is in an available state, the logstack program continues to process the log data according to the original configuration information and stores the processed log data in the current index of the Elasticissearch system.

If the current index state in the index record file is an unavailable state, the logstack program can create a new index in the Elasticsearch system by using the collection item and the collection time of the currently read log data, and then replace the index name of the current index in the configuration information with the index name of the newly created index, so that the subsequent log data is stored in the new index, and the index name of the newly created index is added to the index record file of the Elasticsearch system.

The storage method of the log file provided by the embodiment of the application has the following beneficial effects:

in a first aspect, the invention solves the limitation that the existing Elasticsearch index template cannot create a personalized index, and for a single target log file, when the data volume of the target log file is too large, a plurality of indexes used for storing the log data of the target log file can be created.

In the second aspect, the indexing strategy is simple and efficient, the data volume of the log data of the log file to be stored does not need to be accurately estimated, and only the maximum capacity and the fragment number of a single index are set according to the actual resources of the system. When the data volume of the log data saved by a single index is too large, a new index can be created and the log data can be saved in the new index, so that the data volume of each index fragment is prevented from being too large.

In a third aspect, the invention realizes real-time monitoring of the index state of each index in the Elasticissearch system, can dynamically evaluate the actual data volume of each index, and can automatically create a new index according to the index strategy when the data volume of the index is too large.

In a fourth aspect, the dynamic creation of the index in the Elasticissearch system avoids the occurrence of fragmented small index fragmentation, oversized index fragmentation and the like, and improves the stability and query performance of the ES cluster.

In the fifth aspect, the invention realizes the efficient utilization of system resources and reduces the cost of later-stage manual maintenance by dynamically creating and finely controlling the index in the Elasticissearch system.

The following describes a flow of the storage method provided in the embodiment of the present application with reference to a specific log file:

after the application server a is started, data of a log file B of the application server a needs to be stored in an Elasticsearch system, for this reason, an index is created in the Elasticsearch system for the log file B and is recorded as index-1, then, the log data is written into the log file B in real time in the running process of the application server a, meanwhile, a log collection component collects the log data written into the log file B by the application server a in real time, and transmits the collected log data to a logstack program, where the index entry in the logstack program is: and the logstack program processes the log data and stores the log data into the index-1 of the Elasticisearch system.

After a period of time, the index monitoring component monitors that the data volume of the index slice of index-1 is greater than 40GB, and therefore, the status of index-1 is set to the unavailable status in the index record file.

After the index record file is synchronized to the server where the logstack program is located, the logstack program finds that the index-1 is in an unavailable state, so that the logstack program creates a new index-2 in the Elasticsearch system according to the collection item and the collection time of the currently read log data, and modifies the index item in the configuration information into: and the subsequent log collection component collects the log data from the log file B, and stores the log data into the newly-built index-2 after being processed by the logstack program.

Therefore, even if the overall data volume of the log file B is large, the data volume of the index fragments of each index is prevented from being too large in a new index mode, and the occurrence of overlarge fragments is prevented.

Although the operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

With reference to fig. 3, the storage device may include the following units:

the collecting unit 301 is configured to collect log data from the target log file.

A processing unit 302, configured to process the log data through the log analysis component.

A judging unit 303, configured to judge that the current index is in an available state or an unavailable state.

And the creating unit 304 is configured to create a new index by using the collection item and the collection time of the log data if the current index is in an unavailable state.

A storage unit 305, configured to store the log data in the new index.

Optionally, when the collecting unit 301 collects log data, it is specifically configured to:

and starting a log acquisition component which is pre-installed on each application server, so that the log acquisition component acquires log data from the path to which the target log file belongs according to an incremental acquisition mode.

Optionally, when the processing unit 301 processes the log data through the log analysis component, the processing unit is specifically configured to:

establishing a corresponding relation between reply data contained in the current log fragment and reply data contained in the previous log fragment and corresponding to the same data request; the current log fragment refers to each log fragment except the first log fragment obtained by dividing log data; and the previous log fragment refers to a log fragment previous to the current log fragment.

Optionally, the storage apparatus further includes a monitoring unit 306, configured to:

The specific working principle of the storage device for log files provided in the embodiments of the present application may refer to relevant steps in the storage method for log files provided in any embodiment of the present application, and details are not repeated here.

In the storage device of the log file, the acquisition unit 301 corresponds to the log acquisition means, and the processing unit 302, the judgment unit 303, and the storage unit 305 correspond to the log analysis means. The creating unit 304 corresponds to the aforementioned index generating component, and the monitoring unit 306 corresponds to the aforementioned index monitoring component.

The application provides a device of log files, wherein a collecting unit 301 collects log data from a target log file; the processing unit 302 processes the log data through the log analysis component; the judging unit 303 judges whether the current index is in an unavailable state due to an excessively large data amount of the index shard, the current index referring to an index currently used for storing the target log file; if the current index is in an unavailable state, the creating unit 304 creates a new index by using the name and the collection time of the log data, and the storing unit 305 stores the log data in the new index. According to the scheme, the size of the index fragment of the current index is automatically identified in the storage process, and when the index fragment is large, a new index is automatically established to store the acquired log data, so that the occurrence of an oversized index fragment is avoided.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

An embodiment of the present application further provides an electronic device suitable for implementing an embodiment of the present disclosure, and a schematic structural diagram of the electronic device is shown in fig. 4. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, the electronic device 400 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage device 406 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 406 including, for example, magnetic tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

Embodiments of the present application also provide a computer storage medium (i.e., a computer readable medium), which carries one or more programs and when the one or more programs are executed by the electronic device, causes the electronic device to execute the storage method of the log file provided in any embodiment of the present application.

In the context of this disclosure, a computer-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

According to one or more embodiments of the present disclosure, an embodiment shown in fig. 2 of the present application provides a method for storing a log file, including:

collecting log data from a target log file;

processing the log data by a log analysis component;

Optionally, the collecting log data includes:

and starting a log acquisition component which is pre-installed on each application server, so that the log acquisition component acquires log data from a path to which the target log file belongs according to an incremental acquisition mode.

Optionally, the processing the log data by the log analysis component includes:

establishing a corresponding relation between reply data contained in the current log fragment and reply data contained in the previous log fragment and corresponding to the same request; the current log fragment refers to each log fragment except the first log fragment obtained by dividing the log data; the previous log fragment refers to a log fragment previous to the current log fragment.

Optionally, before the determining that the current index is in the available state or the unavailable state, the method further includes:

According to one or more embodiments of the present disclosure, an embodiment of the present application as shown in fig. 3 is a storage apparatus for log files, including:

the acquisition unit is used for acquiring log data from the target log file;

a processing unit for processing the log data by a log analysis component;

and the storage unit is used for storing the log data in the new index.

Optionally, when the collecting unit collects log data, the collecting unit is specifically configured to:

Optionally, when the processing unit processes the log data through the log analysis component, the processing unit is specifically configured to:

establishing a corresponding relation between reply data contained in the current log fragment and reply data contained in the previous log fragment and corresponding to the same data request; the current log fragment refers to each log fragment except the first log fragment obtained by dividing the log data; the previous log fragment refers to a log fragment previous to the current log fragment.

Optionally, the storage device further includes a monitoring unit, configured to:

The embodiment of the present application further provides a computer storage medium, which is used for storing a computer program, and when the computer program is executed, the computer program is specifically used for implementing the storage method of the log file provided in the embodiment of the present application.

The embodiment of the application also provides an electronic device, which comprises a memory and a processor;

wherein the memory is for storing a computer program;

the processor is configured to execute the computer program, and is specifically configured to implement the storage method for the log file provided in the embodiment of the present application.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 409, or from the storage means 406, or from the ROM 402. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 401.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

While several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for storing a log file, comprising:

collecting log data from a target log file;

processing the log data by a log analysis component;

2. The storage method according to claim 1, wherein the collecting log data comprises:

3. The storage method according to claim 1, wherein the processing the log data by the log analysis component comprises:

4. The storage method according to claim 1, wherein before determining whether the current index is in the available state or the unavailable state, the method further comprises:

5. An apparatus for storing a log file, comprising:

the acquisition unit is used for acquiring log data from the target log file;

a processing unit for processing the log data by a log analysis component;

and the storage unit is used for storing the log data in the new index.

6. The storage device according to claim 5, wherein the collection unit, when collecting the log data, is specifically configured to:

7. The storage device according to claim 5, wherein the processing unit, when processing the log data through the log analysis component, is specifically configured to:

8. The storage device of claim 5, further comprising a monitoring unit to:

9. A computer storage medium for storing a computer program, the computer program, when executed, being particularly adapted to implement the storage method of a log file according to any one of claims 1 to 4.

10. An electronic device comprising a memory and a processor;

wherein the memory is for storing a computer program;

the processor is configured to execute the computer program, and in particular to implement the log file storage method according to any one of claims 1 to 4.