CN111143161B

CN111143161B - Log file processing method and device, storage medium and electronic equipment

Info

Publication number: CN111143161B
Application number: CN201911251886.6A
Authority: CN
Inventors: 李琛; 张霞
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2024-04-09
Anticipated expiration: 2039-12-09
Also published as: CN111143161A

Abstract

The disclosure relates to a method, a device, a storage medium and an electronic device for processing a log file, wherein the method is applied to a client and comprises the following steps: determining the resource usage amount of each log file in the next acquisition period according to the historical increment index of each log file in the first number of log files in the log task set in the previous preset period, wherein the log task set comprises a second number of log file groups, each log file in the first number of log files is distributed to any log file group in the second number of log file groups, the distribution of the log files in the log file groups in the log task set is adjusted according to the resource threshold value of each log file group and the resource usage amount of each log file in the log file groups, and after the next acquisition period is entered, each log file group after the adjustment is scanned, if the increment data exists in the target log file, the increment data is acquired, and the increment data is sent to a server.

Description

Log file processing method and device, storage medium and electronic equipment

Technical Field

The disclosure relates to the technical field of electronic information, in particular to a method and a device for processing a log file, a storage medium and electronic equipment.

Background

With the continuous development of electronic information technology, various devices in a network generate huge and numerous types of log files, and a client (namely an Agent) needs to send the acquired log files to a server for various platforms and systems to perform operations such as data preprocessing, data searching, data analysis, data mining and the like on data contained in the log files. Because the total number of log files, and the amount of data contained in each log file, is increasing, and the system resources (e.g., CPU, memory, network bandwidth, etc.) owned by the client are limited, how to collect a large number of log files using the limited system resources on the client is a current problem to be solved. In general, the log file collection method is divided into two types, one is to traverse all log files sequentially with a fixed period (for example, 1 s) to collect incremental data, and this method needs to open a large number of file handles, which occupies excessive system resources and is easy to cause a jam or an error. The other is to limit the occupation of system resources by adopting a self-suppression method, so that too much time delay is caused for acquiring the log file, and the acquisition efficiency is too low.

Disclosure of Invention

The disclosure aims to provide a method and a device for processing log files, a storage medium and electronic equipment, which are used for solving the problem of low collection efficiency of the log files caused by unreasonable system resource utilization in the prior art.

To achieve the above object, according to a first aspect of embodiments of the present disclosure, there is provided a method for processing a log file, applied to a client, the method including:

determining the resource usage amount of each log file in a next acquisition period according to a history increment index of each log file in a first number of log files in a log task set, wherein the log task set comprises a second number of log file groups, and each log file in the first number of log files is distributed to any log file group in the second number of log file groups;

according to the resource threshold value of each log file group and the resource usage amount of each log file in the log file group, the distribution of the log files in the log file group in the log task set is adjusted;

scanning each log file group after adjustment after entering the next acquisition period;

If the target log file is scanned to have the incremental data, acquiring the incremental data, and sending the incremental data to a server.

Optionally, before said scanning each of said log file groups after said entering said next acquisition period, said method further comprises:

determining the scanning frequency of each adjusted log file group in the next acquisition period according to the access times of each log file sent by the server;

after entering the next acquisition period, scanning each log file group after adjustment, including:

and after entering the next acquisition period, scanning each log file in each log file group according to the adjusted scanning frequency of each log file group.

Optionally, the determining, according to the historical increment index of each log file in the first number of log files in the log task set in the previous preset period, the resource usage amount of each log file in the next acquisition period includes:

determining a log increment prediction model according to the history increment index of each log file in the previous preset time period;

Predicting the increment index of each log file in the next acquisition period according to the log increment prediction model;

and determining the resource usage amount of each log file in the next acquisition period according to the corresponding relation between the increment index and the resource usage amount and the increment index of each log file.

Optionally, the adjusting the allocation of the log files in the log file group in the log task set according to the resource threshold value of each log file group and the resource usage amount of each log file in the log file group includes:

for each log file group, taking the sum of the resource usage amounts of all the log files in the log file group as the total resource usage amount of the log file group;

if the total resource usage of the log file group is greater than the resource threshold of the log file group, circularly executing the log adjustment step until the total resource usage of the log file group is less than or equal to the resource threshold of the log file group;

if the total resource usage of the log file group is less than or equal to the resource threshold of the log file group, maintaining the log file group;

The log adjustment step includes:

determining whether an empty log file group exists in the log task set;

if the empty log file group exists, combining the empty log file group with the log file group so as to update the log task set;

if the empty log file group does not exist, migrating a first log file to a first log file group, wherein the first log file is the log file with the lowest resource usage in the log file group, and the first log file group is the log file group with the lowest total resource usage in the log task set so as to update the log task set; or if the empty log file group does not exist, migrating a second log file to the first log file group to update the log task set, wherein the second log file is a log file with the resource usage amount in the log task set smaller than a preset threshold value.

Optionally, the determining, according to the number of accesses to each log file sent by the server, the adjusted scanning frequency of each log file group in the next acquisition period includes:

determining the hit rate of each adjusted log file group according to the access times of each log file;

And determining the scanning frequency of each adjusted log file group in the next acquisition period according to the hit rate of each adjusted log file group, wherein the scanning frequency is positively correlated with the hit rate.

Optionally, the method further comprises:

acquiring a third number of initial log task sets, wherein each initial log task set comprises a fourth number of initial log file groups, each log file in the first number of log files is distributed to any initial log file group in the fourth number of initial log file groups, and the fourth number is smaller than or equal to the second number;

and screening the log task sets from a third number of initial log task sets according to a genetic algorithm according to the first resource utilization rate of each initial log task set and the second resource utilization rate of each initial log file group.

Optionally, the screening the log task set from the third number of initial log task sets according to a genetic algorithm according to the first resource usage rate of each initial log task set and the second resource usage rate of each initial log file group includes:

Taking the initial log task set with the highest utilization rate of the first resource as a target task set, and taking the initial log file group with the highest utilization rate of the second resource as a log file group to be selected to store in a preset position;

deleting the log files which are repeated by each initial log task set and the log file group to be selected so as to update the initial log task set;

on the premise of ensuring that the second resource utilization rate of each log file group in the updated initial log task set is smaller than or equal to the utilization rate threshold value of the log file group, distributing the log files in the log file group with the lowest second resource utilization rate in the initial log task set to other log file groups in the initial log task set;

repeatedly executing the initial log task set with the highest utilization rate of the first resource as a target task set, storing the initial log file group with the highest utilization rate of the second resource as a to-be-selected log file group in a preset position until the second resource utilization rate of each log file group in the initial log task set after updating is ensured to be smaller than or equal to the utilization rate threshold value of the log file group, and distributing the log files in the log file group with the lowest utilization rate of the second resource in the initial log task set to other log file groups in the initial log task set until the log files in the log file group with the lowest utilization rate of the second resource in the target log task set are distributed to the other log file groups in the target log task set, so that the second resource utilization rate of at least one log file group in the target log task set is larger than the utilization rate threshold value of the at least one log file group;

And placing the to-be-selected log file group stored in the preset position into the target log task set, and taking the target log task set as the log task set.

According to a second aspect of embodiments of the present disclosure, there is provided a log file processing apparatus, applied to a client, the apparatus including:

the determining module is used for determining the resource usage amount of each log file in a next acquisition period according to a history increment index of each log file in a first number of log files in a log task set, wherein the log task set comprises a second number of log file groups, and each log file in the first number of log files is distributed to any log file group in the second number of log file groups;

the adjustment module is used for adjusting the distribution of the log files in the log file group in the log task set according to the resource threshold value of each log file group and the resource usage amount of each log file in the log file group;

the scanning module is used for scanning each log file group after adjustment after entering the next acquisition period;

And the acquisition module is used for acquiring the increment data if the increment data exist in the scanned target log file and transmitting the increment data to the server.

Optionally, the apparatus further comprises:

the scanning determining module is used for determining the scanning frequency of each adjusted log file group in the next acquisition period according to the access times of each log file sent by the server before scanning each adjusted log file group after entering the next acquisition period;

the scanning module is used for:

Optionally, the determining module includes:

the model determining submodule is used for determining a log increment prediction model according to the historical increment index of each log file in the previous preset time period;

a prediction sub-module for predicting the increment index of each log file in the next acquisition period according to the log increment prediction model;

the prediction submodule is further used for determining the resource usage amount of each log file in the next acquisition period according to the corresponding relation between the increment index and the resource usage amount and the increment index of each log file.

Optionally, the adjustment module is configured to:

the log adjustment step includes:

determining whether an empty log file group exists in the log task set;

Optionally, the scan determining module is configured to:

Optionally, the apparatus further comprises:

the acquisition module is used for acquiring a third number of initial log task sets, each initial log task set comprises a fourth number of initial log file groups, each log file in the first number of log files is allocated to any initial log file group in the fourth number of initial log file groups, and the fourth number is smaller than or equal to the second number;

and the screening module is used for screening the log task sets from the third initial log task sets according to a genetic algorithm according to the first resource utilization rate of each initial log task set and the second resource utilization rate of each initial log file group.

Optionally, the screening module is configured to:

According to a third aspect of the disclosed embodiments, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method of the first aspect of the disclosed embodiments.

According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of the first aspect of the embodiments of the present disclosure.

Through the technical scheme, the log task set processed in the disclosure comprises a second number of log file groups, wherein the first number of log files are distributed in the second number of log file groups. Firstly, according to the historical increment index of each log file in a log task set in a preset time period, determining the resource usage amount of each log file in a next acquisition period, then according to the resource threshold value of each log file group and the resource usage amount of each log file in the log file group, adjusting the distribution of the log files in the log file group in the log task set, finally, after entering the next acquisition period, scanning each log file group after adjustment, if the target log file is scanned to have increment data, acquiring the increment data, and transmitting the increment data to a server. According to the method and the device, the distribution of the log files is adjusted by predicting the resource usage amount of the log files in the next acquisition period, so that the distribution of the log files in the log file group can be enabled to be suitable for the system resources deployed by the client for each log file group, and the acquisition efficiency of the log files is improved.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:

FIG. 1 is a flow chart illustrating a method of processing a log file according to an example embodiment;

FIG. 2 is a flow chart illustrating another method of processing log files according to an example embodiment;

FIG. 3 is a flowchart illustrating another method of processing a log file according to an example embodiment;

FIG. 4 is a flowchart illustrating another method of processing a log file according to an example embodiment;

FIG. 5 is a schematic diagram illustrating a log file adjustment process according to an example embodiment;

FIG. 6 is a flowchart illustrating another method of processing a log file according to an example embodiment;

FIG. 7 is a flowchart illustrating another method of processing a log file according to an example embodiment;

FIG. 8 is a flowchart illustrating another method of processing a log file according to an example embodiment;

FIG. 9 is a diagram illustrating a screening log task set in accordance with an exemplary embodiment;

FIG. 10 is a block diagram of a log file processing apparatus according to an example embodiment;

FIG. 11 is a block diagram of another log file processing apparatus according to an example embodiment;

FIG. 12 is a block diagram of another log file processing apparatus according to an example embodiment;

FIG. 13 is a block diagram of another log file processing apparatus according to an example embodiment;

fig. 14 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Before introducing the log file processing method, the log file processing device, the storage medium and the electronic equipment provided by the disclosure, application scenes related to various embodiments of the disclosure are first described. The application scenario may be a log collection system including a client and a server, where the client is configured to scan a log file, and send incremental data to the server when the log file is scanned that the incremental data exists. The server is used for storing the incremental data sent by the client into corresponding log files, and correspondingly, a user or other service systems and platforms can acquire the log files stored in the server by accessing the server. The clients in the log collection system may be one or more, and the servers may be one or more. The client may be a mobile terminal such as a smart phone, a tablet computer, a smart television, a smart watch, a PDA (english: personal Digital Assistant, chinese: personal digital assistant), a portable computer, or a fixed terminal such as a desktop computer. The server may include, but is not limited to: entity servers, server clusters, cloud servers, etc. The server and the client are connected through a network, so that data transmission can be performed.

FIG. 1 is a flowchart illustrating a method of processing a log file, as shown in FIG. 1, according to an exemplary embodiment, the method being applied to a client and comprising:

step 101, determining the resource usage amount of each log file in the next acquisition period according to the history increment index of each log file in the first number of log files in the log task set in the previous preset period, wherein the log task set comprises a second number of log file groups, and each log file in the first number of log files is distributed to any log file group in the second number of log file groups.

For example, the client is responsible for managing the first number of log files, which may be divided into a second number of log file groups, each of which may include zero, one, or multiple log files. And the second number of log file groups forms a log task set on the client. First, each log file in the first number of log files may be acquired, and a history increment index of a previous preset time period is acquired. The previous preset time period may be understood as a time window (e.g. 24h, or 100 min) before the current time. The historical increment index of a certain log file represents the increment index generated by the log file in the preset time period, and the increment index is the statistical index of increment data. For example, the historical delta index may be the rate of generation (bars/s) and rate of generation (bytes/s) of delta data for the file over 100 minutes.

And establishing an ARIMA (Autoregressive Integrated Moving Average, autoregressive integral moving average) model capable of describing the incremental index of each log file in a preset time period according to the historical incremental index of each log file, so as to predict the incremental index of each log file in the next acquisition period. And then, according to the corresponding relation between the increment index and the resource usage amount, determining the corresponding resource usage amount of each log file in the next acquisition period. The resource usage of a log file can reflect the resources that need to be used by the client when scanning the log file, and may include, for example, CPU frequency, memory occupation, network bandwidth occupation, and the like.

Step 102, according to the resource threshold value of each log file group and the resource usage amount of each log file in the log file group, the allocation of the log files in the log file group in the log task set is adjusted.

For example, the client deploys a certain system resource for each log file group in the log task set, which can be understood as a thread on the client corresponding to one log file group, the client deploys a certain system resource for each thread, and the sum of the system resources of all log file groups should be less than or equal to the system resource of the client. The multiple threads can be executed in parallel, and the multiple log file groups can be scanned in parallel, so that the client can scan the multiple log file groups at the same time, and the utilization rate of system resources is improved. The system resources of each log file group may be the same or different. Thus, for each log file group of system resources, a corresponding resource threshold (the resource threshold being positively correlated with the system resources), i.e. an upper limit of the processing capacity of the log file group, may be set. Whether all log files in the log file group exceed the resource threshold in the next acquisition period can be predicted according to the sum of the resource threshold of each log file group and the resource usage amount of all log files in the log file group. For example, the sum of the resource usage amounts of all log files in the log file group may be taken as the total resource usage amount of the log file group. If the total resource usage exceeds the resource threshold, the log file allocation in the log file group is unreasonable, and if the total resource usage does not exceed the resource threshold, the log file allocation in the log file group is reasonable, and the log file group is not required to be adjusted.

Specifically, for the log file group with the total resource usage exceeding the resource threshold, part of log files in the log file group can be migrated to other log file groups which do not exceed the resource threshold, so as to reduce the total resource usage of the log file group, i.e. the log files in the log file group are allocated by using other log file groups which do not exceed the resource threshold. And the empty log file group (i.e. the log file group without any log files) included in the log task set can be combined with the log file group, so that the system resources of the combined log file group are the sum of the system resources of the two log file groups, which is equivalent to improving the resource threshold of the combined log file group.

It should be noted that, because the log task set may include an empty log file group, the client may also deploy system resources for the empty log file group, and the empty log file group is correspondingly set with a resource threshold. The empty log file group does not contain log files, which can be understood that the sum of the resource usage amounts of all log files is zero, that is, the total resource usage amount of the empty log file group is zero, and the resource threshold value of the empty log file group is not exceeded. Therefore, when step 102 is executed, if a certain log file group is determined to be an empty log file group, no adjustment is required for the log file group. Taking the a log file group as an empty log file group as an example, in step 102, adjustment of the a log file group is not required, and when adjustment is performed on other non-empty log file groups, part of log files may be migrated to the a log file group. Then after entering the next acquisition period, the a log file group is no longer an empty log file group, and accordingly, when step 102 is performed, it may be further determined whether the total resource usage of the a log file group exceeds the resource threshold.

Step 103, after entering the next acquisition period, scanning each log file group after adjustment.

Step 104, if the target log file is scanned to have the incremental data, the incremental data is collected, and the incremental data is sent to the server.

For example, the collection period may be understood as that when the client scans the log task set, the distribution of the log files in the log task set is adjusted once every other collection period, and the collection period may be preset on the client, for example, may be 2min. The next acquisition period is the next acquisition period of the acquisition period to which the current moment belongs, and after entering the next acquisition period, the client can scan the adjusted log file groups according to a preset scanning frequency, wherein the scanning frequency of each log file group can be the same or different. When the client scans the log file group, if the scanned log file has incremental data, the incremental data is collected and sent to the server, after the server receives the incremental data, the incremental data is written into the target log file stored on the server, and if the scanned log file does not generate the incremental data, the client continues to scan other log files. In step 102, the allocation of each log file group in the log task set is adjusted to be within a reasonable range, that is, the total resource usage of each log file group does not exceed the resource threshold, and the method is applicable to the system resources deployed by the client for each log file group, so that the client can ensure that the system resources of each log file group can be fully utilized when scanning each log file group, and neither blocking nor error nor too high delay is caused, thereby effectively improving the collection efficiency of log files.

In summary, the log task set processed in the present disclosure includes a second number of log file groups, where the first number of log files is allocated in the second number of log file groups. Firstly, according to the historical increment index of each log file in a log task set in a preset time period, determining the resource usage amount of each log file in a next acquisition period, then according to the resource threshold value of each log file group and the resource usage amount of each log file in the log file group, adjusting the distribution of the log files in the log file group in the log task set, finally, after entering the next acquisition period, scanning each log file group after adjustment, if the target log file is scanned to have increment data, acquiring the increment data, and transmitting the increment data to a server. According to the method and the device, the distribution of the log files is adjusted by predicting the resource usage amount of the log files in the next acquisition period, so that the distribution of the log files in the log file group can be enabled to be suitable for the system resources deployed by the client for each log file group, and the acquisition efficiency of the log files is improved.

FIG. 2 is a flowchart illustrating another method of processing a log file, as shown in FIG. 2, according to an exemplary embodiment, the method further comprising, prior to step 103:

Step 105, determining the scanning frequency of each adjusted log file group in the next acquisition period according to the access times of each log file sent by the server.

In a specific application scenario, in an initialization stage, a client sends all log files in a locally stored log task set to a server, then the client scans each log file group in the log task set (i.e. scans each log file) regularly, if a certain log file generates incremental data, the client sends the incremental data to the server, and the server writes the incremental data into the log file. Thus, the log files on the client and the log files stored on the server remain synchronized. The user can acquire each log file on the server so as to perform operations such as data preprocessing, data searching, data analysis, data mining and the like on the data included in the log file. The attention degree of the user to different log files may be different, and accordingly, the access times of different log files are different. Thus, the client may adjust the frequency of uploading the incremental data, i.e., the frequency of scanning each log file group, based on the number of accesses each log file has on the server. Specifically, if a log file in a certain log file group is frequently accessed on a server (i.e., the number of accesses is large), the scanning frequency of the log file group may be increased, and if a log file in a certain log file group is rarely accessed on a server (i.e., the number of accesses is small), the scanning frequency of the log file group may be reduced.

Accordingly, the implementation manner of step 103 may be:

after entering the next acquisition period, scanning each log file in each log file group according to the adjusted scanning frequency of each log file group.

For example, after entering the next acquisition period, each log file within each log file group may be scanned at the adjusted scanning frequency for that log file group determined in step 105. The higher the scanning frequency is, the more system resources are needed to be occupied, the smaller the time delay of log acquisition is, and conversely, the lower the scanning frequency is, the less system resources are needed to be occupied, and the longer the time delay of log acquisition is. Therefore, the scanning frequency of each log file group can be flexibly adjusted according to the access times of the log files. Aiming at the frequently accessed log files, timeliness is more important, namely the time delay of log acquisition needs to be reduced as much as possible, so that the log files accessed by a user on a server can be kept synchronous with the log files acquired by a client, and the scanning frequency of the log file group where the frequently accessed log files are located can be improved. Aiming at the log files which are less accessed, the requirement on time efficiency is lower, and the occupation of system resources can be reduced as much as possible on the basis of increasing a certain time delay, so that the scanning frequency of a log file group where the log files which are less accessed are positioned can be reduced, the resource usage amount of the log files and the access times of the log files can be comprehensively considered, the time delay of log acquisition can be reasonably adjusted, the system resources can be allocated, and the efficiency of log file acquisition can be further improved.

FIG. 3 is a flowchart illustrating another method of processing a log file according to an exemplary embodiment, as shown in FIG. 3, step 101 may include the steps of:

and step 1011, determining a log increment prediction model according to the historical increment index of each log file in the previous preset time period.

Step 1012, predicting the increment index of each log file in the next acquisition period according to the log increment prediction model.

Step 1013, determining the resource usage of each log file in the next acquisition period according to the corresponding relation between the increment index and the resource usage and the increment index of each log file.

For example, a log delta prediction model may be established according to the historical delta index of each log file in a preset time period, and the log delta prediction model may be an ARIMA model, for example. The previous preset time period may be understood as a sliding time window before the current moment. Taking a time window of 50min as an example, starting from 0min, the log increment prediction model is determined according to the historical increment index within 0min to 50min, after 5min, the log increment prediction model is determined according to the historical increment index within 5min to 55min, and so on. After the log increment prediction model is determined, the increment index of each log file in the next acquisition period can be predicted, and the resource usage amount of each log file in the next acquisition period is determined according to the corresponding relation between the preset increment index and the resource usage amount. The corresponding relation between the increment index and the resource usage amount can be obtained by counting a large number of log files in advance, can be stored in a client or on a server, and can be inquired from the server when the client needs to be used. The correspondence may be, for example, in the form of a table, where the table includes a plurality of records, each record including an incremental index, and a resource usage amount corresponding to the incremental index. For example, the increment index includes a generation speed (bar/s) and a generation rate (byte/s), and the resource usage includes: the CPU frequency, memory occupancy, network bandwidth occupancy, the correspondence may be as shown in table 1:

TABLE 1

Production speed (bars/s)	Production Rate (byte/s)	CPU frequency (MHz)	Memory occupancy	Network bandwidth occupancy
					2	50	50	5％	2％
3	40	40	5％	3％
					10	120	100	8％	10％
…	…	…	…	…

FIG. 4 is a flowchart illustrating another method of processing a log file according to an exemplary embodiment, as shown in FIG. 4, the implementation of step 102 may include:

step 1021, for each log file group, taking the sum of the resource usage amounts of all log files in the log file group as the total resource usage amount of the log file group.

Step 1022, if the total resource usage of the log file group is greater than the resource threshold of the log file group, the log adjustment step is circularly executed until the total resource usage of the log file group is less than or equal to the resource threshold of the log file group.

Step 1023, if the total resource usage of the log file group is less than or equal to the resource threshold of the log file group, maintaining the log file group.

For example, the adjustment is performed for the log files in each log file group, and first, the total resource usage of the log file group may be obtained, where the total resource usage is the sum of the resource usage of all log files in the log file group. And then, determining whether the log file group needs to be adjusted according to the total resource usage amount and the resource threshold value of the log file group. If the total resource usage is greater than the resource threshold, indicating that the allocation of the log files in the log file group is unreasonable, the log adjustment step may be performed until the total resource usage of the log file group is less than or equal to the resource threshold of the log file group, and if the total resource usage is less than or equal to the resource threshold, indicating that the allocation of the log files in the log file group is reasonable, then the allocation of the log files in the log file group is maintained.

The log adjustment step comprises the following steps:

step 1) determining whether an empty log file group exists in the log task set.

And 2) if the empty log file group exists, combining the empty log file group with the log file group so as to update the log task set.

And 3) if the empty log file group does not exist, migrating the first log file to the first log file group, wherein the first log file is the log file with the lowest resource usage in the log file group, and the first log file group is the log file group with the lowest total resource usage in the log task set so as to update the log task set. Or if the empty log file group does not exist, migrating the second log file to the first log file group to update the log task set, wherein the second log file is a log file with the resource usage amount smaller than a preset threshold value in the log task set.

Specifically, when the log file group with the total resource usage greater than the resource threshold is adjusted, the adjustment process can be divided into three types of merging, migration and merging. Fig. 5 (a) shows a merged scenario, where the log task set includes 10 log file groups, where the total resource usage of the log file group 2 is greater than the resource threshold, and if there is an empty log file group in the log task set: i.e. the log file group 9, the log file group 9 and the log file group 2 may be combined, and the system resource of the combined log file group is the sum of the system resources of the log file group 9 and the log file group 2, which is equivalent to increasing the resource threshold of the combined log file group, so that the number of log file groups included in the updated log task set is reduced by one (i.e. the combined log task set includes 9 log file groups). And then judging again whether the total resource usage of the combined log file group is greater than a resource threshold value so as to determine whether the log file group needs to be further adjusted.

In the scenario shown in fig. 5 (b), a migration scenario is illustrated, where a log task set includes 10 log file groups, where the total resource usage of the log file group 2 is greater than the resource threshold, and if no empty log file group exists in the log task set, the log file u with the lowest resource usage in the log file group 2 (i.e., the first log file) may be migrated to the log file group 9 with the lowest total resource usage in the log task set (i.e., the first log file group), so as to update the log task set. Thus, the total resource usage of the updated log file group can be reduced. Or, a second log file with the resource usage amount smaller than the preset threshold value in the log task set may be migrated to the first log file group to update the log task set, where the second log file may be one or multiple. And then judging whether the total resource usage of the updated log file group is greater than a resource threshold value again so as to determine whether the log file group needs to be further adjusted.

If the total resource usage of the updated log file group is still greater than the resource threshold, while executing step 3), files in the second log file group with lower total resource usage in the log task set may also be migrated to the first log file group, and repeating the above steps, log files with lower resource usage may be collected in the first log file group, where a new empty log file group may appear in the log task set. The empty log file group can be combined with the log file group, namely a migration+combination scene, and can be used for combining with the log file group with the total resource usage amount being larger than the resource threshold value when the log task set is adjusted in the next acquisition period (used for scanning the log file group after entering the next acquisition period).

When the log adjustment step is executed on the log file group, it is required to ensure that the total resource usage of the first log file group is less than or equal to the resource threshold of the first log file group, if the log adjustment step is repeatedly executed, the total resource usage of the log file group cannot be adjusted to be less than or equal to the resource threshold all the time, which indicates that the system resource on the client cannot meet the current requirement of log collection, an alarm can be sent on the client or the server to prompt a manager to perform operations such as capacity expansion.

FIG. 6 is a flowchart illustrating another method of processing a log file, as shown in FIG. 6, according to an exemplary embodiment, step 105 may include:

step 1051, determining the hit rate of each log file group after adjustment according to the access times of each log file.

Step 1052, determining the scanning frequency of each adjusted log file group in the next acquisition period according to the hit rate of each adjusted log file group, wherein the scanning frequency is positively correlated with the hit rate.

Specifically, the hit rate of each log file group can be determined by the following formula:

wherein H is _i Representing hit rate, vt in the ith log file group _i Represents the number of accesses of all log files in the ith log file group, N represents the number of log file groups (i.e., the second number), V _p Represents the number of accesses to the p-th log file in the i-th log file group, and M represents the number of log files in the i-th log file group.

Accordingly, the sweep frequency may be determined by the following formula:

wherein F is _i Representing the scanning frequency of the ith log file group, F ₀ Representing the initial scanning frequency.

FIG. 7 is a flowchart illustrating another method of processing a log file, as shown in FIG. 7, according to an exemplary embodiment, the method further comprising:

step 106, obtaining a third number of initial log task sets, each initial log task set including a fourth number of initial log file groups, each log file in the first number of log files being assigned to any one of the fourth number of initial log file groups, the fourth number being less than or equal to the second number.

Step 107, screening out the log task sets from the third number of initial log task sets according to the genetic algorithm according to the first resource utilization rate of each initial log task set and the second resource utilization rate of each initial log file group.

In a specific application scenario, in an initialization stage, the log collection system may perform random allocation for a first number of log files locally stored in the client to obtain a third number of initial log task sets, where each initial log task set includes the first number of log files, and the first number of log files is allocated to a fourth number (less than or equal to the second number) of initial log file groups. It can be understood that, for the first number of log files stored on the client, random allocation is performed, and under the condition that the total resource usage of each initial log file group in the initial log file groups is less than or equal to the resource threshold, all possible allocation modes are exhausted, so as to obtain the third number of initial log task sets. It should be noted that, the number of log files included in each initial log task set is the same (i.e., the first number), and the number of initial log file groups included in each initial log task set may be different, i.e., the fourth number of each initial log task set may be different.

And then, according to the first resource utilization rate of each initial log task set and the second resource utilization rate of each initial log file group, screening out log task sets from a third number of initial log task sets according to a genetic algorithm, namely, the log task sets suitable for scanning by clients in steps 101 to 104.

The first resource usage rate of each initial log task set is used for responding to the adaptability of the initial log task set, which can be understood that the higher the total resource usage amount of the log file group is, the fewer the total number of the log file group is, and the higher the first resource usage rate is. The first resource usage may be obtained by the following formula:

wherein F is _j First resource utilization, P, representing a j-th initial set of log tasks _j A fourth number representing a j-th initial set of log tasks,representing the total resource usage of the kth resource of the ith initial log file group in the jth initial log task setEach initial file group includes a total resource usage of K resources, which may include, for example, a total CPU frequency, a total memory footprint, a total network bandwidth footprint, and so on.

The second resource usage rate of each initial log file group can reflect the resource usage rate of the initial log file group, which can be understood that the higher the resource usage amount of the log file, the more fully utilized the system resources of the initial log file group, and the higher the second resource usage rate. The second resource usage may be obtained by the following formula:

wherein C is _m Representing the second resource usage, T, of the mth initial log file group _k Representing the threshold value of the kth resource,represents the kth resource usage of the nth log file in the mth initial log file group (the mth initial log file group includes N log files). The resource threshold may include, for example, a CPU frequency threshold, a memory occupancy threshold, and a network bandwidth occupancy threshold, and the resource usage may include: CPU frequency, memory occupation, network bandwidth occupation.

FIG. 8 is a flowchart illustrating another method of processing a log file according to an example embodiment, as shown in FIG. 8, the implementation of step 107 may include:

in step 1071, the initial log task set with the highest utilization rate of the first resource is used as a target task set, and the initial log file group with the highest utilization rate of the second resource is used as a log file group to be selected and stored in a preset position.

And 1072, deleting the log files which are repeated by each initial log task set and the selected log file group so as to update the initial log task set.

And step 1073, allocating the log files in the log file group with the lowest second resource utilization rate in the initial log task set to other log file groups in the initial log task set on the premise that the updated second resource utilization rate of each log file group in the initial log task set is smaller than or equal to the utilization rate threshold of the log file group.

For example, a genetic algorithm is used to screen out the log task set from the third number of initial log task sets, so as to select the initial log task set with the least fourth number and the higher resource usage of the log files in each log file group as the log task set. The log task set is an initial log task set inheriting the two properties of highest first resource usage and highest second resource usage.

As shown in fig. 9 (a), first, an initial log task set j with the highest first resource usage rate is selected as a target task set from a third number of initial log task sets, and is used as an intersection of genetic algorithms. And then taking the initial log file group i2 (belonging to the initial log task set i) with the highest second resource utilization rate as an intersection part, and storing the intersection part as a log file group to be selected in a preset position. The initial log task set j comprises m initial log file groups, and the initial log task set i comprises n initial log file groups.

As shown in fig. 9 b, the log files e, u, …, and f included in the log file group i2 are deleted from the initial log task set 1 to the initial log task set r (r is a third number), and then the updated third number of initial log task sets do not include the log files e, u, …, and f. The initial log task set 1 comprises k initial log file groups, and the initial log task set r comprises t initial log file groups.

And then, the distribution of the remaining log files in each initial log task set is adjusted, and the log files in the log file group with the lowest second resource utilization rate in the initial log task set are distributed to other log file groups in the initial log task set on the premise that the updated second resource utilization rate of each log file group in the initial log task set is smaller than or equal to the utilization rate threshold value of the log file group. It will be appreciated that after deleting the log files contained in the intersecting portion of the initial set of log tasks, the remaining log files are integrated to obtain at least one empty initial set of log files. Wherein the usage threshold may be set according to the system resources of the log file group, the usage threshold being positively correlated with the system resources.

As shown in fig. 9 (c), taking an initial log task set q (including u initial log file groups) as an initial log file group qu, the log file group with the lowest second resource usage in the initial log task set q may be allocated to the log file q2, so that the initial log file group qu becomes an empty initial log file group.

Step 1074, repeatedly executing steps 1071 to 1073 until the log files in the log file group with the lowest second resource usage rate in the target log task set are allocated to other log file groups in the target log task set, so that the second resource usage rate of at least one log file group in the target log task set is greater than the usage rate threshold of at least one log file group.

Step 1075, the group of log files to be selected stored in the preset position is put into the target log task set, and the target log task set is used as the log task set.

For example, steps 1071 to 1073 are repeated until the target task set cannot allocate the log files in the log file group with the lowest usage rate of the second resource to other log file groups (i.e., cannot obtain an empty initial log file group when step 1073 is performed), which indicates that the number of initial log file groups included in the target task set is the least. Finally, each time step 1071 to step 1073 are executed, a plurality of log file groups to be selected stored in a preset position are put into a target log task set, and a log task set is obtained. The target log task set ensures that the number of the initial log file groups is the least, the log file groups to be selected ensure that the resource usage amount of the log files in the target log task set is the highest, so that the log task set inherits the two attributes of the first resource usage rate which is the highest and the second resource usage rate which is the highest, and is the log task set which is most suitable for the current client to scan. The system resources of the client can be reasonably distributed, and the efficiency of log file acquisition is further improved.

Fig. 10 is a block diagram of an apparatus for processing log files according to an exemplary embodiment, and as shown in fig. 10, the apparatus 200 is applied to a client, and includes:

The determining module 201 is configured to determine, according to a history increment index of each log file in the first number of log files in the log task set in a previous preset period, a resource usage amount of each log file in a next acquisition period, where the log task set includes a second number of log file groups, and each log file in the first number of log files is allocated to any log file group in the second number of log file groups.

The adjustment module 202 is configured to adjust allocation of the log files in the log file group in the log task set according to the resource threshold of each log file group and the resource usage amount of each log file in the log file group.

And the scanning module 203 is configured to scan each log file group after the next acquisition period is entered.

And the acquisition module 204 is configured to acquire the incremental data if the scanned target log file has the incremental data, and send the incremental data to the server.

FIG. 11 is a block diagram of another log file processing apparatus, as shown in FIG. 11, according to an exemplary embodiment, the apparatus 200 further includes:

the scan determining module 205 is configured to determine, according to the number of accesses of each log file sent by the server, a scanning frequency of each log file group after adjustment during a next acquisition period before scanning each log file group after adjustment after entering the next acquisition period.

Accordingly, the scanning module 203 is configured to:

Fig. 12 is a block diagram of another log file processing apparatus according to an exemplary embodiment, and as shown in fig. 12, the determining module 201 includes:

the model determining submodule 2011 is configured to determine a log increment prediction model according to a history increment index of each log file in a preset time period.

And a prediction submodule 2012, configured to predict an increment index of each log file in a next acquisition period according to the log increment prediction model.

The prediction submodule 2012 is further configured to determine the resource usage amount of each log file in the next acquisition period according to the corresponding relationship between the increment index and the resource usage amount and the increment index of each log file.

Optionally, the adjustment module 202 is configured to perform the following steps:

step A) takes the sum of the resource usage amounts of all log files in each log file group as the total resource usage amount of the log file group.

And B) if the total resource usage of the log file group is greater than the resource threshold of the log file group, circularly executing the log adjustment step until the total resource usage of the log file group is less than or equal to the resource threshold of the log file group.

And C) if the total resource usage of the log file group is smaller than or equal to the resource threshold of the log file group, maintaining the log file group.

The log adjustment step comprises the following steps:

step 1) determining whether an empty log file group exists in the log task set.

Optionally, the scan determination module 205 is configured to:

first, according to the access times of each log file, the hit rate of each log file group after adjustment is determined.

And then, determining the scanning frequency of each adjusted log file group in the next acquisition period according to the hit rate of each adjusted log file group, wherein the scanning frequency is positively related to the hit rate.

Fig. 13 is a block diagram of another log file processing apparatus according to an exemplary embodiment, and as shown in fig. 13, the apparatus 200 further includes:

an obtaining module 206, configured to obtain a third number of initial log task sets, where each initial log task set includes a fourth number of initial log file groups, each log file in the first number of log files is allocated to any initial log file group in the fourth number of initial log file groups, and the fourth number is less than or equal to the second number.

And a screening module 207, configured to screen the log task set from the third number of initial log task sets according to a genetic algorithm according to the first resource usage rate of each initial log task set and the second resource usage rate of each initial log file group.

Optionally, the screening module 207 is configured to perform the following steps:

and D) taking the initial log task set with the highest utilization rate of the first resource as a target task set, and taking the initial log file group with the highest utilization rate of the second resource as a to-be-selected log file group to store in a preset position.

And E) deleting the log files which are repeated by each initial log task set and the selected log file group so as to update the initial log task set.

And F) distributing the log files in the log file group with the lowest second resource utilization rate in the initial log task set to other log file groups in the initial log task set on the premise that the updated second resource utilization rate of each log file group in the initial log task set is smaller than or equal to the utilization rate threshold value of the log file group.

And G) repeatedly executing the steps D to F until the log files in the log file group with the lowest second resource utilization rate in the target log task set are distributed to other log file groups in the target log task set, so that the second resource utilization rate of at least one log file group in the target log task set is greater than the utilization rate threshold of at least one log file group.

And H) putting the to-be-selected log file group stored in the preset position into a target log task set, and taking the target log task set as a log task set.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 14 is a block diagram of an electronic device 300, according to an example embodiment. As shown in fig. 14, the electronic device 300 may include: a processor 301, a memory 302. The electronic device 300 may also include one or more of a multimedia component 303, an input/output (I/O) interface 304, and a communication component 305.

The processor 301 is configured to control the overall operation of the electronic device 300 to complete all or part of the steps in the log file processing method described above. The memory 302 is used to store various types of data to support operation at the electronic device 300, which may include, for example, instructions for any application or method operating on the electronic device 300, as well as application-related data, such as contact data, transceived messages, pictures, audio, video, and the like. The Memory 302 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 303 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 302 or transmitted through the communication component 305. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 304 provides an interface between the processor 301 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 305 is used for wired or wireless communication between the electronic device 300 and other devices. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G or 4G, or a combination of one or more thereof, the corresponding communication component 305 may thus comprise: wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic device 300 may be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), digital signal processor (Digital Signal Processor, abbreviated as DSP), digital signal processing device (Digital Signal Processing Device, abbreviated as DSPD), programmable logic device (Programmable Logic Device, abbreviated as PLD), field programmable gate array (Field Programmable Gate Array, abbreviated as FPGA), controller, microcontroller, microprocessor, or other electronic component for performing the above-described log file processing method.

In another exemplary embodiment, a computer readable storage medium is also provided, comprising program instructions which, when executed by a processor, implement the steps of the method of processing a log file described above. For example, the computer readable storage medium may be the memory 302 described above including program instructions executable by the processor 301 of the electronic device 300 to perform the method of processing a log file described above.

In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned log file processing method when executed by the programmable apparatus.

The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure, and all the simple modifications belong to the protection scope of the present disclosure.

In addition, the specific features described in the foregoing embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, the present disclosure does not further describe various possible combinations.

Moreover, any combination between the various embodiments of the present disclosure is possible as long as it does not depart from the spirit of the present disclosure, which should also be construed as the disclosure of the present disclosure.

Claims

1. A method for processing a log file, the method being applied to a client, the method comprising:

2. The method of claim 1, wherein prior to scanning each of the adjusted log file groups after the entering the next acquisition period, the method further comprises:

3. The method of claim 1, wherein determining the resource usage of each log file in the next acquisition period based on the historical delta index for each log file in the first number of log files in the log task set for the previous preset period of time comprises:

4. The method of claim 1, wherein adjusting the allocation of the log files in the log file group within the log task set based on the resource threshold for each of the log file groups and the resource usage of each of the log files in the log file group comprises:

the log adjustment step includes:

determining whether an empty log file group exists in the log task set;

5. The method according to claim 2, wherein the determining the scanning frequency of each log file group after adjustment in the next acquisition period according to the access times of each log file sent by the server includes:

6. The method according to any one of claims 1-5, further comprising:

7. The method of claim 6, wherein said screening said set of log tasks from a third number of said sets of initial log tasks according to a genetic algorithm based on a first resource usage of each of said sets of initial log tasks and a second resource usage of each of said sets of initial log files, comprises:

taking the initial log task set with the highest utilization rate of the first resource as a target log task set, and taking the initial log file group with the highest utilization rate of the second resource as a log file group to be selected to store in a preset position;

on the premise of ensuring that the second resource utilization rate of each log file group in the initial log task set after updating is smaller than or equal to the utilization rate threshold value of the log file group, distributing the log files in the log file group with the lowest second resource utilization rate in the initial log task set to other log file groups in the initial log task set;

repeatedly executing the initial log task set with the highest utilization rate of the first resource as a target log task set, storing the initial log file group with the highest utilization rate of the second resource as a to-be-selected log file group in a preset position until the second resource utilization rate of each log file group in the initial log task set after updating is ensured to be smaller than or equal to the utilization rate threshold value of the log file group, and distributing the log files in the log file group with the lowest utilization rate of the second resource in the initial log task set to other log file groups in the initial log task set until the log files in the log file group with the lowest utilization rate of the second resource in the initial log task set are distributed to the other log file groups in the target log task set, so that the second resource utilization rate of at least one log file group in the target log task set is larger than the utilization rate threshold value of the at least one log file group;

8. A log file processing apparatus, for application to a client, the apparatus comprising:

9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1-7.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-7.