CN112749139B

CN112749139B - Log file processing method, electronic equipment and storage medium

Info

Publication number: CN112749139B
Application number: CN202011614132.5A
Authority: CN
Inventors: 邵传贤; 周振江; 王浩然; 马兵; 吴庆双
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2024-04-19
Anticipated expiration: 2040-12-30
Also published as: CN112749139A

Abstract

The invention provides a log file processing method, electronic equipment and a storage medium, wherein the method comprises the following steps: determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in the coding table; for each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the coding table ID corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table; and generating a source file according to the source character segment corresponding to each coding field. According to the log file processing method, the electronic equipment and the storage medium, the encoding table set is built and updated based on the character segments in the source file, the source file is encoded and compressed through the encoding table set, the storage space of the source file in the storage process is released, the cost of hardware equipment is reduced, and meanwhile the encoding file is decoded through the encoding table set, so that quick decoding is realized.

Description

Log file processing method, electronic equipment and storage medium

Technical Field

The present invention relates to the field of information encoding and decoding technologies, and in particular, to a log file processing method, an electronic device, and a storage medium.

Background

During the operation of the business system, the requests and system responses of the users are recorded in a log file mode. The log files are collected to a big data analysis platform and used for big data analysis, and the analyzed data are uniformly circulated to a data storage system for storage. The log files are stored, so that later historical services can be conveniently and deeply mined, historical data can be analyzed when problems exist, a problem rule is found, and the problems are conveniently located and solved.

If the log file is stored in the source string format, a large amount of hardware storage space is required. For this reason, it is necessary to code-compress the log file to reduce the compression space.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a log file processing method, electronic equipment and a storage medium.

The invention provides a log file processing method, which comprises the following steps:

Determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in the coding table;

For each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the coding table ID corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table corresponding to the coding field;

Generating a source file according to the source character segment corresponding to each coding field;

wherein the set of encoding tables comprises one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each code table in the same code table subset corresponds to a different code table ID; the encoding type is determined based on the source character segment length.

According to the log file processing method provided by the invention, the length of the code numbers in the code table is sequentially increased, and the source character segments sequentially correspond to the code numbers according to the occurrence times from more to less.

According to the icon adjusting method provided by the invention, the number of the code table subsets is the same as the numerical value of the preset interception length, and the interception length is the reference length for sectionally dividing the source file in the source file encoding process.

According to the log file processing method provided by the invention, the method further comprises the following steps:

Acquiring a source file to be encoded, and carrying out sectional division on the source file to be encoded according to the intercepting length to obtain each sub-section;

and encoding the source character segment in each sub-segment based on the existing encoding table set to obtain an encoding file, and updating the encoding table set.

According to the log file processing method provided by the invention, the source character segments in each sub-segment are respectively encoded based on the existing encoding table set to obtain the encoding file, and the encoding table set is updated, and the log file processing method comprises the following steps:

judging whether a corresponding coding table subset of the coding types exists in the existing coding table set according to the maximum length of the subsections, and obtaining a first judging result;

If the first judgment result is yes, judging whether the coding tables in the existing coding table subset can be matched with the subsections, obtaining a second judgment result, and configuring the coding numbers for the subsections according to the second judgment result;

If the first judgment result is negative, a new code table subset is established, the corresponding code type is configured according to the maximum length of the sub-segment, a new code table is established in the new code table subset, the corresponding code table ID is configured, and the code number is configured for the sub-segment.

According to the log file processing method provided by the invention, when the first judgment result is NO, the log file processing method further comprises the following steps:

determining a code table subset with corresponding code types in the existing code table set according to each single character of the subsection, determining that the code tables in the existing code table subset cannot be matched with the single character, and configuring the code numbers of the single character in the code tables;

And determining that the encoding table subset of the corresponding encoding type does not exist in the existing encoding table set according to each single character of the subsection, establishing an encoding table in the new encoding table subset, and configuring the encoding number of the single character in the encoding table.

Acquiring character segments S (0, i) of the sub-segment, wherein the i takes any one value from 1 to (L-1), S (0, i) represents the character segment formed by splicing the 0 th to the i th character sequence in the sub-segment, and L is the maximum length of the sub-segment;

determining that the character segment S (0, i) does not have a corresponding code table subset of the code types in the existing code table set, establishing a new code table subset, configuring the corresponding code types according to the lengths of the character segment S (0, i), establishing a new code table in the new code table subset, configuring a corresponding code table ID, and configuring code numbers for the character segment S (0, i);

Determining that the character segment S (0, i) exists in the existing coding table set and a corresponding coding table subset of the coding type, determining that the character segment S (0, i) cannot be matched in the coding table of the existing coding table subset, matching a new coding number for the character segment S (0, i), and updating the coding table.

According to the log file processing method provided by the invention, the code number is configured for the sub-segment according to the second judging result, and the log file processing method comprises the following steps:

if the fact that the code table in the existing code table subset cannot be matched with the sub-segment is determined, a new code number is configured for the sub-segment, and the code table is updated;

And if the fact that the sub-segment can be matched in the code tables in the existing code table subset is determined, the matched code numbers are configured for the sub-segment.

The invention also provides a log file processing method, which comprises the following steps:

acquiring a source file to be encoded, and carrying out sectional division on the source file to be encoded according to a interception length to obtain each sub-section, wherein the interception length is a reference length for carrying out sectional division on the source file;

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the log file processing methods described above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the log processing method as described in any of the above.

According to the log file processing method, the electronic equipment and the storage medium, the encoding table set is built and updated based on the character segments in the source file, the source file is encoded and compressed through the encoding table set, the storage space of the source file in the storage process is released, the cost of hardware equipment is reduced, and meanwhile the encoding file is decoded through the encoding table set, so that quick decoding is realized.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a log file processing method provided by the invention;

FIG. 2 is a schematic flow chart of a log file processing method according to the present invention;

FIG. 3 is a schematic diagram of a log file processing device according to the present invention;

FIG. 4 is a schematic diagram of another configuration of a log file processing apparatus according to the present invention;

Fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The following describes a log file processing method, a device, an electronic device and a storage medium provided by the invention with reference to fig. 1 to 5.

Fig. 1 shows a flow chart of a log file processing method provided by the invention, referring to fig. 1, the method comprises the following steps:

S11, determining each coding field in a file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in the coding table;

S12, respectively aiming at each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the coding table ID corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table corresponding to the coding field;

S13, generating a source file according to the source character segments corresponding to each coding field.

Wherein the set of encoding tables comprises one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each code table in the same code table subset corresponds to a different code table ID; the coding type is determined based on the source segment length.

In the above steps S11 to S13, it should be noted that, during the operation of the service system, the request and the system response for the user are recorded in the form of a log file. The log files are collected to a big data analysis platform and used for big data analysis, and the analyzed data are uniformly circulated to a data storage system for storage. The log files are stored, so that later historical services can be conveniently and deeply mined, historical data can be analyzed when problems exist, a problem rule is found, and the problems are conveniently located and solved.

Each log file is a character string, and each complete character string can generate different character segments due to the arrangement sequence of characters, and the different character segments cause different coding information. For this reason, a corresponding encoding table set needs to be established based on the characteristics of the log file, and encoding and decoding of the log file are completed through the established encoding table set.

The order of the characters changes the length of the character segments. For example, the length of the character segment ab is 2 bytes and the length of the character segment abcdf bytes.

In order to save the encoding time, the character segments with different lengths need to be matched in the corresponding encoding table to obtain the encoding number. To this end, the set of encoding tables includes one or more subsets of encoding tables, each having stored therein one or more encoding tables of the same type. I.e. each subset of coding tables has a unique coding type, comprising one or more coding tables.

Since the character segments are distinguished based on different lengths (i.e., byte lengths), in the present invention, the coding type of each coding sub-table is determined based on the character segment lengths.

For example, in the present invention, a character segment includes one character or a plurality of characters.

The coding table corresponding to the character segment of one character is called a single character coding table, and the coding type is called a single character.

The coding table corresponding to the character segments of the two characters is called a double-character coding table, and the coding type is called a double-character.

……

The coding table corresponding to the character segment of i characters is called an i character coding table, and the coding type is called an "i character".

In the present invention, since one or more encoding tables are included in each subset of encoding tables. So in this code sub-table, each code table has a unique code table ID.

For example, if the subset of encoding tables includes 3 encoding tables, the IDs of the respective encoding tables are 1.1, 1.2, and 1.3.

In the present invention, an encoding table is used to encode a log file. The encoding table includes the correspondence between the encoding number and the source character segment. Each encoded log file is referred to as a source file and the log file is made up of character segments, so that the character segments of the source file are referred to herein as source character segments. The source character segments and the code numbers form a one-to-one correspondence and are stored in the code table.

The code number is a unique number corresponding to the source character segment. The code number is determined using binary coding.

In the invention, the source file is encoded in a segmented encoding mode, and therefore, the encoded file comprises a plurality of encoding fields, and each encoding field comprises an encoding type, an encoding table ID and an encoding number in the encoding table.

In the decoding process, a file to be decoded is obtained, and the coding type, the coding table ID and the coding number in the coding table of each coding field are determined according to the file to be decoded.

Determining a code table corresponding to each code field in the code table set according to the code type and the code table ID in each code field, and obtaining a source character segment corresponding to each code field according to the code number and the code table.

And generating a source file according to the source character segments corresponding to the coding fields.

For example, the file to be decoded contains coding fields of { three characters, 3.1, 011}, { three characters, 3.2, 001}, respectively.

If there is a correspondence between 011 and abc in the encoding table with the encoding table ID of 3.1 in the three-character encoding table subset, the source character segment with the encoding field { three characters, 3.1, 011} is abc.

If there is a correspondence between 001 and jkh in the encoding table whose encoding table ID is 3.2 in the three-character encoding table subset, the source character segment whose encoding field is { three characters, 3.2, 001} is jkh.

At this point, the source file generated is abcjkh.

The log file processing method provided by the invention establishes and updates the coding table set based on the character segments in the source file, codes and compresses the source file through the coding table set, releases the storage space of the source file in the storage process, reduces the cost of hardware equipment, and decodes the coding file through the coding table set to realize quick decoding.

In the further explanation of the above method, mainly, the specific explanation of the code numbers in the code table is that the lengths of the code numbers are sequentially increased, and the source character segments sequentially correspond to the code numbers according to the number of occurrences.

In this regard, it should be noted that each code number has its own byte length. For example, binary codes 0,1, 10, 11, 100, 101, 111, 1000 … …, it follows that the length of the code numbers is set in an incremental manner as a whole.

In the present invention, there are some character segments that occur more frequently. But a code number corresponding to a longer byte length is allocated in the code table. For this purpose, the coding table needs to be updated and optimized, so that the source character segment is replaced with the coding number according to the number of occurrences from more to less.

For example, the number of occurrences of the source segment abcdefgh in different source files is large, but the corresponding code number of the source segment in the code table is 10000, and the corresponding code number of the other source segment abcvgjhk in the code table is 0, at this time, the code number of the source segment abcdefgh is replaced with 0, and the code number of the other source segment abcvgjhk is replaced with 10000. The numbers of the source character segments with more times are sequentially adjusted to the front of the coding numbers in the coding table.

In the invention, the same character segment is compressed to obtain the same compression result, and the occurrence frequency of the character segment can be counted through the compression result to optimize the coding table, thereby obtaining better character compression rate.

In the further explanation of the above method, the number of the subset of the encoding table is mainly specifically explained, the number of the subset of the encoding table is the same as a preset interception length, and the interception length is a reference length for sectionally dividing the source file in the source file encoding process.

It should be noted that, the purpose of the method of the present invention is to compress the source file, and for compression, the character segment of the source file with a certain length of intercepting bytes (i.e. intercepting length) is actually compressed into the code with a shorter length of intercepting bytes. Namely: the file compression is achieved only by making the maximum length of the code number in each code table shorter than the truncated length.

Based on the above explanation of the coding type, the number of the subsets of the coding table is the same as the value of the preset interception length. The intercepting length is the reference length for sectionally dividing the source file in the source file encoding process.

For example, the configuration interception length is 10 bytes, and at this time, for carrying out segmentation division on a source file with 98 bytes, 9 sub-segments with the length of 10 bytes and 1 sub-segment with the length of 8 bytes need to be divided.

In the above encoding process in which the truncated length is 10 bytes, since 10 bytes are the maximum length of the segmentation, the number of subsets of the encoding table can be only 10.

The method of the invention can reasonably manage the number of the subset of the established coding table through reasonably setting the interception length.

In the further description of the above method, the process of creating and updating the coding table is mainly explained as follows:

In this regard, it should be noted that, in the present invention, the encoding tables in the encoding table set may be dynamically updated during the encoding process of the log file.

Encoding and compressing a source file is to compress a character segment with a certain length of intercepting bytes (namely intercepting length) in the source file into a code with a shorter length of bytes.

After the source file to be encoded is obtained, the source file to be encoded is segmented according to the set intercepting length, and each sub-segment is obtained.

And then encoding the source character segment in each sub-segment based on the existing encoding table set to obtain an encoding file, and updating the encoding table set.

In the present invention, an initial encoding table may be configured. The character segment corresponding to the code number in the code table is a common character or character segment. That is, common characters or character segments are encoded, and corresponding relations between encoding numbers and character segments in the encoding tables corresponding to different character segment lengths are established.

And then the original coding table set codes a certain number of source files to obtain a coding file, and meanwhile, a more perfect coding table set is obtained.

And then, the follow-up source file is encoded by the more perfect encoding table set, so that the encoding file is obtained, and meanwhile, the encoding table set is dynamically updated.

The method further encodes the source file through the existing encoding table set, and realizes the dynamic update of the encoding table set while obtaining the encoding file, so that the encoding table set is more perfect and has better adaptability.

In the further description of the above method, the processing procedure of encoding the source character segment in each sub-segment based on the existing encoding table set to obtain the encoded file and updating the encoding table set is mainly explained, which is specifically as follows:

In this regard, in the present invention, when the configuration cut length is 10 bytes, the source file of 98 bytes is divided into segments, and 9 sub-segments of 10 bytes and 1 sub-segment of 8 bytes are required to be divided.

For 9 sub-segments of 10 bytes length, the maximum length of the sub-segments is 10.

For 1 sub-segment of 8 bytes length, the maximum length of the sub-segment is 8.

In the encoding process, firstly, whether the current sub-segment can be matched with the corresponding encoding number in the encoding table is judged, and if the current sub-segment can be matched with the encoding number which can be directly configured, the encoding compression of the current sub-segment is completed.

When it is determined that the existing code table set does not have a corresponding code type code table subset according to the maximum length of the sub-segments, it is indicated that the character segments of the length are not coded in the coding process, at this time, a new code table subset is established for the byte length, a code table is established in the code table subset, a corresponding code table ID is configured, a code number is configured for the sub-segments, and then the sub-segments are coded and compressed by the code number just configured.

For example, for the sub-segment abcdefghlk, a subset of the encoding table with the encoding type of "10 characters" is created, then the encoding table with the ID of 10.1 is configured, and the correspondence between "0" and "abcdefghlk" is created in the encoding table.

When it is determined that the existing code table set has the corresponding code table subset of the code type according to the maximum length of the subsections, but it cannot be guaranteed that a corresponding relationship between the subsections and the code numbers exists in one code table in the code table subset. For this purpose, matching in the encoding table is required.

In the further method, when the existing coding table is used for centralizing the coding tables and cannot match the subsections, a new coding table is established, and the coding numbers are configured so as to directly match the subsequent subsections.

In the further description of the above method, the following processing procedure when the first determination result is no is mainly described in a supplementary manner, which is specifically as follows:

The file compression is achieved because the maximum length of the code number in each code table is shorter than the truncated length. Therefore, when the length of the code number in one code table approaches the cut length, a new code table needs to be reconstructed, and then a new code number is configured for a new sub-segment, which does not establish the corresponding relationship between the sub-segment and the code number, is added to the code table.

In the invention, the sub-segment with the length is not coded in the coding process, and at this time, whether the character segment with different byte lengths existing in the sub-segment has the corresponding relation with the coding number in the corresponding coding table cannot be determined. Because the length of the last character segment that may be reserved is less than the maximum length of the sub-segments when a source file is divided. For example the maximum length of a sub-segment is 10 and the length of the last character segment is 2.

At this time, continuously acquiring the character segment S (0, i) of the sub-segment, wherein the i takes any one of values 1 to (L-1), S (0, i) represents the character segment formed from 0 to i characters in the sub-segment, and L is the maximum length of the field;

Determining that the character segment S (0, i) does not have a corresponding code table subset of the code types in the existing code table set, establishing a new code table subset, configuring the corresponding code types according to the lengths of the character segment S (0, i), establishing a new code table in the new code table subset, configuring a corresponding code table ID, and configuring code numbers for the character segment S (0, i); and determining that the corresponding coding table subset of the coding types exists in the existing coding table set according to the length of the character segment S (0, i), and if the coding table subset exists and cannot be matched with the character segment S (0, i), matching a new coding number for the character segment S (0, i), and updating the coding table.

The processing procedure of the character segment S (0, i) of the sub-segment is the same as the processing procedure of the whole field, and will not be described here again.

In addition, if the first judgment result is no, determining that a corresponding coding table subset of the coding types exists in the existing coding table set according to each single character of the subsection, determining that the coding tables in the existing coding table subset cannot be matched with the single character, and configuring the coding number of the single character in the coding table; and determining that the encoding table subset of the corresponding encoding type does not exist in the existing encoding table set according to each single character of the subsection, establishing an encoding table in the new encoding table subset, and configuring the encoding number of the single character in the encoding table. These processes are the same as the above-described processes and will not be described again.

The above-described processing procedure is explained below with specific examples:

If the source file to be encoded is a string ：&₁&₂…&₁₀@₁@₂…@₁₀％₁％₂…％₁₀…,, the string length is 101 characters. The character segment for each compression is 10 characters, and then the compression is performed 11 times. I.e. 10 character segments of 10 characters and 1 character segment of 1 character.

Firstly, taking the 1 st to 10 th characters from a source file to be encoded as a first character segment needing to be compressed "& ₁&₂…&₁₀";

If a 10-character code table subset exists in the code table set, a corresponding code table exists in the 10-character code table subset, "& ₁&₂…&₁₀", and a corresponding code 01 in the code table is displayed, the code 01 is directly output;

If the encoding table set does not have the 10-character encoding table subset or the 10-character encoding table subset, but the encoding table is not successfully matched, the first character '₁' is obtained from '₁&₂…&₁₀', whether the character '₁' is in the single-character encoding table is judged, and if so, the next step is continued; if not, firstly putting '₁' into a single character coding table, distributing a coding number for the single character coding table, and then carrying out the next step;

Reading a second character '₂' from '₁&₂…&₁₀', wherein the processing of the single character is consistent with that in the previous step; after the single character processing is completed, the last processed character '₁' is spliced with the single character '₂' read at this time to obtain a character string '₁&₂'. Judging whether "& ₁&₂" is in the double-character encoding table, if so, performing the next step; if not, adding '₁&₂' into the double-character coding table, and distributing corresponding coding numbers for the double-character coding table;

Reading a second character '₃' from '₁&₂…&₁₀', wherein the processing of the single character is consistent with that in the previous step; after the single character processing is completed, the character string '₁&₂&₃' is obtained by splicing the character segment 'ab' processed last time with the single character '₃' read at this time. Judging whether "& ₁&₂&₃" is in the three-character coding table, if so, performing the next step; if not, adding '₁&₂&₃' into the three-character coding table, and distributing corresponding coding numbers for the three-character coding table;

Reading the ith character '_i' from '₁&₂…&₁₀', wherein i is 4-10, and the single character processing is consistent with the processing in the last step; after the single character processing is completed, the character segment '₁-&_i-1' processed last time is spliced with the single character '_i' read at this time, and the character segment '₁-&_i' is obtained. Judging whether "& ₁-&_i" is in the i character encoding table, if so, performing the next step; if not, adding '₁-&_i' into the i character coding table, and distributing corresponding coding numbers for the i character coding table;

After the processing of "& ₁&₂…&₁₀" is completed, the codes in the 10-character code table corresponding to the character segment can be obtained, and the processing of the first character segment is completed. The final encoded output format is: { ten characters, 10.1, 00}, cross is the coding type, 10.1 is the coding table ID, and 00 is the coding number.

Then intercepting the second character string "@ ₁@₂…@₁₀"、"％₁％₂…％₁₀", "…" in turn, repeating the processing process of the same principle as the character segment "& ₁&₂…&₁₀" until the whole source file to be encoded is processed, namely the compression is completed.

According to the further method, when all the subsections in the source file to be encoded are not successfully matched in the encoding table set, the purpose of dynamically updating the encoding table is achieved by carrying out independent encoding analysis on all the character strings of each subsection.

Fig. 2 shows a flow chart of a log file processing method provided by the invention, referring to fig. 2, the method includes the following steps:

S21, acquiring a source file to be encoded, and carrying out sectional division on the source file to be encoded according to a interception length, so as to obtain each sub-segment, wherein the interception length is a reference length for carrying out sectional division on the source file;

s22, encoding the source character segment in each sub-segment based on the existing encoding table set to obtain an encoding file, and updating the encoding table set.

In the above-mentioned further explanation of the method, mainly, when the first determination result is no, further explanation will be made:

In the further description of the above method, the processing procedure of configuring the code number for the sub-segment according to the second determination result is mainly explained, and specifically includes the following steps:

The above-described encoding process is described in detail in the foregoing, and is not repeated here.

The method encodes the source file through the existing encoding table set, and realizes the dynamic update of the encoding table set while obtaining the encoding file, so that the encoding table set is more perfect and has better adaptability.

The log file processing device provided by the invention is described below, and the log file processing device described below and the log file processing method described above can be referred to correspondingly.

Fig. 3 shows a schematic structural diagram of a log file processing device provided by the present invention, referring to fig. 3, the device includes an parsing module 31, a decoding module 32, and a generating module 33, where:

the parsing module 31 is configured to determine each encoding field in the file to be decoded, where each encoding field includes a corresponding encoding type, an ID of an encoding table, and an encoding number in the encoding table;

A decoding module 32, configured to determine, for each encoding field, an encoding table corresponding to the encoding field in the encoding table set according to an encoding type and an encoding table ID corresponding to the encoding field, and determine a source character segment corresponding to the encoding field according to an encoding number and an encoding table corresponding to the encoding field;

a generating module 33, configured to generate a source file according to the source character segments corresponding to each encoding field;

In a further description of the above apparatus, the lengths of the code numbers in the code table are sequentially increased, and the source character segments sequentially correspond to the code numbers according to the number of occurrences.

In a further description of the above apparatus, the number of the subsets of the encoding table is the same as a value of a preset interception length, where the interception length is a reference length for performing segmentation division on the source file in the source file encoding process.

In a further illustration of the apparatus described above, the apparatus further comprises an encoding module for:

In a further description of the above apparatus, the encoding module is specifically configured to, in a process of encoding the source character segment in each sub-segment based on the existing encoding table set to obtain the encoded file and updating the encoding table:

In a further description of the above apparatus, when the first determination result is no, the encoding module is further configured to:

acquiring character segments S (0, i) of the subsections, wherein i is 0- (L-1), S (0, i) represents the character segments formed by splicing the 0 th to the i th character sequences in the subsections, and L is the maximum length of the subsections;

In a further description of the above apparatus, the encoding module is specifically configured to:

Since the apparatus according to the embodiment of the present invention is the same as the method according to the above embodiment, the details of the explanation will not be repeated here.

It should be noted that, in the embodiment of the present invention, the related functional modules may be implemented by a hardware processor (hardware processor).

Fig. 4 shows a schematic structural diagram of a log file processing device provided by the present invention, referring to fig. 4, the device includes a dividing module 41 and an encoding module 42, where:

the dividing module 41 is configured to obtain a source file to be encoded, and divide the source file to be encoded into segments according to a interception length, so as to obtain each sub-segment, where the interception length is a reference length for dividing the source file into segments;

The encoding module 42 is configured to encode the source character segment in each sub-segment based on the existing encoding table set, obtain an encoded file, and update the encoding table set.

In a further description of the above apparatus, the encoding module is specifically configured to, in a process of encoding the source character segment in each sub-segment based on the existing encoding table set to obtain the encoded file and updating the encoding table set:

The device encodes the source file through the existing encoding table set, and realizes the dynamic update of the encoding table set while obtaining the encoding file, so that the encoding table set is more perfect and has better adaptability.

Fig. 5 shows a schematic physical structure of an electronic device, as shown in fig. 5, where the electronic device may include: processor (processor) 51, communication interface (Communications Interface) 52, memory (memory) 53 and communication bus 54, wherein processor 51, communication interface 52, memory 53 accomplish the communication between each other through communication bus 54. Processor 51 may call logic instructions in memory 53 to perform a log file processing method comprising: determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in the coding table; for each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the coding table ID corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table corresponding to the coding field; and generating a source file according to the source character segment corresponding to each coding field. Wherein the set of encoding tables comprises one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each code table in the same code table subset corresponds to a different code table ID; the coding type is determined based on the source segment length.

Further, the logic instructions in the memory 53 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the log file processing method provided by the above methods, the method comprising: determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in the coding table; for each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the coding table ID corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table corresponding to the coding field; and generating a source file according to the source character segment corresponding to each coding field. Wherein the set of encoding tables comprises one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each code table in the same code table subset corresponds to a different code table ID; the coding type is determined based on the source segment length.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the log file processing methods provided above, the method comprising: determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in the coding table; for each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the coding table ID corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table corresponding to the coding field; and generating a source file according to the source character segment corresponding to each coding field. Wherein the set of encoding tables comprises one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each code table in the same code table subset corresponds to a different code table ID; the coding type is determined based on the source segment length.

The present invention provides an electronic device, which may include: a processor (processor), a communication interface (Communications Interface), a memory (memory), and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus. The processor may invoke logic instructions in the memory to perform a log file processing method comprising: acquiring a source file to be encoded, and carrying out sectional division on the source file to be encoded according to a interception length to obtain each sub-section, wherein the interception length is a reference length for carrying out sectional division on the source file; and encoding the source character segment in each sub-segment based on the existing encoding table set to obtain an encoding file, and updating the encoding table set.

Further, the logic instructions in the memory described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the log file processing method provided by the above methods, the method comprising: acquiring a source file to be encoded, and carrying out sectional division on the source file to be encoded according to a interception length to obtain each sub-section, wherein the interception length is a reference length for carrying out sectional division on the source file; and encoding the source character segment in each sub-segment based on the existing encoding table set to obtain an encoding file, and updating the encoding table set.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the log file processing methods provided above, the method comprising: acquiring a source file to be encoded, and carrying out sectional division on the source file to be encoded according to a interception length to obtain each sub-section, wherein the interception length is a reference length for carrying out sectional division on the source file; and encoding the source character segment in each sub-segment based on the existing encoding table set to obtain an encoding file, and updating the encoding table set.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A log file processing method, comprising:

2. The log file processing method as claimed in claim 1, wherein the length of the code numbers in the code table is sequentially increased, and the source character segments sequentially correspond to the code numbers according to the number of occurrences from more to less.

3. The log file processing method according to claim 2, wherein the maximum value of the number of the subset of the encoding table is the same as a value of a preset interception length, and the interception length is a reference length for dividing the source file into segments in the source file encoding process.

4. A log file processing method as claimed in any of claims 1 to 3, wherein the method further comprises:

5. The log file processing method as set forth in claim 4 wherein the encoding the source character segment in each sub-segment based on the existing encoding table set to obtain the encoded file and updating the encoding table set comprises:

6. The log file processing method as set forth in claim 5, wherein when the first judgment result is no, further comprising:

7. The log file processing method as set forth in claim 6, wherein when the first judgment result is no, further comprising:

acquiring character segments S (0, i) of the subsections, wherein the i takes any one value from 1 to L-1, S (0, i) represents the character segments formed by splicing the 0 th to the i th character sequences in the subsections, and L is the maximum length of the subsections;

8. The method of claim 5, wherein configuring the code number for the sub-segment according to the second determination result comprises:

9. A log file processing method, comprising:

encoding the source character segment in each sub-segment based on the existing encoding table set to obtain an encoding file, and updating the encoding table set;

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the steps of the log file processing method according to any one of claims 1 to 8 or the steps of the log file processing method according to claim 9.

11. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the log file processing method according to any one of claims 1 to 8 or the steps of the log file processing method according to claim 9.