CN113297154B - Website log compression method and device - Google Patents

Website log compression method and device Download PDF

Info

Publication number
CN113297154B
CN113297154B CN202110487122.8A CN202110487122A CN113297154B CN 113297154 B CN113297154 B CN 113297154B CN 202110487122 A CN202110487122 A CN 202110487122A CN 113297154 B CN113297154 B CN 113297154B
Authority
CN
China
Prior art keywords
field
type
result
compression
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110487122.8A
Other languages
Chinese (zh)
Other versions
CN113297154A (en
Inventor
李传咏
卢颖
赵莉
陈宁
李玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Webber Software Co ltd
Original Assignee
Xi'an Webber Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Webber Software Co ltd filed Critical Xi'an Webber Software Co ltd
Priority to CN202110487122.8A priority Critical patent/CN113297154B/en
Publication of CN113297154A publication Critical patent/CN113297154A/en
Application granted granted Critical
Publication of CN113297154B publication Critical patent/CN113297154B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a website log compression method and device, and relates to the field of data compression. A website log compression method comprises the following steps: and judging the type of each field in the website log to be compressed to obtain a type judgment result. And respectively inputting each field into the corresponding preset model according to the type judgment result so as to obtain a field compression result of each field. And establishing a position index of each field compression result to obtain a log compression result. The website log compression method and device take different characteristics of different types of fields into consideration, and different compression processing is performed on different types of log files according to the different characteristics of the fields, so that the compression rate of the website log to be compressed can be greatly improved, a good compression effect is achieved, the storage space of a server and the bandwidth and time of communication are greatly reduced, and the transmission time of log compression results is reduced.

Description

Website log compression method and device
Technical Field
The invention relates to the field of data compression, in particular to a method and a device for compressing website logs.
Background
Data compression techniques are a technique for representing raw signal data with the least possible amount of data. Due to the rapid development of informatization, digitization and networking, the data volume of various forms is huge, so that the data compression technology becomes a key common technology in computers and communication, storage and multimedia entertainment nowadays.
There are two main categories of data compression algorithms: lossy compression and lossless compression. Lossy data compression algorithms typically reduce the size of files by deleting small details that require a large amount of fidelity data. In lossy data compression, it is impossible to restore the original file because the basic data is deleted. The lossless data compression is to reduce the size of the file, so that the original file can be completely recovered by one decompression function, and data loss cannot be caused. Lossless data compression is ubiquitous in computers, and storage space of computers can be saved.
The basic principle of the lossless compression algorithm is as follows: any non-random file contains repeating data that can be compressed by statistical modeling techniques used to determine the probability of occurrence of a character or phrase. Using these and other techniques, an 8-bit length character or string can be represented with fewer bits, so that a large amount of duplicate data is removed. Typical compression algorithms include: the dictionary class LZ77(1977) and LZ78(1978) series, as well as other modes of PPM (1984) and BZIP2 (1996).
At present, most compression algorithms basically treat a compression object as a character sequence to perform algorithm processing. The existing compression algorithm ignores the characteristics of a compressed object, and in the practical application of a lossless compression algorithm, especially in the practice of networking and informatization, a large amount of website access logs are processed by using a general data lossless compression algorithm, so that relatively more storage space and communication bandwidth and time are occupied, and a good compression effect cannot be achieved.
Disclosure of Invention
The invention aims to provide a website log compression method and a website log compression device, which are used for solving the problem that in the prior art, a lossless compression algorithm is used for processing a large number of website access logs, relatively more storage space and communication bandwidth and time are occupied, and a good compression effect cannot be achieved.
The embodiment of the invention is realized by the following steps:
in a first aspect, an embodiment of the present application provides a website log compression method, which includes the following steps: and judging the type of each field in the website log to be compressed to obtain a type judgment result. And respectively inputting each field into the corresponding preset model according to the type judgment result so as to obtain a field compression result of each field. And establishing a position index of each field compression result to obtain a log compression result.
In some embodiments of the present invention, before the step of determining the type of each field in the website log to be compressed, the method for compressing the website log further includes: and acquiring a website log to be compressed.
In some embodiments of the present invention, after the step of inputting each field into the corresponding preset model, the website log compression method further includes: and when the type judgment result is the first type, inputting the field into the first model. And counting the repetition times of the contents of each different line in the field to obtain the repetition times. And according to the repetition times, coding the content of each different line to obtain a first coding result. The line content of the field is replaced with the first encoding result to obtain a first data stream.
In some embodiments of the present invention, after the step of inputting each field into the corresponding preset model, the website log compression method further includes: and when the type judgment result is a second type, inputting the field into the second model. Dividing the content of each line of the field into a first character string and a second character string, and counting the number of repetitions of each first character string to obtain the number of repetitions. And coding each first character string according to the number of the repeated characters to obtain a second coding result. And replacing the first character string with the second encoding result to obtain a second data stream.
In some embodiments of the present invention, after the step of inputting each field into the corresponding preset model, the website log compression method further includes: and when the type judgment result is a third type, inputting the field into the third model. And calculating the time difference of two adjacent lines in the field to obtain a plurality of time differences. And counting the repetition frequency of each time difference to obtain the repetition frequency. And coding the time difference according to the repetition frequency to obtain a third coding result. And obtaining a third data stream according to the third encoding result.
In a second aspect, an embodiment of the present application provides a website log compression apparatus, which includes: and the type judgment module is used for judging the type of each field in the log of the website to be compressed so as to obtain a type judgment result. And the field compression module is used for respectively inputting each field into the corresponding preset model according to the type judgment result so as to obtain the field compression result of each field. And the log compression module is used for establishing a position index of each field compression result so as to obtain a log compression result.
In some embodiments of the present invention, the website log compression apparatus further includes a to-be-compressed website log obtaining module, where the to-be-compressed website log obtaining module is configured to obtain a to-be-compressed website log.
In some embodiments of the invention, the field compression module comprises: and the first type input unit is used for inputting the field into the first model when the type judgment result is the first type. And the repetition frequency counting unit is used for counting the repetition frequency of each different row content in the field to obtain the repetition frequency. And the first coding unit is used for coding the content of each different line according to the repetition times so as to obtain a first coding result. A first data stream unit is obtained for replacing the line content of the field with the first encoding result to obtain a first data stream.
In some embodiments of the invention, the field compression module comprises: and the second type input unit is used for inputting the field into the second model when the type judgment result is the second type. And the repeated number counting unit is used for dividing the content of each line of the field into a first character string and a second character string, and counting the repeated number of each first character string to obtain the repeated number. And the second encoding unit is used for encoding each first character string according to the number of the repeated characters so as to obtain a second encoding result. And obtaining a second data stream unit, which is used for replacing the first character string with the second coding result to obtain a second data stream.
In some embodiments of the present invention, the field compression module includes: and the third type input unit is used for inputting the field into a third model when the type judgment result is a third type. And the time difference calculation unit is used for calculating the time difference of two adjacent rows in the field to obtain a plurality of time differences. And the repetition frequency counting unit is used for counting the repetition frequency of each time difference to obtain the repetition frequency. And the third coding unit is used for coding the time difference according to the repetition frequency to obtain a third coding result. And obtaining a third data stream unit, configured to obtain a third data stream according to the third encoding result.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory for storing one or more programs; a processor. The program or programs, when executed by a processor, implement the method of any of the first aspects as described above.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method according to any one of the first aspect described above.
Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects:
the invention provides a method and a device for compressing website logs, which comprises the following steps: and judging the type of each field in the website log to be compressed to obtain a type judgment result. And respectively inputting each field into the corresponding preset model according to the type judgment result so as to obtain a field compression result of each field. And establishing a position index of each field compression result to obtain a log compression result. Firstly, acquiring a website log to be compressed, judging the types of all fields in the website log to be compressed, then performing different compression processing on the fields of different types according to the types of the fields, and finally establishing position indexes of all field compression results to obtain a log compression result. The website log compression method and device take different characteristics of different types of fields into consideration, and different compression processing is performed on different types of log files according to the different characteristics of the fields, so that the compression rate of the website log to be compressed can be greatly improved, a good compression effect is achieved, the storage space of a server and the bandwidth and time of communication are greatly reduced, and the transmission time of log compression results is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a website log compression method according to an embodiment of the present invention;
fig. 2 is a detailed content of a website log to be compressed according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating compression of a first type field according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a weblog compression apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural block diagram of an electronic device according to an embodiment of the present invention.
Icon: 100-website log compression means; 110-type judging module; 120-field compression module; 130-log compression module; 101-a memory; 102-a processor; 103-communication interface.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not construed as indicating or implying relative importance.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of an element identified by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the description of the present application, it should be noted that if the terms "upper", "lower", "inner", "outer", etc. are used to indicate an orientation or positional relationship based on that shown in the drawings or that the application product is usually placed in use, the description is merely for convenience and simplicity, and it is not intended to indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore should not be construed as limiting the present application.
In the description of the present application, it should also be noted that, unless otherwise explicitly stated or limited, the terms "disposed" and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the individual features of the embodiments can be combined with one another without conflict.
Examples
Referring to fig. 1, fig. 1 is a flowchart illustrating a website log compression method according to an embodiment of the present disclosure. A website log compression method comprises the following steps:
s110: judging the type of each field in the website log to be compressed to obtain a type judgment result;
by scanning the website log to be compressed and circulating each field in the website log to be compressed, the type judgment result of each field can be obtained. Specifically, the type determination result includes a first type field, a second type field, and a third type field. The first type field may include fields such as remote host IP, E-mail data, user login name, status code data, total number of bytes sent, and user browser. The second type field may include "method + resource + protocol". The third type field may include a request time. Fig. 2 shows specific contents of a website log to be compressed according to an embodiment of the present application, where a field of "kyy.njtech.edu.cn 122.228.19.71" in the drawing belongs to a first type field, a field of "[ 25/Jan/2021:00:00 +0800] in the drawing belongs to a third type field, and" GET/info/1017/profiles/14433/529. jsp HTTP/1.1 "in the drawing belongs to a second type field.
S110: according to the type judgment result, inputting each field into a corresponding preset model respectively to obtain a field compression result of each field;
and inputting the fields into corresponding preset models according to the types of the fields so as to perform different compression processing on the fields without the types, thereby obtaining the field compression results of the fields. The preset model comprises a first model, a second model and a third model, and the field compression result comprises a first data stream, a second data stream and a third data stream. The first model may be a model for processing a first type field, the second model may be a model for processing a second type field, and the third model may be a model for processing a third type field. The first data stream is a field compression result of a field of a first type, the second data stream is a field compression result of a field of a second type, and the third data stream is a field compression result of a field of a third type. Different types of fields are input into different models for processing, and corresponding data streams can be obtained.
S120: and establishing a position index of each field compression result to obtain a log compression result.
All the field compression results can be arranged according to the positions of the corresponding fields in the website logs to be compressed by establishing the position indexes, and after the log compression results are decompressed, the decompressed website logs can be consistent with the website logs to be compressed, so that lossless compression of the website logs to be compressed is guaranteed.
In the implementation process, firstly, the type of each field in the website log to be compressed is judged by scanning the website log to be compressed, then different types of fields are compressed according to the type of the field, and finally, position indexes of all field compression results are established to obtain a log compression result. The website log compression method considers different characteristics of different types of fields, and performs different compression processing on different types of log files according to the different characteristics of the fields, so that the compression rate of the website log to be compressed can be greatly improved, a better compression effect is achieved, the storage space of a server and the bandwidth and time of communication are greatly reduced, and the transmission time of a log compression result is reduced.
It should be noted that, as the website is visited more and more frequently, the more website logs are generated each day, and the more obvious the compression effect is by using the website log compression method.
In some embodiments of this embodiment, before the step of determining the type of each field in the website log to be compressed, the method for compressing the website log further includes: and acquiring a website log to be compressed. Specifically, the website log to be compressed includes fields such as remote host IP, E-mail data, user login name, request time, "method + resource + protocol" data, status code data, total number of bytes sent, and user browser. The separator is arranged between the fields of different types in the website log to be compressed, and each field can be accurately determined through the separator, so that the type of each field can be conveniently determined.
In some embodiments of this embodiment, after the step of inputting each field into the corresponding preset model, the method for compressing weblogs further includes: and when the type judgment result is the first type, inputting the field into the first model. And counting the repetition times of the contents of each different line in the field to obtain the repetition times. And according to the repetition times, coding the content of each different line to obtain a first coding result. The line content of the field is replaced with the first encoding result to obtain a first data stream. The first type of field belongs to a field in which the content is very much repeated completely. Specifically, after the first type field is input into the first model, the number of times of repetition of each different row of content of the first type field in the website log to be compressed is counted, and according to the number of times of repetition, each different row of content is encoded, the encoding mode may be huffman encoding or arithmetic encoding, so as to obtain a first encoding result, the first encoding result corresponds to the different row of content, and then the corresponding row of content is replaced by the first encoding result, so as to obtain a field compression result of the first type field, that is, a first data stream.
Referring to fig. 3, fig. 3 is a flowchart illustrating compression of a first type field according to an embodiment of the present invention. First, the number of repetitions of each different row content of the first type field in the website log to be compressed in fig. 2 is counted, and the counted result is that the number of repetitions of "kyy.njtech.edu.cn" is 7, the number of repetitions of "ny.njtech.edu.cn" is 4, the number of repetitions of "ngdrwy.njtech.njtech.edu.cn" is 3, the number of repetitions of "licme.njtech.edu.cn" is 5, the number of repetitions of "zz.njtech.edu.cn" is 1, the number of repetitions of "mint.njtech.edu.cn" is 1, and the number of repetitions of "maker.njtech.edu.cn" is 1. And then, according to the repetition times of the content of each different line, performing Huffman coding on the content of each different line to obtain a first coding result. Thus, the coding result of "kyy.njtech.edu.cn" was 01, the coding result of "ny.njtech.edu.cn" was 0001, the coding result of "ngdrwyjy.njtech.edu.cn" was 00001, the coding result of "licme.njtech.edu.cn" was 001, the coding result of "zz.njtech.edu.cn" was 000001, the coding result of "nth.njtech.edu.cn" was 0000001, and the coding result of "maker.njtech.edu.cn" was 0000000. Taking the first type field of the first eight lines in fig. 2 as an example, replacing the line contents of the first eight lines with the first encoding result, the data stream after compressing the fields of the first eight lines can be obtained as 010100010100001000010101.
In some embodiments of this embodiment, after the step of inputting all the fields into the corresponding preset models respectively, the method for compressing a website log further includes: and when the type judgment result is the second type, inputting the field into the second model. Dividing the content of each line of the field into a first character string and a second character string, and counting the number of repetitions of each first character string to obtain the number of repetitions. And coding each first character string according to the number of the repeated characters to obtain a second coding result. And replacing the first character string with the second encoding result to obtain a second data stream. If the majority of the strings belonging to the row content in the field of the second type are identical, the field of the second type may be divided into a first string and a second string, which are processed separately. Specifically, the first character string in the line content of the second type field may be a character string with very much repeated content, and the number of repetitions of the first character string in the line content may be counted. And according to the number of repetition, performing Huffman coding or arithmetic coding on all the first character strings to obtain a second coding result, wherein the second coding result corresponds to different first character strings, and replacing the corresponding first character strings with the second coding result to obtain a field compression result of a second type field, namely a second data stream.
The second string may be processed in a manner of directly saving the second string.
In addition, the second string may be processed in a manner of counting the number of repetitions of the second string in the line content. And performing Huffman coding on all the second character strings according to the number of repetition to obtain Huffman coding results of all the second character strings, and replacing the corresponding second character strings with the Huffman coding results of the second character strings to obtain data stream results of the second character strings, wherein the data stream results of the second character strings and the second data streams can jointly form field compression results of a second type field.
The two processing methods selected for the processing method of the second character string are merely two options of the embodiments in the present embodiment, and are not limited to the selection of the processing method of the second character string.
In some embodiments of this embodiment, after the step of inputting all the fields into the corresponding preset models respectively, the method for compressing a website log further includes: and when the type judgment result is a third type, inputting the field into the third model. And calculating the time difference of two adjacent lines in the field to obtain a plurality of time differences. And counting the repetition frequency of each time difference to obtain the repetition frequency. And coding the time difference according to the repetition frequency to obtain a third coding result. And obtaining a third data stream according to the third encoding result. In particular, the fields of the third type belong to a time series or a regular data series. According to the characteristic, the time difference of two adjacent lines of the third type field in the website log to be compressed can be calculated. However, the time difference between two adjacent rows is mostly 0 or 1, and the repetition frequency of the time difference is also large. And performing Huffman coding or arithmetic coding on all the different time differences according to the repetition frequency to obtain a third coding result, wherein the third coding result corresponds to the different time differences, and a field compression result of a field of a third type, namely a third data stream, can be obtained through the third coding.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a website log compression apparatus 100 according to an embodiment of the present disclosure. A weblog compression device 100, comprising: the type determining module 110 is configured to determine a type of each field in the to-be-compressed website log to obtain a type determining result. And a field compression module 120, configured to input each field into a corresponding preset model according to the type determination result, so as to obtain a field compression result of each field. And the log compression module 130 is configured to establish a location index of each field compression result to obtain a log compression result. Specifically, the website log compression apparatus 100 determines the type of each field in the website log to be compressed through the type determining module 110, then performs different compression processing on the fields of different types through the field compression module 120, and finally obtains a log compression result through the log compression module 130. Therefore, the purpose of performing different compression processing on different types of log files according to different characteristics of fields is achieved, the compression rate of the website logs to be compressed can be greatly improved, a good compression effect is achieved, the storage space of a server and the communication bandwidth and time are greatly reduced, and the transmission time of log compression results is reduced.
In some embodiments of the present embodiment, the website log compressing apparatus 100 further includes a to-be-compressed website log obtaining module, where the to-be-compressed website log obtaining module is configured to obtain a to-be-compressed website log. Specifically, the website log compressing apparatus 100 obtains the website log to be compressed through a website log to be compressed obtaining module.
In some embodiments of this embodiment, the field compression module 120 includes: and the first type input unit is used for inputting the field into the first model when the type judgment result is the first type. And the repetition frequency counting unit is used for counting the repetition frequency of each different row content in the field to obtain the repetition frequency. And the first coding unit is used for coding the content of each different line according to the repetition times so as to obtain a first coding result. A first data stream unit is obtained for replacing the line content of the field with the first encoding result to obtain a first data stream. Specifically, after the first type field is input into the first model, the number of repetitions of different row contents in the first type field is counted, and huffman coding or arithmetic coding is performed on all the different row contents according to the number of repetitions to obtain a first coding result, where the first coding result corresponds to the different row contents, and then the corresponding row contents are replaced with the first coding result, so that a field compression result of the first type field, that is, a first data stream, can be obtained.
In some embodiments of this embodiment, the field compression module 120 includes: and the second type input unit is used for inputting the field into the second model when the type judgment result is the second type. And the repeated number counting unit is used for dividing the content of each line of the field into a first character string and a second character string, and counting the repeated number of each first character string to obtain the repeated number. And the second encoding unit is used for encoding each first character string according to the number of the repeated characters so as to obtain a second encoding result. And obtaining a second data stream unit, which is used for replacing the first character string with the second coding result to obtain a second data stream. Specifically, the first character string in the line content of the second type field may be a character string with very much repeated content, and the number of repetitions of the first character string in the line content may be counted. And according to the number of the repeated characters, performing Huffman coding or arithmetic coding on all the first character strings to obtain a second coding result, and replacing the corresponding first character strings with the second coding result to obtain a field compression result of a second type field, namely a second data stream.
In some embodiments of this embodiment, the field compression module 120 includes: and the third type input unit is used for inputting the field into the third model when the type judgment result is the third type. And the time difference calculation unit is used for calculating the time difference of two adjacent rows in the field to obtain a plurality of time differences. And the repetition frequency counting unit is used for counting the repetition frequency of each time difference to obtain the repetition frequency. And the third coding unit is used for coding the time difference according to the repetition frequency to obtain a third coding result. And obtaining a third data stream unit, configured to obtain a third data stream according to the third encoding result. Specifically, the time difference between two adjacent lines of the third type field in the website log to be compressed can be calculated. And counting the repetition frequency of each time difference. And performing huffman coding or arithmetic coding on each different time difference according to the repetition frequency to obtain a third coding result, and obtaining a field compression result of the third type field, namely a third data stream, through the third coding.
Referring to fig. 5, fig. 5 is a schematic structural block diagram of an electronic device according to an embodiment of the present disclosure. The electronic device comprises a memory 101, a processor 102 and a communication interface 103, wherein the memory 101, the processor 102 and the communication interface 103 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 101 may be used to store software programs and modules, such as program instructions/modules corresponding to the weblog compression apparatus 100 provided in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 101, so as to execute various functional applications and data processing. The communication interface 103 may be used for communicating signaling or data with other node devices.
The Memory 101 may be, but is not limited to, a Random Access Memory 101 (RAM), a Read Only Memory 101 (ROM), a Programmable Read Only Memory 101 (PROM), an Erasable Read Only Memory 101 (EPROM), an electrically Erasable Read Only Memory 101 (EEPROM), and the like.
The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor 102, including a Central Processing Unit (CPU) 102, a Network Processor 102 (NP), and the like; but may also be a Digital Signal processor 102 (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware components.
It will be appreciated that the configuration shown in fig. 5 is merely illustrative and that the electronic device may include more or fewer components than shown in fig. 5 or have a different configuration than shown in fig. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination thereof.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory 101 (ROM), a Random Access Memory 101 (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
To sum up, the website log compression method and device provided by the embodiment of the present application include the following steps: and judging the type of each field in the website log to be compressed to obtain a type judgment result. And respectively inputting each field into the corresponding preset model according to the type judgment result so as to obtain a field compression result of each field. And establishing a position index of each field compression result to obtain a log compression result. Firstly, judging the types of all fields in the website log to be compressed, then performing different compression processing on the fields of different types according to the types of the fields, and finally establishing position indexes of the compression results of all the fields to obtain a log compression result. The website log compression method considers different characteristics of different types of fields, and performs different compression processing on different types of log files according to the different characteristics of the fields, so that the compression rate of the website log to be compressed can be greatly improved, a better compression effect is achieved, the storage space of a server and the bandwidth and time of communication are greatly reduced, and the transmission time of a log compression result is reduced.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (6)

1. A website log compression method is characterized by comprising the following steps:
judging the type of each field in the website log to be compressed to obtain a type judgment result;
according to the type judgment result, inputting each field into a corresponding preset model respectively to obtain a field compression result of each field;
establishing a position index of each field compression result to obtain a log compression result;
after the step of inputting each field into the corresponding preset model, the method further includes:
when the type judgment result is a first type, inputting the field into a first model, wherein the field of the first type belongs to a field with very much completely repeated content;
counting the repetition times of the contents of each different line in the field to obtain the repetition times;
according to the repetition times, coding the contents of each different line to obtain a first coding result;
replacing the line content of the field with the first encoding result to obtain a first data stream;
after the step of inputting each field into the corresponding preset model, the method further includes:
when the type judgment result is a second type, inputting the field into a second model, wherein most character strings of the field of the second type, which belong to the line content in the field, are the same;
dividing the content of each line of the field into a first character string and a second character string, and counting the number of repetitions of each first character string to obtain the number of repetitions;
according to the number of the repeated characters, coding each first character string to obtain a second coding result;
replacing the first character string with the second encoding result to obtain a second data stream;
after the step of inputting each field into the corresponding preset model, the method further includes:
when the type judgment result is a third type, inputting the field into a third model, wherein the field of the third type belongs to a time sequence or a regular data sequence;
calculating the time difference of two adjacent lines in the field to obtain a plurality of time differences;
counting the repetition frequency of each time difference to obtain the repetition frequency;
according to the repetition frequency, coding the time difference to obtain a third coding result;
obtaining a third data stream according to the third encoding result;
the encoding mode is Huffman encoding.
2. The weblog compression method according to claim 1, wherein before the step of determining the type of each field in the weblog to be compressed, the method further comprises:
and acquiring a website log to be compressed.
3. A weblog compression apparatus, comprising:
the type judgment module is used for judging the type of each field in the website log to be compressed to obtain a type judgment result;
the field compression module is used for respectively inputting each field into a corresponding preset model according to the type judgment result so as to obtain a field compression result of each field;
the log compression module is used for establishing a position index of each field compression result to obtain a log compression result;
the field compression module includes:
the first type input unit is used for inputting the field into a first model when the type judgment result is a first type, wherein the field of the first type belongs to a field with very much completely repeated content;
the repetition frequency counting unit is used for counting the repetition frequency of each different row content in the field to obtain the repetition frequency;
the first coding unit is used for coding the contents of each different line according to the repetition times to obtain a first coding result;
obtaining a first data stream unit, configured to replace the line content of the field with the first encoding result to obtain a first data stream;
the second type input unit is used for inputting the field into the second model when the type judgment result is the second type, and most character strings of the field of the second type, which belong to the line content in the field, are the same;
the repeated number counting unit is used for dividing the content of each line of the field into a first character string and a second character string and counting the repeated number of each first character string to obtain the repeated number;
the second coding unit is used for coding each first character string according to the number of the repeated characters to obtain a second coding result;
a second data stream unit is obtained and used for replacing the first character string with a second coding result so as to obtain a second data stream;
a third type input unit, configured to input a field into a third model when the type determination result is a third type, where the field of the third type belongs to a time series or a regular data series;
the time difference calculation unit is used for calculating the time difference of two adjacent rows in the field to obtain a plurality of time differences;
a repetition frequency counting unit for counting the repetition frequency of each time difference to obtain the repetition frequency;
a third encoding unit, configured to encode the time difference according to the repetition frequency to obtain a third encoding result;
and obtaining a third data stream unit, configured to obtain a third data stream according to the third encoding result, where the encoding mode is huffman encoding.
4. The weblog compression device according to claim 3, further comprising a to-be-compressed weblog obtaining module, wherein the to-be-compressed weblog obtaining module is configured to obtain a weblog to be compressed.
5. An electronic device, comprising:
a memory for storing one or more programs;
a processor;
the one or more programs, when executed by the processor, implement the method of any of claims 1-2.
6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-2.
CN202110487122.8A 2021-05-04 2021-05-04 Website log compression method and device Active CN113297154B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110487122.8A CN113297154B (en) 2021-05-04 2021-05-04 Website log compression method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110487122.8A CN113297154B (en) 2021-05-04 2021-05-04 Website log compression method and device

Publications (2)

Publication Number Publication Date
CN113297154A CN113297154A (en) 2021-08-24
CN113297154B true CN113297154B (en) 2022-05-17

Family

ID=77321710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110487122.8A Active CN113297154B (en) 2021-05-04 2021-05-04 Website log compression method and device

Country Status (1)

Country Link
CN (1) CN113297154B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113886193A (en) * 2021-10-26 2022-01-04 Oppo广东移动通信有限公司 Log data processing method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10360251B1 (en) * 2009-11-05 2019-07-23 Netapp Inc. Efficient compression of system logs through metadata-based grouping
CN109165144B (en) * 2018-09-06 2023-06-13 南京聚铭网络科技有限公司 Secure log compression storage and retrieval method based on variable length record
CN111651417B (en) * 2020-07-09 2021-09-28 腾讯科技(深圳)有限公司 Log processing method and device

Also Published As

Publication number Publication date
CN113297154A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
US10547324B2 (en) Data compression coding method, apparatus therefor, and program therefor
CN115208414B (en) Data compression method, data compression device, computer device and storage medium
CN113238912B (en) Aggregation processing method for network security log data
CN113297154B (en) Website log compression method and device
CN104125475B (en) Multi-dimensional quantum data compressing and uncompressing method and apparatus
CN107277109B (en) Multi-string matching method for compressed flow
CN112995199B (en) Data encoding and decoding method, device, transmission system, terminal equipment and storage medium
CN115408350A (en) Log compression method, log recovery method, log compression device, log recovery device, computer equipment and storage medium
CN113381768B (en) Huffman correction coding method, system and related components
Bhatt et al. Universal graph compression: Stochastic block models
CN113676187A (en) Huffman correction coding method, system and related components
CN112804029A (en) Transmission method, device and equipment of BATS code based on LDPC code and readable storage medium
CN109831544B (en) Code storage method and system applied to email address
Djusdek et al. Adaptive image compression using adaptive Huffman and LZW
CN112262578A (en) Point cloud attribute encoding method and device and point cloud attribute decoding method and device
Hirata et al. Estimating topological entropy via a symbolic data compression technique
Jain et al. A comparative study of lossless compression algorithm on text data
Severo et al. Your dataset is a multiset and you should compress it like one
CN111143641A (en) Deep learning model training method and device and electronic equipment
Rani et al. An Enhanced Text Compression System Based on ASCII Values and Huffman Coding
Sharma et al. Evaluation of lossless algorithms for data compression
CN115757049B (en) Multi-service module log recording method, system, electronic equipment and storage medium
Lovén Data Compression in a Vehicular Environment
US20160323603A1 (en) Method and apparatus for performing an arithmetic coding for data symbols
Alajaji et al. Lossless Data Compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant