CN114070325A - Text data compression method and device, computer equipment and storage medium - Google Patents

Text data compression method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114070325A
CN114070325A CN202111271127.3A CN202111271127A CN114070325A CN 114070325 A CN114070325 A CN 114070325A CN 202111271127 A CN202111271127 A CN 202111271127A CN 114070325 A CN114070325 A CN 114070325A
Authority
CN
China
Prior art keywords
compressed
char
character unit
unit
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111271127.3A
Other languages
Chinese (zh)
Inventor
王勇
栾乐
周凯
莫文雄
许中
马智远
李党
霍建彬
代晓丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202111271127.3A priority Critical patent/CN114070325A/en
Publication of CN114070325A publication Critical patent/CN114070325A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/70Type of the data to be coded, other than image and sound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to a method, an apparatus, a computer device, a storage medium and a computer program product for compressing text data. The method comprises the following steps: acquiring text data to be compressed; converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes; converting each unit to be compressed into a first char character unit and a second char character unit; the first char character unit is shifted to the left by a preset number of digits to obtain a third char character unit; adding the third char character unit and the corresponding second char character unit to obtain a compressed character; and splicing the compressed characters in sequence to obtain compressed data. By adopting the compression method of the embodiment, the compressed data has higher compression ratio and occupies less storage space, and the bandwidth occupation can be effectively reduced in the network transmission process.

Description

Text data compression method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for compressing text data, a computer device, a storage medium, and a computer program product.
Background
Along with the more and more power quality monitoring points, the quantity of power quality monitoring data is larger and larger, and the more and more abundant monitoring data brings great convenience to equipment state monitoring, operation and maintenance, thereby causing monitoring data blowout.
The current data storage is based on text format, which is not beneficial to long-term storage and flow of data, and a large amount of monitoring data has to be abandoned or deleted regularly because of insufficient funds for purchasing enough hard disks.
In addition, due to universality of traditional winrar compression software and the like, monitoring data cannot reach a high compression ratio.
Therefore, how to increase the compression ratio of data and increase the utilization rate of the hard disk becomes a technical problem which needs to be solved urgently at present.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device, a computer readable storage medium, and a computer program product for compressing text data, which can store compressed data files, improve the utilization of storage space, have a higher compression ratio, and occupy less storage space.
A method of compressing text data, the method comprising:
acquiring text data to be compressed;
converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
converting each unit to be compressed into a first char character unit and a second char character unit;
the first char character unit is shifted to the left by a preset number of digits to obtain a third char character unit;
adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and splicing the compressed characters in sequence to obtain compressed data.
In one embodiment, the text data is character string data, and the converting the text data to be compressed into a plurality of units to be compressed with a length of two bytes includes:
converting the character string data into array data of a plurality of bytes;
traversing the array data of the bytes by taking the two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
In one embodiment, the method further comprises:
and when the number of the bits of the array data is an odd number, assigning the second byte of the last unit to be compressed as zero.
In one embodiment, the adding the third char character unit and the corresponding second char character unit to obtain the compressed character includes:
adding the third char character unit and the second char character unit to obtain an added character unit;
and converting the added character units into char character units to obtain compressed characters.
In one embodiment, the method comprises:
and carrying out secondary compression on the compressed data through a specific component to obtain a secondary compression result.
In one embodiment, the method further comprises:
and calculating and storing the Hash value of the secondary compression result.
An apparatus for compressing text data, the apparatus comprising:
the text data acquisition module is used for acquiring text data to be compressed;
the unit to be compressed converting module is used for converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
the char character unit conversion module is used for converting each unit to be compressed into a first char character unit and a second char character unit;
a left shift module, configured to shift the first char character unit by a preset number of bits to the left to obtain a third char character unit;
the adding module is used for adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and the splicing module is used for splicing the compressed characters in sequence to obtain compressed data.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring text data to be compressed;
converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
converting each unit to be compressed into a first char character unit and a second char character unit;
the first char character unit is shifted to the left by a preset number of digits to obtain a third char character unit;
adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and splicing the compressed characters in sequence to obtain compressed data.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
converting the character string data into array data of a plurality of bytes;
traversing the array data of the bytes by taking the two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and when the number of the bits of the array data is an odd number, assigning the second byte of the last unit to be compressed as zero.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
adding the third char character unit and the second char character unit to obtain an added character unit;
and converting the added character units into char character units to obtain compressed characters.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and carrying out secondary compression on the compressed data through a specific component to obtain a secondary compression result.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and calculating and storing the Hash value of the secondary compression result.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring text data to be compressed;
converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
converting each unit to be compressed into a first char character unit and a second char character unit;
the first char character unit is shifted to the left by a preset number of digits to obtain a third char character unit;
adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and splicing the compressed characters in sequence to obtain compressed data.
In one embodiment, the computer program when executed by the processor further performs the steps of:
converting the character string data into array data of a plurality of bytes;
traversing the array data of the bytes by taking the two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and when the number of the bits of the array data is an odd number, assigning the second byte of the last unit to be compressed as zero.
In one embodiment, the computer program when executed by the processor further performs the steps of:
adding the third char character unit and the second char character unit to obtain an added character unit;
and converting the added character units into char character units to obtain compressed characters.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and carrying out secondary compression on the compressed data through a specific component to obtain a secondary compression result.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and calculating and storing the Hash value of the secondary compression result.
A computer program product comprising a computer program which when executed by a processor performs the steps of:
acquiring text data to be compressed;
converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
converting each unit to be compressed into a first char character unit and a second char character unit;
the first char character unit is shifted to the left by a preset number of digits to obtain a third char character unit;
adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and splicing the compressed characters in sequence to obtain compressed data.
In one embodiment, the computer program when executed by the processor further performs the steps of:
converting the character string data into array data of a plurality of bytes;
traversing the array data of the bytes by taking the two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and when the number of the bits of the array data is an odd number, assigning the second byte of the last unit to be compressed as zero.
In one embodiment, the computer program when executed by the processor further performs the steps of:
adding the third char character unit and the second char character unit to obtain an added character unit;
and converting the added character units into char character units to obtain compressed characters.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and carrying out secondary compression on the compressed data through a specific component to obtain a secondary compression result.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and calculating and storing the Hash value of the secondary compression result.
The text data compression method, the text data compression device, the computer equipment, the storage medium and the computer program product have the advantages that compressed data files are stored, the utilization rate of storage space is improved, the compression ratio is high, the storage space is small, and the occupied bandwidth can be effectively reduced in the network transmission process.
Drawings
FIG. 1 is a flowchart illustrating a method for compressing text data according to an embodiment;
FIG. 2 is a schematic flow chart of the unit to be compressed obtaining step in one embodiment;
FIG. 3 is a flow diagram illustrating assignment steps in one embodiment;
FIG. 4 is a flow diagram illustrating the char unit addition step in one embodiment;
FIG. 5 is a schematic flow chart of the secondary compression step in one embodiment;
FIG. 6 is a block diagram showing a configuration of a compression apparatus for text data in one embodiment;
FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In an embodiment, as shown in fig. 1, a method for compressing text data is provided, and this embodiment is illustrated by applying the method to a terminal, it is to be understood that the method may also be applied to a server, and may also be applied to a system including a terminal and a server, and is implemented by interaction between the terminal and the server. In this embodiment, the method includes the following steps:
step 101, acquiring text data to be compressed;
in this embodiment, the terminal may first obtain text data to be compressed, specifically, the text data may be multiple types of data such as power quality monitoring data, which is not limited in this embodiment; further, the text data may be character string data.
The terminal can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, portable wearable devices and the like. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.
102, converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
in practical application to the embodiment, after the text data to be compressed is obtained, the text data can be converted into a unit to be compressed, the length of which is two bytes; firstly, it can be calculated how many bytes of data the text data includes, that is, how many total lengths are, and the total lengths are divided into two by two to obtain each group of units to be compressed, where the length of each group of units to be compressed is two bytes, and of course, the number of the units to be compressed may be multiple.
For example, when the total length of the text data is 26 bytes, the text data can be divided into 13 units to be compressed with a length of two bytes, and if the total length of the text data is 25 bytes, a dummy byte is added at the end and assigned as 0, and the text data is divided into 13 units to be compressed with a length of two bytes.
103, converting each unit to be compressed into a first char character unit and a second char character unit;
after obtaining a plurality of units to be compressed of two bytes, the units to be compressed of two bytes can be converted into a first char character unit and a second char character unit, and data of each byte of the units to be compressed is converted into the char character units respectively, wherein the char characters are a basic data type which can only accommodate a single character in a computer programming language.
Specifically, the two bytes of data in each unit to be compressed may be assigned to two separate byte units, and the two separate byte units may be converted into a first char character unit and a second char character unit.
104, moving the first char character unit to the left by a preset digit number to obtain a third char character unit;
in this embodiment, the first char character unit may be shifted to the left by a preset number of bits to calculate a third char character unit, and the preset number of bits may be 8 bits, which is not limited in this embodiment.
Step 105, adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
further applied to this embodiment, the third char character unit may be added to the corresponding second char character unit, and the obtained data is then forcibly converted into a char character unit, so as to obtain a compressed character.
And repeating the steps 103 to 105, converting all units to be compressed of the two bytes into char character units, and performing operations such as shifting, adding and the like to obtain corresponding compressed characters.
For example, when the total length of the text data is 26 bytes and the text data is divided into 13 units to be compressed with a length of two bytes, each unit to be compressed converts the first char character unit and the second char character unit, and then the 13 units to be compressed perform the above operations after shifting and adding, and then 13 compressed characters are obtained at the same time.
And 106, splicing the compressed characters in sequence to obtain compressed data.
And splicing all the compressed characters according to the sequence to obtain compressed data, and writing the obtained compressed result into a data file.
In the text data compression method, the compressed data file is stored, the utilization rate of the storage space is improved, the compression ratio is higher, the occupied storage space is smaller, and the occupied bandwidth can be effectively reduced in the network transmission process. And has certain data security capability under the condition that the compression method is confidential.
In an embodiment, as shown in fig. 2, it is a schematic flow chart of a step of obtaining a unit to be compressed of this embodiment, and includes:
step 201, converting the character string data into array data of a plurality of bytes;
in this embodiment, the text data is character string data, and the character string data is converted into array data of a plurality of bytes, that is, the character string data is converted into a byte array, and the array length may be n.
Step 202, traversing the array data of the plurality of bytes by taking two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
Further applied to the embodiment, the step length is set to be two bytes, and the array data of the n bytes is traversed to obtain n/2 units to be compressed with the length of two bytes.
Specifically applied to this embodiment, as shown in fig. 3, it is a schematic flow chart of an assignment step in this embodiment, and the schematic flow chart includes:
step 301, when the number of bits of the array data is an odd number, assigning a value of a second byte of the last unit to be compressed to zero.
In another case, when the number of bits of the array data is an odd number, that is, n is an odd number, a byte array of one byte may be added and assigned to zero, that is, a dummy byte is added to the last unit to be compressed and assigned to zero, so that the problem of dividing the unit to be compressed of two bytes under the odd number condition is solved, and the compression efficiency is improved.
In a specific implementation, as shown in fig. 4, it is a schematic flow chart of a char character unit adding step in this embodiment, and the schematic flow chart includes:
step 401, adding the third char character unit and the second char character unit to obtain an added character unit;
first, the third char character unit and the second char character unit may be added to obtain an added character unit, i.e., the shifted first char character unit and the second char character unit may be added to obtain an added character unit.
Step 402, converting the added character units into char character units to obtain compressed characters.
Further applied to the embodiment, the added character unit is forcibly converted into a char character unit, so as to obtain a compressed character.
In an embodiment of practical application, as shown in fig. 5, it is a schematic flow chart of a secondary compression step in this embodiment, and the schematic flow chart includes:
and step 501, performing secondary compression on the compressed data through a specific component to obtain a secondary compression result.
The specific components may include specific components such as SharpZipLib, winzip, etc. to perform secondary compression on the compressed data, and may also perform secondary compression through other specific components, and this embodiment does not impose too much limitation on the types of the specific components
In detail, sharpzip is a class library of C #, and is mainly used for decompressing formats such as Zip, GZip, BZip2, Tar, and the like, and is implemented in a manner of a managed program set, and can be conveniently applied to other projects. Winzip is a powerful and easy-to-use compression program component, and supports ZIP, CAB, TAR, GZIP, MIME, and more formats of compressed or decompressed files.
And further, performing secondary compression operation to obtain a secondary compression result, and calculating and storing a Hash value of the secondary compression result.
In another preferred embodiment, the secondary compression result can be compressed again to obtain a tertiary compression result, so that the compression ratio is further improved, less storage space is occupied, bandwidth occupation can be more effectively reduced, and transmission efficiency is improved.
In order to make the core concept of the present application better understood by those skilled in the art, the following is illustrated by a specific example:
the text data compression method comprises the following steps:
step 1, reading character string contents of a data file to be compressed;
step 2, converting the character string content in the step 1 into a byte array, wherein the length of the array is n;
step 3, traversing the byte array obtained in the step 2by taking 2 bytes as a step length to obtain n/2 units to be compressed; if n is an odd number, the second byte of the last unit to be compressed is assigned to be 0;
step 4, taking out and assigning the 1 st byte and the 2 nd byte in the to-be-compressed units of the 2 bytes obtained in the step 3 to the 1 st byte unit and the 2 nd byte unit to be calculated respectively, and forcibly converting the 1 st byte unit and the 2 nd byte unit to be calculated into the 1 st char character unit and the 2 nd char character unit to be calculated respectively;
step 5, moving the 1 st char character unit to be calculated to the left by 8 bits to obtain the 1 st calculated char character unit;
step 6, adding the calculated 1 st char character unit obtained in the step 5 and the 2 nd to-be-calculated char character unit obtained in the step four, and then forcibly converting into char to obtain a compressed character unit;
step 7, repeating the steps 3 to 6, compressing all the character string contents, and splicing the results to obtain a compressed result and write the compressed result into a data file;
step 8, compressing the data file in the step 7 again by utilizing sharpzippib, and finally generating a data file after secondary compression;
and 9, calculating the Hash of the data file after the recompression, and storing the Hash value.
In another preferred embodiment, step 8 may adopt another method such as winzip to perform secondary compression, and after step 8, the data obtained in step 8 may be compressed again, so as to further improve the compression efficiency.
The data file compression method adopted by the embodiment has a higher compression ratio, occupies less storage space, and can effectively reduce the bandwidth occupation in the network transmission process. And has certain data security capability under the condition that the compression method is confidential.
It should be understood that although the various steps in the flow charts of fig. 1-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 6, there is provided an apparatus for compressing text data, the apparatus including:
a text data obtaining module 601, configured to obtain text data to be compressed;
a to-be-compressed unit conversion module 602, configured to convert the to-be-compressed text data into a plurality of to-be-compressed units with lengths of two bytes;
a char character unit conversion module 603, configured to convert each unit to be compressed into a first char character unit and a second char character unit;
a left shift module 604, configured to shift the first char character unit by a preset number of bits to the left to obtain a third char character unit;
an adding module 605, configured to add the third char character unit and the corresponding second char character unit to obtain a compressed character;
and a splicing module 606, configured to splice the compressed characters in sequence to obtain compressed data.
In one embodiment, the text data is character string data, and the to-be-compressed unit conversion module includes:
the array data conversion submodule is used for converting the character string data into array data of a plurality of bytes;
and the unit to be compressed obtaining submodule is used for traversing the array data of the bytes by taking the two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
In one embodiment, the apparatus comprises:
and the assignment module is used for assigning the second byte of the last unit to be compressed to zero when the digit of the array data is an odd number.
In one embodiment, the adding module comprises:
the adding submodule is used for adding the third char character unit and the second char character unit to obtain an added character unit;
and the conversion submodule is used for converting the added character units into char character units to obtain compressed characters.
In one embodiment, the apparatus comprises:
and the secondary compression module is used for carrying out secondary compression on the compressed data through a specific component to obtain a secondary compression result.
In one embodiment, the apparatus comprises:
and the Hash value calculating module is used for calculating and storing the Hash value of the secondary compression result.
For the specific definition of the compression apparatus of the text data, reference may be made to the above definition of the compression method of the text data, which is not described herein again. The respective modules in the above-described apparatus for compressing text data may be entirely or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 7. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of compressing text data. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring text data to be compressed;
converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
converting each unit to be compressed into a first char character unit and a second char character unit;
the first char character unit is shifted to the left by a preset number of digits to obtain a third char character unit;
adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and splicing the compressed characters in sequence to obtain compressed data.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
converting the character string data into array data of a plurality of bytes;
traversing the array data of the bytes by taking the two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and when the number of the bits of the array data is an odd number, assigning the second byte of the last unit to be compressed as zero.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
adding the third char character unit and the second char character unit to obtain an added character unit;
and converting the added character units into char character units to obtain compressed characters.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and carrying out secondary compression on the compressed data through a specific component to obtain a secondary compression result.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and calculating and storing the Hash value of the secondary compression result.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring text data to be compressed;
converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
converting each unit to be compressed into a first char character unit and a second char character unit;
the first char character unit is shifted to the left by a preset number of digits to obtain a third char character unit;
adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and splicing the compressed characters in sequence to obtain compressed data.
In one embodiment, the computer program when executed by the processor further performs the steps of:
converting the character string data into array data of a plurality of bytes;
traversing the array data of the bytes by taking the two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and when the number of the bits of the array data is an odd number, assigning the second byte of the last unit to be compressed as zero.
In one embodiment, the computer program when executed by the processor further performs the steps of:
adding the third char character unit and the second char character unit to obtain an added character unit;
and converting the added character units into char character units to obtain compressed characters.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and carrying out secondary compression on the compressed data through a specific component to obtain a secondary compression result.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and calculating and storing the Hash value of the secondary compression result.
A computer program product comprising a computer program which when executed by a processor performs the steps of:
acquiring text data to be compressed;
converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
converting each unit to be compressed into a first char character unit and a second char character unit;
the first char character unit is shifted to the left by a preset number of digits to obtain a third char character unit;
adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and splicing the compressed characters in sequence to obtain compressed data.
In one embodiment, the computer program when executed by the processor further performs the steps of:
converting the character string data into array data of a plurality of bytes;
traversing the array data of the bytes by taking the two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and when the number of the bits of the array data is an odd number, assigning the second byte of the last unit to be compressed as zero.
In one embodiment, the computer program when executed by the processor further performs the steps of:
adding the third char character unit and the second char character unit to obtain an added character unit;
and converting the added character units into char character units to obtain compressed characters.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and carrying out secondary compression on the compressed data through a specific component to obtain a secondary compression result.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and calculating and storing the Hash value of the secondary compression result.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for compressing text data, the method comprising:
acquiring text data to be compressed;
converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
converting each unit to be compressed into a first char character unit and a second char character unit;
the first char character unit is shifted to the left by a preset number of digits to obtain a third char character unit;
adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and splicing the compressed characters in sequence to obtain compressed data.
2. The method according to claim 1, wherein the text data is character string data, and the converting the text data to be compressed into a plurality of units to be compressed with a length of two bytes comprises:
converting the character string data into array data of a plurality of bytes;
traversing the array data of the bytes by taking the two bytes as step length to obtain a plurality of units to be compressed with the length of two bytes.
3. The method of claim 2, further comprising:
and when the number of the bits of the array data is an odd number, assigning the second byte of the last unit to be compressed as zero.
4. The method as recited in claim 1, wherein adding the third char character unit to the corresponding second char character unit to obtain the compressed character comprises:
adding the third char character unit and the second char character unit to obtain an added character unit;
and converting the added character units into char character units to obtain compressed characters.
5. The method according to claim 1, characterized in that it comprises:
and carrying out secondary compression on the compressed data through a specific component to obtain a secondary compression result.
6. The method of claim 5, further comprising:
and calculating and storing the Hash value of the secondary compression result.
7. An apparatus for compressing text data, the apparatus comprising:
the text data acquisition module is used for acquiring text data to be compressed;
the unit to be compressed converting module is used for converting the text data to be compressed into a plurality of units to be compressed with the length of two bytes;
the char character unit conversion module is used for converting each unit to be compressed into a first char character unit and a second char character unit;
a left shift module, configured to shift the first char character unit by a preset number of bits to the left to obtain a third char character unit;
the adding module is used for adding the third char character unit and the corresponding second char character unit to obtain a compressed character;
and the splicing module is used for splicing the compressed characters in sequence to obtain compressed data.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 6 when executed by a processor.
CN202111271127.3A 2021-10-29 2021-10-29 Text data compression method and device, computer equipment and storage medium Pending CN114070325A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111271127.3A CN114070325A (en) 2021-10-29 2021-10-29 Text data compression method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111271127.3A CN114070325A (en) 2021-10-29 2021-10-29 Text data compression method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114070325A true CN114070325A (en) 2022-02-18

Family

ID=80236041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111271127.3A Pending CN114070325A (en) 2021-10-29 2021-10-29 Text data compression method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114070325A (en)

Similar Documents

Publication Publication Date Title
KR102069940B1 (en) Page-based compressed storage management
CN111177302B (en) Service bill processing method, device, computer equipment and storage medium
CN108628898B (en) Method, device and equipment for data storage
CN101261825B (en) A word library management method for mobile terminal system
KR102535450B1 (en) Data storage method and apparatus, and computer device and storage medium thereof
CN102841901A (en) Web page display method and device
US20180041224A1 (en) Data value suffix bit level compression
CN105631035A (en) Data storage method and device
CN109582231B (en) Data storage method and device, electronic equipment and storage medium
JP2019504426A (en) Method and apparatus for generating random character string
CN114529741A (en) Picture duplicate removal method and device and electronic equipment
US20220182072A1 (en) Data Compression Method and Apparatus, Computer-Readable Storage Medium, and Electronic Device
CN104408178A (en) Device and method for WEB control loading
CN112559462A (en) Data compression method and device, computer equipment and storage medium
CN111158606B (en) Storage method, storage device, computer equipment and storage medium
CN114070325A (en) Text data compression method and device, computer equipment and storage medium
KR102236521B1 (en) Method and apparatus for processing data
CN112905575A (en) Data acquisition method, system, storage medium and electronic equipment
Chen et al. The real-time compression layer for flash memory in mobile multimedia devices
CN117375627B (en) Lossless compression method and system for plain text format data suitable for character strings
CN109471855B (en) Ship data index establishing method, loading method, device and computer equipment
US11940998B2 (en) Database compression oriented to combinations of record fields
CN112073174B (en) Communication account decryption method, device, equipment, storage medium and information interaction system
CN113553300B (en) File processing method and device, readable medium and electronic equipment
CN117834613A (en) Data transmission method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination