CN110569487B - Base64 expansion coding method and system based on high-frequency character substitution algorithm - Google Patents
Base64 expansion coding method and system based on high-frequency character substitution algorithm Download PDFInfo
- Publication number
- CN110569487B CN110569487B CN201910762390.9A CN201910762390A CN110569487B CN 110569487 B CN110569487 B CN 110569487B CN 201910762390 A CN201910762390 A CN 201910762390A CN 110569487 B CN110569487 B CN 110569487B
- Authority
- CN
- China
- Prior art keywords
- character
- frequency
- high frequency
- characters
- bit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention provides a Base64 expansion coding method and system based on a high-frequency character substitution algorithm, wherein the invention adopts high-frequency substitution characters on the basis of 1/3 expansion ratio of original Base64 transcoding, if the expansion ratio is calculated according to 40% substitution rate, the expansion ratio is 20%, and is far less than 33.3%, thus greatly improving the conversion economy and effectively saving expansion space for data transcoding with large data volume; the invention changes the method of Base64 code shift transcoding, adopts D.times.64+d algorithm to convert, so that each code is irrelevant to other codes, and the phenomenon that the subsequent codes cannot be recovered due to abnormality is avoided; the conversion range of the invention covers all ASCII data fields, is suitable for binary data transcoding under various applications, and the invention only relates to a conversion algorithm, has no relation with realized languages and platforms, and has good compatibility; the invention does not need to carry out ending complement, and the algorithm is simpler.
Description
Technical Field
The invention relates to the technical field of character coding, in particular to a Base64 expansion coding method and system based on a high-frequency character substitution algorithm.
Background
It is known that any data in a computer is stored and calculated according to binary data, and in order to facilitate information exchange between different hosts on a network, ASCII encoding is uniformly adopted. Values between 0-31 and 127 of the ASCII code are invisible characters. When data is exchanged over the network, a plurality of routing devices are often used, and because different devices process characters in different ways, invisible characters can be processed erroneously, which is not beneficial to transmission.
Such problems are also encountered in everyday applications, where some application scenarios are applicable only to printable characters of ASCII code, such as conventional mail systems, and control characters of ASCII code cannot be transmitted by mail. Thus, there is a great limitation in that each byte of the picture binary stream may not be all visible characters, and thus a picture cannot be transferred.
The best way to change the above-mentioned problems is to open up a new coding scheme to support the transmission of binary files without changing the conventional protocol, and to represent invisible characters by visible characters. Base64 codes are a representation method for representing binary data based on 64 visible characters, which, although it is possible to transcode binary data into visible characters, has the following disadvantages:
1. the final coding is increased by 1/3, the larger the number of source codes is, the larger the occupied space is after conversion, and for large number of data transcoding, the low efficiency is a significant defect which cannot be ignored;
2. adopting a code-by-code displacement algorithm, and once an error occurs, causing subsequent coding restoration failure of the same unit;
3. the end complement "=", needs to be increased for the source code number not being an integer multiple of 3.
Disclosure of Invention
The invention aims to provide a Base64 expansion coding method and system based on a high-frequency character substitution algorithm, which aim to solve the problems of large occupied space and low efficiency caused by coding increment when Base64 coding is adopted in the prior art, effectively save expansion space and improve conversion efficiency.
In order to achieve the technical purpose, the invention provides a Base64 expansion coding method based on a high-frequency character substitution algorithm, which comprises the following steps:
s1, traversing binary character source codes to be converted to obtain a high-frequency character table;
s2, replacing the high-frequency character in the binary character to be converted by using a high-frequency character substitution table;
s3, expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting according to a D x 64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64.
Preferably, the specific conversion operation according to the d×64+d algorithm is as follows:
adding a character K, initializing all bits to be 0, filling the D of a first non-high frequency character into the 6 th bit and the 5 th bit of the character K, filling the D of a second non-high frequency character into the 4 th bit and the 3 rd bit of the character K, and filling the D of a third non-high frequency character into the 2 nd bit and the 1 st bit of the character K;
the first non-high frequency character is replaced with d of the character, the second non-high frequency character is replaced with d of the character, and the third non-high frequency character is replaced with d of the character.
Preferably, the specific operation of traversing the binary character source code to be converted to obtain the high-frequency character table is as follows:
the frequency of each character is 1/256, and the average number of times of each character is the total number of source codes/256;
setting the frequency of occurrence to be three times of the average frequency to be a high-frequency character;
if the number of the obtained high-frequency characters reaches the required number, stopping traversing all the source codes, otherwise traversing all the characters, and selecting the high-frequency characters according to the occurrence frequency.
Preferably, when the number of the last group of binary source codes is less than 3, the integer D of the blank character is fixed to binary 00, and no remainder bytes exist.
The invention also provides a Base64 expansion coding system based on the high-frequency character substitution algorithm, which comprises:
the high-frequency character acquisition module is used for traversing binary character source codes to be converted to acquire a high-frequency character table;
the high-frequency character replacement module is used for replacing the high-frequency characters in the binary characters to be converted by using a high-frequency character replacement table;
and the non-high frequency character expansion module is used for expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting according to a D-by-64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64.
Preferably, the non-high frequency character expansion module includes:
the expansion character filling unit is used for expanding by taking three non-high frequency characters as a group, adding one character K, initializing all bits to be 0, filling the D of the first non-high frequency character by the 6 th bit and the 5 th bit of the character K, filling the D of the second non-high frequency character by the 4 th bit and the 3 rd bit of the character K, and filling the D of the third non-high frequency character by the 2 nd bit and the 1 th bit of the character K;
an original non-high frequency character replacing unit for replacing the first non-high frequency character with d of the character, replacing the second non-high frequency character with d of the character, and replacing the third non-high frequency character with d of the character;
and the mapping unit is used for mapping the converted non-high frequency characters according to the conversion mapping table.
Preferably, the high frequency character acquisition module includes:
a high-frequency character determining unit for setting the frequency of occurrence to be three times of the average frequency to be a high-frequency character;
and the traversing unit is used for stopping traversing all the source codes if the number of the obtained high-frequency characters reaches the required number, otherwise traversing all the characters and selecting the high-frequency characters according to the occurrence frequency.
Preferably, when the number of the last group of binary source codes is less than 3, the integer D of the blank character is fixed to binary 00, and no remainder bytes exist.
The effects provided in the summary of the invention are merely effects of embodiments, not all effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
compared with the prior art, the invention adopts high-frequency replacement characters on the basis of 1/3 of the original Base64 transcoding expansion ratio, if the expansion ratio is calculated according to 40% of the replacement ratio, the expansion ratio is 20% and is far smaller than 33.3%, so that for large data amount data transcoding, the conversion economy can be greatly improved, and the expansion space can be effectively saved; the invention changes the method of Base64 code shift transcoding, adopts D.times.64+d algorithm to convert, so that each code is irrelevant to other codes, and the phenomenon that the subsequent codes cannot be recovered due to abnormality is avoided; the conversion range of the invention covers all ASCII data fields, is suitable for binary data transcoding under various applications, and the invention only relates to a conversion algorithm, has no relation with realized languages and platforms, and has good compatibility; the invention does not need to carry out ending complement, and the algorithm is simpler.
Drawings
FIG. 1 is a flowchart of a Base64 extension encoding method based on a high frequency character substitution algorithm according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a converted binary data structure according to an embodiment of the present invention;
fig. 3 is a block diagram of a Base64 extension coding system based on a high-frequency character substitution algorithm according to an embodiment of the present invention.
Detailed Description
In order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below with reference to the following detailed description and the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different structures of the invention. In order to simplify the present disclosure, components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted so as to not unnecessarily obscure the present invention.
The following describes a Base64 expansion coding method and system based on a high-frequency character substitution algorithm in detail by referring to the accompanying drawings.
As shown in fig. 1, the embodiment of the invention discloses a Base64 expansion coding method based on a high-frequency character substitution algorithm, which comprises the following steps:
s1, traversing binary character source codes to be converted to obtain a high-frequency character table;
s2, replacing the high-frequency character in the binary character to be converted by using a high-frequency character substitution table;
s3, expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting according to a D x 64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64.
The binary character source codes to be converted are traversed to generate a high-frequency character arrangement sequence, and in order to acquire a high-frequency character table as soon as possible, the following shortcut operators can be adopted:
the frequency of occurrence of each character is 1/256, so the average number of occurrences of each character is: total number of source codes/256;
setting the frequency of occurrence to be three times of the average number to be a high-frequency character;
if the number of the obtained high-frequency characters reaches the required number, stopping traversing all the source codes, otherwise traversing all the characters, and selecting the high-frequency characters according to the occurrence frequency.
For the high-frequency character, a high-frequency character substitution is set to replace the character which appears at high frequency, so that the transcoding space is saved to the greatest extent, and the substitution character can be set according to the actual situation, but the character which possibly causes ambiguity is removed, as shown in the table 1.
TABLE 1
Frequency of occurrence of characters | Substitute character |
Character with highest appearance frequency | 0x23(#) |
Character of next highest appearance frequency | 0x24($) |
Character of third high frequency | 0x25(%) |
Character of fourth high frequency | 0x26(&) |
Character of fifth high frequency | 0x2D(-) |
A sixth high frequency appearing character | 0x5F(_) |
Seventh high frequency appearing character | 0x5B([) |
Eighth high frequency appearing character | 0x5C(]) |
Ninth high frequency appearing character | 0x7B(}) |
Tenth high frequency appearing character | 0x7D({) |
And taking out one character from the binary characters to be converted, and judging whether the character is a high-frequency character or not. If the character is a high-frequency character, the character is replaced according to a high-frequency character substitution table, the non-high-frequency character is converted according to a D-by-64+d algorithm, D is an integer part of dividing the original data pair by 64, and D is a remainder part of dividing the original data pair by 64.
The method comprises the steps of taking three non-high frequency characters as a group for expansion, adding one character K, initializing all bits to 0, filling the D of a first non-high frequency character by the 6 th bit and the 5 th bit of the character K, namely, dividing the first non-high frequency character by 64, filling the D of a second non-high frequency character by the 4 th bit and the 3 rd bit of the character K, namely, dividing the second non-high frequency character by 64, and filling the D of a third non-high frequency character by the 2 nd bit and the 1 st bit of the character K, namely, dividing the third non-high frequency character by 64; the first non-high frequency character is replaced with d, i.e., the remainder of the division of the first pair of non-high frequency characters by 64, the second non-high frequency character is replaced with d, i.e., the remainder of the division of the second pair of non-high frequency characters by 64, and the third non-high frequency character is replaced with d, i.e., the remainder of the division of the third pair of non-high frequency characters by 64.
Through conversion, each three non-high frequency characters (M1, M2, M3) are converted and expanded into four characters (C1, C2, C3, C4), the C1 character is a newly added character, D of the three non-high frequency characters is stored, the C2 stores data D1 of the character M1, the C3 stores data D2 of the character M2, and the C4 stores data D3 of the character M3. Repeating the above operation until all binary source codes are converted, and when the number of the last group of binary source codes is less than 3, fixing the integer D of the vacant character to be binary 00 without remainder bytes, as shown in figure 2.
For the converted non-high frequency character, mapping is performed according to mapping table 2, and the converted non-high frequency character is converted into printable character.
TABLE 2
Index | Corresponding character | Index | Corresponding character | Index | Corresponding character | Index | Corresponding character |
0 | A | 17 | R | 34 | i | 51 | z |
1 | B | 18 | S | 35 | j | 52 | 0 |
2 | C | 19 | T | 36 | k | 53 | 1 |
3 | D | 20 | U | 37 | l | 54 | 2 |
4 | E | 21 | V | 38 | m | 55 | 3 |
5 | F | 22 | W | 39 | n | 56 | 4 |
6 | G | 23 | X | 40 | o | 57 | 5 |
7 | H | 24 | Y | 41 | p | 58 | 6 |
8 | I | 25 | Z | 42 | q | 59 | 7 |
9 | J | 26 | a | 43 | r | 60 | 8 |
10 | K | 27 | b | 44 | s | 61 | 9 |
11 | L | 28 | c | 45 | t | 62 | + |
12 | M | 29 | d | 46 | u | 63 | / |
13 | N | 30 | e | 47 | v | ||
14 | O | 31 | f | 48 | w | ||
15 | P | 32 | g | 49 | x | ||
16 | Q | 33 | h | 50 | y |
According to the invention, on the basis of 1/3 of the original Base64 transcoding expansion ratio, high-frequency replacement characters are adopted, if the expansion ratio is calculated according to 40% of the replacement ratio, the expansion ratio is 20% and is far smaller than 33.3%, so that for large data amount data transcoding, the conversion economy can be greatly improved, and the expansion space is effectively saved; the invention changes the method of Base64 code shift transcoding, adopts D.times.64+d algorithm to convert, so that each code is irrelevant to other codes, and the phenomenon that the subsequent codes cannot be recovered due to abnormality is avoided; the conversion range of the invention covers all ASCII data fields, is suitable for binary data transcoding under various applications, and the invention only relates to a conversion algorithm, has no relation with realized languages and platforms, and has good compatibility; the invention does not need to carry out ending complement, and the algorithm is simpler.
As shown in fig. 3, the embodiment of the present invention further discloses a Base64 extension coding system based on a high-frequency character substitution algorithm, where the system includes:
the high-frequency character acquisition module is used for traversing binary character source codes to be converted to acquire a high-frequency character table;
the high-frequency character replacement module is used for replacing the high-frequency characters in the binary characters to be converted by using a high-frequency character replacement table;
and the non-high frequency character expansion module is used for expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting according to a D-by-64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64.
The high frequency character acquisition module includes:
a high-frequency character determining unit for setting the frequency of occurrence to be three times of the average frequency to be a high-frequency character;
and the traversing unit is used for stopping traversing all the source codes if the number of the obtained high-frequency characters reaches the required number, otherwise traversing all the characters and selecting the high-frequency characters according to the occurrence frequency.
The non-high frequency character extension module includes:
the expansion character filling unit is used for expanding by taking three non-high frequency characters as a group, adding one character K, initializing all bits to be 0, filling the D of the first non-high frequency character by the 6 th bit and the 5 th bit of the character K, filling the D of the second non-high frequency character by the 4 th bit and the 3 rd bit of the character K, and filling the D of the third non-high frequency character by the 2 nd bit and the 1 th bit of the character K;
an original non-high frequency character replacing unit for replacing the first non-high frequency character with d of the character, replacing the second non-high frequency character with d of the character, and replacing the third non-high frequency character with d of the character;
and the mapping unit is used for mapping the converted non-high frequency characters according to the conversion mapping table.
Through extension coding, each three non-high frequency characters (M1, M2 and M3) are converted and then are extended into four characters (C1, C2, C3 and C4), the C1 character is a newly added character, D of the three non-high frequency characters is stored, the C2 stores data D1 of the character M1, the C3 stores data D2 of the character M2, and the C4 stores data D3 of the character M3. Repeating the above operation until all binary source codes are converted, and when the number of the last group of binary source codes is less than 3, fixing the integer D of the vacant character to be binary 00 without remainder bytes.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
Claims (4)
1. A Base64 extension encoding method based on a high-frequency character substitution algorithm, the method comprising the steps of:
s1, traversing binary character source codes to be converted to obtain a high-frequency character table;
s2, replacing the high-frequency character in the binary character to be converted by using a high-frequency character substitution table;
s3, for non-high frequency characters, expanding by taking three non-high frequency characters as a group, converting according to a D x 64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64;
the specific conversion operation according to the d×64+d algorithm is as follows:
adding a character K, initializing all bits to be 0, filling the D of a first non-high frequency character into the 6 th bit and the 5 th bit of the character K, filling the D of a second non-high frequency character into the 4 th bit and the 3 rd bit of the character K, and filling the D of a third non-high frequency character into the 2 nd bit and the 1 st bit of the character K; the first non-high frequency character is replaced with d of the character, the second non-high frequency character is replaced with d of the character, and the third non-high frequency character is replaced with d of the character;
the high frequency character is set as follows:
the frequency of each character is 1/256, and the average number of times of each character is the total number of source codes/256; setting the frequency of occurrence to be three times of the average frequency to be the high-frequency character.
2. The Base64 extension encoding method based on the high frequency character substitution algorithm according to claim 1, wherein when the number of the last group of binary source codes is less than 3, the integer D of the blank character is fixed to binary 00, and no remainder bytes are present.
3. A Base64 extension encoding system based on a high frequency character substitution algorithm, the system comprising:
the high-frequency character acquisition module is used for traversing binary character source codes to be converted to acquire a high-frequency character table;
the high-frequency character replacement module is used for replacing the high-frequency characters in the binary characters to be converted by using a high-frequency character replacement table;
the non-high frequency character expansion module is used for expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting the non-high frequency characters according to a D-64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64; the non-high frequency character extension module includes:
the expansion character filling unit is used for expanding by taking three non-high frequency characters as a group, adding one character K, initializing all bits to be 0, filling the D of the first non-high frequency character by the 6 th bit and the 5 th bit of the character K, filling the D of the second non-high frequency character by the 4 th bit and the 3 rd bit of the character K, and filling the D of the third non-high frequency character by the 2 nd bit and the 1 th bit of the character K;
an original non-high frequency character replacing unit for replacing the first non-high frequency character with d of the character, replacing the second non-high frequency character with d of the character, and replacing the third non-high frequency character with d of the character;
the mapping unit is used for mapping the converted non-high frequency characters according to a conversion mapping table;
the high frequency character is set as follows:
the frequency of each character is 1/256, and the average number of times of each character is the total number of source codes/256; setting the frequency of occurrence to be three times of the average frequency to be the high-frequency character.
4. A Base64 extension coding system based on a high frequency character substitution algorithm according to claim 3, wherein when the number of the last set of binary source codes is less than 3, the integer D of the blank character is fixed to binary 00, with no remainder bytes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910762390.9A CN110569487B (en) | 2019-08-19 | 2019-08-19 | Base64 expansion coding method and system based on high-frequency character substitution algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910762390.9A CN110569487B (en) | 2019-08-19 | 2019-08-19 | Base64 expansion coding method and system based on high-frequency character substitution algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110569487A CN110569487A (en) | 2019-12-13 |
CN110569487B true CN110569487B (en) | 2023-07-18 |
Family
ID=68775675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910762390.9A Active CN110569487B (en) | 2019-08-19 | 2019-08-19 | Base64 expansion coding method and system based on high-frequency character substitution algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110569487B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113987556B (en) * | 2021-12-24 | 2022-05-10 | 杭州趣链科技有限公司 | Data processing method and device, electronic equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103684760B (en) * | 2012-09-24 | 2018-12-07 | 腾讯科技(深圳)有限公司 | The encryption of communication and the method, apparatus of decryption and system |
CN105450232A (en) * | 2014-08-28 | 2016-03-30 | 华为技术有限公司 | Encoding method, decoding method, encoding device and decoding device |
CN105740215A (en) * | 2016-01-23 | 2016-07-06 | 北京掌阔移动传媒科技有限公司 | Data communication coding and decoding method |
CN107919943B (en) * | 2016-10-11 | 2020-08-04 | 阿里巴巴集团控股有限公司 | Method and device for coding and decoding binary data |
-
2019
- 2019-08-19 CN CN201910762390.9A patent/CN110569487B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110569487A (en) | 2019-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100525450C (en) | Method and device for realizing Hoffman decodeng | |
JPH0830432A (en) | Data compressing method and data restoring method, and data compressing device and data restoring device | |
US9529932B2 (en) | XML node labeling and querying using logical operators | |
CN111629081A (en) | Internet protocol IP address data processing method and device and electronic equipment | |
CN101557399A (en) | Method for compression and decompression of XMPP protocol transmission data | |
CN101794318A (en) | URL (Uniform Resource Location) analyzing method and equipment | |
CN104378117A (en) | Data compression method and device and data transmission method and system | |
CN110569487B (en) | Base64 expansion coding method and system based on high-frequency character substitution algorithm | |
CN105450712A (en) | Data transmission method and device | |
CN110264361A (en) | A kind of data analysis method and device of block chain | |
CN109446202A (en) | Identifier allocation method, device, server and storage medium | |
Ramprasad et al. | Information-theoretic bounds on average signal transition activity [VLSI systems] | |
CN103036641A (en) | Method and system of data exchange and deserialization method | |
CN101489128A (en) | JPEG2000 pipeline arithmetic encoding method and circuit | |
CN111355820B (en) | Data transmission method, terminal and electronic equipment | |
CN114492316A (en) | Data exchange coding and decoding method and system | |
CN104065460A (en) | Encoding method and device based on binary tree | |
CN103051480B (en) | The storage means of a kind of DN and DN storage device | |
CN1243431C (en) | Analysis of universal route platform command lines | |
CN115604365B (en) | Data encoding and decoding method and device, electronic equipment and readable storage medium | |
CN103092607A (en) | Encoding and decoding method of telecommunications call ticket | |
CN111522899B (en) | Parallel compression method and device for high compression ratio of three-dimensional vector data | |
CN111858103B (en) | Compatible realization method for communication of high and low versions of modules in software system | |
US7930435B2 (en) | Hub and spoke compression | |
CN114095036B (en) | Code length generating device for dynamic Huffman coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |