CN110569487B - Base64 expansion coding method and system based on high-frequency character substitution algorithm - Google Patents

Base64 expansion coding method and system based on high-frequency character substitution algorithm Download PDF

Info

Publication number
CN110569487B
CN110569487B CN201910762390.9A CN201910762390A CN110569487B CN 110569487 B CN110569487 B CN 110569487B CN 201910762390 A CN201910762390 A CN 201910762390A CN 110569487 B CN110569487 B CN 110569487B
Authority
CN
China
Prior art keywords
character
frequency
high frequency
characters
bit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910762390.9A
Other languages
Chinese (zh)
Other versions
CN110569487A (en
Inventor
周秀丽
瞿晓宏
刘立元
孙发恩
孟庆媛
李玉兰
谈凤真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Integrated Electronic Systems Lab Co Ltd
Original Assignee
Integrated Electronic Systems Lab Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Integrated Electronic Systems Lab Co Ltd filed Critical Integrated Electronic Systems Lab Co Ltd
Priority to CN201910762390.9A priority Critical patent/CN110569487B/en
Publication of CN110569487A publication Critical patent/CN110569487A/en
Application granted granted Critical
Publication of CN110569487B publication Critical patent/CN110569487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a Base64 expansion coding method and system based on a high-frequency character substitution algorithm, wherein the invention adopts high-frequency substitution characters on the basis of 1/3 expansion ratio of original Base64 transcoding, if the expansion ratio is calculated according to 40% substitution rate, the expansion ratio is 20%, and is far less than 33.3%, thus greatly improving the conversion economy and effectively saving expansion space for data transcoding with large data volume; the invention changes the method of Base64 code shift transcoding, adopts D.times.64+d algorithm to convert, so that each code is irrelevant to other codes, and the phenomenon that the subsequent codes cannot be recovered due to abnormality is avoided; the conversion range of the invention covers all ASCII data fields, is suitable for binary data transcoding under various applications, and the invention only relates to a conversion algorithm, has no relation with realized languages and platforms, and has good compatibility; the invention does not need to carry out ending complement, and the algorithm is simpler.

Description

Base64 expansion coding method and system based on high-frequency character substitution algorithm
Technical Field
The invention relates to the technical field of character coding, in particular to a Base64 expansion coding method and system based on a high-frequency character substitution algorithm.
Background
It is known that any data in a computer is stored and calculated according to binary data, and in order to facilitate information exchange between different hosts on a network, ASCII encoding is uniformly adopted. Values between 0-31 and 127 of the ASCII code are invisible characters. When data is exchanged over the network, a plurality of routing devices are often used, and because different devices process characters in different ways, invisible characters can be processed erroneously, which is not beneficial to transmission.
Such problems are also encountered in everyday applications, where some application scenarios are applicable only to printable characters of ASCII code, such as conventional mail systems, and control characters of ASCII code cannot be transmitted by mail. Thus, there is a great limitation in that each byte of the picture binary stream may not be all visible characters, and thus a picture cannot be transferred.
The best way to change the above-mentioned problems is to open up a new coding scheme to support the transmission of binary files without changing the conventional protocol, and to represent invisible characters by visible characters. Base64 codes are a representation method for representing binary data based on 64 visible characters, which, although it is possible to transcode binary data into visible characters, has the following disadvantages:
1. the final coding is increased by 1/3, the larger the number of source codes is, the larger the occupied space is after conversion, and for large number of data transcoding, the low efficiency is a significant defect which cannot be ignored;
2. adopting a code-by-code displacement algorithm, and once an error occurs, causing subsequent coding restoration failure of the same unit;
3. the end complement "=", needs to be increased for the source code number not being an integer multiple of 3.
Disclosure of Invention
The invention aims to provide a Base64 expansion coding method and system based on a high-frequency character substitution algorithm, which aim to solve the problems of large occupied space and low efficiency caused by coding increment when Base64 coding is adopted in the prior art, effectively save expansion space and improve conversion efficiency.
In order to achieve the technical purpose, the invention provides a Base64 expansion coding method based on a high-frequency character substitution algorithm, which comprises the following steps:
s1, traversing binary character source codes to be converted to obtain a high-frequency character table;
s2, replacing the high-frequency character in the binary character to be converted by using a high-frequency character substitution table;
s3, expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting according to a D x 64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64.
Preferably, the specific conversion operation according to the d×64+d algorithm is as follows:
adding a character K, initializing all bits to be 0, filling the D of a first non-high frequency character into the 6 th bit and the 5 th bit of the character K, filling the D of a second non-high frequency character into the 4 th bit and the 3 rd bit of the character K, and filling the D of a third non-high frequency character into the 2 nd bit and the 1 st bit of the character K;
the first non-high frequency character is replaced with d of the character, the second non-high frequency character is replaced with d of the character, and the third non-high frequency character is replaced with d of the character.
Preferably, the specific operation of traversing the binary character source code to be converted to obtain the high-frequency character table is as follows:
the frequency of each character is 1/256, and the average number of times of each character is the total number of source codes/256;
setting the frequency of occurrence to be three times of the average frequency to be a high-frequency character;
if the number of the obtained high-frequency characters reaches the required number, stopping traversing all the source codes, otherwise traversing all the characters, and selecting the high-frequency characters according to the occurrence frequency.
Preferably, when the number of the last group of binary source codes is less than 3, the integer D of the blank character is fixed to binary 00, and no remainder bytes exist.
The invention also provides a Base64 expansion coding system based on the high-frequency character substitution algorithm, which comprises:
the high-frequency character acquisition module is used for traversing binary character source codes to be converted to acquire a high-frequency character table;
the high-frequency character replacement module is used for replacing the high-frequency characters in the binary characters to be converted by using a high-frequency character replacement table;
and the non-high frequency character expansion module is used for expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting according to a D-by-64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64.
Preferably, the non-high frequency character expansion module includes:
the expansion character filling unit is used for expanding by taking three non-high frequency characters as a group, adding one character K, initializing all bits to be 0, filling the D of the first non-high frequency character by the 6 th bit and the 5 th bit of the character K, filling the D of the second non-high frequency character by the 4 th bit and the 3 rd bit of the character K, and filling the D of the third non-high frequency character by the 2 nd bit and the 1 th bit of the character K;
an original non-high frequency character replacing unit for replacing the first non-high frequency character with d of the character, replacing the second non-high frequency character with d of the character, and replacing the third non-high frequency character with d of the character;
and the mapping unit is used for mapping the converted non-high frequency characters according to the conversion mapping table.
Preferably, the high frequency character acquisition module includes:
a high-frequency character determining unit for setting the frequency of occurrence to be three times of the average frequency to be a high-frequency character;
and the traversing unit is used for stopping traversing all the source codes if the number of the obtained high-frequency characters reaches the required number, otherwise traversing all the characters and selecting the high-frequency characters according to the occurrence frequency.
Preferably, when the number of the last group of binary source codes is less than 3, the integer D of the blank character is fixed to binary 00, and no remainder bytes exist.
The effects provided in the summary of the invention are merely effects of embodiments, not all effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
compared with the prior art, the invention adopts high-frequency replacement characters on the basis of 1/3 of the original Base64 transcoding expansion ratio, if the expansion ratio is calculated according to 40% of the replacement ratio, the expansion ratio is 20% and is far smaller than 33.3%, so that for large data amount data transcoding, the conversion economy can be greatly improved, and the expansion space can be effectively saved; the invention changes the method of Base64 code shift transcoding, adopts D.times.64+d algorithm to convert, so that each code is irrelevant to other codes, and the phenomenon that the subsequent codes cannot be recovered due to abnormality is avoided; the conversion range of the invention covers all ASCII data fields, is suitable for binary data transcoding under various applications, and the invention only relates to a conversion algorithm, has no relation with realized languages and platforms, and has good compatibility; the invention does not need to carry out ending complement, and the algorithm is simpler.
Drawings
FIG. 1 is a flowchart of a Base64 extension encoding method based on a high frequency character substitution algorithm according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a converted binary data structure according to an embodiment of the present invention;
fig. 3 is a block diagram of a Base64 extension coding system based on a high-frequency character substitution algorithm according to an embodiment of the present invention.
Detailed Description
In order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below with reference to the following detailed description and the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different structures of the invention. In order to simplify the present disclosure, components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted so as to not unnecessarily obscure the present invention.
The following describes a Base64 expansion coding method and system based on a high-frequency character substitution algorithm in detail by referring to the accompanying drawings.
As shown in fig. 1, the embodiment of the invention discloses a Base64 expansion coding method based on a high-frequency character substitution algorithm, which comprises the following steps:
s1, traversing binary character source codes to be converted to obtain a high-frequency character table;
s2, replacing the high-frequency character in the binary character to be converted by using a high-frequency character substitution table;
s3, expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting according to a D x 64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64.
The binary character source codes to be converted are traversed to generate a high-frequency character arrangement sequence, and in order to acquire a high-frequency character table as soon as possible, the following shortcut operators can be adopted:
the frequency of occurrence of each character is 1/256, so the average number of occurrences of each character is: total number of source codes/256;
setting the frequency of occurrence to be three times of the average number to be a high-frequency character;
if the number of the obtained high-frequency characters reaches the required number, stopping traversing all the source codes, otherwise traversing all the characters, and selecting the high-frequency characters according to the occurrence frequency.
For the high-frequency character, a high-frequency character substitution is set to replace the character which appears at high frequency, so that the transcoding space is saved to the greatest extent, and the substitution character can be set according to the actual situation, but the character which possibly causes ambiguity is removed, as shown in the table 1.
TABLE 1
Frequency of occurrence of characters Substitute character
Character with highest appearance frequency 0x23(#)
Character of next highest appearance frequency 0x24($)
Character of third high frequency 0x25(%)
Character of fourth high frequency 0x26(&)
Character of fifth high frequency 0x2D(-)
A sixth high frequency appearing character 0x5F(_)
Seventh high frequency appearing character 0x5B([)
Eighth high frequency appearing character 0x5C(])
Ninth high frequency appearing character 0x7B(})
Tenth high frequency appearing character 0x7D({)
And taking out one character from the binary characters to be converted, and judging whether the character is a high-frequency character or not. If the character is a high-frequency character, the character is replaced according to a high-frequency character substitution table, the non-high-frequency character is converted according to a D-by-64+d algorithm, D is an integer part of dividing the original data pair by 64, and D is a remainder part of dividing the original data pair by 64.
The method comprises the steps of taking three non-high frequency characters as a group for expansion, adding one character K, initializing all bits to 0, filling the D of a first non-high frequency character by the 6 th bit and the 5 th bit of the character K, namely, dividing the first non-high frequency character by 64, filling the D of a second non-high frequency character by the 4 th bit and the 3 rd bit of the character K, namely, dividing the second non-high frequency character by 64, and filling the D of a third non-high frequency character by the 2 nd bit and the 1 st bit of the character K, namely, dividing the third non-high frequency character by 64; the first non-high frequency character is replaced with d, i.e., the remainder of the division of the first pair of non-high frequency characters by 64, the second non-high frequency character is replaced with d, i.e., the remainder of the division of the second pair of non-high frequency characters by 64, and the third non-high frequency character is replaced with d, i.e., the remainder of the division of the third pair of non-high frequency characters by 64.
Through conversion, each three non-high frequency characters (M1, M2, M3) are converted and expanded into four characters (C1, C2, C3, C4), the C1 character is a newly added character, D of the three non-high frequency characters is stored, the C2 stores data D1 of the character M1, the C3 stores data D2 of the character M2, and the C4 stores data D3 of the character M3. Repeating the above operation until all binary source codes are converted, and when the number of the last group of binary source codes is less than 3, fixing the integer D of the vacant character to be binary 00 without remainder bytes, as shown in figure 2.
For the converted non-high frequency character, mapping is performed according to mapping table 2, and the converted non-high frequency character is converted into printable character.
TABLE 2
Index Corresponding character Index Corresponding character Index Corresponding character Index Corresponding character
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w
15 P 32 g 49 x
16 Q 33 h 50 y
According to the invention, on the basis of 1/3 of the original Base64 transcoding expansion ratio, high-frequency replacement characters are adopted, if the expansion ratio is calculated according to 40% of the replacement ratio, the expansion ratio is 20% and is far smaller than 33.3%, so that for large data amount data transcoding, the conversion economy can be greatly improved, and the expansion space is effectively saved; the invention changes the method of Base64 code shift transcoding, adopts D.times.64+d algorithm to convert, so that each code is irrelevant to other codes, and the phenomenon that the subsequent codes cannot be recovered due to abnormality is avoided; the conversion range of the invention covers all ASCII data fields, is suitable for binary data transcoding under various applications, and the invention only relates to a conversion algorithm, has no relation with realized languages and platforms, and has good compatibility; the invention does not need to carry out ending complement, and the algorithm is simpler.
As shown in fig. 3, the embodiment of the present invention further discloses a Base64 extension coding system based on a high-frequency character substitution algorithm, where the system includes:
the high-frequency character acquisition module is used for traversing binary character source codes to be converted to acquire a high-frequency character table;
the high-frequency character replacement module is used for replacing the high-frequency characters in the binary characters to be converted by using a high-frequency character replacement table;
and the non-high frequency character expansion module is used for expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting according to a D-by-64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64.
The high frequency character acquisition module includes:
a high-frequency character determining unit for setting the frequency of occurrence to be three times of the average frequency to be a high-frequency character;
and the traversing unit is used for stopping traversing all the source codes if the number of the obtained high-frequency characters reaches the required number, otherwise traversing all the characters and selecting the high-frequency characters according to the occurrence frequency.
The non-high frequency character extension module includes:
the expansion character filling unit is used for expanding by taking three non-high frequency characters as a group, adding one character K, initializing all bits to be 0, filling the D of the first non-high frequency character by the 6 th bit and the 5 th bit of the character K, filling the D of the second non-high frequency character by the 4 th bit and the 3 rd bit of the character K, and filling the D of the third non-high frequency character by the 2 nd bit and the 1 th bit of the character K;
an original non-high frequency character replacing unit for replacing the first non-high frequency character with d of the character, replacing the second non-high frequency character with d of the character, and replacing the third non-high frequency character with d of the character;
and the mapping unit is used for mapping the converted non-high frequency characters according to the conversion mapping table.
Through extension coding, each three non-high frequency characters (M1, M2 and M3) are converted and then are extended into four characters (C1, C2, C3 and C4), the C1 character is a newly added character, D of the three non-high frequency characters is stored, the C2 stores data D1 of the character M1, the C3 stores data D2 of the character M2, and the C4 stores data D3 of the character M3. Repeating the above operation until all binary source codes are converted, and when the number of the last group of binary source codes is less than 3, fixing the integer D of the vacant character to be binary 00 without remainder bytes.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (4)

1. A Base64 extension encoding method based on a high-frequency character substitution algorithm, the method comprising the steps of:
s1, traversing binary character source codes to be converted to obtain a high-frequency character table;
s2, replacing the high-frequency character in the binary character to be converted by using a high-frequency character substitution table;
s3, for non-high frequency characters, expanding by taking three non-high frequency characters as a group, converting according to a D x 64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64;
the specific conversion operation according to the d×64+d algorithm is as follows:
adding a character K, initializing all bits to be 0, filling the D of a first non-high frequency character into the 6 th bit and the 5 th bit of the character K, filling the D of a second non-high frequency character into the 4 th bit and the 3 rd bit of the character K, and filling the D of a third non-high frequency character into the 2 nd bit and the 1 st bit of the character K; the first non-high frequency character is replaced with d of the character, the second non-high frequency character is replaced with d of the character, and the third non-high frequency character is replaced with d of the character;
the high frequency character is set as follows:
the frequency of each character is 1/256, and the average number of times of each character is the total number of source codes/256; setting the frequency of occurrence to be three times of the average frequency to be the high-frequency character.
2. The Base64 extension encoding method based on the high frequency character substitution algorithm according to claim 1, wherein when the number of the last group of binary source codes is less than 3, the integer D of the blank character is fixed to binary 00, and no remainder bytes are present.
3. A Base64 extension encoding system based on a high frequency character substitution algorithm, the system comprising:
the high-frequency character acquisition module is used for traversing binary character source codes to be converted to acquire a high-frequency character table;
the high-frequency character replacement module is used for replacing the high-frequency characters in the binary characters to be converted by using a high-frequency character replacement table;
the non-high frequency character expansion module is used for expanding the non-high frequency characters by taking three non-high frequency characters as a group, converting the non-high frequency characters according to a D-64+d algorithm, and mapping the converted non-high frequency characters according to a conversion mapping table, wherein D is an integer part of dividing the character pair by 64, and D is a remainder part of dividing the character pair by 64; the non-high frequency character extension module includes:
the expansion character filling unit is used for expanding by taking three non-high frequency characters as a group, adding one character K, initializing all bits to be 0, filling the D of the first non-high frequency character by the 6 th bit and the 5 th bit of the character K, filling the D of the second non-high frequency character by the 4 th bit and the 3 rd bit of the character K, and filling the D of the third non-high frequency character by the 2 nd bit and the 1 th bit of the character K;
an original non-high frequency character replacing unit for replacing the first non-high frequency character with d of the character, replacing the second non-high frequency character with d of the character, and replacing the third non-high frequency character with d of the character;
the mapping unit is used for mapping the converted non-high frequency characters according to a conversion mapping table;
the high frequency character is set as follows:
the frequency of each character is 1/256, and the average number of times of each character is the total number of source codes/256; setting the frequency of occurrence to be three times of the average frequency to be the high-frequency character.
4. A Base64 extension coding system based on a high frequency character substitution algorithm according to claim 3, wherein when the number of the last set of binary source codes is less than 3, the integer D of the blank character is fixed to binary 00, with no remainder bytes.
CN201910762390.9A 2019-08-19 2019-08-19 Base64 expansion coding method and system based on high-frequency character substitution algorithm Active CN110569487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910762390.9A CN110569487B (en) 2019-08-19 2019-08-19 Base64 expansion coding method and system based on high-frequency character substitution algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910762390.9A CN110569487B (en) 2019-08-19 2019-08-19 Base64 expansion coding method and system based on high-frequency character substitution algorithm

Publications (2)

Publication Number Publication Date
CN110569487A CN110569487A (en) 2019-12-13
CN110569487B true CN110569487B (en) 2023-07-18

Family

ID=68775675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910762390.9A Active CN110569487B (en) 2019-08-19 2019-08-19 Base64 expansion coding method and system based on high-frequency character substitution algorithm

Country Status (1)

Country Link
CN (1) CN110569487B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987556B (en) * 2021-12-24 2022-05-10 杭州趣链科技有限公司 Data processing method and device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103684760B (en) * 2012-09-24 2018-12-07 腾讯科技(深圳)有限公司 The encryption of communication and the method, apparatus of decryption and system
CN105450232A (en) * 2014-08-28 2016-03-30 华为技术有限公司 Encoding method, decoding method, encoding device and decoding device
CN105740215A (en) * 2016-01-23 2016-07-06 北京掌阔移动传媒科技有限公司 Data communication coding and decoding method
CN107919943B (en) * 2016-10-11 2020-08-04 阿里巴巴集团控股有限公司 Method and device for coding and decoding binary data

Also Published As

Publication number Publication date
CN110569487A (en) 2019-12-13

Similar Documents

Publication Publication Date Title
CN101841515B (en) Target variable protocol data unit codec code automatic generation implementation method
CN109902274B (en) Method and system for converting json character string into thraft binary stream
US20110219357A1 (en) Compressing source code written in a scripting language
US8224980B2 (en) Adaptive parsing and compression of SOAP messages
CN106407201A (en) Data processing method and apparatus
CN101794318A (en) URL (Uniform Resource Location) analyzing method and equipment
CN104378117A (en) Data compression method and device and data transmission method and system
CN110569487B (en) Base64 expansion coding method and system based on high-frequency character substitution algorithm
CN105450712A (en) Data transmission method and device
CN106776639A (en) Data processing method and data processing equipment based on SQL
Ramprasad et al. Information-theoretic bounds on average signal transition activity [VLSI systems]
CN103036641A (en) Method and system of data exchange and deserialization method
CN101489128A (en) JPEG2000 pipeline arithmetic encoding method and circuit
CN114492316A (en) Data exchange coding and decoding method and system
CN104065460A (en) Encoding method and device based on binary tree
CN103051480B (en) The storage means of a kind of DN and DN storage device
CN1243431C (en) Analysis of universal route platform command lines
CN103532758B (en) Be applicable to the configuration processing method of transmission of future generation, data equipment fusion
CN111522899B (en) Parallel compression method and device for high compression ratio of three-dimensional vector data
CN115167869A (en) Method, electronic device and medium for serialization and deserialization of Java object
US7930435B2 (en) Hub and spoke compression
CN113076107A (en) Method for automatically acquiring and fusing logs through finite state machine
CN114095036B (en) Code length generating device for dynamic Huffman coding
CN111858103A (en) Method for realizing compatibility of high-version and low-version communication of modules in software system
CN105183750B (en) Close-coupled XML resolution system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant