WO2016029801A1 - Encoding and decoding method, encoding device and decoding device - Google Patents
Encoding and decoding method, encoding device and decoding device Download PDFInfo
- Publication number
- WO2016029801A1 WO2016029801A1 PCT/CN2015/087239 CN2015087239W WO2016029801A1 WO 2016029801 A1 WO2016029801 A1 WO 2016029801A1 CN 2015087239 W CN2015087239 W CN 2015087239W WO 2016029801 A1 WO2016029801 A1 WO 2016029801A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- encoded
- character
- ary
- encoding
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
Definitions
- Embodiments of the present invention relate to data communication technologies, and in particular, to an encoding and decoding method, and an encoding apparatus and a decoding apparatus.
- characters that can be displayed and printed in text and stored as strings.
- Character-type numbers take up more storage space than digital-type numbers, for example, a decimal number of 255. If stored as a digital number, only 1 byte is needed, because 1 byte can store 256 different states (0 ⁇ 255). If it is stored in a character type, it takes 3 bytes. At this time, only 10 states (0-9) are used for each byte, and a large number of states are wasted.
- each character-type number can store more states, thereby reducing the storage space of character-type numbers. For example, storing a 64-character-encoded character number can save about 44.6% of the storage space compared to storing a 10-digit-encoded character-type number, and storing a 64-encoded character-type number relative to a hexadecimal-encoded character type. The number can save about 1/3 of the storage space.
- the existing 64-ary encoding method uses the character set: AZ, az, 0-9, +, /, and the character set represents the ASCII code values of 0-25, 26-51, 52-61, 62 respectively. 63.
- the character "/” cannot be used as a file name under the Linux operating system and the Windows operating system, and the character “+” indicates one or more in the shell, and must be escaped to be used as an ordinary character.
- the characters "/" and “+” represent the divisor and plus sign respectively in the operator, which is ambiguous when written as an expression.
- the string "///” can be understood as a hexadecimal string using 64-ary encoding, or a decimal formula: 63/63.
- the shell here can usually be called shell, which refers to a software that can provide users with an interface.
- the embodiment of the invention provides an encoding and decoding method, an encoding device and a decoding device, so as to solve the problem that some characters of the 64-ary encoding method in the prior art cannot be used as file names in the Linux operating system and the Windows operating system, and in the operation.
- the problem of ambiguity in the formula is a problem that some characters of the 64-ary encoding method in the prior art cannot be used as file names in the Linux operating system and the Windows operating system, and in the operation.
- a first aspect of the present invention provides an encoding method, including:
- the character set used includes the following 64 characters: 0-9, @, AZ, _, and az.
- the order of the size of the characters represented by the characters in the character set and the size of the ASCII code of the US standard exchange information code of the characters be consistent.
- the order of the characters in the character set is from 0 to 9, and in sequence: 0-9, @ , AZ, _, az, wherein the number represented by the hexadecimal coded characters 0-9 is 0-9, the number represented by the hexadecimal coded character @ is 10, and the hexadecimal coded character AZ represents The number is 11-36, the number represented by the hexadecimal coded character _ is 37, and the number represented by the hexadecimal coded character az is 38-63.
- the 64-ary encoded string has an initial a character
- the start character is used to identify a starting position of the 64-ary encoded string.
- a second aspect of the present invention provides a decoding method, including:
- decoding the data to be decoded according to a 64-ary decoding rule to obtain decoded data of the data to be decoded, where the decoded data is a binary string, where a character set used by the 64-ary decoding rule includes The following 64 characters: 0-9, @, AZ, _, and az.
- a third aspect of the present invention provides an encoding apparatus, including:
- An obtaining module configured to acquire data to be encoded, where the data to be encoded is a binary string
- An encoding module configured to encode the data to be encoded according to a hexadecimal encoding rule to obtain encoded data of the data to be encoded, where the encoded data is a 64-ary encoded string, wherein the 64
- the character set used by the hexadecimal encoding rule includes the following 64 characters: 0-9, @, AZ, _, a-z.
- the order of the size of the characters represented by the characters in the character set and the order of the size of the American standard exchange information code ASCII code of the characters be consistent.
- the order of the characters in the character set is as follows: 0-9, @, AZ, _, az, wherein the number represented by the 64-ary coded characters 0-9 is 0-9, and the number represented by the 64-ary coded character @ is 10, and the code of the 64-ary coded character AZ
- the number is 11-36
- the number represented by the hexadecimal coded character _ is 37
- the number represented by the hexadecimal coded character az is 38-63.
- the 64-ary encoded string has a start a character, the start character is used to identify a starting position of the 64-ary encoded string.
- a fourth aspect of the present invention provides a decoding apparatus, including:
- An acquiring module configured to acquire data to be decoded, where the data to be decoded is a 64-ary encoded string;
- a decoding module configured to encode the data to be decoded according to a 64-ary decoding rule to obtain decoded data corresponding to the data to be decoded, where the decoded data is a binary string, where the 64-ary decoding
- the character set used by the rule includes the following 64 characters: 0-9, @, AZ, _, az.
- a fifth aspect of the present invention provides an encoding apparatus, including:
- processors a processor, a memory, and a system bus, wherein the processor and the memory are connected by the system bus and complete communication with each other;
- the memory is configured to store a computer execution instruction
- the processor configured to execute the computer to execute instructions, to cause the encoding apparatus to perform the method of any one of the first to third possible implementations of the first aspect and the first aspect of the present invention .
- a sixth aspect of the present invention provides a decoding apparatus, including:
- processors a processor, a memory, and a system bus, wherein the processor and the memory are connected by the system bus and complete communication with each other;
- the memory is configured to store a computer execution instruction
- the processor is configured to execute the computer to execute an instruction to cause the decoding device to perform the method provided by the second aspect of the present invention.
- the encoding and decoding method and the encoding device and the decoding device are provided by the embodiment of the present invention.
- the encoding device obtains data to be encoded, and the data to be encoded is a binary character string; the to-be-coded according to a 64-ary encoding rule
- the data is encoded to obtain the encoded data corresponding to the data to be encoded, and the encoded data is a 64-ary encoded string, wherein the character set used by the 64-ary encoding rule includes the following 64 characters: -9, @, AZ, _, az.
- the encoding method encodes the data to be encoded to shorten the length of the data to be encoded, thereby saving storage space.
- FIG. 1 is a flowchart of an encoding method according to Embodiment 1 of the present invention.
- FIG. 2 is a flowchart of a decoding method according to Embodiment 2 of the present invention.
- FIG. 3 is a schematic structural diagram of an encoding apparatus according to Embodiment 3 of the present invention.
- FIG. 4 is a schematic structural diagram of a decoding apparatus according to Embodiment 4 of the present invention.
- FIG. 5 is a schematic structural diagram of an encoding apparatus according to Embodiment 5 of the present invention.
- FIG. 6 is a schematic structural diagram of a decoding apparatus according to Embodiment 6 of the present invention.
- Base64 encoding is an encoding method that represents binary data based on 64 printable characters. Since the 6th power of 2 is equal to 64, each 6 bits is a unit, and the unit corresponds to one printable character.
- Base64 encoding each time three bytes of data are read for encoding, three bytes are 24 bits in total, each 6 bits is one unit, three bytes of data corresponds to 4 units, and the three-byte data is encoded by Base64. The encoded data is then obtained as 4 characters, that is, 3 bytes of data when using Base64 encoding needs to be represented by 4 printable characters.
- the printable characters in Base64 encoding include the letters AZ, az, numbers 0-9, a total of 62 characters, and the other two printable characters are different in different systems. In Base64, the other two printable characters are operands. +" and "/".
- the Base64 encoding has a wide range of application scenarios.
- the Base64 encoding application can be used to transmit long identification information in a hypertext transfer protocol (HTTP) environment.
- HTTP hypertext transfer protocol
- Base64 encoding is used to encode a long unique identifier into a 64-ary string.
- URL Universal Resource Locator
- the use of Base64 encoding not only makes the encoded data relatively short, but also makes the encoded data not directly recognized by the human eye.
- the link can be encrypted with Base64 encoding to ensure data security.
- Base64 encoding can be used to encode emails in the Multipurpose Internet Mail Extensions (MIME) format, or use Base64 encoding to extensible Markup Language (XML).
- MIME Multipurpose Internet Mail Extensions
- XML extensible Markup Language
- Base64 encoding can be used to encode data that needs to be stored as a string, such as Base64 encoding of file names or log content.
- the existing Base64 encoded character set cannot be used as a file name under the Linux operating system and the Windows operating system.
- the character "+” means one or more in the shell, and must be escaped as an ordinary character. use. Therefore, the existing Base64 coding method has certain application limitations.
- the character set used in the 64-ary encoding provided by the embodiment of the present invention is different from the existing Base64 encoded character set, and can improve the application range of the 64-ary encoding, and has better compatibility.
- the encoding method and the decoding method provided by the embodiments of the present invention are described in detail below.
- Embodiment 1 is a flowchart of an encoding method according to Embodiment 1 of the present invention.
- the method in this embodiment may be performed by an encoding device integrated in a server or a personal computer (PC), as shown in FIG.
- the method of this embodiment may include the following steps:
- Step 101 Acquire data to be encoded, where the data to be encoded is a binary string.
- Step 102 Encode the data to be encoded according to a hexadecimal encoding rule to obtain encoded data corresponding to the data to be encoded, where the encoded data is a 64-ary encoded string, where the 64-ary encoding rule is used.
- the character set includes the following 64 characters: 0-9, @, AZ, _, az.
- All data in the computer is stored in binary numbers when storing and computing.
- the ASCII code corresponding to the data is stored in the computer.
- ASCII codes use a specified combination of 7-bit or 8-bit binary numbers to represent 128 or 256 possible characters. Taking the number 4095 as an example, when the 4095 is stored as a digital number, its corresponding binary string is 111111111111, which is 12 bits in total, and requires two bytes.
- Character-type numbers are stored in binary mode and occupy a large storage space. In order to reduce the space occupied by character-type digital storage, character-type numbers need to be encoded.
- the character type number can be 64-ary coded, and then the ASCII code value corresponding to the encoded 64-character string is stored, and the encoded 64-character string occupies less storage space before encoding. Occupied storage space. For example, the decimal number 4095, the corresponding binary string is 111111111111, when encoding in 64, the binary string is divided into 6 bits, then the divided string is "111111/111111", "111111” Indicates the number 64.
- the character representing the number 64 is selected as the character after the 64-bit encoding from the character set used by the 64-encoding rule. If the character "z" is used to represent the number 64 in the 64-encoding rule, then the converted 64
- the encoded string is zz.
- the number represented by 64 characters is not limited: the characters “0-9, @, AZ, _, az” may correspond to any one of “0-63”, and each character can only uniquely correspond. a number.
- the decimal number 4095 is stored as a character type number, if it takes 2 bytes to store in binary data, if only two bytes are stored in 64-bit data, the length of the string can be shortened. , saving storage space for character numbers.
- the character set used in the 64-ary encoding rule in this embodiment is different from the character set used in the existing 64-ary encoding rule.
- the character set used in the 64-ary encoding rule of this embodiment satisfies the following three conditions: (1) can be used as a file name under linux or windows; (2) cannot be an arithmetic operation and a logical operation symbol; (3) shell, It can be used directly in regular expressions, does not conflict with existing syntax, and is compatible with code statements in the common language C/C++/Java.
- the following 64 characters are selected from all visible strings of ASCII code: 0-9, @, A-Z, _, and a-z.
- the 64 characters used by Base64 are: 0-9, az, AZ, +, /.
- the character set of the 64-ary encoding rule in this embodiment replaces the character in Base64 with the characters "@" and "_”. +” and "/".
- the characters "@" and “_” in this embodiment can be used as file names under the Linux operating system and the Windows operating system.
- the characters "@" and “_” are not arithmetic operators and logical operators, and are not ambiguous when written as an arithmetic expression, and can uniquely represent an arithmetic expression.
- the characters "@" and “_” do not conflict with existing syntax, can be used in shells, regular expressions, without conversion, and are compatible with common programming languages C / C + + and Java code statements.
- a character set using a 64-ary code can be used as a file name and is compatible with a shell.
- mkdir_aS ls
- the character set using the 64-ary code can also be directly written into the expression.
- the character of the expression is shortened from 16 characters to 11 characters.
- the 64-ary coded character corresponding to the 6-bit character string is encoded as the 6-bit character string, and the remaining bytes in the buffer are sequentially encoded until all the data to be encoded is encoded.
- the 64-ary coded string also has a start character, which is used to identify 64 The starting position of the encoded string.
- the character "_@” indicates the start character of the 64-ary encoded string, from the first character after "_@" to the first non-64-encoded character.
- the number is recognized by the decoding device as a hexadecimal number.
- %s outputs a hexadecimal number as a string.
- the data to be encoded is coded in hexadecimal to shorten the length of the data to be encoded, thereby saving storage space.
- All characters selected in the hexadecimal encoding of this embodiment can be used in the Linux operating system and The Windows operating system is used as a file name, and the characters "@" and "_" are not arithmetic operators and logical operators. They are not ambiguous when written as an arithmetic expression, and can uniquely represent an arithmetic expression.
- the characters "@" and “_” can be used directly in the shell, regular expressions, do not conflict with the existing syntax, and are compatible with code statements in the common language C / C + + / Java.
- the coding method provided by the embodiment of the present invention has a wider application range and is more compatible.
- the encoding method of this embodiment can be applied to a server, or a PC, after the encoding device of the server or the PC encodes the data to be encoded, the server or the PC can send the encoded data to other devices, for example, when When the 64-ary encoding method is used as the transmission encoding method of the e-mail, the server or the PC device transmits the encoded data to the terminal of the mail recipient. Alternatively, the server or the PC stores the encoded data. When storing the encoded data, the actual stored ASCII code value corresponding to the encoded 64-ary encoded string, for example, the encoded 64-ary code. The string is ABCD, then the ASCII code values corresponding to the hexadecimal characters "A", "B", "C", and "D" are stored separately during storage.
- the encoding method of this embodiment is applicable to a big data scenario.
- some application content such as log text and K-means may contain a large number of character numbers, and encoding is required by using a 64-ary encoding method.
- the big data scenario not only the storage space problem, but also the sorting performance of big data should be considered.
- the order of the number of digits represented by each character in the character set used by Base64 encoding is inconsistent with the order of ASCII code values.
- the ASCII code value cannot be directly used for comparison, but the 64-ary encoded string is first converted to the corresponding one. Decimal numbers, then compare the size of two decimal numbers, which affects big data sorting performance.
- Table 1 is existing The relationship between the characters of the Base64 encoding method and the ASCII code value and the number represented:
- the number represented by the character A is a decimal number 0, and the number represented by the character 0 is a decimal number 52, and therefore, the character A is smaller than the character 0.
- the ASCII code value is directly compared, the ASCII code of the character A is 65, the ASCII code of the character 0 is 48, and the ASCII code of the character A is larger than the ASCII code of the character 0.
- the result is obviously incorrect.
- the hexadecimal string needs to be first converted to the corresponding decimal number, and the decimal number is used for comparison. Since the conversion of the hexadecimal character to the decimal number is expensive, the overhead is large. Will greatly reduce the performance of big data sorting.
- the order of the numbers represented by the characters in the character set used in the 64-ary encoding rule is consistent with the order of the ASCII code values, as shown in Table 2.
- the order of the characters in the character set is as follows: 0-9, @, AZ, _, az, the number of characters represented by 0-9 in the hexadecimal coded characters is 0-9, and the number represented by the hexadecimal coded characters @ is 10
- the number represented by the 64-ary coded character AZ is 11-36
- the number represented by the hexadecimal coded character _ is 37
- the number represented by the ASCII coded character az is 38-63. It can be clearly seen from Table 2 that the numbers represented by the hexadecimal coded characters are sequentially incremented, and the corresponding ASCII code values are also sequentially incremented.
- the ASCII coded characters can be directly used. Code value comparison. For example, the hexadecimal strings "@8Fs" and "_u0I" have the same number of bits in the two strings. You can only compare the ASCII size of the first character of the two strings, the ASCII code of the character "@". 64, the ASCII code of the character "_” is 95, the ASCII code of the character "@” is less than the ASCII code of the character "_” is 95, therefore, the string “@8Fs" is smaller than the string "_u0I.
- the method first converts the character "@" to the corresponding number 10, and converts the character "_" to the corresponding number 37. Since the number 10 is smaller than the number 37, the character "@" is smaller than the character "_”, and the comparison is directly The comparison result using the ASCII code value for comparison is the same as the result of converting the ASCII code value to the corresponding number.
- the order of the numbers represented by the characters in the character set used by setting the hexadecimal encoding rule is the same as the ASCII code order. Therefore, the ASCII value of the hexadecimal string can be directly used for comparison, which reduces the size.
- the overhead of sorting data is the same as the ASCII code order.
- FIG. 2 is a flowchart of a decoding method according to Embodiment 2 of the present invention.
- the method in this embodiment may be performed by a decoding device, and the decoding device is integrated in a server or a PC.
- the method provided in this embodiment is shown in FIG. The steps can be included:
- Step 201 Acquire data to be decoded, where the data to be decoded is a 64-ary encoded string.
- Step 202 Decode the data to be decoded according to a 64-ary decoding rule to obtain decoded data corresponding to the data to be decoded, where the decoded data is a binary string, where the character set used by the 64-ary decoding rule includes The following 64 characters: 0-9, @, AZ, _, az.
- the decoding method of this embodiment is the inverse of the encoding method of the first embodiment, that is, converting a 64-ary string into a binary string. Specifically, each character of the hexadecimal string is converted into a corresponding binary string, and then the binary string corresponding to each character is sequentially connected to obtain a binary string corresponding to the hexadecimal string. For example, the hexadecimal string zz, the character z is converted to a binary string of 111111, then the binary string corresponding to the hexadecimal string zz is 11111111111, and the decimal number corresponding to the binary string is 4095.
- FIG. 3 is a schematic structural diagram of an encoding apparatus according to Embodiment 3 of the present invention. As shown in FIG. 3, the encoding apparatus of this embodiment includes: an obtaining module 11 and an encoding module 12.
- the obtaining module 11 is configured to obtain data to be encoded, where the data to be encoded is a binary string;
- the encoding module 12 is configured to encode the data to be encoded according to a hexadecimal encoding rule to obtain encoded data of the data to be encoded, where the encoded data is a 64-ary encoded string, where The character set used by the hexadecimal encoding rule includes the following 64 characters: 0-9, @, AZ, _, az.
- the order of the size of the numbers represented by the characters in the character set is consistent with the order of the size of the US standard exchange information code ASCII code of the characters.
- the order of the characters in the character set is as follows: 0-9, @, A-Z, _, Az, wherein the number represented by the hexadecimal coded characters 0-9 is 0-9, the number represented by the hexadecimal coded character @ is 10, and the number represented by the 64-ary coded character AZ is 11- 36.
- the number represented by the hexadecimal coded character _ is 37, and the number represented by the hexadecimal coded character az is 38-63.
- the 64-ary encoded character string has a start character, and the start character is used to identify a starting position of the 64-ary encoded character string.
- the encoding device of this embodiment can be used to perform the encoding method provided in the first embodiment, and the implementation principle and technical effects thereof are similar, and details are not described herein again.
- the decoding apparatus of this embodiment includes: an obtaining module 21 and a decoding module 22.
- the acquiring module is configured to obtain data to be decoded, where the data to be decoded is a 64-ary encoded string;
- a decoding module configured to encode the data to be decoded according to a 64-ary decoding rule to obtain decoded data corresponding to the data to be decoded, where the decoded data is a binary string, where the 64-ary decoding
- the character set used by the rule includes the following 64 characters: 0-9, @, AZ, _, az.
- the decoding apparatus of this embodiment may be used to perform the decoding method provided in the second embodiment, and the implementation principle and technical effects thereof are similar, and details are not described herein again.
- FIG. 5 is a schematic structural diagram of an encoding apparatus according to Embodiment 5 of the present invention.
- the encoding apparatus 300 of this embodiment includes: a processor 31, a memory 32, and a system bus 33, and the processor 31 and the The memory 32 is connected through the system bus 33 and completes communication with each other; the memory 32 is used to store computer execution instructions 321; the processor 31 is configured to run the computer execution instructions 321 to perform the following The method described:
- the character set used includes the following 64 characters: 0-9, @, AZ, _, and az.
- the order of the size of the numbers represented by the characters in the character set is consistent with the order of the size of the US standard exchange information code ASCII code of the characters.
- the order of the characters in the character set is as follows: 0-9, @, A-Z, _, Az, wherein the number represented by the hexadecimal coded characters 0-9 is 0-9, the number represented by the hexadecimal coded character @ is 10, and the number represented by the 64-ary coded character AZ is 11- 36.
- the number represented by the hexadecimal coded character _ is 37, and the number represented by the hexadecimal coded character az is 38-63.
- the 64-ary encoded character string has a start character, and the start character is used to identify a starting position of the 64-ary encoded character string.
- the encoding device of this embodiment can be used to perform the encoding method provided in the first embodiment, and the implementation principle and technical effects thereof are similar, and details are not described herein again.
- FIG. 6 is a schematic structural diagram of a decoding apparatus according to Embodiment 6 of the present invention.
- the decoding apparatus 400 of this embodiment includes: a processor 41, a memory 42 and a system bus 43, the processor 41 and the The memory 42 is connected through the system bus 43 and completes communication with each other; the memory 42 is configured to store a computer execution instruction 421; the processor 41 is configured to run the computer execution instruction 421 to execute the following The method described:
- decoding the data to be decoded according to a 64-ary decoding rule to obtain decoded data of the data to be decoded, where the decoded data is a binary string, where a character set used by the 64-ary decoding rule includes The following 64 characters: 0-9, @, AZ, _, and az.
- the decoding apparatus of this embodiment may be used to perform the decoding method provided in the second embodiment, and the implementation principle and technical effects thereof are similar, and details are not described herein again.
- the aforementioned program can be stored in a computer readable storage medium.
- the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
- each component in the above embodiment may have another division manner in actual implementation.
- multiple modules or components may be combined or integrated into another device, or some features may be omitted or not performed.
- the coupling or direct coupling or communication connection of the components shown or discussed may be through some communication interface, indirect coupling or communication connection of the modules, and may include electrical, mechanical, or other forms of connection.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Provided in an embodiment of the present invention are an encoding and decoding method, and encoding device and decoding device, the encoding method comprising: acquiring data to be encoded, the data to be encoded being a binary character string; and according to a base-64 encoding rule, encoding the data to be encoded so as to obtain encoded data corresponding to the data to be encoded, the encoded data being a base-64 encoded character string, and the base-64 encoding rule utilizing the following 64 characters: 0-9, @, A-Z, _ and a-z. The encoding method shortens the length of the data to be encoded and saves storage space. All characters in the character set employed by the encoding method in the embodiment of the present invention can be used as file names in a Linux operating system and a Windows operating system, and the characters "@" and "_" can be directly used in a shell and a regular expression, and the encoding method is compatible with a code statement in the common languages C/C++/Java; therefore, the encoding method has a better compatibility.
Description
本发明实施例涉及数据通信技术,尤其涉及一种编码、解码方法以及编码装置和解码装置。Embodiments of the present invention relate to data communication technologies, and in particular, to an encoding and decoding method, and an encoding apparatus and a decoding apparatus.
在大数据场景中,诸如日志文本、K-means等应用均包含大量的字符型数字,字符型数字是指能够以文本的方式显示和打印的字符,并且以字符串的形式存储。字符型数字比数字型数字占用更多的存储空间,例如,10进制数字255,若以数字型数字存储,只需要1个字节,因为1个字节可以存储256个不同的状态(0~255)。若以字符型数字存储,需要3个字节,此时每个字节只用了10个状态(0-9),大量的状态被浪费掉了。In big data scenarios, applications such as log text and K-means contain a large number of character numbers, which are characters that can be displayed and printed in text and stored as strings. Character-type numbers take up more storage space than digital-type numbers, for example, a decimal number of 255. If stored as a digital number, only 1 byte is needed, because 1 byte can store 256 different states (0 ~255). If it is stored in a character type, it takes 3 bytes. At this time, only 10 states (0-9) are used for each byte, and a large number of states are wasted.
为了减少字符型数字的存储空间,现有技术中,通过增大数字的进制使每一个字符型数字能够存储更多的状态,从而减少字符型数字的存储空间。例如,存储64进制编码的字符型数字相对于存储10进制编码的字符型数字大约可以节约44.6%的存储空间,存储64进制编码的字符型数字相对于存储16进制编码的字符型数字大约可以节约1/3的存储空间。现有64进制编码方法使用的字符集为:A-Z、a-z、0-9、+、/,该字符集依次代表的ASCII码的值分别为0-25、26-51、52-61、62、63。In order to reduce the storage space of character-type numbers, in the prior art, by increasing the number of digits, each character-type number can store more states, thereby reducing the storage space of character-type numbers. For example, storing a 64-character-encoded character number can save about 44.6% of the storage space compared to storing a 10-digit-encoded character-type number, and storing a 64-encoded character-type number relative to a hexadecimal-encoded character type. The number can save about 1/3 of the storage space. The existing 64-ary encoding method uses the character set: AZ, az, 0-9, +, /, and the character set represents the ASCII code values of 0-25, 26-51, 52-61, 62 respectively. 63.
但是,现有技术中,字符“/”在linux操作系统和windows操作系统下不能作为文件名使用,字符“+”在shell中表示one or more,必须经过转义才能作为普通字符使用。并且字符“/”和“+”在运算符中分别代表除号和加号,写成运算式时会产生歧义。例如,字符串“///”,可以理解为是一个使用64进制编码的64进制字符串,也可以理解为一个十进制算式:63/63。其中,这里的shell通常可以为称为壳,是指一种可以提供使用者使用界面的软件。However, in the prior art, the character "/" cannot be used as a file name under the Linux operating system and the Windows operating system, and the character "+" indicates one or more in the shell, and must be escaped to be used as an ordinary character. And the characters "/" and "+" represent the divisor and plus sign respectively in the operator, which is ambiguous when written as an expression. For example, the string "///" can be understood as a hexadecimal string using 64-ary encoding, or a decimal formula: 63/63. Among them, the shell here can usually be called shell, which refers to a software that can provide users with an interface.
发明内容
Summary of the invention
本发明实施例提供一种编码、解码方法以及编码装置和解码装置,以解决现有技术中64进制编码方法的部分字符不能在linux操作系统和windows操作系统下作为文件名使用,以及在运算式中产生歧义的问题。The embodiment of the invention provides an encoding and decoding method, an encoding device and a decoding device, so as to solve the problem that some characters of the 64-ary encoding method in the prior art cannot be used as file names in the Linux operating system and the Windows operating system, and in the operation. The problem of ambiguity in the formula.
本发明第一方面提供一种编码方法,包括:A first aspect of the present invention provides an encoding method, including:
获取待编码的数据,所述待编码的数据为二进制字符串;Obtaining data to be encoded, where the data to be encoded is a binary string;
根据64进制编码规则对所述待编码的数据进行编码以得到所述待编码的数据对应的编码数据,所述编码数据为64进制的编码字符串,其中,所述64进制编码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_和a-z。Encoding the data to be encoded according to a hexadecimal encoding rule to obtain encoded data corresponding to the data to be encoded, where the encoded data is a 64-ary encoded string, wherein the 64-ary encoding rule The character set used includes the following 64 characters: 0-9, @, AZ, _, and az.
结合本发明第一方面,在本发明第一方面的第一种可能的实现方式中,所述字符集中的字符表示的数字的大小顺序与所述字符的美国标准交换信息代码ASCII码的大小顺序保持一致。With reference to the first aspect of the present invention, in a first possible implementation manner of the first aspect of the present invention, the order of the size of the characters represented by the characters in the character set and the size of the ASCII code of the US standard exchange information code of the characters be consistent.
结合本发明第一方面的第一种可能的实现方式,在本发明第一方面的第二种可能的实现方式中,所述字符集中的字符的顺序从小到大依次为:0-9、@、A-Z、_、a-z,其中,所述64进制编码字符0-9表示的数字为0-9,所述64进制编码字符@表示的数字为10,所述64进制编码字符A-Z表示的数字为11-36,所述64进制编码字符_表示的数字为37,所述64进制编码字符a-z表示的数字为38-63。With reference to the first possible implementation manner of the first aspect of the present invention, in the second possible implementation manner of the first aspect of the present invention, the order of the characters in the character set is from 0 to 9, and in sequence: 0-9, @ , AZ, _, az, wherein the number represented by the hexadecimal coded characters 0-9 is 0-9, the number represented by the hexadecimal coded character @ is 10, and the hexadecimal coded character AZ represents The number is 11-36, the number represented by the hexadecimal coded character _ is 37, and the number represented by the hexadecimal coded character az is 38-63.
结合本发明第一方面以及第一方面的第一种和第二种可能的实现方式,在本发明第一方面的第三种可能的实现方式中,所述64进制编码字符串具有起始字符,所述起始字符用于标识所述64进制编码字符串的起始位置。In conjunction with the first aspect of the present invention and the first and second possible implementations of the first aspect, in a third possible implementation of the first aspect of the present invention, the 64-ary encoded string has an initial a character, the start character is used to identify a starting position of the 64-ary encoded string.
本发明第二方面提供一种解码方法,包括:A second aspect of the present invention provides a decoding method, including:
获取待解码的数据,所述待解码的数据为64进制的编码字符串;Obtaining data to be decoded, where the data to be decoded is a 64-ary encoded string;
根据64进制解码规则对所述待解码的数据进行解码以得到所述待解码的数据的解码数据,所述解码数据为二进制字符串,其中,所述64进制解码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_和a-z。And decoding the data to be decoded according to a 64-ary decoding rule to obtain decoded data of the data to be decoded, where the decoded data is a binary string, where a character set used by the 64-ary decoding rule includes The following 64 characters: 0-9, @, AZ, _, and az.
本发明第三方面提供一种编码装置,包括:A third aspect of the present invention provides an encoding apparatus, including:
获取模块,用于获取待编码的数据,所述待编码的数据为二进制字符串;An obtaining module, configured to acquire data to be encoded, where the data to be encoded is a binary string;
编码模块,用于根据64进制编码规则对所述待编码的数据进行编码以得到所述待编码的数据的编码数据,所述编码数据为64进制的编码字符串,其中,所述64进制编码规则使用的字符集包括如下64个字符:0-9、@、A-Z、
_、a-z。An encoding module, configured to encode the data to be encoded according to a hexadecimal encoding rule to obtain encoded data of the data to be encoded, where the encoded data is a 64-ary encoded string, wherein the 64 The character set used by the hexadecimal encoding rule includes the following 64 characters: 0-9, @, AZ,
_, a-z.
结合本发明第三方面,在本发明第三方面的第一种可能的实现方式中,所述字符集中的字符表示的数字的大小顺序与所述字符的美国标准交换信息代码ASCII码的大小顺序保持一致。With reference to the third aspect of the present invention, in a first possible implementation manner of the third aspect of the present invention, the order of the size of the characters represented by the characters in the character set and the order of the size of the American standard exchange information code ASCII code of the characters be consistent.
结合本发明第三方面的第一种可能的实现方式,在本发明第三方面的第二种可能的实现方式中,所述字符集中的字符的顺序从小达到依次为:0-9、@、A-Z、_、a-z,其中,所述64进制编码字符0-9表示的数字为0-9,所述64进制编码字符@表示的数字为10,所述64进制编码字符A-Z表示的数字为11-36,所述64进制编码字符_表示的数字为37,所述64进制编码字符a-z表示的数字为38-63。With reference to the first possible implementation manner of the third aspect of the present invention, in the second possible implementation manner of the third aspect of the present invention, the order of the characters in the character set is as follows: 0-9, @, AZ, _, az, wherein the number represented by the 64-ary coded characters 0-9 is 0-9, and the number represented by the 64-ary coded character @ is 10, and the code of the 64-ary coded character AZ The number is 11-36, the number represented by the hexadecimal coded character _ is 37, and the number represented by the hexadecimal coded character az is 38-63.
结合本发明第三方面以及第三方面的第一种和第二种可能的实现方式,在本发明第三方面的第三种可能的实现方式中,所述64进制编码字符串具有起始字符,所述起始字符用于标识所述64进制编码字符串的起始位置。In conjunction with the third aspect of the present invention and the first and second possible implementations of the third aspect, in a third possible implementation of the third aspect of the present invention, the 64-ary encoded string has a start a character, the start character is used to identify a starting position of the 64-ary encoded string.
本发明第四方面提供一种解码装置,包括:A fourth aspect of the present invention provides a decoding apparatus, including:
获取模块,用于获取待解码的数据,所述待解码的数据为64进制的编码字符串;An acquiring module, configured to acquire data to be decoded, where the data to be decoded is a 64-ary encoded string;
解码模块,用于根据64进制解码规则对所述待解码的数据进行编码以得到所述待解码的数据对应的解码数据,所述解码数据为二进制字符串,其中,所述64进制解码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_、a-z。a decoding module, configured to encode the data to be decoded according to a 64-ary decoding rule to obtain decoded data corresponding to the data to be decoded, where the decoded data is a binary string, where the 64-ary decoding The character set used by the rule includes the following 64 characters: 0-9, @, AZ, _, az.
本发明第五方面提供一种编码装置,包括:A fifth aspect of the present invention provides an encoding apparatus, including:
处理器、存储器和系统总线,所述处理器和所述存储器之间通过所述系统总线连接并完成相互间的通信;a processor, a memory, and a system bus, wherein the processor and the memory are connected by the system bus and complete communication with each other;
所述存储器,用于存储计算机执行指令;The memory is configured to store a computer execution instruction;
所述处理器,用于运行所述计算机执行指令,使所述编码装置执行如本发明第一方面以及第一方面的第一种至第三种可能的实现方式提供的任一所述的方法。The processor, configured to execute the computer to execute instructions, to cause the encoding apparatus to perform the method of any one of the first to third possible implementations of the first aspect and the first aspect of the present invention .
本发明第六方面提供一种解码装置,包括:A sixth aspect of the present invention provides a decoding apparatus, including:
处理器、存储器和系统总线,所述处理器和所述存储器之间通过所述系统总线连接并完成相互间的通信;
a processor, a memory, and a system bus, wherein the processor and the memory are connected by the system bus and complete communication with each other;
所述存储器,用于存储计算机执行指令;The memory is configured to store a computer execution instruction;
所述处理器,用于运行所述计算机执行指令,使所述解码装置执行本发明第二方面提供的方法。The processor is configured to execute the computer to execute an instruction to cause the decoding device to perform the method provided by the second aspect of the present invention.
本发明实施例提供的编码、解码方法以及编码装置和解码装置,所述编码装置通过获取待编码的数据,所述待编码的数据为二进制字符串;根据64进制编码规则对所述待编码的数据进行编码以得到所述待编码的数据对应的编码数据,所述编码数据为64进制的编码字符串,其中,所述64进制编码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_、a-z。所述编码方法通过对待编码的数据进行编码,以缩短待编码的数据的长度,从而节约存储空间,所述字符集的所有字符均能在linux和windows下作为文件名使用,并且字符“@”和“_”不是算式运算符和逻辑运算符,在写成运算式时不会产生歧义,可以唯一的表示一个运算式。另外,字符“@”和“_”可以在shell、正则表达式中直接使用,且兼容常用语言C/C++/Java中的代码语句。与现有的64进制编码方法相比,本发明实施例提供的编码方法应用范围更加广泛,具有更强的兼容性。The encoding and decoding method and the encoding device and the decoding device are provided by the embodiment of the present invention. The encoding device obtains data to be encoded, and the data to be encoded is a binary character string; the to-be-coded according to a 64-ary encoding rule The data is encoded to obtain the encoded data corresponding to the data to be encoded, and the encoded data is a 64-ary encoded string, wherein the character set used by the 64-ary encoding rule includes the following 64 characters: -9, @, AZ, _, az. The encoding method encodes the data to be encoded to shorten the length of the data to be encoded, thereby saving storage space. All characters of the character set can be used as file names under linux and windows, and the character "@" And "_" are not arithmetic operators and logical operators. They are not ambiguous when written as an expression, and can uniquely represent an expression. In addition, the characters "@" and "_" can be used directly in the shell, regular expressions, and are compatible with code statements in the common language C/C++/Java. Compared with the existing 64-ary coding method, the coding method provided by the embodiment of the present invention has a wider application scope and stronger compatibility.
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly made below.
图1为本发明实施例一提供的编码方法的流程图;1 is a flowchart of an encoding method according to Embodiment 1 of the present invention;
图2为本发明实施例二提供的解码方法的流程图;2 is a flowchart of a decoding method according to Embodiment 2 of the present invention;
图3为本发明实施例三提供的编码装置的结构示意图;3 is a schematic structural diagram of an encoding apparatus according to Embodiment 3 of the present invention;
图4为本发明实施例四提供的解码装置的结构示意图;4 is a schematic structural diagram of a decoding apparatus according to Embodiment 4 of the present invention;
图5为本发明实施例五提供的编码装置的结构示意图;FIG. 5 is a schematic structural diagram of an encoding apparatus according to Embodiment 5 of the present invention; FIG.
图6为本发明实施例六提供的解码装置的结构示意图。FIG. 6 is a schematic structural diagram of a decoding apparatus according to Embodiment 6 of the present invention.
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,
显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention.
It is apparent that the described embodiments are a part of the embodiments of the invention, and not all of the embodiments.
Base64编码是一种基于64个可打印字符来表示二进制数据的编码方法,由于2的6次方等于64,所以每6比特为一个单元,该单元对应一个可打印字符。使用Base64编码时每次读取三字节的数据进行编码,三字节共24比特,每6比特为一个单元,三字节的数据对应4个单元,将该三字节的数据采用Base64编码后得到编码数据为4个字符,即采用Base64编码时3字节的数据需要用4个可打印字符表示。在Base64编码中的可打印字符包括字母A-Z、a-z、数字0-9共62个字符,另外两个可打印字符在不同的系统中而不同,在Base64中另外两个可打印字符为运算符号“+”和“/”。Base64 encoding is an encoding method that represents binary data based on 64 printable characters. Since the 6th power of 2 is equal to 64, each 6 bits is a unit, and the unit corresponds to one printable character. When using Base64 encoding, each time three bytes of data are read for encoding, three bytes are 24 bits in total, each 6 bits is one unit, three bytes of data corresponds to 4 units, and the three-byte data is encoded by Base64. The encoded data is then obtained as 4 characters, that is, 3 bytes of data when using Base64 encoding needs to be represented by 4 printable characters. The printable characters in Base64 encoding include the letters AZ, az, numbers 0-9, a total of 62 characters, and the other two printable characters are different in different systems. In Base64, the other two printable characters are operands. +" and "/".
Base64编码有广泛的应用场景,在一种场景中,可以将Base64编码应用在超文本传输协议(Hypertext transfer protocol,简称HTTP)环境中用来传递较长的标识信息。例如,在Java持久化系统Hibernate中,就采用了Base64编码将一个较长的唯一标识符编码为一个64进制的字符串。在其他应用程序中,也常常需要把二进制数据编码为适合放在统一资源定位符(Universal Resource Locator,简称URL)中的形式。此时,采用Base64编码不仅使得编码数据比较简短,同时也使得编码数据不会被人眼直接识别。在另一种场景下,可以采用Base64编码对链接进行加密,以保障数据安全。或者,还可以使用Base64编码对多用途互联网邮件扩展(Multipurpose Internet Mail Extensions,简称MIME)格式的邮件(email)进行编码,或使用Base64编码对对可扩展标记性语言(Extensible Markup Language,简称XML)中的复杂数据进行编码并存储。在大数据环境中,可以使用Base64编码对需要以字符串形式存储的数据进行编码,例如对文件名或者日志内容等进行Base64编码。The Base64 encoding has a wide range of application scenarios. In one scenario, the Base64 encoding application can be used to transmit long identification information in a hypertext transfer protocol (HTTP) environment. For example, in the Java persistence system Hibernate, Base64 encoding is used to encode a long unique identifier into a 64-ary string. In other applications, it is often necessary to encode binary data into a form suitable for placement in a Universal Resource Locator (URL). At this time, the use of Base64 encoding not only makes the encoded data relatively short, but also makes the encoded data not directly recognized by the human eye. In another scenario, the link can be encrypted with Base64 encoding to ensure data security. Alternatively, you can use Base64 encoding to encode emails in the Multipurpose Internet Mail Extensions (MIME) format, or use Base64 encoding to extensible Markup Language (XML). The complex data in it is encoded and stored. In a big data environment, Base64 encoding can be used to encode data that needs to be stored as a string, such as Base64 encoding of file names or log content.
然而,现有的Base64编码的字符集中,字符“/”在linux操作系统和windows操作系统下不能作为文件名使用,字符“+”在shell中表示one or more,必须经过转义才能作为普通字符使用。因此现有的Base64编码方法具有一定的应用局限性。However, in the existing Base64 encoded character set, the character "/" cannot be used as a file name under the Linux operating system and the Windows operating system. The character "+" means one or more in the shell, and must be escaped as an ordinary character. use. Therefore, the existing Base64 coding method has certain application limitations.
本发明实施例提供的64进制编码采用的字符集与现有的Base64编码的字符集不同,能够提高64进制编码的应用范围,具有更好的兼容性。下面将对本发明实施例提供的编码方法、解码方法进行详细介绍。
The character set used in the 64-ary encoding provided by the embodiment of the present invention is different from the existing Base64 encoded character set, and can improve the application range of the 64-ary encoding, and has better compatibility. The encoding method and the decoding method provided by the embodiments of the present invention are described in detail below.
图1为本发明实施例一提供的编码方法的流程图,本实施例的方法可以由编码装置执行,该编码装置集成在服务器或个人计算机(Personal Computer简称PC)中,如图1所示,本实施例的方法可以包括以下步骤:1 is a flowchart of an encoding method according to Embodiment 1 of the present invention. The method in this embodiment may be performed by an encoding device integrated in a server or a personal computer (PC), as shown in FIG. The method of this embodiment may include the following steps:
步骤101、获取待编码的数据,该待编码的数据为二进制字符串。Step 101: Acquire data to be encoded, where the data to be encoded is a binary string.
步骤102、根据64进制编码规则对该待编码的数据进行编码以得到该待编码的数据对应的编码数据,该编码数据为64进制的编码字符串,其中,64进制编码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_、a-z。Step 102: Encode the data to be encoded according to a hexadecimal encoding rule to obtain encoded data corresponding to the data to be encoded, where the encoded data is a 64-ary encoded string, where the 64-ary encoding rule is used. The character set includes the following 64 characters: 0-9, @, AZ, _, az.
在计算机中所有的数据在存储和运算时都要使用二进制数表示,当数据为可显示的字符型数字时,计算机中存储的为数据对应的ASCII码。ASCII码使用指定的7位或8位二进制数组合来表示128种或256种可能的字符。以数字4095为例,当4095以数字型数字存储时,其对应的2进制字符串为111111111111,共12比特,需要两个字节。当4095以字符型数字存储时,若不对字符型数字进行编码,即直接以二进制方式存储,那么需要4个字节,即字符“4”、“0”、“9”、“5”分别占用一个字节,在计算机中存储时具体存储的是字符“4”、“0”、“9”、“5”对应的美国信息交换标准代码(American Standard Code for Information,简称ASCII)。All data in the computer is stored in binary numbers when storing and computing. When the data is a displayable character type, the ASCII code corresponding to the data is stored in the computer. ASCII codes use a specified combination of 7-bit or 8-bit binary numbers to represent 128 or 256 possible characters. Taking the number 4095 as an example, when the 4095 is stored as a digital number, its corresponding binary string is 111111111111, which is 12 bits in total, and requires two bytes. When the 4095 is stored as a character type number, if the character type number is not encoded, that is, it is directly stored in binary mode, then 4 bytes are required, that is, the characters "4", "0", "9", and "5" respectively occupy One byte, stored in the computer, is specifically stored in the American Standard Code for Information (ASCII) corresponding to the characters "4", "0", "9", and "5".
字符型数字若以二进制方式存储,占用的存储空间大,为了减少字符型数字存储占用的空间,需要对字符型数字进行编码。本实施例的方法中可以对字符型数字进行64进制编码,然后,存储编码后的64进制的字符串对应的ASCII码值,编码后的64进制字符串占用的存储空间小于编码前占用的存储空间。例如,10进制数字4095,其对应的二进制字符串为111111111111,当用64进制编码时,将二进制字符串以6位划分,那么划分后的字符串为“111111/111111”,“111111”表示数字64,从64进制编码规则使用的字符集中选择代表数字64的字符作为64进制编码后的字符,若64进制编码规则中用字符“z”表示数字64,那么转换后的64进制编码字符串为zz。本实施例中,并不对64个字符表示的数字进行限制:即字符“0-9、@、A-Z、_、a-z”可以对应“0-63”中任一个数字,每个字符只能唯一对应一个数字。通过上述例子可知,10进制数字4095以字符型数字存储时,若以二进制数据存储需要占用4个字节,若以64进制数据存储只需要两个字节,从而能够缩短字符串的长度,节约字符型数字的存储空间。
Character-type numbers are stored in binary mode and occupy a large storage space. In order to reduce the space occupied by character-type digital storage, character-type numbers need to be encoded. In the method of this embodiment, the character type number can be 64-ary coded, and then the ASCII code value corresponding to the encoded 64-character string is stored, and the encoded 64-character string occupies less storage space before encoding. Occupied storage space. For example, the decimal number 4095, the corresponding binary string is 111111111111, when encoding in 64, the binary string is divided into 6 bits, then the divided string is "111111/111111", "111111" Indicates the number 64. The character representing the number 64 is selected as the character after the 64-bit encoding from the character set used by the 64-encoding rule. If the character "z" is used to represent the number 64 in the 64-encoding rule, then the converted 64 The encoded string is zz. In this embodiment, the number represented by 64 characters is not limited: the characters “0-9, @, AZ, _, az” may correspond to any one of “0-63”, and each character can only uniquely correspond. a number. According to the above example, when the decimal number 4095 is stored as a character type number, if it takes 2 bytes to store in binary data, if only two bytes are stored in 64-bit data, the length of the string can be shortened. , saving storage space for character numbers.
本实施例中的64进制编码规则使用的字符集和现有的64进制编码规则使用的字符集不同。本实施例的64进制编码规则使用的字符集满足如下三个条件:(1)可以在linux、windows下作为文件名使用;(2)不能是算术运算和逻辑运算符号;(3)shell、正则表达式中可直接使用,不与现有语法冲突,且兼容常用语言C/C++/Java中的代码语句。The character set used in the 64-ary encoding rule in this embodiment is different from the character set used in the existing 64-ary encoding rule. The character set used in the 64-ary encoding rule of this embodiment satisfies the following three conditions: (1) can be used as a file name under linux or windows; (2) cannot be an arithmetic operation and a logical operation symbol; (3) shell, It can be used directly in regular expressions, does not conflict with existing syntax, and is compatible with code statements in the common language C/C++/Java.
根据上述三个条件从ASCII码的所有可见字符串中选取如下64个字符:0-9、@、A-Z、_和a-z。而Base64使用的64个字符为:0-9、a-z、A-Z、+、/,本实施例中的64进制编码规则的字符集用字符“@”和“_”代替了Base64中的字符“+”和“/”。本实施例中的字符“@”和“_”在linux操作系统和windows操作系统下可以作为文件名使用。并且,字符“@”和“_”不是算式运算符和逻辑运算符,在写成算数运算式时不会产生歧义,可以唯一的表示一个运算式。另外,字符“@”和“_”不与现有语法冲突,在shell、正则表达式中可以使用,而不需要进行转换,且兼容常用编程语言C/C++和Java中的代码语句。According to the above three conditions, the following 64 characters are selected from all visible strings of ASCII code: 0-9, @, A-Z, _, and a-z. The 64 characters used by Base64 are: 0-9, az, AZ, +, /. The character set of the 64-ary encoding rule in this embodiment replaces the character in Base64 with the characters "@" and "_". +" and "/". The characters "@" and "_" in this embodiment can be used as file names under the Linux operating system and the Windows operating system. Moreover, the characters "@" and "_" are not arithmetic operators and logical operators, and are not ambiguous when written as an arithmetic expression, and can uniquely represent an arithmetic expression. In addition, the characters "@" and "_" do not conflict with existing syntax, can be used in shells, regular expressions, without conversion, and are compatible with common programming languages C / C + + and Java code statements.
在本发明实施例中,采用64进制编码的字符集可以用做文件名且兼容shell。例如:mkdir_aS;ls|grep@_3;find/-name D_@。采用64进制编码的字符集也可以直接写成运算式,例如,十进制运算式为:(127+1473)/40=40,其对应的64进制运算式为:(1_+M1)/c=c,该运算式的字符由16个字符缩短为11个字符。In the embodiment of the present invention, a character set using a 64-ary code can be used as a file name and is compatible with a shell. For example: mkdir_aS; ls|grep@_3; find/-name D_@. The character set using the 64-ary code can also be directly written into the expression. For example, the decimal expression is: (127+1473)/40=40, and its corresponding 64-ary expression is: (1_+M1)/c= c, the character of the expression is shortened from 16 characters to 11 characters.
本实施例中,使用64进制编码规则进行编码时,首先将3字节的待编码的数据依次读入一个24比特的缓冲区中,先放入缓冲区中的数据占高位,当待编码的数据不足3字节时,将缓冲区中剩下的比特位用0补足。然后,按照字符串的顺序从高位到低位每次从缓冲区中读出6比特的字符串,从64进制编码规则使用的字符集中选择该6比特字符串对应的64进制编码字符,将该6比特字符串对应的64进制编码字符作为该6比特字符串编码后的字符,依次对缓冲区中剩余的字节进行编码,直到全部待编码的数据编码完成。当待编码的数据的长度不是3字节的整数倍时,若最后剩下一字节的数据,那么在编码数据后加1个“=”;若最后剩下两字节的数据,那么在编码数据后加2个“=”;若没有剩下任何数据,那么什么都不加,这样做的目的是为了保证数据解码的正确性。
In this embodiment, when encoding is performed using a hexadecimal encoding rule, first, 3 bytes of data to be encoded are sequentially read into a 24-bit buffer, and the data first put into the buffer occupies a high level, when to be encoded. When the data is less than 3 bytes, the remaining bits in the buffer are padded with 0s. Then, in the order of the string, the 6-bit string is read out from the buffer every time from the high to the low, and the 64-encoded character corresponding to the 6-bit string is selected from the character set used by the 64-encoding rule. The 64-ary coded character corresponding to the 6-bit character string is encoded as the 6-bit character string, and the remaining bytes in the buffer are sequentially encoded until all the data to be encoded is encoded. When the length of the data to be encoded is not an integer multiple of 3 bytes, if one byte of data is left last, then one "=" is added after the encoded data; if the last two bytes of data are left, then Add 2 "=" after encoding the data; if there is no data left, then nothing is added. The purpose of this is to ensure the correctness of the data decoding.
类似于8进制的起始标记“0”和16进制的起始标记“0x”,本实施例中,64进制编码字符串也具有起始字符,该起始字符用于标识64进制编码字符串的起始位置。例如,用于字符“_@”表示64进制编码字符串的起始字符,则从“_@”之后的第一个字符起,一直到第一个非64进制编码字符为止,之间的数字被解码装置认作一个64进制数。在标准输入输出格式中,类似于8进制的“%o”和16进制的“%x”,用“%_”表示输入或输出一个64进制字符串,也可以通过sprintf函数用“%s”把64进制数输出为字符串。Similar to the hexadecimal start tag “0” and the hexadecimal start tag “0x”, in this embodiment, the 64-ary coded string also has a start character, which is used to identify 64 The starting position of the encoded string. For example, the character "_@" indicates the start character of the 64-ary encoded string, from the first character after "_@" to the first non-64-encoded character. The number is recognized by the decoding device as a hexadecimal number. In the standard input and output format, similar to octal "%o" and hexadecimal "%x", use "%_" to indicate whether to input or output a hex string, or use the sprintf function. %s" outputs a hexadecimal number as a string.
本实施例,通过将待编码的数据用64进制编码,以缩短待编码的数据的长度,从而节约存储空间,本实施例的64进制编码所选用的所有字符均能在linux操作系统和windows操作系统下作为文件名使用,并且字符“@”和“_”不是算式运算符和逻辑运算符,在写成运算式时不会产生歧义,可以唯一的表示一个运算式。另外,字符“@”和“_”可以在shell、正则表达式中直接使用,不与现有语法冲突,且兼容常用语言C/C++/Java中的代码语句。与现有的64进制编码方法相比,本发明实施例提供的编码方法应用范围更加广泛,具有更强的兼容性In this embodiment, the data to be encoded is coded in hexadecimal to shorten the length of the data to be encoded, thereby saving storage space. All characters selected in the hexadecimal encoding of this embodiment can be used in the Linux operating system and The Windows operating system is used as a file name, and the characters "@" and "_" are not arithmetic operators and logical operators. They are not ambiguous when written as an arithmetic expression, and can uniquely represent an arithmetic expression. In addition, the characters "@" and "_" can be used directly in the shell, regular expressions, do not conflict with the existing syntax, and are compatible with code statements in the common language C / C + + / Java. Compared with the existing 64-ary coding method, the coding method provided by the embodiment of the present invention has a wider application range and is more compatible.
本实施例的编码方法可以应用在服务器中,或者PC机中,在服务器或PC机中的编码装置对待编码的数据编码后,服务器或PC机可以将编码数据发送给其他设备,例如,当该64进制编码方法作为电子邮件的传输编码方式时,服务器或PC机会将编码数据发送给邮件接收者的终端。或者,服务器或PC机对编码数据进行存储,在对编码数据进行存储时,实际存储的是编码后的64进制的编码字符串对应的ASCII码值,例如,编码后的64进制的编码字符串为ABCD,那么在存储时,分别存储64进制字符“A”“B”“C”“D”对应的ASCII码值。The encoding method of this embodiment can be applied to a server, or a PC, after the encoding device of the server or the PC encodes the data to be encoded, the server or the PC can send the encoded data to other devices, for example, when When the 64-ary encoding method is used as the transmission encoding method of the e-mail, the server or the PC device transmits the encoded data to the terminal of the mail recipient. Alternatively, the server or the PC stores the encoded data. When storing the encoded data, the actual stored ASCII code value corresponding to the encoded 64-ary encoded string, for example, the encoded 64-ary code. The string is ABCD, then the ASCII code values corresponding to the hexadecimal characters "A", "B", "C", and "D" are stored separately during storage.
本实施例的编码方法适用于大数据场景,在大数据场景下诸如日志文本、K-means等一些应用内容均可能包含大量的字符型数字,需要采用64进制编码方法进行编码。但是,在大数据场景下不仅要考虑存储空间的问题,而且要考虑大数据的排序性能问题。Base64编码所使用的字符集中各字符表示的数字大小顺序与ASCII码值的顺序不一致,在大数据排序时不能直接使用ASCII码值进行比较,而是先将64进制编码字符串转换为对应的十进制数字,然后比较两个十进制数字的大小,从而影响大数据排序性能。表1为现有的
Base64编码方法的字符与ASCII码值和表示的数字的关系:The encoding method of this embodiment is applicable to a big data scenario. In a big data scenario, some application content such as log text and K-means may contain a large number of character numbers, and encoding is required by using a 64-ary encoding method. However, in the big data scenario, not only the storage space problem, but also the sorting performance of big data should be considered. The order of the number of digits represented by each character in the character set used by Base64 encoding is inconsistent with the order of ASCII code values. In the case of big data sorting, the ASCII code value cannot be directly used for comparison, but the 64-ary encoded string is first converted to the corresponding one. Decimal numbers, then compare the size of two decimal numbers, which affects big data sorting performance. Table 1 is existing
The relationship between the characters of the Base64 encoding method and the ASCII code value and the number represented:
表1Table 1
64进制编码的字符Hex-encoded characters | A-ZA-Z | a-zA-z | 0-90-9 | ++ | // |
ASCII码ASCII code | 65-9065-90 | 97-12297-122 | 48-5748-57 | 4343 | 4747 |
表示的数字Represented number | 0-250-25 | 26-5126-51 | 52-6152-61 | 6262 | 6363 |
例如,当要比较字符型数字A和0的大小时,字符A表示的数字为十进制数字0,字符0表示的数字为十进制数字52,因此,字符A小于字符0。若直接用ASCII码值比较,字符A的ASCII码为65,字符0的ASCII码为48,字符A的ASCII码大于字符0的ASCII码,结果显然不正确。现有技术中两个64进制字符串比较时,需要将64进制字符串先转换为对应的十进制数字,采用十进制数字进行比较,由于64进制字符转换为十进制数字占用的开销大,因此,大大将降低了大数据排序性能。For example, when the size of the character numbers A and 0 is to be compared, the number represented by the character A is a decimal number 0, and the number represented by the character 0 is a decimal number 52, and therefore, the character A is smaller than the character 0. If the ASCII code value is directly compared, the ASCII code of the character A is 65, the ASCII code of the character 0 is 48, and the ASCII code of the character A is larger than the ASCII code of the character 0. The result is obviously incorrect. In the prior art, when comparing two hexadecimal strings, the hexadecimal string needs to be first converted to the corresponding decimal number, and the decimal number is used for comparison. Since the conversion of the hexadecimal character to the decimal number is expensive, the overhead is large. Will greatly reduce the performance of big data sorting.
针对大数据排序的性能问题,本发明实施例的编码方法中设置64进制编码规则所使用的字符集中的字符表示的数字的大小顺序与ASCII码值的顺序保持一致,如表2所示,字符集中的字符的顺序从小达到依次为:0-9、@、A-Z、_、a-z,64进制编码字符0-9表示的数字为0-9,64进制编码字符@表示的数字为10,64进制编码字符A-Z表示的数字为11-36,64进制编码字符_表示的数字为37,64进制编码字符a-z表示的数字为38-63。从表2中可以清楚的看到,64进制编码字符表示的数字依次递增,其对应的ASCII码值也是依次递增的,因此,在大数据排序时,可以直接使用64进制编码字符的ASCII码值比较。例如,64进制字符串“@8Fs”和“_u0I”,两个字符串的位数相同,可以只需要比较两个字符串的第一个字符的ASCII的大小,字符“@”的ASCII码64,字符“_”的ASCII码为95,字符“@”的ASCII码小于字符“_”的ASCII码为95,因此,字符串“@8Fs”小于字符串“_u0I。若采用现有Base64编码方法,先将字符“@”转换为对应的数字10,将字符“_”转换为对应的数字37,由于数字10小于数字37,因此,字符“@”小于字符“_”,通过比较可知直接采用ASCII码值进行比较的比较结果与将ASCII码值转换为对应数字进行比较的结果相同。For the performance problem of the big data sorting, in the encoding method of the embodiment of the present invention, the order of the numbers represented by the characters in the character set used in the 64-ary encoding rule is consistent with the order of the ASCII code values, as shown in Table 2. The order of the characters in the character set is as follows: 0-9, @, AZ, _, az, the number of characters represented by 0-9 in the hexadecimal coded characters is 0-9, and the number represented by the hexadecimal coded characters @ is 10 The number represented by the 64-ary coded character AZ is 11-36, the number represented by the hexadecimal coded character _ is 37, and the number represented by the ASCII coded character az is 38-63. It can be clearly seen from Table 2 that the numbers represented by the hexadecimal coded characters are sequentially incremented, and the corresponding ASCII code values are also sequentially incremented. Therefore, in the case of big data sorting, the ASCII coded characters can be directly used. Code value comparison. For example, the hexadecimal strings "@8Fs" and "_u0I" have the same number of bits in the two strings. You can only compare the ASCII size of the first character of the two strings, the ASCII code of the character "@". 64, the ASCII code of the character "_" is 95, the ASCII code of the character "@" is less than the ASCII code of the character "_" is 95, therefore, the string "@8Fs" is smaller than the string "_u0I. If the existing Base64 encoding is used The method first converts the character "@" to the corresponding number 10, and converts the character "_" to the corresponding number 37. Since the number 10 is smaller than the number 37, the character "@" is smaller than the character "_", and the comparison is directly The comparison result using the ASCII code value for comparison is the same as the result of converting the ASCII code value to the corresponding number.
表2
Table 2
64进制编码的字符Hex-encoded characters | 0-90-9 | @@ | A-ZA-Z | __ | a-zA-z |
ASCII码ASCII code | 48-5748-57 | 6464 | 65-9065-90 | 9595 | 4747 |
代表的值Representative value | 0-90-9 | 1010 | 11-3611-36 | 3737 | 97-12297-122 |
本实施例中,通过设置64进制编码规则使用的字符集中的字符表示的数字的大小顺序与ASCII码顺序保持一致,因此,能够直接使用64进制字符串的ASCII值进行比较,降低了大数据排序的开销。In this embodiment, the order of the numbers represented by the characters in the character set used by setting the hexadecimal encoding rule is the same as the ASCII code order. Therefore, the ASCII value of the hexadecimal string can be directly used for comparison, which reduces the size. The overhead of sorting data.
图2为本发明实施例二提供的解码方法的流程图,本实施例的方法可以由解码装置执行,该解码装置集成在服务器或PC机中,如图2所示,本实施例提供的方法可以包括以下步骤:2 is a flowchart of a decoding method according to Embodiment 2 of the present invention. The method in this embodiment may be performed by a decoding device, and the decoding device is integrated in a server or a PC. As shown in FIG. 2, the method provided in this embodiment is shown in FIG. The steps can be included:
步骤201、获取待解码的数据,该待解码的数据为64进制的编码字符串。Step 201: Acquire data to be decoded, where the data to be decoded is a 64-ary encoded string.
步骤202、根据64进制解码规则对该待解码的数据进行解码以得到该待解码的数据对应的解码数据,该解码数据为二进制字符串,其中,该64进制解码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_、a-z。Step 202: Decode the data to be decoded according to a 64-ary decoding rule to obtain decoded data corresponding to the data to be decoded, where the decoded data is a binary string, where the character set used by the 64-ary decoding rule includes The following 64 characters: 0-9, @, AZ, _, az.
本实施例的解码方法是实施例一的编码方法的逆过程,即将64进制字符串转换为二进制字符串。具体地,将64进制字符串的每一个字符转换为对应的二进制字符串,然后,将每个字符对应的二进制字符串顺序连接得到该64进制字符串对应的二进制字符串。例如,64进制字符串zz,字符z转换为二进制字符串为111111,那么64进制字符串zz对应的二进制字符串为11111111111,二进制字符串对应的十进制数字为4095。The decoding method of this embodiment is the inverse of the encoding method of the first embodiment, that is, converting a 64-ary string into a binary string. Specifically, each character of the hexadecimal string is converted into a corresponding binary string, and then the binary string corresponding to each character is sequentially connected to obtain a binary string corresponding to the hexadecimal string. For example, the hexadecimal string zz, the character z is converted to a binary string of 111111, then the binary string corresponding to the hexadecimal string zz is 11111111111, and the decimal number corresponding to the binary string is 4095.
图3为本发明实施例三提供的编码装置的结构示意图,如图3所示,本实施例的编码装置包括:获取模块11和编码模块12。FIG. 3 is a schematic structural diagram of an encoding apparatus according to Embodiment 3 of the present invention. As shown in FIG. 3, the encoding apparatus of this embodiment includes: an obtaining module 11 and an encoding module 12.
其中,获取模块11,用于获取待编码的数据,所述待编码的数据为二进制字符串;The obtaining module 11 is configured to obtain data to be encoded, where the data to be encoded is a binary string;
编码模块12,用于根据64进制编码规则对所述待编码的数据进行编码以得到所述待编码的数据的编码数据,所述编码数据为64进制的编码字符串,其中,所述64进制编码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_、a-z。The encoding module 12 is configured to encode the data to be encoded according to a hexadecimal encoding rule to obtain encoded data of the data to be encoded, where the encoded data is a 64-ary encoded string, where The character set used by the hexadecimal encoding rule includes the following 64 characters: 0-9, @, AZ, _, az.
可选地,本实施例中,所述字符集中的字符表示的数字的大小顺序与所述字符的美国标准交换信息代码ASCII码的大小顺序保持一致。Optionally, in this embodiment, the order of the size of the numbers represented by the characters in the character set is consistent with the order of the size of the US standard exchange information code ASCII code of the characters.
可选地,所述字符集中的字符的顺序从小达到依次为:0-9、@、A-Z、_、
a-z,其中,所述64进制编码字符0-9表示的数字为0-9,所述64进制编码字符@表示的数字为10,所述64进制编码字符A-Z表示的数字为11-36,所述64进制编码字符_表示的数字为37,所述64进制编码字符a-z表示的数字为38-63。Optionally, the order of the characters in the character set is as follows: 0-9, @, A-Z, _,
Az, wherein the number represented by the hexadecimal coded characters 0-9 is 0-9, the number represented by the hexadecimal coded character @ is 10, and the number represented by the 64-ary coded character AZ is 11- 36. The number represented by the hexadecimal coded character _ is 37, and the number represented by the hexadecimal coded character az is 38-63.
本实施例中,所述64进制编码字符串具有起始字符,所述起始字符用于标识所述64进制编码字符串的起始位置。In this embodiment, the 64-ary encoded character string has a start character, and the start character is used to identify a starting position of the 64-ary encoded character string.
本实施例的编码装置,可用于执行实施例一提供的编码方法,其实现原理和技术效果类似,这里不再赘述。The encoding device of this embodiment can be used to perform the encoding method provided in the first embodiment, and the implementation principle and technical effects thereof are similar, and details are not described herein again.
图4为本发明实施例四提供的解码装置的结构示意图,如图4所示,本实施例的解码装置包括:获取模块21和解码模块22。4 is a schematic structural diagram of a decoding apparatus according to Embodiment 4 of the present invention. As shown in FIG. 4, the decoding apparatus of this embodiment includes: an obtaining module 21 and a decoding module 22.
其中,获取模块,用于获取待解码的数据,所述待解码的数据为64进制的编码字符串;The acquiring module is configured to obtain data to be decoded, where the data to be decoded is a 64-ary encoded string;
解码模块,用于根据64进制解码规则对所述待解码的数据进行编码以得到所述待解码的数据对应的解码数据,所述解码数据为二进制字符串,其中,所述64进制解码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_、a-z。a decoding module, configured to encode the data to be decoded according to a 64-ary decoding rule to obtain decoded data corresponding to the data to be decoded, where the decoded data is a binary string, where the 64-ary decoding The character set used by the rule includes the following 64 characters: 0-9, @, AZ, _, az.
本实施例的解码装置,可用于执行实施例二提供的解码方法,其实现原理和技术效果类似,这里不再赘述。The decoding apparatus of this embodiment may be used to perform the decoding method provided in the second embodiment, and the implementation principle and technical effects thereof are similar, and details are not described herein again.
图5为本发明实施例五提供的编码装置的结构示意图,如图5所示,本实施例的编码装置300包括:处理器31、存储器32和系统总线33,所述处理器31和所述存储器32之间通过所述系统总线33连接并完成相互间的通信;所述存储器32,用于存储计算机执行指令321;所述处理器31,用于运行所述计算机执行指令321,以执行如下所述的方法:FIG. 5 is a schematic structural diagram of an encoding apparatus according to Embodiment 5 of the present invention. As shown in FIG. 5, the encoding apparatus 300 of this embodiment includes: a processor 31, a memory 32, and a system bus 33, and the processor 31 and the The memory 32 is connected through the system bus 33 and completes communication with each other; the memory 32 is used to store computer execution instructions 321; the processor 31 is configured to run the computer execution instructions 321 to perform the following The method described:
获取待编码的数据,所述待编码的数据为二进制字符串;Obtaining data to be encoded, where the data to be encoded is a binary string;
根据64进制编码规则对所述待编码的数据进行编码以得到所述待编码的数据对应的编码数据,所述编码数据为64进制的编码字符串,其中,所述64进制编码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_和a-z。Encoding the data to be encoded according to a hexadecimal encoding rule to obtain encoded data corresponding to the data to be encoded, where the encoded data is a 64-ary encoded string, wherein the 64-ary encoding rule The character set used includes the following 64 characters: 0-9, @, AZ, _, and az.
可选地,所述字符集中的字符表示的数字的大小顺序与所述字符的美国标准交换信息代码ASCII码的大小顺序保持一致。Optionally, the order of the size of the numbers represented by the characters in the character set is consistent with the order of the size of the US standard exchange information code ASCII code of the characters.
可选地,所述字符集中的字符的顺序从小到大依次为:0-9、@、A-Z、_、
a-z,其中,所述64进制编码字符0-9表示的数字为0-9,所述64进制编码字符@表示的数字为10,所述64进制编码字符A-Z表示的数字为11-36,所述64进制编码字符_表示的数字为37,所述64进制编码字符a-z表示的数字为38-63。Optionally, the order of the characters in the character set is as follows: 0-9, @, A-Z, _,
Az, wherein the number represented by the hexadecimal coded characters 0-9 is 0-9, the number represented by the hexadecimal coded character @ is 10, and the number represented by the 64-ary coded character AZ is 11- 36. The number represented by the hexadecimal coded character _ is 37, and the number represented by the hexadecimal coded character az is 38-63.
本实施例中,所述64进制编码字符串具有起始字符,所述起始字符用于标识所述64进制编码字符串的起始位置。In this embodiment, the 64-ary encoded character string has a start character, and the start character is used to identify a starting position of the 64-ary encoded character string.
本实施例的编码装置,可用于执行实施例一提供的编码方法,其实现原理和技术效果类似,这里不再赘述。The encoding device of this embodiment can be used to perform the encoding method provided in the first embodiment, and the implementation principle and technical effects thereof are similar, and details are not described herein again.
图6为本发明实施例六提供的解码装置的结构示意图,如图6所示,本实施例的解码装置400包括:处理器41、存储器42和系统总线43,所述处理器41和所述存储器42之间通过所述系统总线43连接并完成相互间的通信;所述存储器42,用于存储计算机执行指令421;所述处理器41,用于运行所述计算机执行指令421,以执行如下所述的方法:FIG. 6 is a schematic structural diagram of a decoding apparatus according to Embodiment 6 of the present invention. As shown in FIG. 6, the decoding apparatus 400 of this embodiment includes: a processor 41, a memory 42 and a system bus 43, the processor 41 and the The memory 42 is connected through the system bus 43 and completes communication with each other; the memory 42 is configured to store a computer execution instruction 421; the processor 41 is configured to run the computer execution instruction 421 to execute the following The method described:
获取待解码的数据,所述待解码的数据为64进制的编码字符串;Obtaining data to be decoded, where the data to be decoded is a 64-ary encoded string;
根据64进制解码规则对所述待解码的数据进行解码以得到所述待解码的数据的解码数据,所述解码数据为二进制字符串,其中,所述64进制解码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_和a-z。And decoding the data to be decoded according to a 64-ary decoding rule to obtain decoded data of the data to be decoded, where the decoded data is a binary string, where a character set used by the 64-ary decoding rule includes The following 64 characters: 0-9, @, AZ, _, and az.
本实施例的解码装置,可用于执行实施例二提供的解码方法,其实现原理和技术效果类似,这里不再赘述。The decoding apparatus of this embodiment may be used to perform the decoding method provided in the second embodiment, and the implementation principle and technical effects thereof are similar, and details are not described herein again.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。One of ordinary skill in the art will appreciate that all or part of the steps to implement the various method embodiments described above may be accomplished by hardware associated with the program instructions. The aforementioned program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
需要说明的是,本申请所提供的实施例仅仅是示意性的。例如,上述实施例中各部件的划分,实际实现时还可以有另外的划分方式。例如多个模块或组件可以结合或者可以集成到另一个设备中,或一些特征可以忽略,或不执行。另外,所显示或讨论的部件相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口、模块的间接耦合或通信连接,可以包括电性连接、机械连接或其它的连接形式。
It should be noted that the embodiments provided in the present application are merely illustrative. For example, the division of each component in the above embodiment may have another division manner in actual implementation. For example, multiple modules or components may be combined or integrated into another device, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection of the components shown or discussed may be through some communication interface, indirect coupling or communication connection of the modules, and may include electrical, mechanical, or other forms of connection.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。在本发明实施例、权利要求以及附图中揭示的特征可以独立存在也可以组合存在。在本发明实施例中以硬件形式描述的特征可以通过软件来执行,反之亦然。在此不做限定。
It will be apparent to those skilled in the art that, for the convenience and brevity of the description, in the above embodiments, the description of each embodiment has its own emphasis, and the parts which are not described in detail in a certain embodiment can be referred to other implementations. A description of the example. Features disclosed in the embodiments of the invention, the claims, and the drawings may exist independently or in combination. Features described in hardware in the embodiments of the present invention may be implemented by software, and vice versa. There is no limit here.
Claims (12)
- 一种编码方法,其特征在于,包括:An encoding method, comprising:获取待编码的数据,所述待编码的数据为二进制字符串;Obtaining data to be encoded, where the data to be encoded is a binary string;根据64进制编码规则对所述待编码的数据进行编码以得到所述待编码的数据对应的编码数据,所述编码数据为64进制的编码字符串,其中,所述64进制编码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_和a-z。Encoding the data to be encoded according to a hexadecimal encoding rule to obtain encoded data corresponding to the data to be encoded, where the encoded data is a 64-ary encoded string, wherein the 64-ary encoding rule The character set used includes the following 64 characters: 0-9, @, AZ, _, and az.
- 根据权利要求1所述的方法,其特征在于,所述字符集中的字符表示的数字的大小顺序与所述字符的美国标准交换信息代码ASCII码的大小顺序保持一致。The method according to claim 1, wherein the order of the magnitudes of the numbers represented by the characters in the character set is consistent with the order of the size of the American Standard Exchange Information Code ASCII code of the characters.
- 根据权利要求2所述的方法,其特征在于,所述字符集中的字符的顺序从小到大依次为:0-9、@、A-Z、_、a-z,其中,所述64进制编码字符0-9表示的数字为0-9,所述64进制编码字符@表示的数字为10,所述64进制编码字符A-Z表示的数字为11-36,所述64进制编码字符_表示的数字为37,所述64进制编码字符a-z表示的数字为38-63。The method according to claim 2, wherein the order of the characters in the character set is as follows: 0-9, @, AZ, _, az, wherein the 64-ary coded character 0- The number represented by 9 is 0-9, the number represented by the 64-ary coded character @ is 10, the number represented by the 64-ary coded character AZ is 11-36, and the number represented by the 64-ary coded character_ 37, the number represented by the 64-ary coded character az is 38-63.
- 根据权利要求1-3中任一项所述的方法,其特征在于,所述64进制编码字符串具有起始字符,所述起始字符用于标识所述64进制编码字符串的起始位置。The method according to any one of claims 1 to 3, wherein the 64-ary encoded character string has a start character, and the start character is used to identify the 64-character encoded string. Starting position.
- 一种解码方法,其特征在于,包括:A decoding method, comprising:获取待解码的数据,所述待解码的数据为64进制的编码字符串;Obtaining data to be decoded, where the data to be decoded is a 64-ary encoded string;根据64进制解码规则对所述待解码的数据进行解码以得到所述待解码的数据的解码数据,所述解码数据为二进制字符串,其中,所述64进制解码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_和a-z。And decoding the data to be decoded according to a 64-ary decoding rule to obtain decoded data of the data to be decoded, where the decoded data is a binary string, where a character set used by the 64-ary decoding rule includes The following 64 characters: 0-9, @, AZ, _, and az.
- 一种编码装置,其特征在于,包括:An encoding device, comprising:获取模块,用于获取待编码的数据,所述待编码的数据为二进制字符串;An obtaining module, configured to acquire data to be encoded, where the data to be encoded is a binary string;编码模块,用于根据64进制编码规则对所述待编码的数据进行编码以得到所述待编码的数据的编码数据,所述编码数据为64进制的编码字符串,其中,所述64进制编码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_、a-z。An encoding module, configured to encode the data to be encoded according to a hexadecimal encoding rule to obtain encoded data of the data to be encoded, where the encoded data is a 64-ary encoded string, wherein the 64 The character set used by the hexadecimal encoding rule includes the following 64 characters: 0-9, @, AZ, _, az.
- 根据权利要求6所述的编码装置,其特征在于,所述字符集中的字符表示的数字的大小顺序与所述字符的美国标准交换信息代码ASCII码的大小 顺序保持一致。The encoding apparatus according to claim 6, wherein the order of the size of the number represented by the character in the character set and the size of the ASCII code of the US standard exchange information code of the character The order is consistent.
- 根据权利要求7所述的编码装置,其特征在于,所述字符集中的字符的顺序从小达到依次为:0-9、@、A-Z、_、a-z,其中,所述64进制编码字符0-9表示的数字为0-9,所述64进制编码字符@表示的数字为10,所述64进制编码字符A-Z表示的数字为11-36,所述64进制编码字符_表示的数字为37,所述64进制编码字符a-z表示的数字为38-63。The encoding apparatus according to claim 7, wherein the order of the characters in the character set is as follows: 0-9, @, AZ, _, az, wherein the 64-ary coded character 0- The number represented by 9 is 0-9, the number represented by the 64-ary coded character @ is 10, the number represented by the 64-ary coded character AZ is 11-36, and the number represented by the 64-ary coded character_ 37, the number represented by the 64-ary coded character az is 38-63.
- 根据权利要求6-8中任一项所述的编码装置,其特征在于,所述64进制编码字符串具有起始字符,所述起始字符用于标识所述64进制编码字符串的起始位置。The encoding apparatus according to any one of claims 6 to 8, wherein the 64-ary encoded character string has a start character, and the start character is used to identify the 64-ary encoded character string. starting point.
- 一种解码装置,其特征在于,包括:A decoding device, comprising:获取模块,用于获取待解码的数据,所述待解码的数据为64进制的编码字符串;An acquiring module, configured to acquire data to be decoded, where the data to be decoded is a 64-ary encoded string;解码模块,用于根据64进制解码规则对所述待解码的数据进行编码以得到所述待解码的数据对应的解码数据,所述解码数据为二进制字符串,其中,所述64进制解码规则使用的字符集包括如下64个字符:0-9、@、A-Z、_、a-z。a decoding module, configured to encode the data to be decoded according to a 64-ary decoding rule to obtain decoded data corresponding to the data to be decoded, where the decoded data is a binary string, where the 64-ary decoding The character set used by the rule includes the following 64 characters: 0-9, @, AZ, _, az.
- 一种编码装置,其特征在于,包括:An encoding device, comprising:处理器、存储器和系统总线,所述处理器和所述存储器之间通过所述系统总线连接并完成相互间的通信;a processor, a memory, and a system bus, wherein the processor and the memory are connected by the system bus and complete communication with each other;所述存储器,用于存储计算机执行指令;The memory is configured to store a computer execution instruction;所述处理器,用于运行所述计算机执行指令,使所述编码装置执行如权利要求1-4任一所述的方法。The processor, configured to execute the computer to execute an instruction, to cause the encoding device to perform the method of any one of claims 1-4.
- 一种解码装置,其特征在于,包括:A decoding device, comprising:处理器、存储器和系统总线,所述处理器和所述存储器之间通过所述系统总线连接并完成相互间的通信;a processor, a memory, and a system bus, wherein the processor and the memory are connected by the system bus and complete communication with each other;所述存储器,用于存储计算机执行指令;The memory is configured to store a computer execution instruction;所述处理器,用于运行所述计算机执行指令,使所述解码装置执行如权利要求5所述的方法。 The processor is configured to execute the computer to execute an instruction, and cause the decoding device to perform the method of claim 5.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410432775.6A CN105450232A (en) | 2014-08-28 | 2014-08-28 | Encoding method, decoding method, encoding device and decoding device |
CN201410432775.6 | 2014-08-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016029801A1 true WO2016029801A1 (en) | 2016-03-03 |
Family
ID=55398744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/087239 WO2016029801A1 (en) | 2014-08-28 | 2015-08-17 | Encoding and decoding method, encoding device and decoding device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105450232A (en) |
WO (1) | WO2016029801A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106789898A (en) * | 2016-11-18 | 2017-05-31 | 杭州塔网科技有限公司 | Remote data transmission terminal, system and its coding, coding/decoding method |
CN109213973A (en) * | 2018-07-04 | 2019-01-15 | 珠海市特车网络科技有限公司 | VIN code transcoding storage method and device and corresponding read method and device |
CN109871520A (en) * | 2019-02-28 | 2019-06-11 | 魏勇 | A kind of binary data decoding method embedded suitable for HTTP content |
CN110070162A (en) * | 2019-03-11 | 2019-07-30 | 上海因致信息科技有限公司 | The coding method and system of data to be filled in bar code |
CN110503174A (en) * | 2019-08-28 | 2019-11-26 | 苏州儒拉玛特智芸科技有限公司 | A kind of label and recognition methods, device, storage medium of product information |
CN111131403A (en) * | 2019-12-06 | 2020-05-08 | 深圳猛犸电动科技有限公司 | Message coding and decoding method and device for Internet of things equipment |
CN111211887A (en) * | 2019-12-23 | 2020-05-29 | 中移(杭州)信息技术有限公司 | Resource encryption method, system, device and computer readable storage medium |
CN111222306A (en) * | 2020-01-06 | 2020-06-02 | 广州虎牙科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN112395830A (en) * | 2019-07-31 | 2021-02-23 | 腾讯科技(深圳)有限公司 | Form processing method based on Wan Guo code and related device |
CN114070470A (en) * | 2021-11-17 | 2022-02-18 | 中国银行股份有限公司 | Encoding and decoding method and device |
CN114758728A (en) * | 2022-06-15 | 2022-07-15 | 成都边界元科技有限公司 | Genotype identification and visualization method for generating minimum hamming distance under mixed system |
CN115883688A (en) * | 2022-12-16 | 2023-03-31 | 雷沃工程机械集团有限公司 | Excavator Tbox equipment data frame reporting and analyzing algorithm |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105721882B (en) * | 2016-04-18 | 2021-01-05 | 上海泥娃通信科技有限公司 | Method for separating coding and decoding |
CN105975607A (en) * | 2016-05-16 | 2016-09-28 | 乐视控股(北京)有限公司 | Picture storing and reading methods and picture storing system |
CN106445890B (en) * | 2016-07-07 | 2019-06-25 | 湖南千年华光软件开发有限公司 | Data processing method |
CN108123721B (en) * | 2016-11-29 | 2022-01-11 | 展讯通信(上海)有限公司 | Encoding method and device |
CN109446488A (en) * | 2018-08-21 | 2019-03-08 | 深圳市华力特电气有限公司 | A kind of data processing method and device |
CN109617680B (en) * | 2018-12-06 | 2021-12-03 | 中国移动通信集团福建有限公司 | Encryption method, device, equipment and medium |
CN110569487B (en) * | 2019-08-19 | 2023-07-18 | 积成电子股份有限公司 | Base64 expansion coding method and system based on high-frequency character substitution algorithm |
CN110958024B (en) * | 2019-12-13 | 2024-01-19 | 深圳市道通智能航空技术股份有限公司 | Method and device for encoding serial data, embedded equipment and communication system |
CN111428442B (en) * | 2020-03-25 | 2023-04-21 | 北京思特奇信息技术股份有限公司 | Data conversion method, system and storage medium without dictionary table |
CN111832067B (en) * | 2020-05-26 | 2021-12-17 | 华控清交信息科技(北京)有限公司 | Data processing method and device and data processing device |
CN111666737B (en) * | 2020-06-04 | 2023-04-25 | 广州博高信息科技有限公司 | Multi-coding rule compatible processing method, device, equipment and medium for regional library |
CN111988297B (en) * | 2020-08-13 | 2022-09-13 | 北京诚志重科海图科技有限公司 | Text communication secret transmission plain secret conversion system |
CN112016270B (en) * | 2020-09-08 | 2024-04-02 | 中国物品编码中心 | Logistics information coding method, device and equipment of Chinese-character codes |
CN112818639A (en) * | 2020-12-30 | 2021-05-18 | 平安普惠企业管理有限公司 | Data encoding method, data encoding device, computer equipment and storage medium |
CN113535838B (en) * | 2021-07-20 | 2024-08-20 | 大文传媒集团(山东)有限公司 | Binary coding-based data interaction method and system |
CN114615074B (en) * | 2022-03-25 | 2024-08-13 | 山石网科通信技术股份有限公司 | Network message decoding method, network attack detection method, device and storage medium |
CN115333735B (en) * | 2022-10-11 | 2023-03-14 | 浙江御安信息技术有限公司 | Safe data transmission method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1496668A1 (en) * | 2003-07-11 | 2005-01-12 | Samsung Electronics Co., Ltd. | Apparatus and method for enabling communications between terminals having different protocols |
CN101388767A (en) * | 2008-10-14 | 2009-03-18 | 苏盛辉 | Certificate false proof method based on light weight digital signature scheme |
CN101414908A (en) * | 2008-12-04 | 2009-04-22 | 苏盛辉 | Symbolism stamping method based on public key system |
CN103051480A (en) * | 2012-12-25 | 2013-04-17 | 华为技术有限公司 | DN (Domain Name) storage method and DN storage device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1937582B (en) * | 2006-08-11 | 2012-08-15 | 白杰 | Method for preprocessing data to be compressed and compressed data transmission method |
US8368567B2 (en) * | 2010-07-29 | 2013-02-05 | Sap Ag | Codepage-independent binary encoding method |
CN101976241B (en) * | 2010-09-26 | 2013-01-09 | 用友软件股份有限公司 | Method and system for generating identification code |
-
2014
- 2014-08-28 CN CN201410432775.6A patent/CN105450232A/en active Pending
-
2015
- 2015-08-17 WO PCT/CN2015/087239 patent/WO2016029801A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1496668A1 (en) * | 2003-07-11 | 2005-01-12 | Samsung Electronics Co., Ltd. | Apparatus and method for enabling communications between terminals having different protocols |
CN101388767A (en) * | 2008-10-14 | 2009-03-18 | 苏盛辉 | Certificate false proof method based on light weight digital signature scheme |
CN101414908A (en) * | 2008-12-04 | 2009-04-22 | 苏盛辉 | Symbolism stamping method based on public key system |
CN103051480A (en) * | 2012-12-25 | 2013-04-17 | 华为技术有限公司 | DN (Domain Name) storage method and DN storage device |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106789898A (en) * | 2016-11-18 | 2017-05-31 | 杭州塔网科技有限公司 | Remote data transmission terminal, system and its coding, coding/decoding method |
CN109213973A (en) * | 2018-07-04 | 2019-01-15 | 珠海市特车网络科技有限公司 | VIN code transcoding storage method and device and corresponding read method and device |
CN109213973B (en) * | 2018-07-04 | 2023-05-16 | 珠海市特车网络科技有限公司 | VIN code transcoding storage method and device and corresponding reading method and device |
CN109871520A (en) * | 2019-02-28 | 2019-06-11 | 魏勇 | A kind of binary data decoding method embedded suitable for HTTP content |
CN109871520B (en) * | 2019-02-28 | 2023-05-26 | 魏勇 | Binary data encoding and decoding method suitable for HTTP content embedding |
CN110070162A (en) * | 2019-03-11 | 2019-07-30 | 上海因致信息科技有限公司 | The coding method and system of data to be filled in bar code |
CN110070162B (en) * | 2019-03-11 | 2022-07-26 | 上海因致信息科技有限公司 | Method and system for encoding data to be filled in bar code |
CN112395830A (en) * | 2019-07-31 | 2021-02-23 | 腾讯科技(深圳)有限公司 | Form processing method based on Wan Guo code and related device |
CN110503174A (en) * | 2019-08-28 | 2019-11-26 | 苏州儒拉玛特智芸科技有限公司 | A kind of label and recognition methods, device, storage medium of product information |
CN110503174B (en) * | 2019-08-28 | 2023-08-15 | 苏州儒拉玛特智芸科技有限公司 | Product information marking and identifying method, device and storage medium |
CN111131403A (en) * | 2019-12-06 | 2020-05-08 | 深圳猛犸电动科技有限公司 | Message coding and decoding method and device for Internet of things equipment |
CN111211887B (en) * | 2019-12-23 | 2022-11-29 | 中移(杭州)信息技术有限公司 | Resource encryption method, system, device and computer readable storage medium |
CN111211887A (en) * | 2019-12-23 | 2020-05-29 | 中移(杭州)信息技术有限公司 | Resource encryption method, system, device and computer readable storage medium |
CN111222306A (en) * | 2020-01-06 | 2020-06-02 | 广州虎牙科技有限公司 | Data processing method and device, electronic equipment and storage medium |
CN114070470A (en) * | 2021-11-17 | 2022-02-18 | 中国银行股份有限公司 | Encoding and decoding method and device |
CN114758728A (en) * | 2022-06-15 | 2022-07-15 | 成都边界元科技有限公司 | Genotype identification and visualization method for generating minimum hamming distance under mixed system |
CN114758728B (en) * | 2022-06-15 | 2022-09-02 | 成都边界元科技有限公司 | Genotype identification and visualization method for generating minimum hamming distance under mixed system |
CN115883688A (en) * | 2022-12-16 | 2023-03-31 | 雷沃工程机械集团有限公司 | Excavator Tbox equipment data frame reporting and analyzing algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN105450232A (en) | 2016-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016029801A1 (en) | Encoding and decoding method, encoding device and decoding device | |
JP6931050B2 (en) | Methods and equipment for encoding and decoding binary data | |
Freed et al. | Multipurpose internet mail extensions (MIME) part one: Format of internet message bodies | |
WO2020233033A1 (en) | Information interaction method, device and storage medium | |
US8368567B2 (en) | Codepage-independent binary encoding method | |
US20110219357A1 (en) | Compressing source code written in a scripting language | |
WO2017054597A1 (en) | Processing method and device for emoji string | |
JP2015505432A (en) | Method, computer program, and apparatus for decoding a variable length encoded data stream in a data processing system | |
Bormann et al. | RFC 8949: Concise Binary Object Representation (CBOR) | |
CN108415894B (en) | Report data initialization method and device, computer equipment and storage medium | |
US20200050589A1 (en) | Performing a code conversion in a smaller target encoding space | |
US20080313291A1 (en) | Method and apparatus for encoding data | |
US20180219964A1 (en) | Method and system to convert globally unique identifiers to electronic data interchange document identifiers | |
US20160335255A1 (en) | Innovative method for text encodation in quick response code | |
US20080229289A1 (en) | Bitmapped Data String Conversions | |
US10873836B2 (en) | Efficient short message compression | |
CN108959411B (en) | Processing method, device and equipment of ETL (extract transform and load) task | |
US20230214577A1 (en) | Character string transmission method and device, computer, and readable storage medium | |
CN103631983A (en) | Method and system for simulating tactical data messages | |
US20180232387A1 (en) | Data transfer size reduction | |
CN111064560B (en) | Data encryption transmission method and device, terminal and data encryption transmission system | |
CN103914436B (en) | Code converting method and device compared with Small object space encoder is provided | |
CN115001628B (en) | Data encoding method and device, data decoding method and device and data structure | |
US20120137218A1 (en) | Method to Automatically Display Filenames Encoded in Multiple Code Sets | |
KR101428650B1 (en) | Method of encryption and method of decryption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15835844 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15835844 Country of ref document: EP Kind code of ref document: A1 |