CN116827348A

CN116827348A - Data encoding and decoding method and device

Info

Publication number: CN116827348A
Application number: CN202210285796.4A
Authority: CN
Inventors: 林宪正; 黄羽亮; 韩永祥; 孙杰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2023-09-29

Abstract

The embodiment of the application provides a data encoding and decoding method and device. In the method, the computing device can encode each character in the target data into a coding sequence and store the coding sequence into a coding queue, wherein the value of the coding sequence corresponding to any character is positioned in a sub-probability interval corresponding to the character. In the data coding mode, the value of the coding sequence corresponding to each character is positioned in the sub-probability interval corresponding to the character, and the coding sequences corresponding to different characters have no correlation, so that when the coding sequences in the coding queue are decoded later, a plurality of coding sequences can be simultaneously decoded in parallel to realize parallelization decoding. In summary, the data encoding method provided by the embodiment of the application is used as an entropy encoding method, and can support parallelization decoding without affecting the compression rate, thereby improving the decoding efficiency.

Description

Data encoding and decoding method and device

Technical Field

The present application relates to the field of coding and decoding technologies, and in particular, to a data coding and decoding method and device.

Background

In the big data age, mass data can be utilized by many fields and emerging technologies, and as the data volume grows year by year, the increasingly huge data brings great challenges to data storage. For example, in artificial intelligence (artificial intelligence, AI) technology, Fifth generation (the 5) ^th generation, 5G) mobile communication system, internet of things (internet of things, ioT) all use massive data.

Efficient, fast compression algorithms are key technologies to reduce storage costs, while entropy coding is a common coding technique in high compression rate compression algorithms. Currently, the lower redundancy of coding in entropy coding mainly includes arithmetic coding and asymmetric digital systems (asymmetric numeral system, ANS).

However, arithmetic coding requires the computing device to maintain an encoding interval implementation, and similarly, ANS requires the computing device to maintain an encoding state parameter (state) implementation. However, the coding interval and the coding state parameter have a strong dependency on the coding sequence, and if the characters in the target data are coded in series during the arithmetic coding and the ANS coding, serial processing is also required during the decoding process, which affects the decoding efficiency of the computing device.

Currently, in order to achieve parallel decoding, only the target data is usually diced, and then each data block is encoded and decoded in parallel. However, this approach can result in loss of compression ratio.

Disclosure of Invention

The application provides a data encoding and decoding method and device, which are used for realizing parallelization decoding under the condition of not influencing the compression rate.

In a first aspect, an embodiment of the present application provides a data encoding method, which is described below by taking a computing device as an execution body as an example. The method comprises the following steps:

the method comprises the steps that computing equipment obtains target data to be encoded and a sub-probability interval corresponding to each character in the target data in a set probability interval; the computing device performs the following encoding processing on the character to be encoded in the target data: determining a section starting value and a section length of a sub-probability section corresponding to the character to be coded; generating a coding sequence according to the interval starting value, the interval length and the coding state parameter; the value of the coding sequence is positioned in a sub-probability interval corresponding to the character to be coded, and the coding state parameter is predetermined; storing the coding sequence into a coding queue; and updating the coding state parameters according to the interval length.

In the method, the computing device can encode each character in the target data into a coding sequence and store the coding sequence into a coding queue, wherein the value of the coding sequence corresponding to any character is positioned in a sub-probability interval corresponding to the character. In the data coding mode, the value of the coding sequence corresponding to each character is positioned in the sub-probability interval corresponding to the character, and the coding sequences corresponding to different characters have no correlation, so that when the coding sequences in the coding queue are decoded later, a plurality of coding sequences can be simultaneously decoded in parallel to realize parallelization decoding. In addition, the computing device can decode the coding sequence at any position in the coding queue to realize random access. In summary, the data encoding method provided by the embodiment of the application is used as an entropy encoding method, and can support parallelization decoding without affecting the compression rate, thereby improving the decoding efficiency.

In one possible design, the probability interval is (0, 2 ⁿ ]N is an integer greater than 1; the code sequence is an n-bit sequence.

In one possible design, in order to enable the coding queue to represent the coding order of the characters (the order in which each coding sequence is obtained by coding), the computing device can determine the position of the character corresponding to each coding sequence in the target data so as to restore the original target data, so that the computing device can save the coding sequence in the coding queue according to the set order. Alternatively, the computing device may save the encoded sequence to the end of the encoding queue. Thus, the code sequence at the front of the code queue is encoded first, and the code sequence at the end of the code queue is encoded last. Based on this, in the subsequent decoding process, the computing device may sequentially perform decoding processing on the code sequences in the code queue in order from the back to the front of the code queue.

In one possible design, the computing device may generate the encoded sequence by:

when the coding state parameter is greater than or equal to a first threshold value, generating a coding sequence according to the interval starting value, the interval length and the coding state parameter; or alternatively

Reading v first sequences from a first queue when the encoding state parameter is less than the first threshold; updating the coding state parameters according to the v first sequences; wherein the updated coding state parameter is greater than or equal to the first threshold value, and v is a positive integer; generating a coding sequence according to the interval starting value, the interval length and the updated coding state parameter;

in the scheme, along with the continuous coding process of characters in target data, coding state parameters are continuously updated, and the values are smaller and smaller. In order to avoid that the coding state parameter is too small to influence the calculation precision and further reduce the coding and decoding performance, in the embodiment of the application, the computing device can amplify the coding state parameter when the coding state parameter overflows too little so as to keep the coding state parameter within the precision requirement continuously.

In one possible design, each first sequence is an n-bit sequence; the computing device updates the encoding state parameters according to the v first sequences by:

forming the v first sequences into a first adjusting sequence, wherein the first adjusting sequence is a v-n bit sequence; and shifting the binary bit of the coding state parameter by v x n bits left and then taking the sum of the binary bit and the first adjustment sequence as the updated coding state parameter.

Through the design, when the coding state parameter overflows excessively little, the computing equipment can amplify the coding state parameter through left shift, superposition adjustment sequence and other operations so as to enable the coding state parameter to be continuously kept within the precision requirement, thereby ensuring the computing precision of the computing equipment in the coding process and further ensuring the coding and decoding performances of the computing equipment.

In one possible design, the first threshold value is equal to b_sj]*2 ^T-v*n ，b_S[i]For the regionAnd the length of the space, T, is the upper limit of the bit number of the binary bit of the coding state parameter.

In one possible design, the first queue is the coding queue, and the v first sequences are v coding sequences in the coding queue; or the first queue is a random queue, and the v first sequences are v random sequences in the random queue.

When the coding state parameters overflow excessively little, the computing equipment can update the coding state parameters by taking out the coding sequence from the coding queue to generate an adjusting sequence, so that the coding sequence in the coding queue can be compressed, and the coding state parameters can be controlled within the precision requirement. According to the scheme, in the data decoding process, when the coding state parameter is not excessively overflowed, the computing equipment can decode a plurality of coding sequences in the coding queue simultaneously in parallel, so that parallelization decoding is realized.

When the coding state parameter overflows excessively little each time, the computing equipment reads the random sequence from the random queue to generate an adjusting sequence to update the coding state parameter, and the coding scheme can achieve the compression rate close to the cell entropy. Furthermore, if the encoding state parameter is compressed by the random sequence in the random queue every time the encoding state parameter is excessively small in the encoding process, the encoding queue finally obtained contains the encoding sequence corresponding to each character in the target data because the encoding queue is not compressed, and each encoding sequence can independently decode the characters. In this way, the computing device may not be limited to the coding sequence (i.e., the position of the character corresponding to the coding sequence in the target data) in the process of decoding the coding queue, but may perform decoding processing on any coding sequence, so as to implement random access.

In one possible design, the computing device should explicitly read the positions of v code sequences in the code queue in order to restore the decoded v code sequences to their original positions according to the code state parameters later in the decoding process. Thus, for ease of operation, optionally, the computing device may read the v code sequences from the front of the code queue, i.e. the v first sequences are v code sequences at the front of the code queue. In this way, the computing device may restore the decoded v code sequences to the front end of the code queue later in the decoding process.

In one possible design, the coding sequence corresponds to the formula: di=c_si+x% b_si;

wherein d [ i ] is the coding sequence, c_S [ i ] is the section start value, b_S [ i ] is the section length, and x is the coding state parameter.

The coding sequence value after coding the character S [ i ] is located in the sub-probability interval corresponding to the S [ i ] according to the formula of the coding sequence. In this way, when decoding di later, it can directly determine that the character corresponding to di is S [ i ] by judging that the value of di is located in the sub-probability interval corresponding to S [ i ]. In summary, when each character is encoded, the computing device may enable the value of the encoding sequence corresponding to the character to be located in the sub-probability interval corresponding to the character. In this way, there is no correlation between the code sequences corresponding to different characters, so that the decoding efficiency in the subsequent decoding of the code sequences and the possibility of parallelization can be improved.

In one possible design, the coding state parameter, when updated according to the interval length, conforms to the formula: wherein ,To round down the symbol, x is the coding state parameter, b_S [ i ] ]For the interval length.

In one possible design, the computing device may encode each character in the target data in turn using the set encoding order. For example, the computing device may sequentially select one character from the target data for encoding processing in a back-to-front or front-to-back encoding order. In other words, the character to be encoded is selected from the target data by the computing device in a back-to-front or front-to-back order. After the computing device encodes the current character to be encoded, the computing device selects the next character in the target data as the character to be encoded according to the encoding sequence, and continues encoding until all characters in the target data are encoded, and then the encoding process is finished.

In a second aspect, an embodiment of the present application provides a data decoding method, which is described below by taking a computing device as an execution body as an example. The method comprises the following steps:

the computing equipment acquires a sub-probability interval corresponding to each character in the target data in the set probability interval; then the computing equipment decodes the sequence to be decoded in the coding queue according to the sub-probability interval corresponding to each character to obtain target characters; the sequence to be decoded is at least one coding sequence at the tail end of the coding queue, the target character comprises at least one character, and the at least one character contained in the target character corresponds to the at least one coding sequence one by one.

According to the method, the computing equipment can decode a plurality of coding sequences in parallel at the same time to realize parallelization decoding; in addition, the computing device can decode the coding sequence at any position in the coding queue to realize random access.

In one possible design, the probability interval is (0, 2 ⁿ ]N is an integer greater than 1; each coding sequence in the sequences to be decoded is an n-bit sequence.

In one possible design, the computing device may decode the encoded sequences in the encoding queue sequentially in a back-to-front order of the encoding process. Assuming that the coding sequence obtained by coding one character at a time in the process of coding the target data is updated to the tail end of the coding queue, the computing device can perform decoding processing on the coding sequence from back to front according to the coding queue, namely, each time selects a sequence to be decoded from the tail end of the coding queue to perform decoding until the decoding process is finished after all the coding sequences in the coding queue are subjected to decoding processing. In summary, the to-be-decoded sequence is at least one coding sequence at the tail end of the coding queue.

In one possible design, the computing device may perform decoding the sequence to be decoded according to the sub-probability interval corresponding to each character to obtain the target character by:

Selecting a sub-probability interval in which the value of the sequence to be decoded is located from the sub-probability intervals corresponding to each character; and determining the target character as the character corresponding to the selected sub-probability interval.

In one possible design, after obtaining the target character, the computing device may also determine base parameters for the target character; the base parameter comprises the interval length of the sub-probability interval corresponding to the target character and an offset value between the value of the sequence to be decoded and the interval starting value of the sub-probability interval corresponding to the target character; and then, updating the coding state parameters according to the base parameters of the target characters.

Based on the data encoding method provided in the first aspect, it is known that when the encoding state parameter overflows too little, the encoding state parameter is adjusted by compressing the encoding queue or the random queue. In order to accurately decode the target data in its entirety during the decoding process, in the embodiment of the present application, after decoding the sequence to be decoded to obtain the target character, the computing device needs to update the encoding state parameter, so as to decompress the sequence in the encoding queue or the random queue compressed before the encoding state parameter overflows excessively.

In one possible design, the computing device may update the encoding state parameters sequentially according to the base parameters of each of the target characters in a back-to-front order of the corresponding encoding sequence in the encoding queue.

In one possible design, the encoding state parameter corresponds to the formula: x=x×b_s+r_s;

wherein x is the coding state parameter, b_s and r_s are base parameters of a first character in the target characters, b_s is the interval length of a sub-probability interval corresponding to the first character, and r_s is an offset value between the value of the coding sequence corresponding to the first character and the interval starting value of the sub-probability interval corresponding to the first character.

In one possible design, the computing device performs a right shift process on the encoding-state parameter to update the encoding-state parameter when the encoding-state parameter is greater than or equal to a second threshold after updating the encoding-state parameter according to the base parameter of the target character.

Along with the continuous decoding process of the coding sequence in the coding queue, the coding state parameters are continuously updated and the value is larger and larger. In order to avoid the influence of the overlarge encoding state parameter on the calculation precision and further reduce the decoding performance of the computing device, in the embodiment of the application, the computing device can also reduce the encoding state parameter when the encoding state parameter overflows excessively, so that the encoding state parameter is continuously kept within the precision requirement.

In one possible design, the second threshold value is equal to 2 ^T T is the upper limit of the number of bits of the binary bits of the encoding state parameter.

In one possible design, the computing device right shifting the encoding-state parameters to update the encoding-state parameters includes:

and taking the binary bit right shift v x n bits of the coding state parameter as the updated coding state parameter, wherein v is a positive integer.

In one possible design, before the computing device performs a right shift process on the encoding-state parameters to update the encoding-state parameters, the following steps may be further performed:

the coding state parameter is 2 ^v*n -1 performing a binary bit and operation to obtain a second adjustment sequence of v x n bits; splitting the second adjustment sequence into v second sequences, each second sequence being an n-bit sequence; and storing the v second sequences into the coding queue.

Since the computing device may utilize sequences in the compression encoding queue or random queue to adjust the encoding state parameters when the encoding state parameters overflow too little during the data encoding process. Based on the above data encoding scheme, in the data decoding scheme of the embodiment of the present application, the computing device may recover the compressed sequence in the data encoding process through the present design.

It should also be noted that if the computing device determines during the decoding of the data that a random queue is used to adjust the encoding state parameters during the encoding of the target data, the computing device may not save v second sequences (i.e., v random sequences) to the encoding queue, or mark them after saving to the encoding queue, so as to not subsequently decode the marked sequences (i.e., the computing device may avoid marked sequences when selecting the sequences to be decoded in the encoding queue), or update the encoding state parameters for these decoding processes even after decoding the marked sequences.

In one possible design, the computing device saves the v second sequences to the positions of the encoding queues corresponding to the positions in the first queues of the v first sequences read by the computing device of the first aspect. For example, when a computing device selects v first sequences from the front of the encoding queue while encoding, the computing device saves the resulting v second sequences to the front of the encoding queue during decoding.

In one possible design, the computing device may also recover target data from the target character; the initial value of the coding state parameter is equal to the value of the coding state parameter after coding all characters in the target data. It should be noted that, since the decoding process is performed in the order from the back to the front of the encoding queue, the characters encoded later in the target data are decoded first. Therefore, when the original target data is restored, the computing device restores the original target data according to the coding sequence and the decoding sequence.

In a third aspect, embodiments of the present application provide a computing device comprising means for performing the steps of any of the above aspects.

In a fourth aspect, embodiments of the present application provide a computing device comprising at least one processing element and at least one storage element, wherein the at least one storage element is for storing programs and data, and the at least one processing element is for performing the method provided in any of the aspects of the present application.

In a fifth aspect, embodiments of the present application also provide a computer program which, when run on a computer, causes the computer to perform the method provided in any of the above aspects.

In a sixth aspect, embodiments of the present application also provide a computer-readable storage medium having a computer program stored therein, which when executed by a computer, causes the computer to perform the method provided in any of the above aspects.

In a seventh aspect, an embodiment of the present application further provides a chip, where the chip is configured to read a computer program stored in a memory, and perform the method provided in any one of the above aspects.

In an eighth aspect, an embodiment of the present application further provides a chip system, where the chip system includes a processor, and the processor is configured to support a computer device to implement the method provided in any one of the above aspects. In one possible design, the chip system further includes a memory for storing programs and data necessary for the computer device. The chip system may be formed of a chip or may include a chip and other discrete devices.

Drawings

Fig. 1 is a diagram of huffman tree example of a huffman coding structure;

FIG. 2 is a diagram showing an example of coding section variation of arithmetic coding;

FIG. 3 is a block diagram of a computing device provided by an embodiment of the present application;

FIG. 4 is a flowchart of a data encoding method according to an embodiment of the present application;

FIG. 5 is a flowchart of a data decoding method according to an embodiment of the present application;

fig. 6A is a schematic diagram of a data decoding flow according to an embodiment of the present application;

FIG. 6B is a schematic diagram of another data decoding process according to an embodiment of the present application;

fig. 7 is a schematic flow chart of a data encoding method according to an embodiment of the present application;

fig. 8 is a flow chart of a data decoding method according to an embodiment of the present application;

fig. 9 is a schematic flow chart of a data decoding method according to an embodiment of the present application;

fig. 10 is a schematic diagram of an example of data encoding and decoding according to an embodiment of the present application;

fig. 11A is a schematic diagram of a scenario where a data encoding and decoding scheme provided in an embodiment of the present application is applicable;

FIG. 11B is a schematic diagram of a scenario in which another data encoding/decoding scheme according to an embodiment of the present application is applicable;

FIG. 12 is a block diagram of a computing device according to an embodiment of the present application;

FIG. 13 is a block diagram of another computing device provided in accordance with an embodiment of the present application;

FIG. 14 is a block diagram of a computing device according to an embodiment of the present application.

Detailed Description

The application provides a data encoding and decoding method and device, which are used for realizing parallelization decoding under the condition of not influencing the compression rate. The method and the device provided by the embodiment of the application are based on the same technical conception, and because the principles of solving the problems are similar, the embodiments can be mutually referred to, and the repetition is omitted.

Some terms used in the present application are explained below to facilitate understanding by those skilled in the art.

1) Entropy coding, a coding technique that uses statistical distribution information of characters in data to perform lossless compression on the data. Entropy coding is a coding technique that can achieve or approach the entropy of data cells.

2) Redundancy is encoded, and the actual encoding length of the data is set at the gap between the cell entropies of the data.

3) Compression ratio, the ratio between the amount of data before compression and the amount of data after compression.

4) Random access (random access) means that compressed data can be decoded to obtain any position character without an integral decompression process.

5) Computing devices, devices or apparatus having data computing, storage functions. The present application is not limited to a specific form of computing device, and the computing device may be a server, a computer, a mobile phone, a vehicle-mounted device, a cloud platform, or the like.

6) "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The term "plurality" as used herein means two or more. At least one, meaning one or more.

In addition, it should be understood that in the description of the present application, the words "first," "second," and the like are used merely for distinguishing between the descriptions and not for indicating or implying any relative importance or order.

A simple description of a conventional entropy coding scheme is provided below.

1. Huffman (Huffman). The characteristics are as follows: the encoding speed is fast, but the encoding redundancy is high.

The Huffman coding constructs the shortest average length codeword of the heteronym head according to the character occurrence probability, and mainly aims to construct the Huffman tree according to the frequency of the character occurrence so as to maximally save the storage space.

By way of example, assuming that the data "abcde" needs to be encoded, a huffman tree of the computing device construct may be as shown in fig. 1.

2. Arithmetic coding. The characteristics are as follows: coding redundancy is low but the speed is slow.

The coding process is as follows: determining probability distribution of each character in target data to be encoded; dividing the set probability interval to obtain a sub-probability interval corresponding to each character; initializing a coding interval as the set probability interval; and then, sequentially carrying out shrinkage updating on the coding section according to the sub-probability section corresponding to each character in the target data, thereby realizing the coding of each character. And after the coding of all the characters in the target data is finished, taking any numerical value in the final coding section as a coding result of the target data.

The decoding process is the reverse of the encoding process, and is: initializing a coding interval into the set probability interval, and judging which character corresponding sub-probability interval in the coding interval the coding result belongs to so as to recover one character; then, the coding section is reduced and updated according to the sub-probability section corresponding to the recovered character; and then continuing to judge which character corresponds to the sub-probability interval in the updated coding interval, and analogizing until all the characters are recovered.

Illustratively, during the encoding/decoding process, the change of the encoding section is as shown in fig. 2: assuming that the set probability interval is [0, 1); before encoding/decoding data, the coding interval is [0, 1); after the first character S is encoded or decoded to obtain the first character S, updating the encoding section according to the sub-probability section corresponding to the S, wherein the encoding section is updated to [0.7,0.75 ]; after the second character "W" is encoded or decoded to obtain the second character "W", the encoding section is updated according to the sub-probability section corresponding to "W", and the encoding section is updated to [0.71,0.72) … …

3. An ANS. The characteristics are as follows: the coding speed is close to that of the traditional Huffman coding, and meanwhile, the coding redundancy of arithmetic coding can be achieved.

The encoding process can be expressed by the following formula:

wherein state is the coding state parameter, 2 ⁿ S is the character to be coded in the data, f is the maximum value of the set probability interval _s For the probability of s in the data, CDF (),to round down the symbol mod is a remainder function.

By executing the above-described encoding process for each character in the target data in turn by the above-described formula, each character can be encoded into the encoding state parameter in turn.

The decoding process can be expressed by the following formula:

wherein symbol (& gt) is a character function corresponding to a search value, s is a character obtained by decoding, and f _s The probability of s in the data is obtained for decoding.

The above formula shows that the following calculation is performed on the coding state parameters obtained finally in the coding process: symbol (state mod 2) ⁿ ) So that a character can be decoded and then a formula is adopted Updating the coding state parameters; and then, continuing to decode the next character according to the updated coding state parameters until the original data is recovered.

As is apparent from the above description of the conventional entropy coding scheme, arithmetic coding and ANS have high coding performance. In arithmetic coding and ANS, coding is typically achieved by only one coding state parameter or coding interval. However, the specific value of the coding state parameter or the coding section has a strong dependency on the coding sequence. Therefore, if the characters in the target data are encoded in series during arithmetic coding and ANS coding, serial processing is also required during decoding, which affects the decoding efficiency of the computing device.

In order to achieve decoding parallelization, it is generally necessary to perform a deblocking process on target data at the time of encoding, and then perform parallel encoding and decoding on a plurality of data blocks. However, the manner in which the data is chunked may reduce the compression rate of the target data. The compression effect is greatly compromised by trading for parallel decoding at the expense of compression rate.

Therefore, how to implement parallelized decoding without affecting the compression rate on the basis of high coding performance is a problem that needs to be solved in the art.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Fig. 3 shows a structure of a computing device to which the data encoding and decoding method according to the embodiment of the present application is applicable. Referring to fig. 3, included in the computing device 300 is: processor 310, memory 320, communication module 330, input unit 340, display unit 350, power supply 360, and power management module 370. The various constituent elements of the computing device 300 are described in detail below in conjunction with FIG. 3:

the communication module 330 is configured to implement a data communication function of the computing device 300, and implement data communication with other devices. Optionally, the communication module 330 may connect other devices through a wireless connection and/or a physical connection, so as to implement data transmission and reception of the computing device 300. Optionally, the communication module 230 may include a wireless communication module, a mobile communication module, a communication interface, and the like.

The wireless communication module may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied on the computing device 300. The wireless communication module 1011 may be integrated with at least one communication processing module and implement the wireless communication techniques described above in conjunction with an antenna.

The mobile communication module may provide a solution for mobile communication technologies including 2G/3G/4G/5G, etc. applied on the computing device 300. The mobile communication module may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module can receive electromagnetic waves by the antenna, filter, amplify and the like the received electromagnetic waves, and transmit the electromagnetic waves to the modem processor for demodulation. The mobile communication module 1012 can amplify the signal modulated by the modem processor and convert the signal into electromagnetic waves through the antenna to radiate. In some embodiments, at least some of the functional modules of the mobile communication module may be disposed in the processor 310. In some embodiments, at least some of the functional modules of the mobile communication module may be provided in the same device as at least some of the modules of the processor 310.

A communication interface for physically connecting the computing device 300 with other devices. The communication interface may be connected to the communication interfaces of the other devices through cables, so as to implement data transmission between the computing device 300 and the other devices. For example, the computing device 300 may establish physical connections with various network devices through a communication interface.

The memory 320 may be used to store program instructions and data. The processor 310 performs various functions of the computing device 300 by executing program instructions stored in the memory 320. Among the program instructions, there are program instructions that can cause the processor 310 to execute the data encoding and decoding method according to the following embodiments of the present application.

Alternatively, the memory 320 may mainly include a storage program area and a storage data area. The storage program area can store an operating system, various application programs, program instructions and the like; the storage data area may store text, databases, images, etc. data. In addition, the memory 210 may include high-speed random access memory (random access memory, RAM), and also include nonvolatile memory such as: solid state disk (solid state drive, SSD), mechanical hard disk (HDD), optical disk, magnetic storage device, flash memory device, or other non-volatile solid state storage device.

The input unit 340 may be used to receive information such as data or operation instructions input by a user. Alternatively, the input unit 340 may include an input device such as a touch panel, function keys, a physical keyboard, a mouse, a camera, a monitor, etc.

The display unit 350 may implement man-machine interaction for displaying information input by a user through a user interface, and contents such as information required to be provided to the user. The display unit 350 may include a display panel 3501. Alternatively, the display panel 3501 may be configured in the form of a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), or the like.

Further, when a touch panel is included in the input unit 340, the touch panel may cover the display panel 3501, and when the touch panel detects a touch event thereon or thereabout, the touch event is transmitted to the processor 310 to determine the type of the touch event to perform a corresponding operation.

The processor 310 is a control center of the computer device, and connects the above components using various interfaces and lines. The processor 310 may implement the data encoding and decoding methods provided by the embodiments of the present application by executing program instructions stored in the memory 320 and invoking data stored in the memory 320 to perform various functions of the computing device 300.

Optionally, the processor 310 may include one or more processing units. For example, the processor 310 may include: at least one of an application processor (application processor, AP), a network processor (network processor, NP), a modem processor, a baseband processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, an operator, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, a neural-network processor (neural-network processing unit, NPU). Wherein the different processing units may be separate devices or may be integrated in one or more processors 310. The controller can be a neural center and a command center of the electronic device. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution. Furthermore, the processor 102 may also include hardware chips, such as at least one or a combination of the following: an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), a micro control unit (micro controller unit, MCU), and a single chip microcomputer. The PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof.

Optionally, the processor 310 may also be provided with an internal memory, i.e. a memory of the processor 310. The memory is used for storing temporary instructions and data of the processor 310 in the running process, so as to avoid the process of repeatedly accessing data from the memory 320, thereby improving the data reading and writing efficiency of the processor 310 and further improving the working efficiency of the processor 310.

The computer device also includes a power supply 360 (e.g., a battery) for powering the various components. Alternatively, the power supply 360 may be logically connected to the processor 310 through a power management module 370, so that the processor 310 may implement power management of the computing device, power up and power down of various components, and the like through the power management module 370.

Those skilled in the art will appreciate that the architecture of computing device 300 illustrated in fig. 3 is not limiting of the computing device, and that embodiments of the present application may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components. For example, the computing device 300 may further include a camera, a sensor, an audio circuit, etc., which are not described herein.

In order to achieve parallelized decoding without affecting compression rate on the basis of high coding performance, embodiments of the present application provide a data encoding and decoding method that can be applied to the computing device 300 as described in fig. 3 and can be applied to various storage-related data compression scenarios. The data encoding process and the data decoding process are described in detail below with reference to the flowcharts shown in fig. 4 and 5, respectively.

Fig. 4 is a data encoding method provided in an embodiment of the present application, which specifically includes the following steps:

s401: the computing equipment acquires target data to be encoded and a sub-probability interval corresponding to each character in the target data in a set probability interval.

In the embodiment of the application, the sub-probability interval corresponding to each character in the target data is combined to be equal to the probability interval, similar to the conventional encoding and decoding scheme. Optionally, the probability interval is (0, 2 ⁿ ]N is an integer greater than 1.

Optionally, the target data may be at least one type of data stored in a memory of the computing device, data input or specified by a user, and data sent by other received devices, which is not limited by the present application.

In the embodiment of the application, the length of the probability interval is not limited, namely the value of n is not limited. In practical application, the user can specifically set the value of n according to factors such as specific scenes and the data volume of target data, the distribution characteristics of each character in the target data, and the like.

It should be noted that, in the embodiment of the present application, the computing device may determine, according to the distribution condition of each character in the target data (may, but is not limited to, probability of occurrence of each character), a sub-probability interval corresponding to each character in the set probability interval; alternatively, the sub-probability interval corresponding to each character is preset, and the method of acquiring the sub-probability interval corresponding to each character is not limited.

For example, taking n=2, the probability interval is (0, 4), the probability that the target data to be encoded is "abca" is 1/2, the probability that the character "a" appears in the target data, the probability that the character "b" and the character "c" appear in the target data are each 1/4, therefore, the interval length of the sub-probability interval corresponding to the character "a" is 4×1/2=2, the interval length of the sub-probability interval corresponding to the character "b" and the character "c" respectively is 4×1/4=2.

For another example, taking n=8, the probability interval is (0,256), the target data to be encoded is "baabc", the frequency of occurrence of the characters "a" and "b" in the target data is the same and greater than the frequency of occurrence of the character "c", therefore, the computing device may determine that the sub-probability interval corresponding to the character "a" is (0,96), the sub-probability interval corresponding to the character "b" is (96,192), and the sub-probability interval corresponding to the character "c" is (192,256).

S402: the computing device encodes characters to be encoded in the target data.

Alternatively, the computing device may sequentially perform the encoding process on each character in the target data using the set encoding order. For example, the computing device may sequentially select one character from the target data for encoding processing in a back-to-front or front-to-back encoding order. In other words, the character to be encoded is selected from the target data by the computing device in a back-to-front or front-to-back order. After the computing device encodes the current character to be encoded, the computing device selects the next character in the target data as the character to be encoded according to the encoding sequence, and continues encoding until all characters in the target data are encoded, and then the encoding process is finished.

Taking the current character to be encoded as the (i+1) th character in the target data as an example, the character to be encoded can be marked as S [ i ]. Where i is an integer in [0, N-1], and N is the length of the target data (the total number of characters contained).

As shown in fig. 4, the computing device may perform the encoding process on the current character to be encoded in the following steps S4021 to S4024:

s4021: and the computing equipment determines an interval starting value and an interval length of the sub-probability interval corresponding to the character to be encoded.

S4022: the computing device generates a coding sequence according to the interval start value, the interval length and the coding state parameter. The value of the coding sequence is located in a sub-probability interval corresponding to the character to be coded.

Alternatively, when the probability interval set in S401 is (0, 2) ⁿ ]When the coding sequence is an n-bit sequence.

The encoding state parameter is predetermined, which can be understood as: if the character to be encoded is the first character to be encoded in the target data, the encoding state parameter is an initial value set by a user or the computing equipment; and when the character to be encoded is not the first character to be encoded in the target data, the encoding state parameter is updated when the computing equipment performs encoding processing on the last character.

In the embodiment of the present application, along with the process of continuously encoding the characters in the target data, the encoding status parameter is also continuously updated, and the value is smaller and smaller (refer to the updating process of S4024). In order to avoid that the coding state parameter is too small to influence the calculation precision and further reduce the coding and decoding performance, in the embodiment of the application, the computing device can amplify the coding state parameter when the coding state parameter overflows too little so as to keep the coding state parameter within the precision requirement continuously.

In summary, in one embodiment, when executing S4022, the computing device may first determine whether the current value of the encoding status parameter is too small and overflows, that is, determine whether the encoding status parameter is smaller than a set first threshold; and performs different operations in different cases as shown in steps A1 and A2 below.

A1: when the coding state parameter is greater than or equal to a first threshold, the computing device does not process the coding state parameter, and generates the coding sequence directly according to the section start value, the section length and the coding state parameter.

A2: when the encoding state parameter is less than the first threshold, the computing device reads v first sequences from a first queue; then, updating the coding state parameters according to the v first sequences; wherein the updated coding state parameter is greater than or equal to the first threshold value, and v is a positive integer; and finally, generating the coding sequence according to the section starting value, the section length and the updated coding state parameter.

Alternatively, in an embodiment of the present application, the first threshold may be set to b_si]*2 ^T-v*n Wherein b_S [ i ]]For the character to be encoded S [ i ]]And the interval length of the corresponding sub-probability interval is T which is the upper limit of the bit number of the binary bit of the coding state parameter. Note that T, v is a constant, and in practical application, the value of T or v may be set according to a specific scenario. Each first sequence is an n-bit sequence.

In one embodiment, the computing device may update the encoding state parameters during execution of A2 by:

the computing equipment composes the v first sequences into a first adjustment sequence according to the sequence in the first queue, wherein the first adjustment sequence is a sequence with v x n bits; and shifting the binary bit of the coding state parameter by v x n bits left and then taking the sum of the binary bit and the first adjustment sequence as the updated coding state parameter.

It should be noted that in step A2, the first queue may be a code queue for storing code sequences corresponding to each character, where the v first sequences are v code sequences in the code queue. The first queue may also be a random queue, where the v first sequences are v random sequences in the random queue; the random queue is obtained by dividing a random data sequence by the computing device according to n bits of each random sequence.

Alternatively, when the computing device determines that the encoding state parameter is too small to overflow for execution A2 at different times, v first sequences may be read in different sequences to make up the first adjustment sequence. For example, when the number of coding sequences in the coding queue is greater than or equal to v, the computing device may read v coding sequences in the coding queue to form a first adjustment sequence; and when the number of the code sequences in the code queue is smaller than v, the computing device can read v random sequences in the random queue to form a first adjustment sequence. Of course, the computing device may also read v code sequences from the code queue each time the code state parameter overflows too little, or may read v random sequences from the random queue, which is not limited in the present application.

In addition, when the coding state parameters overflow, when the computing device executes A2, v coding sequences can be taken out from the coding queue to form a first adjusting sequence to adjust the coding state parameters, so that the number of the coding sequences in the coding queue can be reduced while the coding state parameters are adjusted, the effect of data compression is achieved, and the storage/transmission cost is further reduced. It should be noted that, the computing device should explicitly read the positions of the v code sequences in the code queue, so as to restore the v code sequences decoded according to the coding state parameters to the original positions in the subsequent decoding process. Thus, for ease of operation, optionally, the computing device may read the v code sequences from the front of the code queue, i.e. the v first sequences are v code sequences at the front of the code queue. In this way, the computing device may restore the decoded v code sequences to the front end of the code queue later in the decoding process.

Through the step A2, when the encoding state parameter overflows excessively little, the computing device can amplify the encoding state parameter through left shift, superposition adjustment sequence and other operations, so that the encoding state parameter is continuously kept within the precision requirement, the computing precision of the computing device in the encoding process is ensured, and the encoding and decoding performances of the computing device are ensured.

Optionally, in steps A1 and A2, when the computing device generates a coding sequence according to the section start value, the section length, and the coding state parameter, the coding sequence conforms to the formula: di=c_si+x% b_si; wherein d [ i ] is the coding sequence corresponding to S [ i ], c_S [ i ] is the section start value of the sub-probability section corresponding to S [ i ], b_S [ i ] is the section length of the sub-probability section corresponding to S [ i ], and x is the current coding state parameter. The coding sequence value after coding the character S [ i ] is located in the sub-probability interval corresponding to the S [ i ] according to the formula of the coding sequence. In this way, when decoding di later, it can directly determine that the character corresponding to di is S [ i ] by judging that the value of di is located in the sub-probability interval corresponding to S [ i ].

In summary, by using the coding method provided by the embodiment of the present application, when each character is coded, the computing device may enable the value of the coding sequence corresponding to the character to be located in the sub-probability interval corresponding to the character. In this way, there is no correlation between the code sequences corresponding to different characters, so that the decoding efficiency in the subsequent decoding of the code sequences and the possibility of parallelization can be improved.

S4023: the computing device saves the generated code sequence to a code queue.

In order to enable the coding queue to represent the coding order of the characters (the order in which each coding sequence is obtained by coding), so that the computing device can determine the position of the character corresponding to each coding sequence in the target data to restore the original target data after subsequent decoding, the computing device can store the coding sequences in the coding queue according to the set order. Alternatively, the computing device may save the encoded sequence to the end of the encoding queue. Thus, the code sequence at the front of the code queue is encoded first, and the code sequence at the end of the code queue is encoded last. Based on this, in the subsequent decoding process, the computing device may sequentially perform decoding processing on the code sequences in the code queue in order from the back to the front of the code queue.

S4024: the computing device updates the encoding state parameter according to the interval length.

In one embodiment, the computing device may update the encoding state parameters by the following formula: the encoding state parameter conforms to the formula: wherein ,To round down the symbol, x is the coding state parameter, b_S [ i ]]Is S [ i ]]The interval length of the corresponding sub-probability interval.

Through S4024, after encoding a character in the target data, the computing device needs to update the encoding state parameter according to the interval length of the sub-probability interval corresponding to the character, so that the encoding state parameter can embody the feature of the character being encoded, and can ensure that the encoding state parameter is used for encoding the next character continuously.

It should be further noted that, before performing the encoding process on the characters in the target data, the computing device may also initialize the encoding queue and the encoding status parameter, and the embodiments of the present application do not limit the initial values of the encoding queue and the encoding status parameter.

After the computing device performs coding processing on all characters in the target data, the last coding queue and the coding state parameters can be saved so that the coding sequence in the coding queue can be decoded later. In addition, the computing device may also store the total number of characters in the original target data, the set probability interval, the subinterval corresponding to each character, the values of parameters such as v, T, n, and the like. Optionally, the computing device packages the coding queue and the at least one item of information into a compressed file, so that the computing device or other devices receiving the compressed file can quickly determine the information when decoding the coding queue, thereby improving decoding efficiency.

In summary, the embodiment of the application provides a data encoding method. In the method, the computing device can encode each character in the target data into a coding sequence and store the coding sequence into a coding queue, wherein the value of the coding sequence corresponding to any character is positioned in a sub-probability interval corresponding to the character. In the data coding mode, the value of the coding sequence corresponding to each character is positioned in the sub-probability interval corresponding to the character, and the coding sequences corresponding to different characters have no correlation, so that when the coding sequences in the coding queue are decoded later, a plurality of coding sequences can be simultaneously decoded in parallel to realize parallelization decoding. In addition, the computing device can decode the coding sequence at any position in the coding queue to realize random access. In summary, the data encoding method provided by the embodiment of the application is used as an entropy encoding method, and can support parallelization decoding without affecting the compression rate, thereby improving the decoding efficiency.

Furthermore, with the updating of the coding state parameters in the data coding process, when the coding state parameters overflow excessively, the computing device can update the coding state parameters by taking v coding sequences out of the coding queue to generate an adjustment sequence, so that the coding sequences in the coding queue can be compressed, and the coding state parameters can be controlled within the precision requirement. Based on the scheme, in the data decoding process, when the coding state parameters are not excessively overflowed, the computing device can decode a plurality of coding sequences in the coding queue simultaneously in parallel because the operation of decompressing the coding sequences compressed into the coding state parameters is not involved at the moment, so that parallelized decoding is realized.

In addition, through experimental verification, when the coding state parameters overflow excessively little each time, the computing equipment reads the random sequence from the random queue to generate the adjustment sequence to update the coding state parameters, and the coding scheme can achieve the compression rate close to the cell entropy. Furthermore, if the encoding state parameter is compressed by the random sequence in the random queue every time the encoding state parameter is excessively small in the encoding process, the encoding queue finally obtained contains the encoding sequence corresponding to each character in the target data because the encoding queue is not compressed, and each encoding sequence can independently decode the characters. In this way, the computing device may not be limited to the coding sequence (i.e., the position of the character corresponding to the coding sequence in the target data) in the process of decoding the coding queue, but may perform decoding processing on any coding sequence, so as to implement random access.

Fig. 5 is a data decoding method provided in an embodiment of the present application, which specifically includes the following steps:

s501: the computing device obtains a sub-probability interval corresponding to each character in the target data in the set probability interval.

And merging the sub-probability intervals corresponding to each character in the target data to be equal to the probability interval. Optionally, the probability interval is (0, 2 ⁿ ]N is an integer greater than 1.

Alternatively, the sub-probability interval corresponding to each character acquired by the computing device during decoding may be the same as the sub-probability interval corresponding to each character acquired by the computing device during encoding.

Optionally, in an implementation manner, when a compressed file including the encoding queue to be decoded includes a sub-probability interval corresponding to each character, the computing device may directly obtain the sub-probability interval corresponding to each character from the compressed file. In another implementation manner, when the compressed file does not include the sub-probability interval corresponding to each character, the computing device may also predict the sub-probability interval corresponding to each character, and the embodiment of the present application is not limited in specific prediction process. Therefore, the present application is not limited to the specific implementation of S501.

S502: and the computing equipment decodes the sequence to be decoded in the coding queue according to the sub-probability interval corresponding to each character to obtain the target character. The sequence to be decoded is at least one coding sequence in the coding queue, the target character comprises at least one character, and the at least one character contained in the target character corresponds to the at least one coding sequence one by one.

Before S502 is executed, the initial encoding queue is obtained after encoding all characters in the target data. Alternatively, when the probability interval set in S501 is (0, 2) ⁿ ]When each coding sequence in the coding queue is an n-bit sequence.

Optionally, in an embodiment of the present application, the computing device may sequentially decode the encoded sequences in the encoding queue according to a sequence from back to front in the encoding process. Assuming that the coding sequence obtained by coding one character at a time in the process of coding the target data is updated to the tail end of the coding queue, the computing device can perform decoding processing on the coding sequence from back to front according to the coding queue, namely, each time selects a sequence to be decoded from the tail end of the coding queue to perform decoding until the decoding process is finished after all the coding sequences in the coding queue are subjected to decoding processing.

In one embodiment, the computing device may employ the following steps to decode the sequence to be decoded: the computing equipment selects a sub-probability interval in which the value of the sequence to be decoded is located in the sub-probability interval corresponding to each character; and determining the target character as the character corresponding to the selected sub-probability interval.

When the to-be-decoded sequence contains a plurality of coding sequences, the computing device can determine the sub-probability interval where the value of each coding sequence is located in parallel, so as to determine the character corresponding to each coding sequence, and realize parallel decoding.

It should be noted that, based on the data encoding method shown in fig. 4, the encoding state parameter is adjusted by compressing the encoding queue or the random queue when the encoding state parameter overflows too little. In order to accurately decode the target data in its entirety during the decoding process, in the embodiment of the present application, after decoding the sequence to be decoded to obtain the target character, the computing device needs to update the encoding state parameter, so as to decompress the sequence in the encoding queue or the random queue compressed before the encoding state parameter overflows excessively. In an embodiment of the present application, the computing device may update the encoding state parameters by:

b1: the computing device determines base parameters of the target character. The base parameter includes a section length of a sub-probability section corresponding to the target character, and an offset value between a value of the sequence to be decoded and a section start value of the sub-probability section corresponding to the target character.

B2: the computing device updates the encoding state parameter according to the base parameter of the target character. Wherein the encoding state parameter is predetermined.

In one embodiment, when the computing device executes B2, the encoding status parameter may be updated sequentially according to the base parameter of each character in the target character according to the sequence from the back to the front in the encoding queue of the corresponding encoding sequence. For example, when the last code sequence in the code queue is the code sequence corresponding to the character "a", and the next to last code sequence is the code sequence corresponding to the character "b", the computing device may decode the two code sequences in parallel to obtain the characters "a" and "b". However, in updating the encoding state parameters, the computing device needs to update the encoding state parameters according to the base parameters of "a" and then update the encoding state parameters according to the base parameters of "b".

Optionally, taking a first character in the target characters as an example, when the coding state parameter is updated according to the base parameter of the first character, the coding state parameter conforms to the formula: x=x×b_s+r_s; wherein x is the coding state parameter, b_s and r_s are base parameters of the first character, b_s is the interval length of the sub-probability interval corresponding to the first character, and r_s is the offset value between the value of the coding sequence corresponding to the first character and the interval starting value of the sub-probability interval corresponding to the first character.

It should be noted that, by the above method for updating the coding state parameter, along with the continuous decoding process of the coding sequence in the coding queue, the coding state parameter is continuously updated and has an increasingly larger value. In order to avoid the influence of the overlarge encoding state parameter on the calculation precision and further reduce the decoding performance of the computing device, in the embodiment of the application, the computing device can also reduce the encoding state parameter when the encoding state parameter overflows excessively, so that the encoding state parameter is continuously kept within the precision requirement.

In summary, in one embodiment, after the computing device performs the updating of the coding state parameter B2, the computing device may further determine whether the current value of the coding state parameter overflows excessively, that is, determine whether the coding state parameter is greater than or equal to a set second threshold value; and when the coding state parameter is greater than or equal to the second threshold value, performing right shift processing on the coding state parameter to update the coding state parameter so as to realize the reduction of the coding state parameter. Optionally, the second threshold is equal to 2 ^T T is the upper limit of the number of bits of the binary bits of the encoding state parameter.

In one embodiment, when the encoding-state parameter is greater than or equal to the second threshold, the computing device may right shift the binary bits of the encoding-state parameter by v×n bits as the updated encoding-state parameter. Wherein v is a positive integer. The value of v can be agreed by the computing device and the encoding device, or preset, or obtained from a compressed file containing the encoding queue. According to the embodiment, the computing device can reduce the coding state parameters through right shift operation, so that the coding state parameters are kept within the precision requirements continuously, the computing precision of the computing device in the decoding process is guaranteed, and the decoding performance of the computing device is guaranteed.

In one embodiment, the computing device adjusts the encoding state parameters by compressing the encoding queue or random queue as the encoding state parameters overflow too little during encoding. In order to accurately decode the target data in the decoding process, in the embodiment of the present application, when the coding state parameter overflows excessively, before shifting the coding state parameter to the right, the computing device may further decompress the sequence in the previously compressed coding queue or the random queue according to the value of the coding state parameter, where the specific process is as follows:

c1: the computing device compares the encoding state parameter with 2 ^v*n -1 performing a binary bit and operation resulting in a second adjustment sequence of v x n bits.

C2: the computing device splits the second adjustment sequence into v second sequences according to the sequence of bits in the second adjustment sequence, wherein each second sequence is an n-bit sequence.

And C3: the computing device saves the v second sequences to the encoding queue. In this step C3, the computing device stores the v second sequences in the positions of the encoding queues, which corresponds to the position where the computing device selects the v first sequences in the first queue in the embodiment shown in fig. 4. For example, when the computing device selects v first sequences from the front end of the encoding queue at the time of encoding, then the computing device saves the resulting v second sequences to the front end of the encoding queue at the time of decoding execution step C3.

Since in the data encoding process shown in fig. 4, the computing device adjusts the encoding state parameters using sequences in the compression encoding queue or random queue when the encoding state parameters overflow too little. Based on the above data encoding scheme, in the data decoding scheme according to the embodiment of the present application, the computing device may recover the compressed sequence in the data encoding process through steps C1-C3.

It should also be noted that if the computing device determines during the data decoding process that a random queue is used to adjust the encoding state parameters during the encoding of the target data, the computing device may not save the generated v second sequences (i.e., v random sequences) to the encoding queue when executing C3; or the sequences are marked after being stored in the coding queue so that the marked sequences are not decoded later (namely, the computing equipment can avoid marked sequences when selecting the sequences to be decoded in the coding queue); or the code state parameters are stored in the code queue and marked, even if the marked sequences are decoded later, the code state parameters are not updated for the decoding processes, and characters obtained by decoding the sequences are not used when the target data is recovered.

It should be noted that the computing device also needs to obtain initial values of the encoding state parameters before decoding the sequences in the encoding queue. The initial value of the coding state parameter is the value of the coding state parameter after coding all characters in the target data. Alternatively, the computing device may obtain the initial value of the encoding state parameter from a compressed file containing the encoding queue.

In addition, when the computing device can also acquire the total number of characters in the original target data, after the computing device decodes the characters of the corresponding total number in the decoding mode, the original target data can be recovered according to the decoded characters. Alternatively, the computing device may obtain the total number of characters in the original target data from the compressed file containing the encoding queue.

It should be noted that, since the decoding process is performed in the order from the back to the front of the encoding queue, the characters encoded later in the target data are decoded first. Therefore, when the original target data is restored, the computing device restores the original target data according to the coding sequence and the decoding sequence. For example, when the computing device encodes characters in the target data sequentially in the order from front to back, the characters located at the last of the target data are decoded first when decoding, so when the original target data are restored, the characters obtained by decoding are ordered in the order in which the characters are decoded from back to front, and the original target data can be restored. For another example, if the computing device encodes the characters in the target data sequentially in the order from the back to the front, then the first character in the target data is decoded when decoding, so when the original target data is recovered, the characters obtained by decoding are directly ordered in the order in which the characters are decoded from the front to the back, so that the original target data can be recovered.

Referring to fig. 6A, the process of the computing device decoding data from the encoding queue obtained by the data encoding method provided by the embodiment shown in fig. 4 is as follows:

the computing device may read a plurality of coding sequences from the end of the coding queue to perform parallel decoding, so as to obtain a plurality of characters and base parameters of the plurality of characters, and the specific process may refer to the description in S502 above; then updating the coding state parameter x according to the base parameter of each character in sequence from back to front in the coding queue according to the sequence of the plurality of coding sequences, wherein the specific process can refer to the description in B2; when the updated coding state parameter x overflows excessively, generating v second sequences according to the x, and writing the generated v second sequences into a coding queue (optional step), wherein the specific process can refer to C1-C3; and right shifting x to narrow x.

It should be further noted that, according to the data encoding method provided by the embodiment shown in fig. 4, since there is no correlation between each encoding sequence in the encoding queue, the computing device may perform decoding on the encoding sequence at any position in the encoding queue to obtain the character corresponding to the encoding sequence. Furthermore, if the coding sequence in the coding queue is not compressed when the coding state parameter is too small in the coding process of the target data, the coding sequence corresponding to each character in the target data is contained in the coding queue obtained by final coding, and since each coding sequence can independently decode the character, the decoding process of the computing device can decode the coding sequence in the coding queue without considering the decoding sequence through the coding state parameter, that is, without limiting the sequence of coding (that is, the position of the character corresponding to the coding sequence in the target data), but can decode the coding sequence at any position, thereby realizing random access.

If the coding state parameters are adjusted by compressing the coding sequence in the coding queue when the coding state parameters overflow too little in the process of coding the target data, the compressed coding sequence needs to be recovered according to the coding state parameters to be written into the coding queue when the coding state parameters overflow too much in the process of decoding. Based on the decoding process shown in fig. 6A, the decoding scheme provided by the application has local robustness on the basis of supporting parallel decoding. Taking fig. 6B as an example, the data underlined by black in fig. 6B is error data, when a certain coding sequence decodes an error, an error character and an error base parameter are obtained, and the error is accumulated when the coding state parameter x updated based on the error base parameter is also error. However, since there is no correlation between different code sequences, the code sequences can be independently decoded, and as long as the error x does not overflow, the decoding error does not affect the decoding of other code sequences, and the other code sequences can still be successfully decoded. With the continuous update of x, when x overflows too much, the computing device may have an impact on the accuracy of subsequent decoding because the computing device will write v second sequences of errors into the encoding queue based on the erroneous x. Obviously, the data decompression method has local robustness, and when a certain coding sequence is decoded in error, other coding sequences may not be adversely affected immediately.

In summary, based on the data encoding method shown in fig. 4, the embodiment of the application provides a data decoding method. In the method, the computing device can decode a plurality of coding sequences simultaneously in parallel to realize parallelization decoding; in addition, the computing device can decode the coding sequence at any position in the coding queue to realize random access.

Based on the data encoding and decoding schemes provided in fig. 4 and fig. 5, the embodiment of the present application further provides a data encoding and decoding method, and the flow of the method is described below with reference to fig. 7 to fig. 9.

As shown in fig. 7, the data encoding flow of the computing device is as follows:

s701: the computing device acquires data S of length N to be encoded and, in a set probability interval (0, 2 ⁿ ]Each character pair in the internal data SThe corresponding sub-probability interval.

Wherein the sub-probability interval combinations for each character are equal to the probability interval (0, 2 ⁿ ]. The manner in which the computing device determines the sub-probability interval corresponding to each character may refer to the description in S401 in the embodiment shown in fig. 4, which is not described herein.

S702: the computing device initializes i=n-1. Where i is an integer in [0, N-1], i represents the position of the character to be encoded in the data S, and the character to be encoded can be denoted as S [ i ]. S [0] represents the first character in the data S, and S [ N-1] represents the last character in the data S.

The embodiment of the application encodes each character in the data S in sequence from the back to the front, so that the computing device encodes the last character in the data S first.

S703: the computing device determines a section start value c_S [ i ] and a section length b_S [ i ] of a sub-probability section corresponding to the character S [ i ] to be encoded.

S704: the computing device determines whether the encoding state parameter x overflows too little, i.e. whether x is smaller than a first threshold b_si]*2 ^T-v*n The method comprises the steps of carrying out a first treatment on the surface of the When x does not overflow, S707 is performed; s705 is performed when x overflows.

Wherein T is the upper limit of the bit number of the binary bit of x, and v is a positive integer. T and v are preset constants.

S705: when x is less than b_S [ i ]]*2 ^T-v*n When the calculation device reads V n-bit first sequences from the first queue, and generates V n-bit first adjustment sequences V according to the sequence in the first queue.

In one embodiment, the first queue may be a code queue, i.e. v first sequences are v code sequences located at the front end of the code queue. According to the embodiment, the computing device can take out v coding sequences from the coding queue to adjust x when the coding state parameter x overflows, so that the number of the coding sequences in the coding queue is reduced, and the compression effect is achieved.

In another embodiment, the first queue may be a random queue, i.e. v first sequences are v random sequences in the random queue. According to the embodiment, when the coding state parameter x overflows each time, the computing equipment can take out v random sequences from the random queue to adjust x, so that the number of the coding sequences in the coding queue is prevented from being changed, the coding sequence corresponding to each character in S is contained in the coding queue after the data S is coded, and the coding scheme can achieve the compression rate close to the cell entropy. Furthermore, each code sequence in the final code queue can be independently decoded to obtain the corresponding character. In this way, when decoding the coding queue, the computing device may not be limited to the coding sequence, but may perform decoding processing on the coding sequence at any position, so as to implement random access.

S706: the computing device updates x= (x < < V n) +v, i.e. the sum of the binary bit left shift V n bits of the coding state parameter x and the first adjustment sequence V is used as the updated coding state parameter x.

S707: the step is a coding step, and the computing device obtains a coding sequence di of the character S [ i ], that is, di=c_s [ i ] +x% b_s [ i ], according to a section start value c_s [ i ] and a section length b_s [ i ] of the sub-probability section corresponding to S [ i ] and the current coding state parameter x. D [ i ] is then written to the end of the code queue.

S708: computing device pair character si]After the coding process, according to the S [ i ]]Section length b_S [ i ] of corresponding sub-probability section]Updating the coding state parameter x, i.e. updating

S709: the computing device determines whether i is equal to 0; if yes, the first character in the data S is encoded or all characters in the data S are encoded, so that the encoding flow is ended; if i is not equal to 0, indicating that the computing device has not encoded all the characters in the data S, then the computing device selects the next character to continue encoding, i.e., S710 is performed.

S710: the computing device updates i=i-1 and continues to step S703, enabling continued encoding of the next character in the data S.

Fig. 8 illustrates an example in which a computing device selects one code sequence at a time for decoding, and as shown in fig. 8, a data decoding flow of the computing device is as follows:

s801: the computing device obtains the code queue obtained by coding the data S, the length N of the S, the initial value of the coding state parameter x (i.e. the value of x after coding all the characters in the S in the coding flow shown in figure 7), and the code queue in the set probability interval (0, 2 ⁿ ]And a sub-probability interval corresponding to each character in the internal data S.

Alternatively, the computing device may obtain the information in a compressed file containing the encoding queue. Optionally, the computing device may also predict the probability of a transition between the set probability intervals (0, 2 ⁿ ]The sub-probability interval corresponding to each character in the internal determination data S is not limited to a specific prediction method process.

S802: the computing device initializes i=0. Where i is an integer of [0, N-1], i represents the position of the character in the data S obtained by the decoding, and the character can be denoted as S [ i ]. S [0] represents the first character in the data S, and S [ N-1] represents the last character in the data S.

Since each character in the data S is encoded sequentially in the order from back to front in the embodiment shown in fig. 7, in the embodiment of the present application, the encoded sequence in the encoded queue is decoded sequentially in the order from back to front, that is, each character in the data S is obtained sequentially in the order from front to back.

S803: the computing device reads a coding sequence d [ i ] from the tail end of the coding queue, decodes the d [ i ] according to the sub-probability interval corresponding to each character to obtain the character S [ i ] and the base parameter (b_S [ i ], r_S [ i ]) of the S [ i ], wherein b_S [ i ] is the interval length of the sub-probability interval corresponding to the character S [ i ], and r_S [ i ] is the offset value between the value of d [ i ] and the interval starting value of the sub-probability interval corresponding to the S [ i ].

S804: the computing device updates x=x×b_sj+r_sj.

S805: the computing device judges whether the updated coding state parameter x overflows excessively, namelyJudging whether x is greater than or equal to a second threshold value 2 ^T The method comprises the steps of carrying out a first treatment on the surface of the When x does not overflow, S809 is performed; s806 is performed when x overflows.

Where T is the upper limit of the number of bits of the binary bit of x.

S806: the computing device associates the encoding state parameter with 2 ^v*n -1 binary bit AND operation, i.e. x&(2 ^v * ⁿ -1) obtaining a second adjustment sequence of v x n bits.

S807: according to the sequence of bits in the second adjustment sequence, the computing device splits the second adjustment sequence into v second sequences with n bits, and writes the v second sequences into the front end of the coding queue.

It should be noted that, assuming that x is adjusted by compressing v first sequences in the first queue when x overflows too little in the data encoding flow shown in fig. 7, in the decoding process of the embodiment of the present application, the computing device also recovers the compressed first sequences according to x when x overflows too much. When the first queue is a code queue, the second sequence recovered by the computing device in S807 is a code sequence corresponding to v characters in the data S, and thus, the code queue may be directly written to be decoded by the subsequent computing device.

However, when the first queue is a random queue, then the second sequence recovered by the computing device in S807 should be a random sequence, which may be marked by the computing device since it is also meaningless to decode, so that it may not be decoded, or x and i may not be updated after decoding the random sequences. Based on this, in S803, the coding sequence d [ i ] to be decoded selected by the computing device each time should be a non-random sequence, i.e., the computing device can skip the marked random sequence when reading the coding sequence to be decoded.

S808: the computing device updates x= (x > > v n), i.e. right-shifts v n the binary bits of the encoded state parameter x.

S809: after the computing equipment decodes the character S [ i ], judging whether i is equal to N-1; if yes, the last character in the decoded data S is represented or all characters in the decoded data S are represented, so that the decoding flow is ended; if i is not equal to N-1, indicating that the computing device has not completely decoded all characters in the data S, the computing device updates i and continues to select the next code sequence in the code queue for decoding, i.e., S810 is performed.

After the decoding process is finished, the computing device may recover the data S according to the decoded N characters.

S810: the computing device updates i=i+1 and continues to execute S803, achieving the goal of continuing to decode the next code sequence in the code queue.

Fig. 9 illustrates an example in which a computing device selects two code sequences at a time for decoding, i.e., parallel decoding of the two code sequences is implemented. It should be noted that the number of parallel decoding supported by the computing device may be specifically set according to the actual scenario, which the present application does not limit. As shown in fig. 9, the data decoding flow of the computing device is as follows:

s901 to S902 and S905 to S909 in fig. 9 are the same as S801 to S802 and S805 to S809 in fig. 8, respectively, and thus, the same steps may be referred to each other, and are not described herein. Unlike the flow of fig. 8, S903 to S904, and S910, the following description is given for these steps:

s903: the computing device reads two coding sequences from the tail end of the coding queue, records the two coding sequences as d [ i ] and d [ i+1], decodes the d [ i ] and the d [ i+1] in parallel according to the sub-probability section corresponding to each character to obtain the base parameters (b_S [ i ], r_S [ i ]) of the characters S [ i ] and S [ i+1] and the base parameters (b_S [ i+1], r_S [ i+1 ]), wherein b_S [ i ] is the section length of the sub-probability section corresponding to the character S [ i ], r_S [ i ] is the offset value between the value of d [ i ] and the section starting value of the sub-probability section corresponding to the character S [ i ], b_S [ i+1] is the section length of the sub-probability section corresponding to the character S [ i+1], and r_S [ i+1] is the section starting value of the sub-probability section corresponding to the character S [ i+1 ].

Wherein the code sequence at the end of the code queue is denoted as di, and the code sequence adjacent to di and preceding di is denoted as di+1.

S904: the computing device updates the encoding state parameters sequentially using the base parameters of S [ i ] and S [ i+1] in the order of d [ i ] and d [ i+1] from back to front of the encoding queue, i.e., updating x=x×b_si ] +r_si, and then updating x=x×b_si+1+r_si+1.

S910: the computing device updates i=i+2 and continues to execute S903, so as to achieve the purpose of continuing to decode the two coding sequences in the coding queue in parallel.

Fig. 10 is a schematic diagram of an example of a data encoding and decoding method provided based on the above embodiment, and this example is described below with reference to fig. 10. This example takes encoding and decoding target data "baabc" with length of 5 as an example, where n= 8,v =2, t=24, and the initial value of x is 2 before encoding the target data ^T -1＝2 ²⁴ -1=16777215. Wherein in this example, when x overflows too little during encoding, the computing device adjusts x by compressing the encoded sequences in the encoding queue.

The computing device can calculate the probability interval (0, 2) between the set probability intervals according to the distribution of the characters 'a', 'b' and 'c' in the target data ⁿ ](i.e. (0,256)]) And determining a sub-probability interval corresponding to each character in the target data. The probability distribution of each character is assumed as follows: p (a) =p (b) =96/256, P (c) =64/256, then based on the probability distribution, the computing device may determine that the sub-probability interval corresponding to the character "a" is (0,96)]The sub-probability interval corresponding to the character "b" is (96,192)]The sub-probability interval corresponding to the character "c" is (192,256)]As shown in fig. 10.

In addition, the code queue for storing the code sequence may be as shown in fig. 10, in which the front end of the code queue is located at the leftmost side and the end of the code queue is located at the rightmost side.

Before encoding the target data, the initial state of the encoding queue is empty, the initial value of x is 16777215, and the computing device encodes the target data sequentially in a sequence from back to front. The specific coding process is as follows:

1. the coding starts, the computing device determines the character to be coded as the last character 'c' in the target data, denoted as S4]. The computing equipment corresponds to the character 'c' to be codedSub-probability intervals (192,256)]Section start value c_S4]=192 and interval length b_s4]=64. The computing device determines that the value of the current x (16777215) is greater than a first threshold (b_S4 ]*2 ^T ^-v*n ＝64*2 ⁸ =16384), x does not overflow too little, according to the interval start value c_s4]=192 and interval length b_s4]=64, and the current value of x 16777215, to obtain the code sequence d [4 ] corresponding to "c ]]＝c_S[4]+x％b_S[4]=192+16777215%64=255, where d [4 ]]To take an 8-bit code sequence with a value of 255, d 4 is shown in FIG. 10]May be noted as "255".

For S4]After encoding, the computing device writes the encoding sequence "255" corresponding to "c" into the end of the encoding queue, where the state of the encoding queue is: "255". In addition, the computing device is according to b_S4]Updating x, i.e I.e. the updated value of x is 2626143.

2. The computing device determines the character to be encoded as the next character "b" in the target data, denoted S3]. The computing device calculates the sub-probability interval (96,192) corresponding to the character 'b' to be encoded]Interval start value c_S3]=96 and interval length b_s3]=96. The computing device determines that the value of the current x (2626143) is greater than a second threshold (b_S3]*2 ^T-v*n ＝96*2 ⁸ =24576), x not too small overflows, according to the interval start value c_s3]=96 and interval length b_s3]=96, and the current value of x 2626143, resulting in the coding sequence d [3 ] corresponding to "b ]]＝c_S[3]+x％b_S[3]=96+2626143%96=159, wherein d [3 ]]To take the value of 159 for an 8-bit code sequence, d [3 ] is shown in FIG. 10 ]May be noted as "159".

For S3]After encoding, the computing device writes the encoding sequence "159" corresponding to "b" into the end of the encoding queue, where the state of the encoding queue is: "255","159". In addition, the computing device is according to b_S3]Updating x, i.e I.e. the updated value of x is 2730.

3. The computing device determines the character to be encoded as the next character "a" in the target data, denoted S2]. The computing device calculates the sub-probability interval (0,96) corresponding to the character 'a' to be encoded]Interval start value c_s2]=0 and interval length b_s2]=96. The computing device determines that the value of the current x (2730) is less than a third threshold (b_S2]*2 ^T-v*n ＝96*2 ⁸ =24576, ia in the figure), x is too small to overflow, the computing device fetches two coding sequences "255" and "159" from the front of the coding queue, when the state of the coding queue is empty; the computing device then combines the two coding sequences "255" with "159" to generate a 16-bit adjustment sequence V1. Wherein V1 has a value of 255 x 2 ⁸ +159= 65439; finally, the computing device updates x= (x<<16)+V1＝2730*2 ¹⁶ +65439 = 178978719, i.e. the sum of the binary bit of x shifted left by 16 bits and V1 is taken as updated x. The updated x is greater than the third threshold (i.e., 178978719 is greater than 24576).

Through the steps, when x is excessively small in the encoding process, the computing device can adjust x by compressing the encoding sequence of the encoding queue terminal, and the number of the encoding sequences in the encoding queue can be compressed while the x is ensured to be continuously kept within the precision requirement.

After adjusting x, the computing device obtains a code sequence d [2] =c_s2 ] +x% b_s2 ] =0+178978719%96=63 corresponding to "a" according to the section start value c_s2 ] =0 and the section length b_s2 ] =96, and the current value 178978719 of x, where d [2] is an 8-bit code sequence with a value of 63, as shown in fig. 10, d [2] may be denoted as "63".

For S2]After encoding, the computing device writes the encoding sequence "63" corresponding to "a" into the end of the encoding queue, where the state of the encoding queue is: "63". In addition, the computing device is according to b_S2]Updating x, i.e I.e. the updated value of x is 1864361.

4. The computing device determines the character to be encoded as the next character "a" in the target data, denoted S1]. The computing device calculates the sub-probability interval (0,96) corresponding to the character 'a' to be encoded]Interval start value c_s1]=0 and interval length b_s1]=96. The computing device determines that the value of current x (1864361) is greater than a fourth threshold (b_S1 ]*2 ^T-v*n ＝96*2 ⁸ =24576), x not too small overflows, according to the interval start value c_s1]=0 and interval length b_s1]=96, and the current value of x 1864361, to obtain the coding sequence d [1 ] corresponding to "a ]]＝c_S[1]+x％b_S[1]=0+186461%96=41, where d [1 ]]For an 8-bit code sequence with a value of 41, d [1 ] is shown in FIG. 10]May be denoted as "41".

For S1]After encoding, the computing device writes the encoding sequence "41" corresponding to "a" into the end of the encoding queue, where the state of the encoding queue is: "63","41". In addition, the computing device is according to b_S1]Updating x, i.e I.e. the updated value of x is 19420.

5. The computing device determines the character to be encoded as the next character "b" in the target data, denoted S [0 ]]. The computing device calculates the sub-probability interval (96,192) corresponding to the character 'b' to be encoded]Interval start value c_s0]=96 and interval length b_s0]=192. The computing device determines that the value of the current x (19420) is less than a fifth threshold (b_S0]*2 ^T-v*n ＝96*2 ⁸ =24576, ib in the figure), x is too small to overflow, the computing device fetches two coding sequences "63" and "41" from the front of the coding queue, when the state of the coding queue is empty; the computing device then combines the two code sequences "63" and "41" to generate a 16-bit adjustment Sequence V2. Wherein V1 has a value of 63 x 2 ⁸ +41=16169; finally, the computing device updates x= (x<<16)+V1＝19420*2 ¹⁶ +16169= 1272725289, i.e. the sum of the binary bit of x shifted left by 16 bits and V1 is taken as updated x. The updated x is greater than the fifth threshold (i.e., 1272725289 is greater than 24576).

After adjusting x, the computing device obtains a code sequence d [0] =c_s0 ] +x% b_s0 ] =96+1272725289%96=105 corresponding to "b" according to the section start value c_s0 ] =96 and the section length b_s0 ] =96 and the current value 1272725289 of x, wherein d [0] is an 8-bit code sequence with a value of 105, as shown in fig. 10, d [0] may be denoted as "105".

For S0]After encoding, the computing device writes the encoding sequence "105" corresponding to "b" into the end of the encoding queue, where the state of the encoding queue is: "105". In addition, the computing device is according to b_S0]Updating x, i.e I.e. the updated value of x is 13257555.

So far, the computing device finishes encoding all characters in the target data, and the state of the last encoding queue is as follows: "105", and x is 13257555.

The computing device may package the last code queue, the value of x, the total number of characters in the target data (5), and the sub-probability intervals corresponding to each character into a compressed file for subsequent decompression by itself or when transmitting to other computing devices.

Before starting decoding the target data, the computing device may read the encoding queue, the value of x, the total number of characters in the target data (5), and the sub-probability interval for each character from the compressed file. The state of the code queue at this time is: the value of x is 13257555, the total data quantity of characters in target data is 5, and the sub-probability interval corresponding to each character is: the sub-probability interval corresponding to the character "a" is (0,96), the sub-probability interval corresponding to the character "b" is (96,192), and the sub-probability interval corresponding to the character "c" is (192,256).

The computing device may decode the encoded sequences in the encoded queues in a back-to-front order such that each character in the target data may be sequentially derived in a front-to-back order. I.e. the character decoded by the code sequence at the end of the code queue is the first character in the target data. The specific decoding process is as follows:

1. at the beginning of decoding, the computing device reads a code sequence d [0] from the end of the code queue, as shown in FIG. 10, d [0] being an 8-bit code sequence having a value of 105, denoted as "105". After reading, the state of the code queue is empty. The computing device determines that "105" is located in the sub-probability interval (96,192) corresponding to the character "b", and thus can decode the character s0= "b" corresponding to d 0.

The computing device may further obtain (96,9) a base parameter (b_s0, r_s0) of S0 according to the sub-probability interval (96,192) of the character "b", where b_s0 is the length of the sub-probability interval of the character "b", and r_s0 is an offset between the value of d 0 and the interval start value of the sub-probability interval of the character "b", i.e. 105-96=9.

The computing device updates x according to the base parameter (96,9) of S [0], i.e., x=x×b_s0 ] +r_s0 ] =13257555×96+9= 1272725289.

After updating x, the computing device determines that the value of current x is greater than the set threshold (2 ²⁴ =16777216, i.e. I in the figure), x overflows too much, the computing device will x and 2 ^v*n -1 (65535) binary bit AND operation, x&65535, a 16-bit adjustment sequence V (0011111100101001) is obtained. The computing device splits the first 8-bit sequence (00111111) and the last 8-bit sequence (00101001) in the adjustment sequence V, resulting in two encoded sequences of values 63 and 41, denoted "63" and "41", respectively. The computing equipment sequentially writes '63' and '41' into the front end of the coding queue from back to front, namely, firstly writes '41' into the front end of the coding queue, and then writes '63' into the front end of the coding queue; or simultaneously writing "63" and "41" to the front of the encoding queue, where "63" is written to Located "before" 41". The updated state of the code queue is: "63","41".

Finally, the computing device right shifts the binary bit of the current x by 16 bits to update x, i.e., x= (x > > 16) =19420.

2. The computing device reads the next code sequence d [1] from the end of the code queue, as shown in FIG. 10, d [1] is "41". After reading, the state of the code queue is: "63". The computing device determines that "41" is located in the sub-probability interval (0,96) corresponding to the character "a", and thus, the character s1= "a" corresponding to d 1 can be decoded.

The computing device may further obtain (96,41) a base parameter (b_s1, r_s1) of S1 according to the sub-probability interval (0,96) of the character "a", where b_s1 is an interval length of the sub-probability interval of the character "a", and r_s1 is an offset between the value of d 1 and an interval start value of the sub-probability interval of the character "a", i.e. 41-0=41.

The computing device updates x according to the base parameters (96,41) of S [1], i.e., x=x×b_s1+r_s1=19420×96+41= 1864361.

Since the updated x is less than I, there is no excessive overflow, and thus the computing device does not make adjustments to the updated x.

3. The computing device reads the next code sequence d [2] from the end of the code queue, as shown in FIG. 10, d [2] being "63". After reading, the state of the code queue is: empty. The computing device determines that "63" is located in the sub-probability interval (0,96) corresponding to the character "a", and thus can decode the character s2= "a" corresponding to d 2.

The computing device may also obtain (96,63) a base parameter (b_s2, r_s2) of S2 based on the sub-probability interval (0,96) of the character "a", where b_s2 is the interval length of the sub-probability interval of the character "a", and r_s2 is the offset between the value of d 2 and the interval start value of the sub-probability interval of the character "a", i.e. 63-0=63.

The computing device updates x according to the base parameters (96,63) of S [2], i.e., x=x×b_s2 ] +r_s2 ] =1864367×96+63= 178978719.

After updating x, the computing device determines that the current value of x is greater thanSetting threshold (2) ²⁴ =16777216, i.e. I in the figure), x overflows too much, the computing device will x and 2 ^v*n -1 (65535) binary bit AND operation, x&65535, a 16-bit adjustment sequence V (1111111110011111) is obtained. The computing device splits the first 8-bit sequence (11111111) and the last 8-bit sequence (0011111) in the adjustment sequence V to obtain two encoded sequences of values 255 and 159, denoted "255" and "159", respectively. The computing equipment sequentially writes '255' and '159' into the front end of the coding queue from back to front, namely, firstly writes '159' into the front end of the coding queue, and then writes '255' into the front end of the coding queue; or simultaneously "255" and "159" are written to the front of the code queue, with "255" preceding "159". The updated state of the code queue is: "255","159".

Finally, the computing device right shifts the binary bit of the current x by 16 bits to update x, i.e., x= (x > > 16) =2730.

4. The computing device reads the next code sequence d [3] from the end of the code queue, as shown in FIG. 10, d [3] is "159". After reading, the state of the code queue is: "255". The computing device determines that "159" is located in the sub-probability interval (96,192) corresponding to the character "b", and thus can decode the character s3= "b" corresponding to d 3.

The computing device may further obtain, according to the sub-probability interval (96,192) of the character "b", that the base parameter (b_s3, r_s3) of S3 is (96,63), where b_s3 is the interval length of the sub-probability interval of the character "b", and r_s3 is the offset between the value of d 3 and the interval start value of the sub-probability interval of the character "b", i.e. 159-96=63.

The computing device updates x according to the base parameters (96,63) of S [3], i.e., x=x×b_s3 ] +r_s3 ] =2730×96+63=26262343.

5. The computing device reads the next code sequence d [4] from the end of the code queue, as shown in FIG. 10, d [4] being "255". After reading, the state of the code queue is: empty. The computing device determines that "255" is located in the sub-probability interval (192,256) corresponding to the character "c", and thus can decode the character s4= "c" corresponding to d 4.

The computing device may also obtain, based on the sub-probability interval (192,256) of the character "c", an offset between the value of b_s4 and the interval start value of the sub-probability interval of the character "c", i.e., 255-192=63, for which the base parameter (b_s4, r_s4) of S4 is (64, 63), where b_s4 is the interval length of the sub-probability interval of the character "c".

The computing device updates x according to the base parameters (64, 63) of S [4], i.e., x=x×b_s4 ] +r_s4 ] =262626143×96+63=16777215.

So far, the computing device decodes to obtain 5 characters, and the decoding is finished. Finally, the computing device may sequentially sort the 5 characters according to the decoding order to obtain "baabc", thereby recovering the original target data.

It should be noted that during decoding, if multiple encoded sequences are present in the encoded queue, the computing device may decode the multiple encoded sequences simultaneously. For example, when the state of the encoding queue is "63", "41", the computing device may decode "63" and "41" simultaneously to achieve parallelized decoding, but when updating x, it is also necessary to update sequentially in the order in fig. 10.

It should be noted that fig. 7 to 10 are specific examples, and do not limit the encoding and decoding methods provided in the embodiments of the present application. In practical application, the coding and decoding scheme can code data of any plurality of characters, the initial value of x before coding is not limited, and the values of parameters n, v, T and the like related to coding and decoding are not limited. In addition, the order in which the target data is encoded, and the layout of the encoding sequence can be adaptively adjusted.

It should be further noted that, the data encoding and decoding method provided by the embodiment of the present application may be applied to various data compression scenarios related to storage, and as an entropy encoding scheme, the method can support parallelized decoding without affecting the compression rate, so that the method may replace the conventional entropy encoding scheme in the data compression scenario, so as to improve the decoding efficiency. The following describes a scenario to which the data encoding and decoding method provided in the embodiment of the present application is applied, taking the data compression scenario shown in fig. 11A and 11B as an example.

Referring to fig. 11A, the data encoding and decoding scheme provided by the embodiment of the present application may be applied to a main storage system. Alternatively, the primary storage system may store various types of data, such as databases, text, images, executable files, and the like. In the scene, the compression layer can encode the mixed type data through the data encoding scheme provided by the application, thereby realizing data compression, reducing the storage overhead and saving the cost. In addition, although not shown, the system further comprises a decompression layer for decoding the compressed data by the data decoding method provided by the embodiment of the application, so as to realize data decompression.

Table 1 is compression rate comparison data obtained by encoding different types of data polarity by different data encoding and decoding methods, and table 2 is encoding and decoding speed comparison data of different data encoding and decoding methods.

TABLE 1

Wherein the compression ratio difference in table 1 is the compression ratio of the inventive scheme minus the compression ratio of the ANS.

As can be seen from the comparison data in table 1, the coding calculation accuracy of the scheme of the present application is higher than that of the conventional ANS, and the scheme of the present application can be closer to the cell entropy than the conventional ANS in terms of compression rate.

TABLE 2

The encoding speed difference in table 2 is the difference of the encoding speed of the scheme of the present application minus the encoding speed of the ANS, and the decoding speed difference is the difference of the decoding speed of the present application minus the decoding speed of the ANS.

As can be seen from the comparison data in table 2, although the encoding speed of the inventive scheme is slightly slower than the ANS, the decoding speed is clearly higher than the ANS, and thus, the inventive scheme can greatly improve the decoding efficiency of the computing device as an alternative to the ANS.

Referring to FIG. 11B, in some scenarios, the storage system may be divided into hot data and cold data according to how frequently the data is used. Wherein, the hot data is the data with higher frequency of use, and the cold data is the data with lower frequency of use. Illustratively, a key-value (KV) cache system based on flash memory may distinguish KV data from KV data.

The compression layer in the storage system can be realized by using the data coding scheme provided by the application when the stored hot data or cold data are subjected to data compression. Because the access frequency of the hot data is higher, in order to support high-performance retrieval, the situation that the coded hot data can be accessed only after all the coded hot data are decoded when the hot data are accessed is avoided, and when the compression layer codes the hot data, the coding state parameters can be adjusted by generating a random queue based on the cold data when the coding state parameters overflow too little. As can be seen from the description in the above embodiment, when the random queue is used to adjust the encoding status parameter, it can be ensured that the encoding sequence corresponding to each character in the thermal data can be independently decoded, thereby implementing random access and implementing high-performance retrieval.

Based on the same technical concept, the application also provides a computing device which can be applied to various storage related data compression scenes. The computing device is used to implement the data encoding methods provided in the embodiments and examples above. Referring to fig. 12, the computing device 1200 includes: an acquisition unit 1201 and an encoding unit 1202. The specific functions of the individual units in the computing device 1200 are described separately below.

The acquiring unit 1201 is configured to acquire target data to be encoded, and a sub-probability interval corresponding to each character in the target data within a set probability interval;

the encoding unit 1202 is configured to perform the following encoding process on the character to be encoded in the target data:

determining a section starting value and a section length of a sub-probability section corresponding to the character to be coded;

generating a coding sequence according to the interval starting value, the interval length and the coding state parameter; the value of the coding sequence is positioned in a sub-probability interval corresponding to the character to be coded, and the coding state parameter is predetermined;

storing the coding sequence into a coding queue;

and updating the coding state parameters according to the interval length.

Optionally, the probability interval is (0, 2 ⁿ ]N is an integer greater than 1; the code sequence is an n-bit sequence.

Optionally, the encoding unit 1202 is specifically configured to, when storing the encoded sequence in an encoding queue:

and storing the coding sequence to the tail end of the coding queue.

Optionally, the encoding unit 1202 is specifically configured to, when generating the encoded sequence according to the section start value, the section length, and the encoding status parameter:

When the coding state parameter is greater than or equal to a first threshold, generating the coding sequence according to the section start value, the section length and the coding state parameter; or alternatively

Reading v first sequences from a first queue when the encoding state parameter is less than the first threshold; updating the coding state parameters according to the v first sequences; wherein the updated coding state parameter is greater than or equal to the first threshold value, and v is a positive integer; and generating the coding sequence according to the interval starting value, the interval length and the coding state parameter.

Optionally, each first sequence is an n-bit sequence; the encoding unit 1202, when updating the encoding status parameters according to the v first sequences, is specifically configured to:

forming the v first sequences into a first adjusting sequence, wherein the first adjusting sequence is a v-n bit sequence;

and shifting the binary bit of the coding state parameter by v x n bits left and then taking the sum of the binary bit and the first adjustment sequence as the updated coding state parameter.

Optionally, the first threshold is equal to b_S [ i ]]*2 ^T-v*n ，b_S[i]And for the interval length, T is the upper limit of the bit number of the binary bit of the coding state parameter.

Optionally, the first queue is the coding queue, and the v first sequences are v coding sequences in the coding queue; or the first queue is a random queue, and the v first sequences are v random sequences in the random queue.

Optionally, when the first queue is the coding queue, the v first sequences are v coding sequences at the front end of the coding queue.

Optionally, the coding sequence conforms to the formula: di=c_si+x% b_si;

Optionally, the encoding state parameter conforms to the formula:

wherein ,to round down the symbol, x is the coding state parameter, b_S [ i ]]For the interval length. />

Optionally, the character to be encoded is selected from the target data according to a back-to-front or front-to-back encoding order.

Based on the same technical concept, the application also provides a computing device which can be applied to various storage related data compression scenes. The computing device is used to implement the data decoding methods provided in the above embodiments and examples. Referring to fig. 13, the computing device 1300 includes: an acquisition unit 1301 and a decoding unit 1302. The specific functions of the various units in computing device 1300 are described separately below.

An obtaining unit 1301, configured to obtain a sub-probability interval corresponding to each character in the target data in the set probability interval;

a decoding unit 1302, configured to decode a sequence to be decoded in the encoding queue according to the sub-probability interval corresponding to each character, so as to obtain a target character; the sequence to be decoded is at least one coding sequence in the coding queue, the target character comprises at least one character, and the at least one character contained in the target character corresponds to the at least one coding sequence one by one.

Optionally, the probability interval is (0, 2 ⁿ ]N is an integer greater than 1; each coding sequence in the sequences to be decoded is an n-bit sequence.

Optionally, the sequence to be decoded is at least one coding sequence at the tail end of the coding queue.

Optionally, the decoding unit 1302 is further configured to:

after obtaining a target character, determining a base parameter of the target character; the base parameter comprises the interval length of the sub-probability interval corresponding to the target character and an offset value between the value of the sequence to be decoded and the interval starting value of the sub-probability interval corresponding to the target character;

Updating the coding state parameters according to the base parameters of the target characters; wherein the encoding state parameter is predetermined.

Optionally, the decoding unit 1302 is specifically configured to, when updating the encoding status parameter according to the base parameter of the target character:

and updating the coding state parameters according to the base parameters of each character in the target characters in sequence from back to front in the coding queue according to the corresponding coding sequences.

Optionally, the encoding state parameter conforms to the formula: x=x×b_s+r_s;

Optionally, the decoding unit 1302 is further configured to:

after updating the coding state parameter according to the base parameter of the target character, when the coding state parameter is greater than or equal to a second threshold value, performing right shift processing on the coding state parameter to update the coding state parameter.

Optionally, the second threshold is equal to 2 ^T T is the upper limit of the number of bits of the binary bits of the encoding state parameter.

Optionally, the decoding unit 1302 is specifically configured to, when performing a right shift process on the encoding state parameter to update the encoding state parameter:

Optionally, the decoding unit 1302 is further configured to:

the coding state parameters are combined with 2 before the coding state parameters are updated by right shifting ^v*n -1 performing a binary bit and operation to obtain a second adjustment sequence of v x n bits;

splitting the second adjustment sequence into v second sequences, each second sequence being an n-bit sequence;

and storing the v second sequences into the coding queue.

Optionally, the decoding unit 1302 is specifically configured to, when storing the v second sequences in the encoding queue:

and storing the v second sequences to the front end of the coding queue.

Optionally, the decoding unit 1302 is specifically configured to, when decoding the sequence to be decoded according to the sub-probability interval corresponding to each character to obtain the target character:

Optionally, the decoding unit 1302 is further configured to:

restoring the target data according to the target character; the initial value of the coding state parameter is equal to the value of the coding state parameter after coding all characters in the target data, and the coding queue is obtained after coding all characters in the target data.

Based on the above embodiments, the present application further provides a computing device, which may be applied to various storage-related data compression scenarios, and may implement the data encoding and decoding methods in the above embodiments, and has the functions of the computing apparatus 1200 or the computing apparatus 1300 provided in the above embodiments. Referring to fig. 14, the computing device 1400 includes: a processor 1401, and a memory 1402. Optionally, the computing device 1400 may further include a communication module 1403, where the communication module 1403, the processor 1401, and the memory 1402 are connected to each other.

Optionally, the communication module 1403, the processor 1401 and the memory 1402 are connected to each other by a bus 1404. The bus 1404 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 14, but not only one bus or one type of bus.

The communication module 1403 is configured to receive and send data, and implement communication with other devices. Optionally, some examples and implementations of the communication module 1403 may refer to the description of the communication module 330 in the computing device shown in fig. 3 above, which is not repeated herein.

The function of the processor 1401 may refer to the description in the above embodiments, and will not be described herein.

Among other things, the processor 1401 may be a central processor (central processing unit, CPU), a network processor (network processor, NP) or a combination of CPU and NP, or the like. The processor 1401 may further comprise a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof. The processor 1401 may realize the above functions by hardware, and may be realized by executing corresponding software by hardware.

The memory 1402 is used for storing program instructions and the like. In particular, the program instructions may comprise program code comprising computer-operating instructions. Memory 1402 may include random access memory (randomaccess memory, RAM) and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. The processor 1401 executes the program instructions stored in the memory 1402 to realize the functions described above, thereby realizing the methods provided by the above-described embodiments.

Based on the above embodiments, the present application also provides a computer program product containing instructions which, when executed by a computer, cause the computer to perform the data encoding and decoding method provided by the above embodiments.

Based on the above embodiments, the present application also provides a computer-readable storage medium having stored therein a computer program which, when executed by a computer, causes the computer to execute the data encoding and decoding method provided in the above embodiments.

Wherein a storage medium may be any available medium that can be accessed by a computer. Taking this as an example but not limited to: the computer readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

Based on the above embodiments, the embodiments of the present application further provide a chip, where the chip is configured to read a computer program stored in a memory, so as to implement the data encoding and decoding method provided in the above embodiments.

Based on the above embodiments, the embodiments of the present application provide a chip system, which includes a processor for supporting a computer apparatus to implement the functions involved in the communication device in the above embodiments. In one possible design, the chip system further includes a memory for storing programs and data necessary for the computer device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

In summary, the embodiment of the application provides a data encoding and decoding method and device. In the method, the computing device can encode each character in the target data into a coding sequence and store the coding sequence into a coding queue, wherein the value of the coding sequence corresponding to any character is positioned in a sub-probability interval corresponding to the character. In the data coding mode, the value of the coding sequence corresponding to each character is positioned in the sub-probability interval corresponding to the character, and the coding sequences corresponding to different characters have no correlation, so that a plurality of coding sequences can be simultaneously decoded in parallel when the coding sequences in the coding queue are decoded later, and parallelization decoding is realized. In addition, the computing device can decode the coding sequence at any position in the coding queue to realize random access. In summary, the data encoding method provided by the embodiment of the application is used as an entropy encoding method, and can support parallelization decoding without affecting the compression rate, thereby improving the decoding efficiency.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of encoding data, comprising:

acquiring target data to be encoded, and a sub-probability interval corresponding to each character in the target data in a set probability interval;

and carrying out the following coding processing on the character to be coded in the target data:

storing the coding sequence into a coding queue;

and updating the coding state parameters according to the interval length.

2. The method of claim 1, wherein the probability interval is (0, 2) ⁿ ]N is an integer greater than 1;

the code sequence is an n-bit sequence.

3. The method of claim 2, wherein saving the encoded sequence into an encoding queue comprises:

and storing the coding sequence to the tail end of the coding queue.

4. The method of claim 3, wherein generating a coding sequence based on the interval start value, the interval length, and a coding state parameter comprises:

5. The method of claim 4, wherein each first sequence is an n-bit sequence; updating the encoding state parameters according to the v first sequences, including:

6. The method of claim 4 or 5, wherein the first threshold is equal to b_s [ i ] ]*2 ^T-v*n ，b_S[i]And for the interval length, T is the upper limit of the bit number of the binary bit of the coding state parameter.

7. The method of any of claims 4-6, wherein the first queue is the coding queue and the v first sequences are v coding sequences in the coding queue; or alternatively

The first queues are random queues, and the v first sequences are v random sequences in the random queues.

8. The method of claim 7, wherein when the first queue is the code queue, the v first sequences are v code sequences at a front end of the code queue.

9. The method according to any of claims 4-7, wherein generating the coding sequence from the interval start value, the interval length, and the coding state parameter comprises:

the coding sequence conforms to the formula: di=c_si+x% b_si;

10. The method according to any of claims 1-9, wherein updating the coding state parameter according to the interval length comprises:

The encoding state parameter conforms to the formula:

wherein ,to round down the symbol, x is the coding state parameter, b_S [ i ]]For the interval length.

11. The method according to any of claims 1-10, wherein the character to be encoded is selected from the target data in a back-to-front or front-to-back encoding order.

12. A method of decoding data, comprising:

acquiring a sub-probability interval corresponding to each character in target data in a set probability interval;

decoding a sequence to be decoded in the coding queue according to the sub-probability interval corresponding to each character to obtain a target character; the sequence to be decoded is at least one coding sequence in the coding queue, the target character comprises at least one character, and the at least one character contained in the target character corresponds to the at least one coding sequence one by one.

13. The method of claim 12, wherein the probability interval is (0, 2) ⁿ ]N is an integer greater than 1; each coding sequence in the sequences to be decoded is an n-bit sequence.

14. The method of claim 13, wherein the sequence to be decoded is at least one coding sequence at the end of the coding queue.

15. The method of claim 14, wherein after obtaining the target character, the method further comprises:

determining a base parameter of the target character; the base parameter comprises the interval length of the sub-probability interval corresponding to the target character and an offset value between the value of the sequence to be decoded and the interval starting value of the sub-probability interval corresponding to the target character;

16. The method of claim 15, wherein updating the encoding state parameter based on the base parameter of the target character comprises:

17. The method of claim 16, wherein updating the encoding state parameter based on the base parameter for each of the target characters comprises:

the encoding state parameter conforms to the formula: x=x×b_s+r_s;

18. The method of any of claims 15-17, wherein after updating the encoding state parameters based on the base parameters of the target character, the method further comprises:

and when the coding state parameter is greater than or equal to a second threshold value, performing right shift processing on the coding state parameter to update the coding state parameter.

19. The method of claim 18, wherein the second threshold is equal to 2 ^T T is the upper limit of the number of bits of the binary bits of the encoding state parameter.

20. The method of claim 18 or 19, wherein right shifting the encoding-state parameters to update the encoding-state parameters comprises:

21. The method of any of claims 18-20, wherein prior to updating the encoding-state parameters by right shifting the encoding-state parameters, the method further comprises:

the coding state parameter is 2 ^v*n -1 performing a binary bit and operation to obtain a second adjustment sequence of v x n bits;

And storing the v second sequences into the coding queue.

22. The method of claim 21, wherein saving the v second sequences into the encoding queue comprises:

and storing the v second sequences to the front end of the coding queue.

23. The method according to any one of claims 12-22, wherein decoding the sequence to be decoded according to the sub-probability interval corresponding to each character to obtain the target character comprises:

24. The method of any one of claims 12-23, wherein the method further comprises:

25. A computing device, comprising: an acquisition unit and a coding unit;

The acquisition unit is used for acquiring target data to be coded and a sub-probability interval corresponding to each character in the target data in a set probability interval;

the coding unit is used for carrying out the following coding processing on the character to be coded in the target data:

storing the coding sequence into a coding queue;

and updating the coding state parameters according to the interval length.

26. A computing device, comprising: an acquisition unit and a decoding unit;

the acquisition unit is used for acquiring a sub-probability interval corresponding to each character in the target data in the set probability interval;

the decoding unit is used for decoding the sequence to be decoded in the coding queue according to the sub-probability interval corresponding to each character to obtain target characters; the sequence to be decoded is at least one coding sequence in the coding queue, the target character comprises at least one character, and the at least one character contained in the target character corresponds to the at least one coding sequence one by one.

27. A computing device, comprising:

a memory for storing program instructions and data;

a processor for reading program instructions and data in said memory, implementing the method of any of claims 1-24.

28. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to perform the method of any of claims 1-24.

29. A computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-24.

30. A chip, characterized in that the chip is coupled to a memory, the chip reading a computer program stored in the memory, performing the method of any of claims 1-24.