WO2024106556A1

WO2024106556A1 - Method and device for floating-point data compression

Info

Publication number: WO2024106556A1
Application number: PCT/KR2022/018017
Authority: WO
Inventors: 한정호
Original assignee: 주식회사 사피온코리아
Priority date: 2022-11-14
Filing date: 2022-11-15
Publication date: 2024-05-23
Also published as: KR20240070783A

Abstract

A method and a device for floating-point data compression are disclosed. According to an aspect of the present invention, a data compression device is provided, the device comprising: at least one memory for storing instructions; and at least one processor for, by executing the instructions, generating compressed data elements from a data block including data elements encoded according to a floating-point format.

Description

Method and apparatus for floating point data compression

Embodiments of the present invention relate to a method and apparatus for floating point data compression.

The content described below simply provides background information related to this embodiment and does not constitute prior art.

Recently, as deep learning algorithms develop, deep learning algorithms are being applied to various technical fields. As an example, neural networks trained according to deep learning algorithms provide solutions to various problems. In particular, in the image processing field, convolutional neural networks (CNNs) are mainly used.

Since image processing using convolutional neural networks requires a vast amount of data calculation, hardware accelerators or artificial intelligence (AI) accelerators for efficient data calculation are being developed. AI accelerators include hardware components or engines optimized for deep learning algorithms and have faster neural network calculation speeds than general computing devices.

However, due to limitations in internal memory, it is difficult for AI accelerators to store high-quality images or many feature maps based on neural network calculations in internal memory or on-chip memory.

Accordingly, the AI accelerator divides the input images or feature maps into a plurality of small tiles and stores the divided tiles in external memory or off-chip memory. Afterwards, the AI accelerator copies some of the necessary tiles among the stored tiles to internal memory or on-chip memory and performs image processing using the copied tiles.

Nevertheless, the performance of the AI accelerator may deteriorate due to the AI accelerator moving large amounts of data between external memories. For example, a large amount of power is consumed when an AI accelerator writes data to an external memory and reads data from an external memory. Specifically, the power required to move data between an AI accelerator and external memory can be hundreds of times greater than the power required to move data within an AI accelerator. Furthermore, due to bandwidth limitations between the AI accelerator and external memory, not only does it take a lot of time for the AI accelerator to process data, but the performance of the AI accelerator may be limited.

Therefore, methods are needed to minimize access to the external memory of the AI accelerator.

Previously, methods to reduce the number of times an AI accelerator accesses external memory were mainly studied. As an example, there is a method of setting the operation order so that data copied to the internal memory of the AI accelerator can be used again. As another example, when the calculation for the current layer is completed using data copied to the internal memory of the AI accelerator, there is a method of using the calculation result in the next layer rather than moving the calculation result to external memory.

However, the method of reducing the number of times an AI accelerator accesses external memory has limitations in solving power consumption, time consumption, and performance limitations due to data movement between the AI accelerator and external memory.

Embodiments of the present invention are a data compression device and method for reducing power consumption and time required for data access by compressing data between an AI accelerator and an external memory, and improving performance limitations of the AI accelerator due to memory bandwidth limitations. The main purpose is to provide.

The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

According to one aspect of the present invention, there is provided a data compression device, comprising: at least one memory storing instructions, and executing the instructions, thereby extracting data blocks containing data elements encoded according to a floating point format. A data compression apparatus is provided that includes at least one processor that generates compressed data elements. For a first data element whose exponent value matches one of preset exponent values among the data elements, the processor sends an operation code corresponding to the exponent value of the first data element, the first A first compressed data element containing the sign bit and mantissa bits of the data element is generated. For a second data element whose exponent value among the data elements does not match the preset exponent values and is not a zero value, the processor generates a skip code and a second compressed data element including the second data element. creates .

According to another aspect of the present embodiment, a computer-implemented method for data compression is provided. The method includes generating compressed data elements from a data block containing data elements encoded according to a floating point format. The step of generating the compressed data elements includes, for a first data element whose exponent value matches one of preset exponent values among the data elements, a command code corresponding to the exponent value of the first data element ( operation code), and generating a first compressed data element including sign bits and mantissa bits of the first data element. The step of generating the compressed data elements includes a skip code and a skip code for a second data element whose exponent value does not match the preset exponent values and is not a zero value. and generating a second compressed data element.

As described above, according to an embodiment of the present invention, by compressing data between the AI accelerator and external memory, power consumption and time required for data access are reduced, and performance limitations of the AI accelerator due to memory bandwidth limitations are improved. can do.

The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those skilled in the art from the description below.

1 is a configuration diagram of an AI acceleration system according to an embodiment of the present invention.

Figure 2 is an example diagram to explain the half-precision floating point format.

3A and 3B are diagrams illustrating a compressed data format according to an embodiment of the present invention.

Figure 4 is a flowchart of a data compression method according to an embodiment of the present invention.

Figure 5 is a diagram showing examples of data elements.

Figures 6a and 6b are diagrams showing a data compression method according to an embodiment of the present invention.

Figures 7a and 7b are diagrams showing a data compression method according to an embodiment of the present invention.

Figures 8a and 8b are diagrams showing a data compression method according to an embodiment of the present invention.

Figure 9 is a flowchart of a bit stream generation method according to an embodiment of the present invention.

Figure 10 is a diagram for explaining compression of input values of the softmax function according to an embodiment of the present invention.

Hereinafter, some embodiments of the present disclosure will be described in detail using exemplary drawings. When adding reference signs to components in each drawing, it should be noted that the same components are given the same reference numerals as much as possible even if they are shown in different drawings. Additionally, in describing the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description will be omitted.

In describing the components of the embodiment according to the present disclosure, symbols such as first, second, i), ii), a), and b) may be used. These codes are only used to distinguish the component from other components, and the nature, order, or order of the component is not limited by the code. In the specification, when a part is said to 'include' or 'have' a certain component, this means that it does not exclude other components, but may further include other components, unless explicitly stated to the contrary. .

Each component of the device or method according to the present invention may be implemented as hardware or software, or may be implemented as a combination of hardware and software. Additionally, the function of each component may be implemented as software and a microprocessor may be implemented to execute the function of the software corresponding to each component.

Below, the technical problems of neural network operation and solutions for the data compression device according to an embodiment of the present invention will be explained through several drawings.

Types of deep neural networks used in deep learning algorithms include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), or Transformer Neural Network.

Among deep neural networks, operations on a convolutional neural network include convolutional operations in each of a plurality of convolutional layers, so a deep learning algorithm based on a convolutional neural network requires a hardware device that can process a large amount of calculations.

In addition, since operations on recurrent neural networks and transformer neural networks among deep neural networks include large-scale matrix multiplication and matrix addition, recurrent neural network-based deep learning algorithms require hardware devices that can handle a large amount of calculations.

The AI accelerator according to an embodiment of the present invention can accelerate neural network calculations using an AI accelerator including calculation units optimized for neural network calculations.

Referring to FIG. 1, the AI acceleration system includes an AI accelerator 110 and an external memory 120.

Because the amount of feature data in the neural network calculation process is enormous, it is difficult for the AI accelerator 110 to store the feature data of all layers in the internal memory.

Accordingly, the external memory 120 stores data necessary for neural network calculation, and the AI accelerator 110 loads data stored in the external memory 120 and processes the retrieved data to perform neural network calculation.

Below, each component of the AI accelerator system is described.

The external memory 120 stores image data, feature data, weights, and biases as input data or output data of neural network operations. When calculating a convolutional neural network, the external memory 120 stores a plurality of tiles divided from an input image or feature maps.

The external memory 120 has a larger storage capacity than the data memory 116 within the AI accelerator 110. The external memory 120 may be implemented as Dynamic Random Access Memory (DRAM).

External memory 120 may be referred to as off-chip memory.

The AI accelerator 110 is a hardware device optimized for neural network calculations and performs neural network calculations using data stored in the external memory 120. The AI accelerator 110 stores the processed data in the external memory 120. When calculating a convolutional neural network, the AI accelerator 110 may perform the neural network calculation on some of the tiles stored in the external memory 120.

The AI accelerator 110 includes a processor 111, a computation unit 112, a direct memory access DMA 113, a compression/decompression unit 114, a program memory 115, and a data memory. Includes (116).

The processor 111 controls the overall operations of the AI accelerator 110 by executing program instructions stored in the program memory 115.

The data memory 116 stores data necessary for controlling the processor 111 and data read from the external memory 120. As an example, data memory 116 may store feature data or weights of a neural network.

Program memory 115 and data memory 116 may be referred to as internal memory or on-chip memory.

The operation unit 112 performs neural network operations such as convolution operations.

The calculation unit 112 is configured to be optimized for neural network calculations and has a fast processing speed for neural network calculations.

The DMA 113 supports data transfer between the data memory 116 and the external memory 120.

The compression/decompression unit 114 supports data compression or data decompression to reduce the amount of data transfer between the data memory 116 and the external memory 120.

The compression/decompression unit 114 may be controlled according to program execution of the processor 111 or may include at least one processor and at least one memory that performs a data compression process.

According to one embodiment of the present invention, the compression/decompression unit 114 can reduce the power and time required for data transmission by reducing the bit length of data between the data memory 116 and the external memory 120.

Specifically, the neural network training device trains the parameters of the neural network in conjunction with the AI accelerator 110. The parameters of the neural network are primarily trained according to the floating point format for the accuracy of the neural network, and secondarily trained according to the fixed point format for the computational speed of the neural network. As an alternative, the parameters of the neural network can be quadratic trained according to the integer format.

Specifically, because the floating point format can represent a larger range of numbers than the fixed point format or integer type, the parameters of a neural network trained according to the floating point format can improve the accuracy of the neural network.

However, since data encoded in floating point format has a long bit length, there is a problem in that power consumption and time required to access data between the AI accelerator 110 and the external memory 120 are large.

To solve this problem, the compression/decompression unit 114 uses two compression methods: a method of encoding data elements encoded according to a floating point format according to a data compression format, or a method of encoding data elements to be input to the softmax function. At least one method of compressing data can be used.

Specifically, the compression/decompression unit 114 converts the data between the data memory 116 and the external memory 120 from a floating point format to a data format according to an embodiment of the present invention, thereby reducing the bit length of the data element. It can be reduced. In particular, depending on the floating point format, the bit length of the exponent value of the encoded data element is reduced, or the total bit length of the data element with a value of zero is reduced.

Furthermore, taking into account the characteristics of the process related to the softmax function used in the softmax layer in the neural network, the compression/decompression unit 114 compresses the data in a range that does not affect or has a slight effect on the result value of the softmax function. Among the mantissa bits of the element, only the upper mantissa bits are stored in the external memory 120.

According to one embodiment of the present invention, a data compression device for compressing data may correspond to the compression/decompression unit 114. In another embodiment, the data compression device may include a processor 111, a program memory 115, a data memory 116, and a compression/decompression unit 114.

Floating point formats are expressed in binary for machine operations.

Depending on the precision, floating point formats can be either half-precision, single-precision, or double-precision. Half-precision floating point points are expressed with 16 bits, single-precision floating point points are expressed with 32 bits, and double-precision floating point points are expressed with 64 bits. The floating point format includes a sign field, an exponent field, and a mantissa field.

Referring to Figure 2, bit configurations and categories for a half-precision floating point format are shown.

The floating point format is based on the floating point format disclosed in the IEEE 754 standard document. The half-precision floating point format includes a 1-bit sign field, a 5-bit exponent field, and a 10-bit mantissa field. In other embodiments, the number of bits in each field may vary.

The half-precision floating point format can be normalized number, subnormal number, zero (+/- zero), NaN (Not a Number), and infinity (+/- Infinity) depending on the values of the exponent and mantissa. ) can be indicated.

Normal values represent values that can be expressed in the form (-1) ^S × 2 ^(E-15) × (1.M) ₂ . E is the exponent value, and M is the mantissa bits. Because the exponent field does not have a sign bit, a bias of -15 is used to represent negative numbers. For example, the exponent bits of 10000 represent 16 in decimal representation, but actually represent an exponent value of 1 due to a -15 bias. Meanwhile, the mantissa bits have an implied leading 1. As an interpretation of a regular value, if the (1.M) ₂ value expressed in binary is shifted to the left by (E-15) and the sign value is applied, the value according to the binary expression is calculated.

Non-normal values represent values expressed in the form (-1) ^S ×2 ^-14 ×(0.M) ₂ . Non-normal values are intended to represent small numbers that are not represented by regular values. The mantissa bits of non-normal values do not have a leading 1. Numbers close to 0 can be expressed through non-normal values.

A zero value represents a value in which both the exponent bits and mantissa bits are zero. The sign bit can be 0 or 1.

Infinity represents a value outside the range represented by the normal value.

Below, of the two compression methods, a method of encoding data elements encoded according to a floating point format according to a data compression format will be described.

Meanwhile, hereinafter, the half-precision floating point format will be described as including a 1-bit sign field, a 6-bit exponent field, and a 9-bit mantissa field.

Feature values generated during neural network training have similar values as they are closer to each other on feature maps. For example, within an image tile, pixels associated with a particular object are likely to have similar values. In floating point format, adjacent feature values are more likely to have similar exponent values. That is, when the feature values of a neural network are encoded into data elements according to a floating point format, the exponent values of the data elements have locality.

The compressed data format according to an embodiment of the present invention is a format that can compress the bits of data elements using the locality of the exponent values of the data elements.

The data compression device sets reference exponent values for data elements with locality, and determines data according to whether the exponent values of the data elements are included in the exponent range according to the reference exponent value or whether the exponent values of the data elements are zero values. Compress elements.

Referring to Figure 3A, a compressed data format according to one embodiment of the present invention is shown.

The compressed data format includes a data item and an operation code (OP code) for compressing the exponent bits of the data element.

A data item includes the sign bit and mantissa bits of the data element. The data item may contain additional exponent bits as needed, or may contain no bits.

The command code may indicate a characteristic of the data element or the zero bits of the data element. Indication according to the command code can be defined in advance.

Referring further to FIG. 3B, a table in which instructions according to command codes are recorded is shown.

The command code, skip code, and zero code are shorter than the exponent bit length according to the floating point format. To compress the 5 exponent bits in length according to the half-precision floating point format, the instruction code can be 3 bits long.

In Figure 3b, E _in represents the exponent value of the input data element, and E _ref represents the reference exponent value. The exponent range according to the reference exponent value includes E _ref , E _ref +1, E _ref +2, E _ref -1, and E _ref -2.

The command code of 000 bits indicates that the exponent value of the input data element is the same as the reference exponent value.

Command codes of 001 bits to 110 bits indicate that the exponent value of the input data element is included in the exponent range according to the reference exponent value. In Figure 3b, the range of exponent values is evenly distributed around the reference exponent value. On the other hand, in other embodiments, the index range may not be symmetrical with respect to the reference index value. For example, the reference index value may be the smallest value within the index range. As another example, the reference index value may be the largest value.

A command code of 101 bits indicates that the input data element is a zero value. In another embodiment, a command code representing an exponent range may be used instead of a command code representing a zero value. When the amount of data with zero values in the layers of a neural network is small, the compression rate can be increased.

The 110-bit command code indicates that the exponent value of the input data element is outside the exponent range according to the reference exponent value.

The 111-bit command code indicates a command that sets the exponent value of the input data element as the reference exponent value.

The data compression device encodes data elements encoded according to the floating point format according to one of the compressed data formats shown in FIG. 3A.

A data compression device generates compressed data elements from a data block containing data elements encoded according to a floating point format.

Referring to FIG. 4, the data compression device receives a first data element as an initial data element.

When a preset reference index value does not exist, the data compression device sets the index value of the input first data element as the reference index value (S410).

Furthermore, the data compression device sets a plurality of exponent values including the exponent value of the first input data element. The command code corresponding to each of the set exponent values is also set.

The data compression device generates a first compressed data element by combining the first data element with a set code indicating the setting of the reference index value (S420).

With further reference to FIG. 3B, the first encoded data element may include 111 bits, total bits of the first data element.

Thereafter, the data compression device generates a second compressed data element depending on whether the exponent value of the input second data element is included in the exponent range according to the reference exponent value (S430).

Here, the index range according to the reference index value includes a plurality of index values including the reference index value. Referring further to FIG. 3B, whether the exponent value of the second data element is included in the exponent range according to the reference exponent value may be determined depending on whether the exponent value of the second data element matches preset exponent values.

When the exponent value of the second data element matches one of the preset exponent values, the data compression device provides an instruction code corresponding to the exponent value of the second data element, the sign bit of the second data element, and the second data element. Encode the second data element to produce a second compressed data element containing mantissa bits.

On the other hand, if the exponent value of the second data element does not match any of the preset exponent values, the data compression device may skip data compression or set a new reference exponent value.

Specifically, when the exponent value of the second data element does not match any one of the preset exponent values, the data compression device generates a second data element to include a skip code indicating compression skip and all bits of the second data element. Data elements can be encoded. If data elements input after the second data element have locality with the first data element rather than the second data element, the compression rate can be increased.

Otherwise, the data compression device sets the exponent value of the second data element to the reference exponent value and encodes the second data element to include a setting code indicating the setting of the reference exponent value and all bits of the second data element. You can. If data elements input after the second data element have locality with the second data element, the compression rate can be increased.

Meanwhile, according to an embodiment of the present invention, when the input second data element is a zero value, the data compression device may encode the second data element to include a zero code indicating the zero value. Determining whether the second data element is a zero value may take precedence over determining whether the second data element is included in the exponent range. The second encoded data element is compressed to 3 bits long. When a lot of data with a value of 0 is generated by the ReLU (Rectified Linear Unit) function in the layers of the neural network, the compression rate can increase.

According to one embodiment of the present invention, the data compression device may apply at least one of a skip code or a zero code to each layer of the neural network. In a layer with many feature values having a value of 0, only zero codes may be used, or zero codes and skip codes may be used. On the other hand, in a layer with few feature values whose value is 0, only skip codes can be used. To this end, the data compression device can predict the amount of feature values with a value of 0 according to the operation to be processed in each layer of the neural network, and determines whether to use a zero code based on the amount of feature values with a value of 0. can do.

According to another embodiment of the present invention, the data compression apparatus may analyze the distribution of data elements based on the values of the data elements, and apply at least one of a skip code or a zero code based on the distribution of the data elements. If there are many data elements with a value of 0, only the zero code may be used, or the zero code and skip code may be used. On the other hand, when there are few data elements with a value of 0, only a skip code can be used.

As described above, the data compression device can reduce the total bit length of the data elements by encoding data elements according to the floating point format according to the compressed data format. Furthermore, no bit loss occurs.

The data compression device stores the first compressed data element and the second compressed data element in an external memory (S450).

A data compression device can encode data elements in predetermined units and store the encoded data elements in an external memory. As an alternative, a data compression device can encode and then store data elements by element.

According to one embodiment of the present invention, a data compression device can reduce the number of bits of data elements transmitted for storage in an external memory by ignoring some of the mantissa bits of the compressed data elements. A detailed explanation will be provided later.

Figure 5 is a diagram showing examples of data elements.

Referring to Figure 5, 14 data elements are shown. Data elements may be grouped into predetermined units to form one data block. Each data element includes a 1-bit sign field, a 6-bit exponent field, and a 9-bit exponent field.

In Figure 5, most of the data elements have values of similar sizes, and some data elements have zero values or values that are significantly different from other data elements.

Below, various embodiments of compressing the data elements illustrated in FIG. 5 will be described.

Referring to FIG. 6A, a table in which instructions according to command codes are recorded is shown. The table includes command codes for setting reference exponent values and command codes for exponent ranges, but does not include command codes for compression skip and command codes for zero values.

Meanwhile, the data compression device is a method of determining whether the exponent value of a data element is included in the exponent range according to a preset reference exponent value, and whether the difference between the exponent value of the data element and the reference exponent value is included in the preset range. You can judge whether or not. As another method, the data compression device may set values adjacent to the reference exponent value and determine whether the exponent value of the data element matches the set exponent values. However, the explanation below will be based on the former method.

The index range has a value that is 3 greater than the reference index value as the upper limit, and a value that is 3 less than the reference index value as the lower limit.

Referring to FIG. 6B, the 0th data element having (0, 11_0000, 0_1111_1111) bits is first input to the data compression device. Since there is no preset reference exponent value, the difference value between the exponent value of the 0th data element and the reference exponent value also does not exist.

The exponent value of the 0th data element is set as a reference exponent value by the data compression device. The zeroth data element is encoded to include all bits of the zeroth data element and 111 bits indicating the setting of the reference index value. The number of bits of the first encoded data element is 19.

Since the exponent value of the first data element is greater than the reference exponent value by 1, it is included in the exponent range according to the reference exponent value. Accordingly, the first data element is encoded to include 001 bits representing a value that is one greater than the reference exponent value, a sign bit of the first data element, and mantissa bits of the first data element. The number of bits of the first encoded data element is 13.

The second data element is encoded through a similar process as the first data element. The number of bits of the second encoded data element is 13.

Since the exponent value of the third data element is -48 smaller than the reference exponent value, the difference value between the exponent value of the third data element and the reference exponent value exceeds the preset range of 3. Accordingly, the exponent value of the third data element is set as a new reference exponent value by the data compression device. The third data element is encoded to include all bits of the third data element and 111 bits indicating the setting of the reference exponent value.

Afterwards, since the exponent value of the fifth data element is greater than the reset reference exponent value by 43, the difference value between the exponent value of the fifth data element and the reset reference exponent value exceeds the preset range of 3. Accordingly, the exponent value of the fifth data element is set as a new reference exponent value by the data compression device.

Through the above-described process, the data compression device converts each data element encoded into 16 bits into 13 bits or 19 bits depending on the compressed data format. According to the results of converting 14 data elements, the data compression device can reduce the total number of bits of data elements from 224 to 218.

However, even though the 5th to 10th data elements have exponent values similar in size to the 0th data element, the 1st data element, and the 2nd data element, due to the presence of the 3rd data element with a zero value, , it is necessary to set the reference index value twice.

Referring to FIG. 7A, a table in which instructions according to command codes are recorded is shown. The table includes command codes for setting reference exponent values, command codes for exponent ranges, and command codes for compression skip, and does not include command codes for zero values.

The instruction code for compression skip is encoded along with all bits of the data element when the exponent value of the data element is outside the exponent range according to the reference exponent value. The reference index value is not reset. Once set, the reference index value can be used within one or more data blocks. The command code related to compression skip may be referred to as a skip code.

Meanwhile, in FIG. 7A, the exponent range has a value that is 2 greater than the reference index value as the upper limit, and a value that is 3 less than the reference index value as the lower limit.

Referring to Figure 7b, the zeroth data element, the first data element and the second data element are encoded the same as described in Figures 7a and 7b.

The exponent value of the third data element is -48 smaller than the reference exponent value, so it exceeds the exponent range according to the reference exponent value. Here, the exponent value of the third data element is not set as the new reference exponent value.

The third data element is encoded to include all bits of the third data element and a 110-bit skip code representing compression skip. The number of bits of the third encoded data element is 19.

In summary, when a data element with an exponent value exceeding the exponent range according to the reference exponent value is input, the data compression device does not reset the reference exponent value but omits data compression.

The more data elements with similar sizes in a data block, the higher the compression rate. According to the results of converting the 14 data elements, the data compression device can reduce the number of bits of the data elements from 224 to 212.

Referring to FIG. 8A, a table in which instructions according to command codes are recorded is shown. The table includes command codes for setting reference exponent values, command codes for exponent ranges, command codes for compression skip, and command codes for zero values.

The command code for the zero value is a code that indicates that all bits of the data element are 0. Data elements are encoded according to the command code for the zero value. The encoded data element consists of bits according to the command code. A command code related to a zero value may be referred to as a zero code.

Referring to Figure 8b, the zeroth data element, the first data element and the second data element are encoded the same as described in Figures 6a and 6b.

Meanwhile, before determining whether the third data element is included in the index range, whether the third data element is a zero value may be determined first. According to this, since the third data element is a zero value, it is encoded to include a zero code of 101 bits. The number of bits of the third encoded data element is three.

According to the results of converting the 14 data elements, the data compression device can reduce the number of bits of the data elements from 224 to 196.

The more data elements with zero values in a data block, the higher the compression ratio. As an example, when the ReLU function is used as the activation function, the compression ratio can be increased.

Referring to FIG. 9, the data compression device groups compressed data elements according to exponent ranges or zero values (S910).

A data compression device sets data elements that have similar sizes and are adjacent to each other into one data group. If consecutive data elements have exponent values included within an exponent range according to one reference exponent value, they form one data group. Compressed data elements with zero values are included in the zero group. Accordingly, a zero group and at least one data group are created.

The data compression device merges at least one data group according to the number or zero value of compressed data elements (S920).

Among two consecutive data groups, when the trailing data group contains one compressed data element, the trailing data group is merged into the preceding data group. The command code of the compressed data element included in the trailing data group is changed to a skip code.

When the trailing data group among two consecutive data groups is a zero group, the trailing data group is merged into the leading data group.

The data compression device generates a bit stream from data groups (S930).

Meanwhile, Table 1 is a table showing compression rates according to embodiments of the present invention.

Referring to Table 1, compression results derived by applying embodiments of data compression formats to each of a plurality of neural networks are shown.

Examples of data compression include typical data compression, data compression with a skip code, and data compression with a skip code and zero code.

General data compression refers to compression using command codes for setting reference exponent values and command codes for exponent ranges. Data compression according to a skip code indicates that a skip code is applied to general data compression. Data compression with skip code and zero code applied indicates that skip code and zero code are applied to general data compression.

Meanwhile, mixed compression per layer applies a skip code to layers with a low frequency of occurrence of feature values with a size of 0, and a skip code and zero code to layers with a high frequency of occurrence of feature values with a size of 0. Indicates how to apply.

In Table 1, the BERT model is based on the sign field, exponent field, and mantissa field being 1 bit, 6 bit, and 9 bit, respectively. For Resnet50 model and Yolov3 model, it is based on that the sign field, exponent field and mantissa field are 1 bit, 4 bit and 3 bit respectively.

General data compression achieves a compression ratio of about 10% for the BERT model.

Data compression based on skip codes achieves higher compression rates than regular data compression for all neural networks. However, for the Resnet50 model, the average data size increases than the original data size.

Data compression with skip code and zero code results in a slightly smaller compression ratio compared to data compression with skip code for the BERT model, but achieves much higher compression rates for the remaining models. This is because the BERT model uses the softmax function as an activation function, so the frequency of occurrence of feature values with a size of 0 is low, whereas in the Resnet50 model and the YOLOv3 model, the frequency of occurrence of feature values with a size of 0 is low because the ReLU function is used. Because it is high. In other words, when the frequency of occurrence of feature values with size 0 in each layer of the neural network is high, the compression rate of data compression using skip code and zero code is high. On the other hand, when the frequency of occurrence of feature values with size 0 in each layer of the neural network is low, the compression rate of data compression using skip code is high. This is because the bits corresponding to the zero code can be used as codes representing the exponent range.

Below, before explaining the method of compressing the mantissa bits of data elements to be input to the Softmax function among the two compression methods, the calculation process and technical problems of the Softmax function will be explained.

The AI accelerator uses Equation 1 to calculate the softmax function in the neural network.

In Equation 1, X represents a set of data elements, and X _max represents the reference data element with the largest value among the data elements. X may be referred to as a vector containing data elements. Data elements within a data block X are expressed as X _i . XX _max is referred to as the first intermediate value of the softmax function,

is referred to as the second intermediate value of the softmax function.

As an input value of the exponential function, using X _i -X _max instead of the data element X _i is to prevent the exponential function value for the _data element

Meanwhile, since it is difficult for the AI accelerator to store the data elements input to the softmax function in the internal memory at once, it loads some of the data elements stored in external memory and calculates the softmax intermediate values for some data elements. Softmax function values are obtained by calculating the softmax median values for some data elements, storing the softmax median values for some data elements in external memory, and normalizing the softmax median values for all data elements.

Specifically, a set of data elements X is stored in external memory. When data elements are compressed according to a data compression format, the compressed data elements are stored in external memory. Data elements are stored as inputs to the softmax function. Before saving, the AI accelerator can identify the reference data element with the largest value among the data elements.

Afterwards, the AI accelerator loads _the data element

Calculate as the second intermediate value of the softmax function and store it in external memory. Before saving, the AI accelerator can obtain the sum of the natural exponential function values for the difference between all data elements and the reference data element by accumulating the median value.

Once the second medians and the total of the second medians for all data elements are calculated, the AI accelerator loads some of the second medians stored in external memory and divides them by the total of the second medians. , obtain the softmax function value for the data element X _i . That is, the AI accelerator can obtain the output values of the softmax function by normalizing the second intermediate values of the softmax function. The AI accelerator stores softmax function values for data elements in external memory.

Here, the AI accelerator generates data elements X and natural exponential function values for the difference between the data elements and the reference data element.

That is, in the process of storing the second intermediate values in external memory and reading them from the external memory, a lot of power and time are consumed depending on the amount of data transfer.

Below, of the two compression methods, a method of compressing the mantissa bits of data elements to be input to the softmax function will be described.

According to one embodiment of the present invention, the data compression device included in the AI accelerator removes the bits of data elements that are input values of the softmax function, particularly at least one lower mantissa bit among the mantissa bits of the data elements, thereby reducing AI The amount of data transfer between the accelerator and external memory can be reduced.

Specifically, the extent to which the influence on the first median value indicating the difference between the data point and the reference data point is low, or the influence on the second median indicating the natural exponential function value for the difference between the data point and the reference data point. Inside, the data compression device omits some of the mantissa bits of the data elements X, which are input values of the softmax function.

Compression of data elements considering their influence on the first intermediate value or compression of data elements considering their influence on the second intermediate value may be selectively applied.

Below, compression of data elements considering their influence on the first intermediate value is described.

Referring to FIG. 10, when calculating the softmax function, among the lower mantissa bits of the data element X _i , the lower mantissa bits that affect the difference value between the data element X _i and the reference data element X _max are identified.

Among the data elements, the reference data element X _max and the one data element X _i with the largest value are shown as blocks in binary form. The 1 value preceding the mantissa bits represents the number according to the normal value. The sign bit is not shown. As an example, the reference data element X _max is a combination of the preceding 1 bit and mantissa bits M ₁ .

First, in the softmax function, the AI accelerator calculates the difference value X _max -X _i between the reference data element and the data element. The difference value X _max -X _i between the reference data element and the data element is referred to as the first intermediate value of the softmax function. For this purpose, the data element is bit shifted to the right relative to the reference data element. The data element is bit shifted to the right by the value obtained by subtracting the exponent value of the data element from the exponent value of the reference data element. Once bit alignment is complete, subtraction is performed between the reference data element and the data element.

When the difference between the exponent value of the reference data element and the exponent value of the data element is greater than 1, the result of subtraction between the reference data element and the data element can be classified into three cases.

In case a), the most _significant bit _of the first intermediate _value The mantissa bits M ₃ of the first intermediate value X _max -X _i have the same bit range as the mantissa bits M ₁ of the reference data element X _max . _Meanwhile _, among _the mantissa bits M ₂ _of the data element That is, among the mantissa bits of the data element, the lower mantissa bits corresponding to the difference between the exponent value E _max of the reference data element and the exponent value E _i of the data element are not used. Otherwise, the lower order mantissa bits may be rounded off and added to the mantissa bits of the first intermediate value X _max -X _i .

In case b), the most significant bit of the first intermediate value X _max -X _i has the value of 1 at a position one bit below the most significant bit _of the reference data element Among the mantissa bits M ₂ of the data element X _i _, the lower mantissa bits located one bit lower than _the least significant bit of the reference data _element On the other hand, the mantissa bits from the least significant bit among _the mantissa bits M ₂ of the data element X _i to a position two bits lower than the _least significant bit of the _reference data element

In case c), due to the sign difference between the reference data element and the data element, the most significant bit of the first intermediate value X _max _-X _i has a value of 1 at a position one bit higher than the most significant bit of the reference data element In other words, carry out occurs. The mantissa bits from the least significant bit among the mantissa bits M ₂ of the data element X _i to the mantissa bits one bit higher than _the least significant bit of _the _reference data element .

Considering this situation, when the difference between the exponent value of the reference data element and the exponent value of the data element is greater than 1, the data compression device ignores at least one lower mantissa bit among the mantissa bits included in the data element. The compressed data element X _i ' is stored as the input value of the softmax function.

Specifically, the data compression device identifies the reference data element representing the largest value among the data elements and determines the difference between the exponent value of the data element and the exponent value of the reference data element. The data compression device determines the number of at least one lower mantissa bit to be ignored among the mantissa bits included in one compressed data element, based on the difference between the exponent value of the data element and the exponent value of the reference data element. The data compression device stores the secondary compressed data element generated by ignoring the determined at least one lower mantissa bit as an input value of the softmax function.

Here, the data compression device may vary the number of at least one lower mantissa bit to be ignored according to comparison between the sign value S _i of the data element and the sign value S _max of the reference data element.

If the sign value of the data element is the same as the sign value of the reference data element, the data compression device ignores the first lower mantissa bits among the mantissa bits included in the data element. Here, the first lower mantissa bits represent lower mantissa bits corresponding to the difference between the exponent value of the reference data element and the exponent value of the data element minus 1 among the mantissa bits. That _is , _the _compressed data _element

If the sign value of the data element is not the same as the sign value of the reference data element, the data compression device ignores the second lower mantissa bits among the mantissa bits included in the data element. The second lower mantissa bits represent lower mantissa bits corresponding to the difference between the exponent value of the data element and the exponent value of the reference data element. That _is _, the _compressed data _element

Otherwise, the data compression device determines the exponent value of the data element and the exponent value of the reference data element among the mantissa bits included in the compressed data element, regardless of the sign value S _i of the data element and the sign value S _max of the reference data element. The third lower mantissa bits having a number less than the difference value can be ignored. For example, the data compression device may ignore the lower mantissa bits corresponding to the difference minus 2 among the mantissa bits included in the compressed data element. If the number of lower order mantissa bits to be ignored is small, compression performance deteriorates. On the other hand, if the number of lower order mantissa bits to be ignored is large, compression performance increases.

In this way, even if some of the mantissa bits of the data element are discarded, there is no effect on the difference value X _i -X _max between the data element and the reference data element. Even if rounding is applied to the lower mantissa bits, the impact on the first intermediate value X _i -X _max is minimal. That is, the natural exponential function value for the difference between the data element and the reference data element.

and, the natural exponential function value for the difference between the compressed data element and the reference data element.

are the same or similar. There is no or minimal effect on the result of the softmax function.

Meanwhile, due to data compression, power consumption and time required for data transfer between the AI accelerator and external memory are reduced.

Below, compression of data elements considering their influence on the second intermediate value is described.

The AI accelerator calculates the difference value X _i -X _max between the data element and the reference data element. And the AI accelerator is the natural _exponential function value for the difference _value

Calculate . Natural exponential function value for the difference value X _i -X _max

is referred to as the second intermediate value of the softmax function. The AI accelerator stores the second medians in external memory, loads the second medians when the second medians for all data elements have been calculated, and normalizes the second medians to obtain a softmax for the data elements. Calculate the function value.

According to one embodiment of the present invention, the data compression device ignores the lower mantissa bits according to a predefined table among the mantissa bits included in the data element, and converts the generated compressed data element X _i ' into the softmax function. Save it as input value. The predefined table is a table in which the number of upper mantissa bits to be stored among the mantissa bits of the data element, the exponent value of the data element, and the error rate according to data element compression are recorded. When the exponent value of a data element is given, the data compression device identifies the number of upper mantissa bits that satisfy a specific error rate and discards lower mantissa bits excluding the upper mantissa bits among the mantissa bits of the data element.

Specifically, one data element can be expressed as an upper part including upper mantissa bits and a lower part including lower mantissa bits, as shown in Equation 2.

In Equation 2, xi represents a data element, xi _H represents the upper part of the data element, and xi _L represents the lower part of the data element. S represents the sign value of the data element, E represents the exponent value of the data element, M _H(2) represents the upper mantissa bits in binary form, and M _L(2) represents the lower mantissa bits in binary form. T represents the length of the upper mantissa bits of the data element. When the mantissa bit length of a data element is m, the length of the lower mantissa bits of the data element is mT.

Below, it will be explained how much the second intermediate value e ^(xi-xmax) of the softmax function changes when the lower mantissa bits M _L among the mantissa bits of the data element are excluded.

The second intermediate value e ^(xi-xmax) of the softmax function for data element xi can be expressed as Equation 3.

In Equation 3, the error rate due to truncation of the lower mantissa bits of the data element is

am.

The error rate can be expressed as Equation 4.

Since the second intermediate value e ^(xi-xmax) always has a value of 0 or less, the sign value S of the subpart xi _L of the data element in Equation 4 is 1. Furthermore, the lower mantissa 0.M _L of the data element has a value between 0.5 and 1. Considering this, the range of error rate can be expressed as Equation 5.

The error rate is

The maximum value of

am.

The exponent value E of the data element, the length T of the upper mantissa bits to be stored among the mantissa bits of the data element, and the error rate of the second intermediate value according to data compression.

Various tables can be defined from .

Table 2 is a table that exemplarily calculates the error rate according to the exponent value of the data element when the length of the upper mantissa bits is 7.

Table 3 is a table that exemplarily defines the exponent values of data elements according to the length and error rate of the upper mantissa bits.

Table 4 is a table that exemplarily defines the length of the upper mantissa bits according to the exponent value and error rate of the upper data element.

EE	T= 7T = 7
EE	2^(E-T) 2 ^(ET)	exp(2^(E-T))exp(2 ^(ET) )	abs(1- exp(2^(E-T)))abs(1- exp(2 ^(ET) ))
00	7.813E-037.813E-03	1.007843097206E+001.007843097206E+00	7.843E-037.843E-03
-1-One	3.906E-033.906E-03	1.003913889338E+001.003913889338E+00	3.914E-033.914E-03
-2-2	1.953E-031.953E-03	1.001955033591E+001.001955033591E+00	1.955E-031.955E-03
-3-3	9.766E-049.766E-04	1.000977039492E+001.000977039492E+00	9.770E-049.770E-04
-4-4	4.883E-044.883E-04	1.000488400479E+001.000488400479E+00	4.884E-044.884E-04
-5-5	2.441E-042.441E-04	1.000244170430E+001.000244170430E+00	2.442E-042.442E-04
-6-6	1.221E-041.221E-04	1.000122077763E+001.000122077763E+00	1.221E-041.221E-04
-7-7	6.104E-056.104E-05	1.000061037019E+001.000061037019E+00	6.104E-056.104E-05
-8-8	3.052E-053.052E-05	1.000030518044E+001.000030518044E+00	3.052E-053.052E-05
-9-9	1.526E-051.526E-05	1.000015258905E+001.000015258905E+00	1.526E-051.526E-05
-10-10	7.629E-067.629E-06	1.000007629424E+001.000007629424E+00	7.629E-067.629E-06
-11-11	3.815E-063.815E-06	1.000003814705E+001.000003814705E+00	3.815E-063.815E-06
-12-12	1.907E-061.907E-06	1.000001907350E+001.000001907350E+00	1.907E-061.907E-06
-13-13	9.537E-079.537E-07	1.000000953675E+001.000000953675E+00	9.537E-079.537E-07
-14-14	4.768E-074.768E-07	1.000000476837E+001.000000476837E+00	4.768E-074.768E-07
-15-15	2.384E-072.384E-07	1.000000238419E+001.000000238419E+00	2.384E-072.384E-07
-16-16	1.192E-071.192E-07	1.000000119209E+001.000000119209E+00	1.192E-071.192E-07
-17-17	5.960E-085.960E-08	1.000000059605E+001.000000059605E+00	5.960E-085.960E-08
-18-18	2.980E-082.980E-08	1.000000029802E+001.000000029802E+00	2.980E-082.980E-08
-19-19	1.490E-081.490E-08	1.000000014901E+001.000000014901E+00	1.490E-081.490E-08
-20-20	7.451E-097.451E-09	1.000000007451E+001.000000007451E+00	7.451E-097.451E-09
-21-21	3.725E-093.725E-09	1.000000003725E+001.000000003725E+00	3.725E-093.725E-09
-22-22	1.863E-091.863E-09	1.000000001863E+001.000000001863E+00	1.863E-091.863E-09
-23-23	9.313E-109.313E-10	1.000000000931E+001.000000000931E+00	9.313E-109.313E-10
-24-24	4.657E-104.657E-10	1.000000000466E+001.000000000466E+00	4.657E-104.657E-10
-25-25	2.328E-102.328E-10	1.000000000233E+001.000000000233E+00	2.328E-102.328E-10
-26-26	1.164E-101.164E-10	1.000000000116E+001.000000000116E+00	1.164E-101.164E-10
-27-27	5.821E-115.821E-11	1.000000000058E+001.000000000058E+00	5.821E-115.821E-11
-28-28	2.910E-112.910E-11	1.000000000029E+001.000000000029E+00	2.910E-112.910E-11
-29-29	1.455E-111.455E-11	1.000000000015E+001.000000000015E+00	1.455E-111.455E-11
-30-30	7.276E-127.276E-12	1.000000000007E+001.000000000007E+00	7.276E-127.276E-12
-31-31	3.638E-123.638E-12	1.000000000004E+001.000000000004E+00	3.638E-123.638E-12

TT	Error ratio [%]Error ratio [%]
TT	1One	0.10.1	0.010.01	0.0010.001
88	00	-2-2	-6-6	-9-9
77	00	-3-3	-7-7	-10-10
66	-1-One	-4-4	-8-8	-11-11
55	-2-2	-5-5	-9-9	-12-12
44	-3-3	-6-6	-10-10	-13-13
33	-4-4	-7-7	-11-11	-14-14
22	-5-5	-8-8	-12-12	-15-15
1One	-6-6	-9-9	-13-13	-16-16

EE	Error ratio [%]Error ratio [%]
EE	1One	0.10.1	0.010.01	0.0010.001
00	77	1010	1010	1010
-1-One	66	1010	1010	1010
-2-2	55	88	1010	1010
-3-3	44	77	1010	1010
-4-4	33	66	1010	1010
-5-5	22	55	1010	1010
-6-6	1One	44	88	1010
-7-7	--	33	77	1010
-8-8	--	22	66	1010
-9-9	--	1 One	55	88
-10-10	--	--	44	77
-11-11	--	--	33	66
-12-12	--	--	22	55
-13-13	--	--	1One	44
-14-14	--	--	--	33
-15-15	--	--	--	22
-16-16	--	--	--	1One

Referring to Table 4, the data compression device can preset a specific error rate, identify the exponent value E of the input data element, and identify the T value according to the specific error rate and exponent value E. The data compression device extracts only the T upper mantissa bits from among the mantissa bits of the data element and stores them in external memory. Data compression devices can reduce power consumption and time required for data transfer between AI accelerators and external memory through data compression.

The operation of the data compression device using a predefined table is as follows.

The data compression device refers to a predefined table and identifies the exponent value of the data element and the mantissa bit length according to the preset error rate. Referring to Equation 4, the exponent value of the data element may be referred to as E and the mantissa bit length may be referred to as T.

Here, the data element may be data encoded according to a floating point format or may be a data element encoded according to a data compression format.

The data compression device stores the upper mantissa bits corresponding to the mantissa bit length among the mantissa bits of the data element as mantissa bits of the input value of the softmax function in an external memory. The compressed data element includes a sign bit, mantissa bits, and high mantissa bits.

The data compression device can trade-off the data transfer amount and error rate between the AI accelerator and external memory by considering the length of the upper mantissa bits of the data element according to the error rate.

However, some data loss may occur due to compression of intermediate values of the softmax function.

Various implementations of the systems and techniques described herein may include digital electronic circuits, integrated circuits, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), computer hardware, firmware, software, and/or these. It can be realized through combination. These various implementations may include being implemented as one or more computer programs executable on a programmable system. The programmable system includes at least one programmable processor (which may be a special purpose processor) coupled to receive data and instructions from and transmit data and instructions to a storage system, at least one input device, and at least one output device. or may be a general-purpose processor). Computer programs (also known as programs, software, software applications or code) contain instructions for a programmable processor and are stored on a "computer-readable medium."

Computer-readable recording media include all types of recording devices that store data that can be read by a computer system. These computer-readable recording media are non-volatile or non-transitory such as ROM, CD-ROM, magnetic tape, floppy disk, memory card, hard disk, magneto-optical disk, and storage device. It may be a medium, and may further include a transitory medium such as a data transmission medium. Additionally, the computer-readable recording medium may be distributed in a computer system connected to a network, and the computer-readable code may be stored and executed in a distributed manner.

In the flowchart/timing diagram of this specification, each process is described as being executed sequentially, but this is merely an illustrative explanation of the technical idea of an embodiment of the present disclosure. In other words, a person skilled in the art to which an embodiment of the present disclosure pertains may change the order described in the flowchart/timing diagram and execute one of the processes without departing from the essential characteristics of the embodiment of the present disclosure. Since the above processes can be applied in various modifications and variations by executing them in parallel, the flowchart/timing diagram is not limited to a time series order.

The above description is merely an illustrative explanation of the technical idea of the present embodiment, and those skilled in the art will be able to make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are not intended to limit the technical idea of the present embodiment, but rather to explain it, and the scope of the technical idea of the present embodiment is not limited by these examples. The scope of protection of this embodiment should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of this embodiment.

(Statement regarding sponsored research or development)

This invention is a research project (project identification number: 1711152619, detailed task number: 2020-0-01305-003, project management (professional) organization name: Information and Communications Planning and Evaluation Institute, research project name: next-generation intelligent semiconductor technology development (design), research project Name: 2,000 TFLOPS-class server artificial intelligence deep learning processor and module development, contribution rate: 1/1, project performing organization name: Sapion Korea Co., Ltd., research period: 2022-01-01 ~ 2022-12-31).

(CROSS-REFERENCE TO RELATED APPLICATION)

This patent application claims priority to Patent Application No. 10-2022-0152079, filed in Korea on November 14, 2022, which is incorporated herein by reference in its entirety.

Claims

In the data compression device,

at least one memory storing instructions; and

At least one processor, by executing the instructions, to generate compressed data elements from a data block containing data elements encoded according to a floating point format.

Including,

The processor,

For a first data element whose exponent value matches one of the preset exponent values among the data elements, an operation code corresponding to the exponent value of the first data element, generate a first compressed data element including sign bits and mantissa bits,

For a second data element whose exponent value does not match the preset exponent values among the data elements and is not a zero value, generating a second compressed data element including a skip code and the second data element. , data compression device.
According to paragraph 1,

A data compression device, wherein each of the bit length of the command code and the bit length of the skip code is shorter than the exponent bit length according to the floating point format.
According to paragraph 1,

The processor,

A data compression device that generates a third compressed data element including a zero code for a third data element whose exponent value is zero among the data elements.
According to paragraph 1,

The processor,

For an initial data element among the data elements, setting the exponent values including an exponent value of the initial data element, and generating an initial compressed data element including a predefined setting code and the initial data element, Data compression device.
According to paragraph 1,

The processor,

A data compression device that stores the compressed data elements in external memory.
According to clause 5,

The processor,

Identifying a reference data element representing the largest value among the compressed data elements,

Determine a difference value between an exponent value of one of the compressed data elements and an exponent value of the reference data element,

Generated by ignoring at least one lower mantissa bit among the mantissa bits included in the one compressed data element, based on the difference between the exponent value of the one compressed data element and the exponent value of the reference data element. A data compression device that stores secondary compressed data elements as input values of the softmax function.
According to clause 6,

The processor,

When the sign value of the one compressed data element is the same as the sign value of the reference data element, among the mantissa bits included in the one compressed data element, the number corresponding to the difference value minus 1 Ignoring the first lower mantissa bits,

If the sign value of the one compressed data element is not the same as the sign value of the reference data element, a second lower number having a number corresponding to the difference value among the mantissa bits included in the one compressed data element A data compression device that ignores mantissa bits.
According to clause 6,

The processor,

A data compression device that ignores third lower order mantissa bits having a number less than the difference value among mantissa bits included in the one compressed data element.
According to clause 5,

The processor,

Referring to a predefined table, identifying the exponent value of one of the compressed data elements and the mantissa bit length according to a preset error rate,

A data compression device that stores upper mantissa bits corresponding to the mantissa bit length among the mantissa bits of the one compressed data element as mantissa bits of an input value of a softmax function.
In a computer implementation method for data compression,

Generating compressed data elements from a data block containing data elements encoded according to a floating point format.

Including,

The step of generating the compressed data elements includes:

For a first data element whose exponent value matches one of the preset exponent values among the data elements, an operation code corresponding to the exponent value of the first data element, generating a first compressed data element including sign bits and mantissa bits; and

For a second data element whose exponent value does not match the preset exponent values among the data elements and is not a zero value, generating a second compressed data element including a skip code and the second data element. step

Method, including.
According to clause 10,

The step of generating the compressed data elements includes:

Generating a third compressed data element including a zero code for a third data element whose exponent value is zero among the data elements.

Method, including.
According to clause 10,

The step of generating the compressed data elements includes:

For an initial data element among the data elements, setting the exponent values including an exponent value of the initial data element; and

Creating an initial compressed data element including a predefined setting code and the initial data element.

Method, including.
According to clause 10,

Storing the compressed data elements in external memory.

A method further comprising:
According to clause 13,

The step of storing the compressed data elements in external memory includes:

identifying a reference data element representing the largest value among the compressed data elements;

determining a difference between an exponent value of one of the compressed data elements and an exponent value of the reference data element; and

Based on the difference value between the exponent value of the one compressed data element and the exponent value of the reference data element, at least one lower mantissa of the mantissa bits included in one of the compressed data elements Storing the secondary compressed data elements generated by ignoring bits as input to the softmax function.

Method, including.
According to clause 14,

The step of storing the secondary compressed data element as an input value of the softmax function,

When the sign value of the one compressed data element is the same as the sign value of the reference data element, among the mantissa bits included in the one compressed data element, the number corresponding to the difference value minus 1 Ignoring first lower order mantissa bits; and

If the sign value of the one compressed data element is not the same as the sign value of the reference data element, a second lower number having a number corresponding to the difference value among the mantissa bits included in the one compressed data element Steps to ignore singer beats

Method, including.