WO2024025386A1

WO2024025386A1 - Method and device for deep learning network encoding/decoding using standard normal distribution-based quantization technique

Info

Publication number: WO2024025386A1
Application number: PCT/KR2023/011047
Authority: WO
Inventors: 김성제; 정진우; 김규헌; 이성배; 이민석
Original assignee: 한국전자기술연구원
Priority date: 2022-07-28
Filing date: 2023-07-28
Publication date: 2024-02-01
Also published as: KR20240015991A

Abstract

The present disclosure relates to a method and a device for encoding or decoding of a deep learning network. In particular, the present disclosure relates to a method and a device for deep learning network encoding or decoding using a standard normal distribution-based quantization technique for encoding or decoding of residual information or a residual parameter. A deep learning network encoding method using a standard normal distribution-based quantization technique according to an embodiment of the present disclosure comprises the steps of quantizing a residual parameter, and entropy-encoding the quantized residual parameter. Here, the step of quantizing the residual parameter comprises determining importance of the residual parameter on the basis of a predefined threshold by using a standard normal distribution, and then selectively applying one or more of multiple quantization techniques on the basis of the determined importance.

Description

Deep learning network encoding/decoding method and device using standard normal distribution-based quantization technique

The present disclosure relates to a method and device for encoding or decoding a deep learning network. In particular, encoding or decoding a deep learning network using a standard normal distribution-based quantization technique for encoding or decoding residual information or residual parameters. It relates to methods and devices.

Artificial intelligence (AI) systems are developing rapidly due to recent advances in storage and computing devices. In particular, research on Deep Learning, a technology that implements artificial intelligence, is being actively conducted. In particular, the capacity of deep learning networks has increased dramatically due to the development of storage and computing devices, which has recently caused many difficulties in data transmission.

Accordingly, NNC (Compression of Neural Network for Multimedia Content Description and Analysis), a standard of JTC1/SC29/WG4 under ISO/IEC, performs parameter reduction to lighten the network, and then performs parameter quantization and entropy. A method to compress deep learning networks using entropy coding was proposed. The NNC standard concerns standardization of compression of such pre-trained deep learning networks, and currently discusses compression of residual information generated during federated learning.

The purpose of this disclosure is to provide a method and device for deep learning network encoding/decoding.

Additionally, the present disclosure seeks to provide a method and device for efficiently quantizing residual information generated from federated learning of a deep learning network.

In addition, the present disclosure seeks to provide a method and device for reducing the weight of a deep learning network through compression of parameters such as weights of a pre-trained deep learning network in an artificial intelligence system utilizing a deep learning network and its applications.

Other objects and advantages of the present disclosure can be understood by the following description and will be more clearly understood by the examples of the present disclosure. In addition, it will be readily apparent that the objects and advantages of the present disclosure can be realized by means and combinations thereof as indicated in the claims.

According to an embodiment of the present disclosure, a deep learning network encoding method using a standard normal distribution-based quantization technique includes quantizing a residual parameter, and entropy encoding the quantized residual parameter, wherein the residual parameter is In the quantization step, the importance of the residual parameter is determined based on a predefined threshold using a standard normal distribution, and then one or more of a plurality of quantization techniques is selectively applied based on the determined importance.

In addition, according to an embodiment of the present disclosure, the step of quantizing the residual parameter includes determining a binary flag for selecting a quantization technique using the average and standard deviation of the residual parameter, and quantizing the residual parameter according to a standard normal distribution. converting the residual parameter to a step of determining the importance of the residual parameter from the standard normal distribution, and using the determined importance and the binary flag to remove the residual parameter using a quantization technique, a binomial quantization technique, a ternary quantization technique, or a cumulative exponential quantization. It includes the step of quantizing by applying any one of the techniques or a combination thereof.

Additionally, according to an embodiment of the present disclosure, the residual parameter may be a residual weight of a deep learning network generated in federated learning.

Additionally, according to an embodiment of the present disclosure, the step of quantizing the residual parameter further includes lowering the dimension of the residual parameter to be quantized, but the dimension of the changed residual parameter is brought to one dimension.

Additionally, according to an embodiment of the present disclosure, when the absolute value of the value (z) normalized by the standard normal distribution is smaller than a preset first threshold, the removal quantization technique is selected. Here, the preset first threshold may be 1.

Additionally, according to an embodiment of the present disclosure, the removal quantization technique replaces all residual parameters to which the removal quantization technique is applied with 0.

In addition, according to an embodiment of the present disclosure, when the absolute value of the value (z) normalized by the standard normal distribution falls within a preset specific interval, either the binomial quantization technique or the trinomial quantization technique is selected. do. Here, the preset specific section may be a section between 1 and 2.

In addition, according to an embodiment of the present disclosure, the step of quantizing the residual parameter further includes calculating a probability density function for each layer of the deep learning network, wherein the mean and standard deviation are calculated from the normal distribution of the probability density function. is obtained, and the binary flag for selecting either the binomial quantization technique or the ternary quantization technique is determined.

Additionally, according to an embodiment of the present disclosure, if the absolute value of the difference between the residual parameter mean and standard deviation is greater than 0, the binary flag is set to True and the binary quantization technique is selected.

Additionally, according to an embodiment of the present disclosure, if the absolute value of the difference between the residual parameter mean and the standard deviation is less than 0, the binary flag is set to False and the ternary quantization technique is selected.

Additionally, according to an embodiment of the present disclosure, when the absolute value of the value (z) normalized by the standard normal distribution is greater than a preset second threshold, the cumulative exponential quantization technique is selected. Here, the preset second threshold may be 2.

In addition, a deep learning network decoding method encoded using a standard normal distribution-based quantization technique according to an embodiment of the present disclosure includes an entropy decoding step of acquiring a residual parameter to be dequantized and quantization information, and inverse quantizing the residual parameter. Including an inverse quantization step, wherein the inverse quantization step includes deriving which quantization technique among a plurality of quantization techniques was applied to the residual parameter encoded from the obtained quantization information, and the corresponding quantization technique according to the confirmation result. It includes the step of deriving the restored residual parameter by applying the corresponding inverse quantization technique.

Additionally, according to an embodiment of the present disclosure, the plurality of quantization techniques include an elimination quantization technique, a binomial quantization technique, a ternary quantization technique, and a cumulative exponential quantization technique.

In addition, a deep learning network encoding method that performs federated learning through a plurality of clients according to an embodiment of the present disclosure includes generating residual information, which is a difference value of the reference model, from an update model additionally learned by each client. , and quantizing the residual information, wherein the step of quantizing the residual information includes determining the importance of the residual information based on a predefined threshold using a standard normal distribution, and then quantizing the residual information based on the determined importance. Selectively apply one or more of the quantization techniques.

In addition, a deep learning network system that performs federated learning according to an embodiment of the present disclosure includes a plurality of clients that generate residual information that is the difference value of the reference model from an additionally learned update model, and A central server that receives residual information, generates supplemented residual information, and transmits it to the plurality of clients, wherein the residual information generated by the plurality of clients or the supplemented residual information generated by the central server is a standard normal After determining the importance of residual information based on a predefined threshold using a distribution, the residual information is quantized by selectively applying one or more of a plurality of quantization techniques based on the determined importance.

In addition, a quantization method for deep learning network encoding according to an embodiment of the present disclosure includes determining a binary flag for selecting a quantization technique using the average and standard deviation of residual information that is a quantization target, and using the residual information. Converting to a standard normal distribution, determining the importance of the residual information from the standard normal distribution, and using the determined importance and the binary flag to remove the residual information using a quantization technique, a binomial quantization technique, a ternary quantization technique, or It includes the step of quantizing by applying one or a combination of cumulative exponential quantization techniques.

In addition, the inverse quantization method for deep learning network decoding according to an embodiment of the present disclosure includes obtaining quantization information for encoded residual information, from the quantization information, the encoded residual information using a binomial quantization technique, A step of deriving which quantization technique is applied, the ternary quantization technique or the cumulative exponential quantization technique, and

It includes deriving restored residual information by applying a dequantization technique corresponding to the corresponding quantization technique according to the confirmation result.

According to various embodiments of the present disclosure, in deep learning network encoding and decoding, residual information can be efficiently compressed by applying a quantization technique based on a standard normal distribution to the residual information of the deep learning network generated from federated learning. It becomes possible. Specifically, by setting an efficient threshold in the standard normal distribution, it is possible to minimize data loss by maintaining weights that are judged to be of high importance as much as possible. Specifically, according to various embodiments of the present disclosure, it is possible to solve the problem of the prior art that causes large data loss by quantizing positive and negative data excluding 0 in a data set to an average value. In addition, according to various embodiments of the present disclosure, when residual information generated from federated learning must be efficiently transmitted from a central server to various devices, it can be used as a quantization technique that guarantees a high compression rate and low performance degradation. You can.

Figure 1 is an example showing a fully connected layer of a deep learning network according to an embodiment of the present disclosure, and is a diagram for explaining parameters occurring in the deep learning network.

Figure 2 is a diagram illustrating an example of a service model of federated learning according to an embodiment of the present disclosure.

Figure 3 is a diagram for explaining the residual information generation process in federated learning covered by the Compression of Neural Network for Multimedia Content Description and Analysis (NNC) standard, according to an embodiment of the present disclosure.

Figure 4 is a diagram for explaining the general tendency of residual information generated in federated learning according to an embodiment of the present disclosure.

Figure 5 illustrates an NNC encoding and decoding device for explaining the compression process in the NNC standard, according to an embodiment of the present disclosure.

Figure 6 is a diagram for explaining the process of a standard normal distribution-based quantization technique for residual weights in federated learning, according to an embodiment of the present disclosure.

Figure 7 is a diagram for explaining a flattening process according to an embodiment of the present disclosure.

Figure 8 is a diagram for explaining a normal distribution according to an embodiment of the present disclosure.

Figure 9 is a diagram for visually showing the criteria for determining True/False of a binary flag (binary_flag), according to an embodiment of the present disclosure.

Figure 10 is a diagram for explaining the process of determining the true/false status of a binary flag, according to an embodiment of the present disclosure.

Figure 11 is a diagram for explaining a standard normal distribution according to an embodiment of the present disclosure.

Figure 12 is a diagram for explaining pruning quantization according to an embodiment of the present disclosure.

Figure 13 is a diagram for explaining binary-ternary quantization according to an embodiment of the present disclosure.

Figure 14 is a diagram for explaining additive exponent quantization according to an embodiment of the present disclosure.

Figure 15 is a diagram illustrating a specific example of cumulative exponential quantization according to an embodiment of the present disclosure.

Figures 16 and 17 illustrate a deep learning network encoding method using a standard normal distribution-based quantization technique according to an embodiment of the present disclosure.

Figures 18 and 19 illustrate a deep learning network decoding method encoded with a standard normal distribution-based quantization technique according to an embodiment of the present disclosure.

Figure 20 illustrates a deep learning network encoding method that performs joint learning through a plurality of clients, according to an embodiment of the present disclosure.

Figure 21 illustrates a quantization method for deep learning network encoding according to an embodiment of the present disclosure.

Figure 22 illustrates a dequantization method for deep learning network decoding according to an embodiment of the present disclosure.

Figure 23 exemplarily shows a content streaming system to which an embodiment according to the present disclosure can be applied.

Since the present disclosure can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present disclosure to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present disclosure. Similar reference numbers in the drawings refer to identical or similar functions across various aspects. The shapes and sizes of elements in the drawings may be exaggerated for clearer explanation. For a detailed description of the exemplary embodiments described below, refer to the accompanying drawings, which illustrate specific embodiments by way of example. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments. It should be understood that the various embodiments are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein with respect to one embodiment may be implemented in other embodiments without departing from the spirit and scope of the disclosure. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the embodiment. Accordingly, the detailed description that follows is not to be taken in a limiting sense, and the scope of the exemplary embodiments is limited only by the appended claims, together with all equivalents to what those claims assert if properly described.

In the present disclosure, terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component, and similarly, the second component may be referred to as a first component without departing from the scope of the present disclosure. The term and/or includes any of a plurality of related stated items or a combination of a plurality of related stated items.

When a component of the present disclosure is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but other components may exist in between. It must be understood that it may be possible. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

The components appearing in the embodiments of the present disclosure are shown independently to represent different characteristic functions, and do not mean that each component is comprised of separate hardware or one software component. That is, each component is listed and included as a separate component for convenience of explanation, and at least two of each component can be combined to form one component, or one component can be divided into a plurality of components to perform a function, and each of these components can be divided into a plurality of components. Integrated embodiments and separate embodiments of the constituent parts are also included in the scope of the present disclosure as long as they do not deviate from the essence of the present disclosure.

The terms used in this disclosure are only used to describe specific embodiments and are not intended to limit the disclosure. Singular expressions include plural expressions unless the context clearly dictates otherwise. In the present disclosure, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof. In other words, the description of “including” a specific configuration in this disclosure does not exclude configurations other than the configuration, and means that additional configurations may be included in the scope of the implementation of the disclosure or the technical idea of the disclosure.

Some of the components of the present disclosure may not be essential components that perform essential functions in the present disclosure, but may simply be optional components to improve performance. The present disclosure can be implemented by including only essential components for implementing the essence of the present disclosure, excluding components used only to improve performance, and a structure that includes only essential components excluding optional components used only to improve performance. is also included in the scope of rights of this disclosure.

Hereinafter, with reference to the attached drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily practice them. In addition, when describing the embodiments of the present specification, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present specification, the detailed description is omitted, and the same reference numerals refer to the same components in the drawings. Use and omit duplicate descriptions for the same component.

In relation to this, an embodiment of the invention described in this disclosure is characterized by compressing residual information generated by federated learning by applying a new quantization technique, and the technology for implementing this will be described in detail below.

First, the 'federated learning' referred to in this disclosure means distributing a baseline model from a central server to multiple devices, allowing each device to additionally learn the baseline model, and then creating an updated model. ) and a method of collecting residual information of the standard model to create and redistribute a more strengthened model. This will be described in detail later with reference to Figure 2. The present disclosure maintains the concept of the basic module presented by the NNC standard to compress the residual information generated in the joint learning process, but changes the characteristics of the residual information during the joint learning. In consideration of this, we would like to propose variations and modifications to the NNC standard module.

Additionally, 'residual information' referred to in this disclosure means all information related to residuals, including the above-described residual parameters. Here, the residual parameter may include a residual weight. Hereinafter, in the description of the present disclosure, the terms residual information, residual parameter, and residual weight are used interchangeably for more accurate understanding.

Generally, a deep learning network consists of numerous layers, and each layer has neurons corresponding to x1 to x3 and y1 to y2, and weights corresponding to W11 to W32, as shown in Figure 1. ), and the bias corresponding to b. On the other hand, Figure 1 is only a simplified conceptual diagram for convenience of explanation, and in reality, the layers of commonly used deep learning models are composed of numerous neurons and weights, so they contain a huge amount of data. Among these, neurons are values calculated by weights, and generally naming a pre-trained model can refer to weights and biases determined through learning. In particular, the method of storing the weight and bias information varies depending on the deep learning framework. For example, commonly used frameworks include PyTorch or TensorFlow.

Figure 2 is a diagram illustrating an example of a service model of federated learning according to an embodiment of the present disclosure. The service model presented in Figure 2 conceptually illustrates a general federated learning scenario for convenience of explanation. For example, when the central server 210 distributes a baseline model to each client (

clients

221, 223, and 225), each client learns the baseline model using its own data. Thereafter, each of the

clients

221, 223, and 225 sends residual information, which is the difference between the learned update model and the reference model, back to the central server 210. Here, the residual information corresponds to the information in Figure 2.

~

am. Afterwards, the central server 210 supplements the reference model based on the collected residual information, and distributes the supplemented residual information, which is the difference between the supplemented model and the existing reference model, to each client. At this time, the supplemented residual information is shown in Figure 2.

corresponds to The above process is performed at every step (epoch). In relation to this, the federated learning method has the advantage of preventing leakage of personal information because the data of the

clients

221, 223, and 225 is not directly sent to the central server 210.

Figure 3 is a diagram for explaining the residual information generation process in federated learning discussed in the NNC standard, according to an embodiment of the present disclosure. As shown in Figure 3, residual information refers to the difference value between the parameters of the additionally learned model and the reference model parameters. That is, in Figure 3

The residual information expressed as

at

This is the value minus . If there is a difference, the bias is also stored, but the residual information of the bias is generally 0. In other words, the two models that generate residual information must have the same form, and the residual information has the same dimensions as the two models. In general, in federated learning, retraining at each stage does not have a significant impact on the model's weight, and therefore the complemented models at each stage have similar parameters. Therefore, the residual information, which is the difference value of the model at each stage, inevitably has a high percentage of 0, and non-zero values also have values very close to 0. Ultimately, this means that the method for compressing residual information must be different from the technique for compressing the weights of existing learned models.

Figure 4 is a diagram for explaining the general tendency of residual information generated in federated learning according to an embodiment of the present disclosure. As described above, residual information mostly consists of values that are 0 or close to 0. Figure 4 is a histogram that visually illustrates this, and is a result derived from actual data values for the federated learning scenario corresponding to Figure 2 described above. Referring to the histogram in Figure 4, it can be seen that the ratio of 0 is the highest, and also that the frequency gradually decreases as the distance from 0 increases. In other words, it can be seen from Figure 4 that the residual information data generated from joint learning is in the form of a normal distribution.

Figure 5 shows the configuration of an NNC encoding and decoding device to explain the compression process in the NNC standard according to the present disclosure. As shown in FIG. 5, the NNC encoding device 100 may be configured to include three basic modules. That is, the NNC encoding device 100 includes a parameter reduction (110) module (this is also called a 'parameter removal module') and a parameter quantization (120) module (this is also called a 'parameter quantization module'). ), and an entropy coding (130) module.

Specifically, the parameter removal module 110 is a step for network lightweighting, and may typically include a sparsification process and a pruning process. For example, sparsification of a pre-trained model means retraining the model weight in a way that increases the ratio of values close to 0, and considering the importance of the weight through parameter removal, values with low importance are selected. It means making it 0. Here, importance can refer to the degree to which a parameter affects the deep learning network, and the closer it is to 0, the lower the importance. However, parameter removal is not generally used for residual information in joint learning, rather than the weights of the learned model, because it is difficult to retrain the residual information.

The parameter quantization module 120 performs parameter quantization on the information (eg, residual information) that has undergone the parameter removal. For example, representative methods for quantizing the weights of a learned model include scalar quantization, codebook quantization, and stochastic binary-ternary quantization (SBT quantization). . In particular, when quantizing the above-described residual information, ‘stochastic binomial-ternary quantization (SBT quantization)’ is mainly used.

Here, the SBT quantization of the parameter quantization module is a method of randomly selecting binomial or trinomial quantization based on probability. The SBT quantization seeks to increase compression efficiency by quantizing positive and negative data excluding 0 in the data set to the average value. However, because data values other than 0 are replaced with the average value, causing large data loss, deep learning networks that apply this compression have limitations in that performance may decrease. For example, most of the residual information generated in federated learning has the characteristic of being 0 or a value close to 0. Using SBT quantization for information having the above characteristics is a method that takes into account the characteristics of the data, and by using this quantization technique, increasing the redundancy of the data can result in increasing the compression rate. However, high redundancy means large data loss, which leads to performance degradation. Accordingly, the present disclosure seeks to propose a new parameter quantization technique that minimizes performance degradation for residual information that occurs during joint learning, as an element technology supporting the above-mentioned NNC standard. Specifically, in order to prevent deep learning performance degradation when compressing residual information generated in federated learning, the present disclosure utilizes a standard normal distribution to set the importance of each parameter based on a predefined threshold, and then assigns the importance to each parameter. Accordingly, we propose a method of utilizing different quantization techniques. This will be described in detail later with reference to FIGS. 6 to 15.

Additionally, the entropy encoding module 130 is a technique that can increase compression efficiency through an entropy encoding method that replaces similar values with one unified value. For example, the entropy coding technique is a compression technique that allows the same information to be expressed with fewer bits by varying the number of bits allocated according to the probability of appearance of the value. For example, DeepCABAC (Context Adaptive Binary) Arithmetic Coding for Deep neural network compression) can be applied. Therefore, the quantized information (e.g., residual information) is finally compressed into a bitstream and transmitted through the entropy encoding module.

Additionally, the NNC decoding device 200 may be configured to include two basic modules. That is, the NNC decoding device 200 includes an entropy decoding 210 module and a parameter inverse-quantization 220 module (this is also referred to as a 'parameter inverse-quantization module').

Here, the entropy decoding module 210 refers to a process of decoding the bitstream encoded by the entropy encoding module 130 described above. In addition, the parameter dequantization module 220 refers to a dequantization process of restoring information (e.g., residual information) quantized by the above-described parameter quantization module 120 to generate restored residual information.

Figure 6 is a diagram for explaining the process of a standard normal distribution-based quantization technique for residual weights in federated learning, according to an embodiment of the present disclosure. The process of FIG. 6 may be performed, for example, by the parameter quantization module 120 of FIG. 5 described above.

Referring to FIG. 6, a deep learning network encoding method using a standard normal distribution-based quantization technique according to an embodiment of the present disclosure includes the steps of quantizing a residual parameter through the parameter quantization module 120, and the entropy encoding. It includes entropy encoding the quantized residual parameter through the module 130. Here, the step of quantizing the residual parameter involves determining the importance of the residual parameter based on a predefined threshold using a standard normal distribution, and then selectively using one or more of a plurality of quantization techniques based on the determined importance. Apply as.

For example, the process of Figure 6 is further explained as follows. First, residual information generated from federated learning is received as input (S1110). At this time, the residual information may be a residual weight and may also be named a residual parameter. Here, since the input residual information is in the form of a multidimensional tensor, it is reduced to one dimension through flattening for each layer (S1120). In addition, through the probability density function calculation step (S1130) for each layer to obtain a normal distribution by obtaining a probability density function for the data of each layer, true/false of the binary flag (binary_flag) is determined based on the normal distribution. (S1140). Thereafter, the normal distribution is converted to a standard normal distribution (S1150), and at least one of three types of quantization techniques or a combination thereof is used based on the position in the standard normal distribution. In relation to this, the quantization techniques proposed in the present disclosure include Pruning Quantization (S1160), Binary-Ternary Quantization (S1170), and Additive Exponent Quantization (S1180). Here, the Binary-Ternary Quantization (S1170) includes the Binary-Ternary Quantization technique (S1171) and the Ternary Quantization technique (S1172), respectively. Hereinafter, each process of FIG. 6 will be described in detail with reference to FIGS. 7 to 15.

Figure 7 is a diagram for explaining a flattening process (S1120) according to an embodiment of the present disclosure. Here, the dimension reduction process refers to the process of converting a multidimensional tensor into a one-dimensional tensor in order to handle data more easily. As in the example in Figure 7, when trying to obtain a probability density function for one layer, it may be simpler in terms of data processing to use one-dimensional data rather than data composed of two-dimensional tensors.

Figure 8 is a diagram for explaining a normal distribution using probability density function calculation for each layer (S1130) according to an embodiment of the present disclosure. The formula for the probability density function of the normal distribution is the same as Equation 1, and Figure 8 shows the probability density function of the normal distribution.

[Equation 1]

At this time,

means the weighted average of the corresponding layer,

is the weight distribution,

means the weighted standard deviation. Referring again to the above-described Figure 4, the distribution of residual information takes the form of a normal distribution, and since the number of samples is large, it can be expressed in Equation 1 assuming a normal distribution. For example, experimentally, the first weight standard deviation (-

~

) The probability between the intervals is 68.27%, the second weight standard deviation (-2

~ 2

) The probability between the intervals is 95.45%, the third weight standard deviation (-3

~ 3

) It can be seen that the probability between the sections reaches 99.73%. This is used when converting to the standard normal distribution, which will be described later, and the weighted average and standard deviation derived to obtain the probability density function of the normal distribution determine whether the binary flag (binary_flag) of the next module is True/False. It can be utilized.

Figure 9 is a diagram for explaining the binary flag setting (S1140) for each layer according to an embodiment of the present disclosure, and visually illustrates the criteria for determining true/false of the binary flag. The binary flag is a criterion for selecting Binary Quantization (S1171) or Ternary Quantization (S1172) in the Binary-Ternary Quantization process (S1170) among the above-described quantization techniques. do. In general, residual information can be expressed as a normal distribution with 0 as the standard. In other words, exceptional cases may occur where the average is not 0 but is a certain positive or negative value. In this case, as shown in Figure 9, if the absolute value of the difference between the mean and the standard deviation is greater than 0, the binary flag is set to True. This is to encourage the use of only one sign in subsequent binomial quantization, as values opposite to the sign of the average are generally judged to be less important.

Figure 10 is a diagram for explaining the process of determining true/false of the binary flag according to an embodiment of the present disclosure. For example, Figure 10 expresses what is explained in Figure 9 above in pseudocode.

for example,

~

are the weight values of one layer of residual information, and are stored as a one-dimensional tensor with reduced dimension (1010). after,

~

The mean (1020) and standard deviation (1030) of are derived in the process of calculating the probability density function of the normal distribution described above. For example, when the mean is a positive number, the binary flag is true if the difference between the mean and the standard deviation is positive (1040), and false if it is negative (1050). Conversely, when the mean is negative, the binary flag is true if the sum of the mean and standard deviation is less than 0 (1060), and false if it is greater than 0 (1070).

Figure 11 is a diagram for explaining a standard normal distribution transformation step (S1150) according to an embodiment of the present disclosure. The standard normal distribution is a normal distribution of the normal distribution, meaning a normal distribution with a mean of 0 and a standard deviation of 1. The reason for standardizing the normal distribution into the standard normal distribution is for the convenience of calculations. For example, in the probability density function of the normal distribution, the mean and variance are used to find the probability of appearance for a specific value, while in the probability density function of the standard normal distribution, the probability of appearance is calculated directly through z as shown in Figure 11. It becomes possible to obtain it.

The above-mentioned standard normal distribution has the advantage of being able to calculate the importance of the weights by simultaneously considering the size and frequency of appearance of the weight values of the residual information, and of being able to more easily set the threshold for subsequent quantization.

For example, the formula for calculating the probability density function of the standard normal distribution is as shown in Equation 2, where the input data of x is normalized to z through Equation 3.

[Equation 2]

[Equation 3]

For example, referring to the case of Figure 8 above, the probability between the first standard normal distribution (-1 to 1) interval is 68.27%, and the probability between the second standard normal distribution (-2 to 2) interval is 95.457. %, it can be seen that the probability between the 3rd standard normal distribution (-3~3) interval reaches 99.73%.

In relation to this, the quantization techniques proposed in this disclosure using the binary flag and standard normal distribution include Pruning Quantization (S1160), Binary-Ternary Quantization (S1170), and Cumulative Exponent Quantization (S1170). Additive Exponent Quantization (S1180) is included, that is, the removal quantization (S1160) can be performed on the least important data. On the other hand. Data judged to be of higher importance are quantized through the binomial-ternary quantization (S1170), and for the values with the highest importance, quantization can be performed while maintaining values as similar to the original as possible through the cumulative exponential quantization (S1180). there is. Therefore, by maintaining values similar to the original for highly important parameters, it is possible to minimize performance degradation due to compression of residual information that occurs in federated learning. Hereinafter, the quantization methods will be described in detail.

First, Figure 12 is a diagram for explaining the removal quantization (S1160) according to an embodiment of the present disclosure. The removal quantization method can be used as a quantization technique performed on residual information of the lowest importance according to an embodiment of the present disclosure. For example, in the standard normal distribution in step S1150 described above, all values in which the absolute value of z is smaller than a certain threshold (e.g., '1', '0.1', etc.) are replaced with 0. This is because values whose absolute value of z is smaller than the above threshold can be defined as values with a high frequency of occurrence but low importance that do not actually have a significant impact on the performance of the model. Although some data loss may occur by replacing high-frequency values with 0 through the removal quantization method, redundancy increases accordingly, making it possible to achieve a high compression rate in the entropy encoding process. For example, referring to the example of Figure 12, when the residual weights before the removal quantization include 0.21, 0.09, and 0.05, respectively, if the removal quantization method is applied by setting the threshold to '0.1', quantization The resulting residual weight has only 0.21, 0, and 0.21.

Figure 13 is a diagram for explaining the binomial-ternary quantization (S1170) according to an embodiment of the present disclosure. According to an embodiment of the present disclosure, the binomial-ternary quantization method can be used as a quantization technique used when the absolute value of z in the standard normal distribution is within a specific range (eg, between 1 and 2). This has the same concept as the stochastic binomial-ternary quantization (SBT quantization) technique, which is one of the quantization techniques of the conventional NNC standard described above, but the criterion for selecting binomial quantization and trinomial quantization is not randomness based on the conventional Bernoulli probability distribution, but The difference is that it is determined by the binary flag set in step S1140. That is, according to an embodiment of the present disclosure, if the binary flag is true, the binary quantization (S1171) can be selected, and if the binary flag is false, the ternary quantization (S1172) can be selected.

For example, Figure 13 (a) shows a simple example of a binomial quantization technique, and Figure 13 (b) shows a simple example of a triple quantization technique.

In relation to this, the binomial quantization (S1171) technique is a quantization technique that compares the average of positive values and the average of negative values and maintains only the characteristics of the sign that has more influence. For example, the sign with the larger average has more influence, and the opposite sign is all 0. In other words, the values of a sign with a large average are replaced with the average of the corresponding sign, thereby increasing redundancy. For example, assuming that the residual information is {0,0,1,3,2,0,0,0,-2,-1} according to the example in (a) of Figure 13, the average of positive numbers is 2 and the average of negative numbers is 2. The average is -1.5. Therefore, since 2 is greater than 1.5, which is the absolute value of -1.5, positive numbers are interpreted as having greater influence than negative numbers, and accordingly, all negative values are set to 0. In other words, the

positive values

1 and 3 are replaced by the average 2, and as a result, the tensor consisting of the values of -2, -1, 0, 1, 2, and 3 is a tensor expressed only by 0 and 2, {0,0 ,2,2,2,0,0,0,2,2}, and the redundancy of 0 and 2 increases.

In the case of the ternary quantization (S1172) technique, assuming that the residual information is {0,0,1,3,2,0,0,0,-2,-1} according to the example in (b) of Figure 13, each After calculating the average of the signs (positive and negative), positive weights are replaced with the positive average, and negative weights are replaced with the negative average. That is, while the binomial quantization technique described above maintains only one sign, the ternary quantization technique maintains both signs, thereby reducing data loss. For example, as shown in (b) of Figure 13, positive values become 2, the positive average, and negative values become -1.5, the negative average, resulting in {0,0,2,2,2,0, It can be quantized as 0,0,-1.5, -1.5}. Ultimately, it is possible to maintain more characteristics by utilizing more values (e.g., '-1.5') than the binomial quantization technique described above.

Although the above-mentioned binomial quantization (S1171) and ternary quantization (S1172) do not cause data loss as much as the removal quantization technique of FIG. 12, some data loss is inevitable in that various values are replaced with one average value. However, in one embodiment of the present disclosure, due to the advantage of maintaining the minimum characteristics of the layer undergoing quantization, the absolute value of Z in the standard normal distribution is in a specific range (e.g., between 1 and 2). For this, the binomial quantization (S1171) technique or the ternary quantization (S1172) technique is selectively applied. In other words, for these weights that are judged to be neither relatively low nor high in importance, the above-mentioned binomial quantization (S11710 technique) and/or ternary quantization (S1172) technique is selectively performed on the assumption that only the minimum characteristics need to be left. do.

Figure 14 is a diagram for explaining the cumulative exponent quantization (S1180) method according to an embodiment of the present disclosure. For example, according to an embodiment of the present disclosure, the absolute value of z in the standard normal distribution of step S1150 described above may be a quantization technique applied to values greater than a specific threshold (e.g., '2'). . In other words, the cumulative exponential quantization (S1180) is a quantization technique that utilizes a small number of bits and can maintain a value as similar to the original as possible.

For example, Figure 14 shows the cumulative exponential quantization (S1180) process expressed in pseudocode. First, an array with N elements (e.g., 4) is initialized (1410). After this, starting from i=0, find 2 to the power of i (1420). Next, i is decreased until it becomes smaller than the value to be quantized (1430). For example, when it becomes smaller than the parameter to be quantized (1440), i stops decreasing (1450) and sets the i power of the current 2 as the reference point (1460). From the reference point, the same process as above is performed again, and this process is performed a total of N times (e.g., 4 times), and i determined in each loop statement is input as an element of a pre-initialized array. The above process is explained with reference to FIG. 15 as an example as follows.

Figure 15 illustrates a specific example of the cumulative exponential quantization (S1180) according to an embodiment of the present disclosure. For example, Figure 15 is an example illustrating the process of quantizing the value '0.65' through the cumulative exponential quantization (S1180) technique.

Referring to Figure 15, first, if i=0,

is 1. Since 1 is greater than 0.65, decrease i to -1.

is 0.5, and 0.5 is smaller than 0.65, so 0.5 becomes the reference point, and -1 is inserted as the first element of the array (1510). Next, add i to the power of 2 to 0.5 and decrease i until it becomes less than 0.65. If i is -2 then 0.5 +

is 0.75 which is 0.5+0.25. Decrease i again to 0.5 +

, which is calculated as 0.625, which is 0.5+0.125. At this time, 0.625 is smaller than 0.65, so 0.625 becomes the reference point, and -3 is inserted as the second element of the array (1520). If the above process is repeated twice more, a one-dimensional array of [-1,-3,-6,-7] can be finally obtained as shown in Figure 15 (1540).

From the example in Figure 15, as a result, 0.65 can be approximated to a very similar value of 0.6484375. Looking only at the approximated result value (0.6484375), it may seem that the number of bits required to express 0.65 is smaller. However, considering all parameters, the method of expressing as an array of integers according to the cumulative exponential quantization (S1180) method increases redundancy and ultimately achieves a high compression rate in entropy coding. Additionally, since the cumulative exponential quantization (S1180) technique can restore values very similar to the original parameters, it can be used as a quantization technique suitable for application to weights of high importance. Therefore, it may be more efficient to apply it to parameters that are judged to be of high importance, for example, the absolute value of z is 2 or more in the standard normal distribution in step S1150 described above.

To summarize the processes of FIGS. 6 to 15 described above, according to an embodiment of the present disclosure, the parameters of the input residual information are converted to the standard normal distribution, and then different quantization techniques are used depending on the range of z. . For example, if the absolute value of z is less than 1, it is determined to be a parameter of less importance and is replaced with 0 through the removal quantization (S1160). On the other hand, for parameters where the value of z is between 1 and 2 in the standard normal distribution, the binomial quantization (S1171) or the trinomial quantization (S1172) that can utilize the minimum characteristics is performed. Finally, parameters with a z value of 2 or more, which are determined to have the highest importance, are quantized using the cumulative exponential quantization (S1180). By using all of the at least three types of quantization techniques proposed in this disclosure or using a combination of some quantization techniques, it is possible to guarantee a high compression rate and low performance degradation.

Figure 16 illustrates a deep learning network encoding method using a standard normal distribution-based quantization technique according to an embodiment of the present disclosure. The deep learning network encoding method determines the importance of the residual parameter based on a predefined threshold using a standard normal distribution, and then selectively applies one or more of a plurality of quantization techniques based on the determined importance. It may include a step of quantizing a residual parameter (S1610), and a step of entropy encoding the quantized residual parameter (S1620).

Figure 17 shows detailed steps of the step (S1610) of quantizing the residual parameter of Figure 16 described above. That is, the step of quantizing the residual parameter (S1610) includes determining a binary flag for selecting a quantization technique using the average and standard deviation of the residual parameter (S1611), and converting the residual parameter into a standard normal distribution. Step (S1612), determine the importance of the residual parameter from the standard normal distribution, and use the determined importance and the binary flag to remove the residual parameter quantization technique, binomial quantization technique, trinomial quantization technique, or cumulative exponential quantization. It includes a step (S1613) of quantization by applying one of the techniques or a combination thereof.

Figure 18 illustrates a deep learning network decoding method encoded using a standard normal distribution-based quantization technique, according to an embodiment of the present disclosure. The deep learning network decoding method includes an entropy decoding step (S1810) of acquiring a residual parameter to be dequantized and quantization information, and an inverse quantization step (S1820) of inversely quantizing the residual parameter.

Figure 19 shows detailed steps of the inverse quantization step (S1820) of Figure 18 described above. The dequantization step (S1820) includes deriving which quantization technique among a plurality of quantization techniques was applied to the residual parameter encoded from the obtained quantization information (S1821), and determining the corresponding quantization technique according to the confirmation result. It includes a step (S1822) of deriving the restored residual parameter by applying the corresponding inverse quantization technique. Here, according to an embodiment of the present disclosure, the plurality of quantization techniques include an elimination quantization technique, a binomial quantization technique, a ternary quantization technique, and a cumulative exponential quantization technique.

Figure 20 illustrates a deep learning network encoding method that performs joint learning through a plurality of clients, according to an embodiment of the present disclosure. The deep learning network encoding method includes generating residual information that is a difference value of a reference model from an update model additionally learned by a client (S2010), and quantizing the residual information (S2020), wherein the residual information The step of quantizing information (S2020) determines the importance of residual information based on a predefined threshold using a standard normal distribution, and then selectively applies one or more of a plurality of quantization techniques based on the determined importance. do.

Figure 21 illustrates a quantization method for deep learning network encoding according to an embodiment of the present disclosure. The quantization method includes determining a binary flag for selecting a quantization technique using the average and standard deviation of residual information that is a quantization target (S2110), converting the residual information into a standard normal distribution (S2120), and Determine the importance of the residual information from a normal distribution, and use the determined importance and the binary flag to remove the residual information. Any one or a combination of a quantization technique, a binomial quantization technique, a ternary quantization technique, or a cumulative exponential quantization technique. It includes a step of quantizing by applying (S2220).

Figure 22 illustrates a dequantization method for deep learning network decoding according to an embodiment of the present disclosure. The inverse quantization method includes obtaining quantization information for the encoded residual information (S2210), from the quantization information, the encoded residual information is obtained using any quantization technique among the binomial quantization technique, the trinomial quantization technique, or the cumulative exponential quantization technique. It includes a step of deriving whether it has been applied (S2220), and a step of deriving restored residual information by applying an inverse quantization technique corresponding to the corresponding quantization technique according to the confirmation result (S2230).

Although the above-described exemplary methods of the present disclosure are expressed as a series of operations for clarity of explanation, this is not intended to limit the order in which the steps are performed, and each step may be performed simultaneously or in a different order if necessary. there is. In order to implement the method according to the present disclosure, other steps may be included in addition to the exemplified steps, some steps may be excluded and the remaining steps may be included, or some steps may be excluded and additional other steps may be included.

In the present disclosure, an encoding device or a decoding device that performs a predetermined operation (step) may perform an operation (step) that checks performance conditions or situations for the corresponding operation (step). For example, when it is described that a predetermined operation is performed when a predetermined condition is satisfied, the encoding device or decoding device performs an operation to check whether the predetermined condition is satisfied, and then performs the predetermined operation. You can.

The various embodiments of the present disclosure do not list all possible combinations but are intended to explain representative aspects of the present disclosure, and matters described in the various embodiments may be applied independently or in combination of two or more.

Additionally, various embodiments of the present disclosure may be implemented by hardware, firmware, software, or a combination thereof. For hardware implementation, one or more ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gate Arrays), general purpose It can be implemented by a processor (general processor), controller, microcontroller, microprocessor, etc.

In addition, the decoding device and the encoding device to which embodiments of the present disclosure are applied include multimedia broadcasting transmission and reception devices, mobile communication terminals, home cinema video devices, digital cinema video devices, surveillance cameras, video conversation devices, real-time communication devices such as video communication, mobile devices, etc. Streaming devices, storage media, camcorders, video on demand (VoD) service provision devices, OTT video (Over the top video) devices, Internet streaming service provision devices, three-dimensional (3D) video devices, video phone video devices, and medical video devices. etc., and may be used to process video signals or data signals. For example, OTT video (Over the top video) devices may include game consoles, Blu-ray players, Internet-connected TVs, home theater systems, smartphones, tablet PCs, and DVRs (Digital Video Recorders).

Figure 23 is a diagram illustrating a content streaming system to which an embodiment according to the present disclosure can be applied. As shown in Figure 23, a content streaming system to which an embodiment of the present disclosure is applied may largely include an encoding server, a streaming server, a web server, a media storage, a user device, and a multimedia input device.

The encoding server compresses content input from multimedia input devices such as smartphones, cameras, CCTV, etc. into digital data, generates a bitstream, and transmits it to the streaming server. As another example, when multimedia input devices such as smartphones, cameras, CCTV, etc. directly generate bitstreams, the encoding server may be omitted.

The bitstream may be generated by an encoding method and/or an encoding device to which an embodiment of the present disclosure is applied, and the streaming server may temporarily store the bitstream in the process of transmitting or receiving the bitstream.

The streaming server transmits multimedia data to the user device based on a user request through a web server, and the web server can serve as a medium to inform the user of what services are available. When a user requests a desired service from the web server, the web server delivers it to a streaming server, and the streaming server can transmit multimedia data to the user. At this time, the content streaming system may include a separate control server, and in this case, the control server may control commands/responses between each device in the content streaming system.

The streaming server may receive content from a media repository and/or encoding server. For example, when receiving content from the encoding server, the content can be received in real time. In this case, in order to provide a smooth streaming service, the streaming server may store the bitstream for a certain period of time.

Examples of the user devices include mobile phones, smart phones, laptop computers, digital broadcasting terminals, personal digital assistants (PDAs), portable multimedia players (PMPs), navigation, slate PCs, Tablet PC, ultrabook, wearable device (e.g. smartwatch, smart glass, head mounted display), digital TV, desktop There may be computers, digital signage, etc.

Each server in the content streaming system may be operated as a distributed server, and in this case, data received from each server may be distributedly processed.

The scope of the present disclosure is software or machine-executable instructions (e.g., operating system, application, firmware, program, etc.) that cause operations according to the methods of various embodiments to be executed on a device or computer, and such software or It includes non-transitory computer-readable medium in which instructions, etc. are stored and can be executed on a device or computer.

Embodiments of the present disclosure can be used for encoding or decoding deep learning networks.

Claims

In the deep learning network encoding method using a standard normal distribution-based quantization technique,

quantizing the residual parameters, and

Entropy encoding the quantized residual parameter,

The step of quantizing the residual parameter involves determining the importance of the residual parameter based on a predefined threshold using a standard normal distribution, and then selectively applying one or more of a plurality of quantization techniques based on the determined importance. Deep learning network encoding method.
According to paragraph 1,

The step of quantizing the residual parameter is,

Determining a binary flag for selecting a quantization technique using the average and standard deviation of the residual parameters,

converting the residual parameters into a standard normal distribution, and

Determine the importance of the residual parameter from the standard normal distribution, and use the determined importance and the binary flag to remove the residual parameter by using one or more of a removal quantization technique, a binomial quantization technique, a ternary quantization technique, or a cumulative exponential quantization technique. A deep learning network encoding method including the step of quantizing by applying a combination of.
According to paragraph 2,

The residual parameter is a deep learning network encoding method that is a residual weight of a deep learning network that occurs in federated learning.
According to paragraph 2,

The step of quantizing the residual parameter is,

A deep learning network encoding method that further includes the step of lowering the dimension of the residual parameter to be quantized, but inducing the dimension of the changed residual parameter to be one-dimensional.
According to paragraph 2,

When the absolute value of the value (z) normalized by the standard normal distribution is smaller than a preset first threshold, the deep learning network encoding method selects the removal quantization technique.
According to clause 5,

A deep learning network encoding method, wherein the preset first threshold is 1.
According to clause 6,

The removal quantization technique is a deep learning network encoding method in which all residual parameters to which the removal quantization technique is applied are replaced with 0.
According to paragraph 2,

A deep learning network encoding method in which, when the absolute value of the value (z) normalized by the standard normal distribution falls within a preset specific interval, one of the binomial quantization technique or the trinomial quantization technique is selected.
According to clause 8,

The preset specific section is a section between 1 and 2, a deep learning network encoding method.
According to clause 8,

The step of quantizing the residual parameter is,

It further includes the step of calculating a probability density function for each layer of the deep learning network, wherein the mean and standard deviation are calculated from the normal distribution of the probability density function, and the binary for selecting either the binomial quantization technique or the trinomial quantization technique. Deep learning network encoding method for determining flags.
According to clause 10,

If the absolute value of the difference between the residual parameter mean and standard deviation is greater than 0, the binary flag is set to True and the binomial quantization technique is selected.
According to clause 10,

If the absolute value of the difference between the residual parameter mean and the standard deviation is less than 0, the binary flag is set to False and the ternary quantization technique is selected.
According to paragraph 2,

When the absolute value of the value (z) normalized by the standard normal distribution is greater than a preset second threshold, the deep learning network encoding method selects the cumulative exponential quantization technique.
According to clause 13,

The deep learning network encoding method wherein the preset second threshold is 2.
In the deep learning network decoding method encoded with a standard normal distribution-based quantization technique,

An entropy decoding step of acquiring residual parameters and quantization information to be dequantized, and

Including an inverse quantization step of inverse quantizing the residual parameter,

The inverse quantization step includes deriving which quantization technique among a plurality of quantization techniques has been applied to the encoded residual parameter from the obtained quantization information, and determining an inverse quantization technique corresponding to the corresponding quantization technique according to the confirmation result. A deep learning network decoding method including the step of deriving a restored residual parameter by applying it.
According to clause 15,

The plurality of quantization techniques include a removal quantization technique, a binomial quantization technique, a ternary quantization technique, and a cumulative exponential quantization technique.
In a deep learning network encoding method that performs federated learning through multiple clients,

Generating residual information, which is a difference value of the reference model, from the update model additionally learned by the client, and

Including the step of quantizing the residual information,

The step of quantizing the residual information includes determining the importance of the residual information based on a predefined threshold using a standard normal distribution, and then selectively applying one or more of a plurality of quantization techniques based on the determined importance. , A deep learning network encoding method that performs federated learning.
In a deep learning network system that performs federated learning,

A plurality of clients that generate residual information, which is the difference value of the reference model, from the additionally learned update model, and

A central server that receives residual information generated from the plurality of clients, generates supplemented residual information, and transmits it to the plurality of clients,

The residual information generated by the plurality of clients or the supplemented residual information generated by the central server determines the importance of the residual information based on a predefined threshold using a standard normal distribution, and then determines the importance of the residual information based on the determined importance. A deep learning network system that performs federated learning, characterized in that it is quantized by selectively applying one or more of a plurality of quantization techniques based on .
In a quantization method for deep learning network encoding,

A step of determining a binary flag for selecting a quantization technique using the average and standard deviation of the residual information that is the target of quantization,

converting the residual information into a standard normal distribution, and

Determine the importance of the residual information from the standard normal distribution, and use the determined importance and the binary flag to remove the residual information. One or more of a quantization technique, a binomial quantization technique, a trinomial quantization technique, or a cumulative exponential quantization technique. A quantization method comprising the step of quantizing by applying a combination of .
In the inverse quantization method for deep learning network decoding,

Obtaining quantization information for encoded residual information,

From the quantization information, deriving which quantization technique among the binomial quantization technique, ternary quantization technique, or cumulative exponential quantization technique was applied to the encoded residual information, and

An inverse quantization method comprising deriving restored residual information by applying an inverse quantization technique corresponding to the corresponding quantization technique according to the confirmation result.