TWI729939B

TWI729939B - Method and processor for decompression of model parameters using functions based upon cumulative count distributions

Info

Publication number: TWI729939B
Application number: TW109132752A
Authority: TW
Inventors: 強納森亞歷山德羅斯; 丹尼斯查爾斯艾伯茲
Original assignee: 美商葛如克公司
Priority date: 2019-03-22
Filing date: 2019-03-22
Publication date: 2021-06-01
Also published as: TW202103064A

Abstract

A predictive model utilizes a set of coefficients for processing received input data. To reduce memory usage storing the coefficients, a compression circuit compresses the set of coefficients prior to storage by generating a cumulative count distribution of the coefficient values, and identifying a distribution function approximating the cumulative count distribution. Function parameters for the determined function are stored in a memory and used by a decompression circuit to apply the function the compressed coefficients to determine the decompressed component values. Storing the function parameters may consume less memory in comparison to storing a look-up table for decompression, and may reduce an amount of memory look-ups required during decompression.

Description

Method and processor for decompression of model parameters using functions based on cumulative count distribution

本發明係關於一種解壓縮之方法及處理器，特別是使用基於累積計數分佈之函數之用於模型參數之解壓縮之方法及處理器。The present invention relates to a method and processor for decompression, in particular a method and processor for decompression of model parameters using a function based on cumulative count distribution.

本發明大體上係關於一種模型參數之解壓縮，且明確言之係關於用於一神經網路之模型參數之解壓縮。The present invention generally relates to the decompression of a model parameter, and specifically relates to the decompression of the model parameter used in a neural network.

神經網路及其他類型之模型可用於處理各種類型之資料。例如，一神經網路模型可經訓練以辨識經接收輸入影像內是否存在某些類型之物件。訓練及機器學習可用於判定待藉由模型用於處理輸入資料之一係數集，諸如一神經網路模型之神經元之間的權重。Neural networks and other types of models can be used to process various types of data. For example, a neural network model can be trained to recognize whether certain types of objects exist in the received input image. Training and machine learning can be used to determine a set of coefficients to be used by the model to process input data, such as the weights between neurons of a neural network model.

一種預測模型(例如，一神經網路模型)可與該模型之一係數集結合使用。該係數集可儲存於一記憶體中且經存取用於對輸入資料(例如，待藉由該模型分析之一影像)執行算術運算。A predictive model (for example, a neural network model) can be used in combination with a coefficient set of the model. The coefficient set can be stored in a memory and accessed for performing arithmetic operations on input data (for example, an image to be analyzed by the model).

為減少記憶體使用，在儲存之前壓縮該係數集。在操作該輸入資料之前將需要解壓縮該等經儲存壓縮係數。在一些實施例中，基於一函數壓縮經判定係數值。基於解壓縮係數值之一累積計數分佈產生該函數。例如，一模型係數集之計數值可概算一雙峰分佈、高斯(Gaussian)分佈、帕松(Poisson)分佈或一函數可定義之其他類型之分佈。該經判定函數之函數參數可儲存於一記憶體中且由一解壓縮電路使用以將該函數應用於該等壓縮模型係數而進行解壓縮。相較於其他解壓縮方法(例如，一查找表)，儲存該等函數參數可消耗較少記憶體，且亦可減少解壓縮期間所需之記憶體查找之一量。To reduce memory usage, compress the coefficient set before saving. The stored compression factors will need to be decompressed before operating the input data. In some embodiments, the determined coefficient value is compressed based on a function. The function is generated based on the cumulative count distribution of one of the decompression coefficient values. For example, the count value of a model coefficient set can be approximated by a bimodal distribution, a Gaussian distribution, a Poisson distribution, or other types of distributions that can be defined by a function. The function parameters of the determined function can be stored in a memory and used by a decompression circuit to apply the function to the compression model coefficients for decompression. Compared with other decompression methods (for example, a look-up table), storing the function parameters can consume less memory and can also reduce the amount of memory search required during decompression.

在一些實施例中，提供一種用於解壓縮模型係數值之方法。該方法包括接收與一模型相關聯之壓縮係數資料。在一些實施例中，透過一模型訓練程序判定該係數資料之值，且其中使用基於該係數資料之值之一累積分佈之一壓縮函數來壓縮該係數資料。該方法進一步包括擷取與該壓縮函數相關聯之一組函數參數，該組函數參數指定至少一函數類型。該方法進一步包括基於該等經擷取函數參數組態一解壓縮電路。該方法進一步包括在該解壓縮電路處基於函數參數解壓縮該壓縮係數資料以產生解壓縮係數值。該方法進一步包括藉由基於該等壓縮係數值對經接收輸入資料執行算術運算而將該模型應用於該經接收輸入資料。In some embodiments, a method for decompressing model coefficient values is provided. The method includes receiving compression factor data associated with a model. In some embodiments, the value of the coefficient data is determined through a model training process, and a compression function based on a cumulative distribution of the value of the coefficient data is used to compress the coefficient data. The method further includes retrieving a set of function parameters associated with the compression function, the set of function parameters specifying at least one function type. The method further includes configuring a decompression circuit based on the retrieved function parameters. The method further includes decompressing the compression coefficient data based on the function parameter at the decompression circuit to generate a decompression coefficient value. The method further includes applying the model to the received input data by performing arithmetic operations on the received input data based on the compression factor values.

一預測模型(例如，一神經網路模型)可在處理經接收輸入資料時利用一係數集。例如，對於一神經網路模型，該係數集可對應於神經網路之不同神經元之間的權重。該係數集可儲存於一記憶體中且經存取用於對輸入資料(例如，待藉由模型分析之一影像)執行算術運算。A predictive model (for example, a neural network model) can utilize a set of coefficients when processing the received input data. For example, for a neural network model, the coefficient set may correspond to the weights between different neurons of the neural network. The coefficient set can be stored in a memory and accessed for performing arithmetic operations on input data (for example, an image to be analyzed by a model).

為減少記憶體使用，在儲存之前壓縮該係數集。在操作輸入資料之前將需要解壓縮經儲存壓縮係數。查找表可用於將壓縮係數值映射至解壓縮係數值。然而，查找表可需要大量記憶體用於儲存，特別是在係數範圍較大時。另外，在一些實施例中，可能對不同係數子集執行不同類型之壓縮，且因而將需要儲存多個查找表。To reduce memory usage, compress the coefficient set before saving. The stored compression factor will need to be decompressed before operating the input data. A lookup table can be used to map compression factor values to decompression factor values. However, the look-up table may require a large amount of memory for storage, especially when the coefficient range is large. In addition, in some embodiments, different types of compression may be performed on different coefficient subsets, and thus multiple lookup tables will need to be stored.

在一些實施例中，基於一函數壓縮經判定係數值。基於解壓縮係數值之一累積計數分佈產生函數。例如，一模型係數集之計數值可概算一雙峰分佈、高斯分佈、帕松分佈或一函數可定義之其他類型之分佈。經判定函數之函數參數可儲存於一記憶體中且由一解壓縮電路使用以將函數應用於壓縮模型係數而進行解壓縮。與儲存用於解壓縮之一查找表相比，儲存函數參數可消耗較少記憶體。另外，儲存一經判定函數之函數參數所需之儲存空間與係數值之範圍或不同可能係數值之數目無關。In some embodiments, the determined coefficient value is compressed based on a function. A function is generated based on the cumulative count distribution of one of the decompression coefficient values. For example, the count value of a model coefficient set can be approximated by a bimodal distribution, a Gaussian distribution, a Passon distribution, or another type of distribution that can be defined by a function. The function parameters of the determined function can be stored in a memory and used by a decompression circuit to apply the function to the compression model coefficients for decompression. Compared with storing a lookup table for decompression, storing function parameters can consume less memory. In addition, the storage space required to store the function parameters of a determined function has nothing to do with the range of coefficient values or the number of different possible coefficient values.

使用解壓縮函數亦減少解壓縮期間所需之記憶體查找之量。例如，一給定函數之函數參數僅需要在開始由解壓縮電路進行解壓縮時查找一次，且其等用於解壓縮使用該函數壓縮之大量係數。另一方面，使用查找表進行解壓縮通常需要針對待解壓縮之各係數進行一記憶體查找。Using the decompression function also reduces the amount of memory lookup required during decompression. For example, the function parameters of a given function only need to be looked up once when the decompression circuit starts to decompress, and they are used to decompress a large number of coefficients compressed using the function. On the other hand, decompression using a lookup table usually requires a memory lookup for each coefficient to be decompressed.

圖1繪示根據一些實施例之用於儲存及解壓縮用於一模型中之模型係數的一系統之一示意圖。一張量流處理器(TSP) 100或其他類型之處理器經組態以基於一經儲存模型接收及處理輸入資料值102 (例如，來自一輸入影像)而產生輸出資料值 (例如，輸入影像之一分類、輸入資料中之某些類型之物件或特性之識別及/或類似者)。TSP 100可為一積體電路(IC)。在一些實施例中，輸入資料值102可為儲存於記憶體108中之輸入值且其等表示在TSP 100內之別處執行之算術運算之結果。FIG. 1 shows a schematic diagram of a system for storing and decompressing model coefficients used in a model according to some embodiments. A flow processor (TSP) 100 or other type of processor is configured to receive and process input data values 102 (for example, from an input image) based on a stored model to generate output data values (for example, the input image A classification, the identification of certain types of objects or characteristics in the input data and/or similar ones). The TSP 100 may be an integrated circuit (IC). In some embodiments, the input data value 102 may be an input value stored in the memory 108 and represents the result of an arithmetic operation performed elsewhere in the TSP 100.

TSP 100使用一或多個算術電路單元及一或多個模型係數來操作輸入資料值102。算術電路單元包含邏輯電路，該等邏輯電路對輸入值102及模型係數執行算術運算且產生表示算術運算之一結果之輸出資料值。例如，算術電路單元可使用模型係數對輸入值執行一矩陣乘法運算且產生表示矩陣乘積之輸出資料值。一預測模型118 (諸如一神經網路)之執行通常可使用矩陣乘法之數個連續階段來實施。在其他實施例中，算術電路單元之算術運算可包含一迴旋運算、一點乘積運算、一快速傅立葉(Fourier)變換(FFT)運算及/或其他算術運算。算術電路單元106可使用單指令多資料(SIMD)處理來執行運算。The TSP 100 uses one or more arithmetic circuit units and one or more model coefficients to manipulate the input data value 102. The arithmetic circuit unit includes logic circuits that perform arithmetic operations on the input values 102 and model coefficients and generate output data values that represent a result of the arithmetic operations. For example, the arithmetic circuit unit can use the model coefficients to perform a matrix multiplication operation on the input value and generate an output data value representing the matrix product. The execution of a predictive model 118 (such as a neural network) can generally be implemented using successive stages of matrix multiplication. In other embodiments, the arithmetic operation of the arithmetic circuit unit may include a convolution operation, a one-point product operation, a Fast Fourier Transform (FFT) operation, and/or other arithmetic operations. The arithmetic circuit unit 106 may use single instruction multiple data (SIMD) processing to perform operations.

TSP 100包括一記憶體108，記憶體108儲存由算術單元使用以操作輸入資料值102的壓縮模型係數112。可由編譯器120自預測模型118產生壓縮模型係數112。預測模型118可對應於利用一係數集之任何類型之模型。在一些實施例中，透過一機器學習或訓練程序判定該係數集。例如，在一些實施例中，預測模型118係一迴旋神經網路(CNN)或其他類型之神經網路模型。The TSP 100 includes a memory 108 that stores the compression model coefficients 112 used by the arithmetic unit to manipulate the input data value 102. The compression model coefficient 112 can be generated by the compiler 120 from the prediction model 118. The predictive model 118 may correspond to any type of model using a set of coefficients. In some embodiments, the coefficient set is determined through a machine learning or training procedure. For example, in some embodiments, the prediction model 118 is a convolutional neural network (CNN) or other types of neural network models.

一旦已建構或充分訓練預測模型118，便可由一編譯器120編譯模型118以由TSP 110使用用於處理輸入資料值102。編譯器120分析預測模型118之係數值，且選擇用於壓縮模型之係數值之一或多個壓縮方案。接著，將壓縮係數值作為壓縮模型係數112儲存於記憶體108中。Once the predictive model 118 has been constructed or fully trained, the model 118 can be compiled by a compiler 120 for use by the TSP 110 for processing the input data values 102. The compiler 120 analyzes the coefficient values of the prediction model 118 and selects one or more compression schemes for the coefficient values of the compression model. Then, the compression coefficient value is stored in the memory 108 as the compression model coefficient 112.

為了由算術電路單元使用以操作輸入資料值102，需要解壓縮與模型相關聯之壓縮模型係數112。一解壓縮電路經組態以自記憶體108接收壓縮模型係數112，且輸出可由算術單元操作之解壓縮模型係數。In order to be used by the arithmetic circuit unit to manipulate the input data value 102, the compressed model coefficient 112 associated with the model needs to be decompressed. A decompression circuit is configured to receive the compression model coefficients 112 from the memory 108 and output the decompression model coefficients that can be operated by the arithmetic unit.

在一些實施例中，編譯器120基於自與模型相關聯之係數值之一分佈導出的一函數選擇用於預測模型118之係數之一壓縮方案。例如，在許多情況中，模型之係數值之分佈可具有為一雙峰分佈、一高斯分佈或一帕松分佈之一分佈。編譯器120判定最佳擬合模型係數分佈之一函數類型，且將經判定函數之參數作為函數參數114儲存於記憶體108中。函數參數114可指示與分佈相關聯之一函數類型以及函數之係數及/或與函數相關之其他參數的值。在一些實施例中，所儲存之函數參數之類型係基於一函數類型。In some embodiments, the compiler 120 selects a compression scheme for the coefficients of the predictive model 118 based on a function derived from a distribution of one of the coefficient values associated with the model. For example, in many cases, the distribution of the coefficient values of the model may have one of a bimodal distribution, a Gaussian distribution, or a Passon distribution. The compiler 120 determines a function type of the best fitting model coefficient distribution, and stores the parameters of the determined function as function parameters 114 in the memory 108. The function parameter 114 may indicate a function type associated with the distribution and the coefficient of the function and/or the value of other parameters associated with the function. In some embodiments, the type of the stored function parameter is based on a function type.

解壓縮電路支援用於解壓縮壓縮模型係數112之數種可能函數。解壓縮電路藉由將由函數參數114定義之特定函數應用於壓縮模型係數112解壓縮壓縮模型係數112以判定解壓縮模型係數。The decompression circuit supports several possible functions for decompressing the compression model coefficient 112. The decompression circuit determines the decompression model coefficient by applying a specific function defined by the function parameter 114 to the compression model coefficient 112 and the decompression model coefficient 112.

使用函數執行解壓縮可減少儲存用於解壓縮之資料所需之記憶體之量(例如，相較於一查找表)。另外，儲存函數參數所需之記憶體之量可與係數值之範圍或不同可能係數值之數目無關。使用解壓縮函數亦減少解壓縮期間所需之記憶體查找之量。例如，函數參數114可表示在解壓縮開始時查找一次之少量恆定記憶體，且可用於解壓縮包括許多係數之一長串資料。另一方面，使用查找表進行解壓縮通常需要針對待解壓縮之各係數進行一記憶體查找。Using functions to perform decompression can reduce the amount of memory required to store the data used for decompression (for example, compared to a lookup table). In addition, the amount of memory required to store the function parameters can be independent of the range of coefficient values or the number of different possible coefficient values. Using the decompression function also reduces the amount of memory lookup required during decompression. For example, the function parameter 114 can represent a small amount of constant memory that is searched once at the beginning of decompression, and can be used to decompress a long string of data including many coefficients. On the other hand, decompression using a lookup table usually requires a memory lookup for each coefficient to be decompressed.

在一些實施例中，記憶體108可將預測模型118之壓縮模型係數儲存為複數個不同係數集(例如，一第一壓縮模型係數112A集及一第二壓縮模型係數112B集)。可能已基於一不同函數(例如，與第一函數參數114A相關聯之一第一函數及與第二函數參數114B相關聯之一第二函數)且運用算術或霍夫曼(Huffman)寫碼壓縮各壓縮模型係數112集。在一些實施例中，一不同解壓縮電路(例如，解壓縮電路110A及110B)可用於解壓縮使用不同函數壓縮之不同壓縮模型係數集，以產生不同解壓縮模型係數(例如，解壓縮模型係數116A及116B)集。可由多個算術單元(例如，算術單元106A及106B)操作經輸出解壓縮模型參數116A及116B以產生輸出資料值之多個集(例如：輸出資料104A及104B)。In some embodiments, the memory 108 may store the compression model coefficients of the prediction model 118 as a plurality of different coefficient sets (for example, a first compression model coefficient 112A set and a second compression model coefficient 112B set). May have been based on a different function (for example, a first function associated with the first function parameter 114A and a second function associated with the second function parameter 114B) and using arithmetic or Huffman coding compression 112 sets of coefficients for each compression model. In some embodiments, a different decompression circuit (e.g., decompression circuits 110A and 110B) can be used to decompress different sets of compression model coefficients compressed using different functions to generate different decompression model coefficients (e.g., decompression model coefficients). 116A and 116B) set. Multiple arithmetic units (e.g., arithmetic units 106A and 106B) can operate the output decompressed model parameters 116A and 116B to generate multiple sets of output data values (e.g., output data 104A and 104B).

在一些實施例中，多個函數可用於解壓縮壓縮模型係數。例如，在壓縮模型係數時，編譯器可將係數劃分為一或多個子集，且判定對應於各子集中之係數值之分佈的一函數及參數。In some embodiments, multiple functions can be used to decompress the compressed model coefficients. For example, when compressing model coefficients, the compiler can divide the coefficients into one or more subsets, and determine a function and parameter corresponding to the distribution of the coefficient values in each subset.

圖2繪示可使用不同函數解壓縮之一壓縮模型係數集之一方塊圖。在一些實施例中，壓縮模型係數可經由複數個位元通道(例如，位元通道0至n)自記憶體傳輸至一解壓縮電路。解壓縮電路可使用一不同函數(例如，函數f_0至函數f_n)解壓縮各位元通道上之傳入係數資料。例如，解壓縮電路可接收對應於函數f_0至函數f_n之多組函數參數，各函數用於解壓縮經由一對應位元通道接收之係數資料。Figure 2 shows a block diagram of a set of compression model coefficients that can be decompressed using different functions. In some embodiments, the compression model coefficients can be transmitted from the memory to a decompression circuit via a plurality of bit channels (for example, bit channels 0 to n). The decompression circuit can use a different function (for example, function f_0 to function f_n) to decompress the incoming coefficient data on the bit channel. For example, the decompression circuit may receive multiple sets of function parameters corresponding to functions f_0 to f_n, and each function is used to decompress coefficient data received via a corresponding bit channel.

儘管圖2繪示針對各位元通道應用一不同函數之解壓縮電路，然應瞭解，在其他實施例中，一共同函數可應用於多個位元通道。另外，由解壓縮電路使用以解壓縮壓縮模型係數之函數可經組態以隨時間改變。例如，在一時間t=0，解壓縮電路可使用函數f_0至函數f_n進行解壓縮。然而，在一稍後時間t=t₁ ，解壓縮電路可接收不同組函數參數以改變用於解壓縮位元通道之一或多者之函數。在一些實施例中，編譯器在壓縮待儲存於TSP之記憶體中之模型係數時判定哪些函數針對哪些位元通道且在何時用於壓縮/解壓縮模型係數。Although FIG. 2 illustrates a decompression circuit that applies a different function to each bit channel, it should be understood that in other embodiments, a common function may be applied to multiple bit channels. In addition, the function used by the decompression circuit to decompress the compression model coefficients can be configured to change over time. For example, at a time t=0, the decompression circuit can use the function f_0 to the function f_n to perform decompression. However, at a later time t=t ₁ , the decompression circuit may receive different sets of function parameters to change the function used to decompress one or more of the bit channels. In some embodiments, the compiler determines which functions are for which bit channels and when to compress/decompress the model coefficients when compressing the model coefficients to be stored in the memory of the TSP.

在一些實施例中，在使用某些壓縮方案進行壓縮時，相較於解壓縮，某些係數值在被壓縮時可佔據較大數目個位元。因而，編譯器可判定無需壓縮此等係數值。在解壓縮期間，解壓縮電路可經組態以將一恆等函數應用於此等係數值。或者，可繞過解壓縮電路。In some embodiments, when certain compression schemes are used for compression, certain coefficient values may occupy a larger number of bits when compressed compared to decompression. Therefore, the compiler can determine that there is no need to compress these coefficient values. During decompression, the decompression circuit can be configured to apply an identity function to these coefficient values. Alternatively, the decompression circuit can be bypassed.

圖3A繪示展示根據一些實施例之一模型係數分佈之一例示性圖表。圖表300具有對應於係數值之一x軸及對應於計數值之一y軸。雖然圖表300之x軸僅展示整數係數值，但應瞭解，一模型之係數值可使用整數、浮點數、定點數及/或類似者表示。FIG. 3A illustrates an exemplary graph showing a model coefficient distribution according to some embodiments. The graph 300 has an x-axis corresponding to the coefficient value and a y-axis corresponding to the count value. Although the x-axis of the graph 300 only shows integer coefficient values, it should be understood that the coefficient values of a model can be represented by integers, floating-point numbers, fixed-point numbers, and/or the like.

圖表300含有展示一特定模型之一係數值分佈之一第一曲線302。在產生模型之一係數集之後(例如，透過一訓練程序)，對該集合之具有各值之係數的數目進行計數。在許多情況中，具各值之係數之數目將概算一共同分佈，諸如一雙峰分佈、一高斯分佈、一帕松分佈及/或類似者。例如，如由第一曲線302繪示，特定模型之係數值具有一大體雙峰分佈，其中最大數目之係數具有值-2或2。The graph 300 contains a first curve 302 showing the distribution of a coefficient value of a specific model. After generating a coefficient set of the model (for example, through a training procedure), the number of coefficients with each value in the set is counted. In many cases, the number of coefficients with each value will approximate a common distribution, such as a bimodal distribution, a Gaussian distribution, a Passon distribution, and/or the like. For example, as depicted by the first curve 302, the coefficient values of the specific model have a roughly bimodal distribution, where the largest number of coefficients has the value -2 or 2.

圖表300亦繪示一第二曲線304，其指示模型之係數值之一累積分佈。累積分佈曲線304指示針對圖表300之x軸上所表示之各係數值，係數之一總數小於或等於該值。因而，一係數集之累積分佈將單調增加，而容許使用分佈之一函數自一給定計數值導出一唯一係數值。The graph 300 also shows a second curve 304, which indicates the cumulative distribution of one of the coefficient values of the model. The cumulative distribution curve 304 indicates that for each coefficient value represented on the x-axis of the graph 300, the total number of one of the coefficients is less than or equal to this value. Therefore, the cumulative distribution of a set of coefficients will increase monotonically, allowing the use of a function of the distribution to derive a unique coefficient value from a given count value.

藉由編譯器基於擬合係數之累積計數分佈的一函數壓縮模型之係數值。在一些實施例中，編譯器可首先基於累積計數分佈選擇一函數類型，且判定選定函數類型之函數參數以達成函數類型與累積計數分佈之一最佳擬合。例如，圖表300中繪示之第三曲線306對應於可藉由編譯器選擇以概算表示累積計數分佈304之多項式函數。如圖3中繪示，對應於第三曲線306之多項式函數可為八階多項式函數。在一些實施例中，函數可基於概算表示係數值之計數分佈(曲線302)之一函數之一積分。The coefficient values of the model are compressed by the compiler based on a function of the cumulative count distribution of the fitting coefficients. In some embodiments, the compiler may first select a function type based on the cumulative count distribution, and determine the function parameters of the selected function type to achieve a best fit between the function type and the cumulative count distribution. For example, the third curve 306 shown in the graph 300 corresponds to a polynomial function that can be selected by the compiler to approximate the cumulative count distribution 304. As shown in FIG. 3, the polynomial function corresponding to the third curve 306 may be an eighth-order polynomial function. In some embodiments, the function may be based on an integral of a function of the approximate count distribution (curve 302) representing the coefficient value.

在一些實施例中，編譯器使用算術編碼基於經判定函數壓縮係數值。例如，如圖3B中繪示，將函數之計數值擬合至0與1之間的一範圍中，其中0由二進位序列0000…表示且1由二進位序列1111…表示。此導致較常出現係數(例如，具有較高計數值之係數)由短位元序列表示，且較不常出現係數由長位元序列表示。In some embodiments, the compiler uses arithmetic coding to compress coefficient values based on the determined function. For example, as shown in FIG. 3B, the count value of the function is fitted to a range between 0 and 1, where 0 is represented by the binary sequence 0000... and 1 is represented by the binary sequence 1111... This results in the more frequently occurring coefficients (for example, coefficients with higher count values) being represented by a short bit sequence, and the less frequently occurring coefficients being represented by a long bit sequence.

在一些實施例中，各係數值可對應於基於相鄰係數值之間之區間之一值區間。可基於一捨入方案、頂限函數、地板函數及/或類似者判定各係數值之區間。例如，在係數值係整數且使用一地板函數來判定值區間之一實施例中，係數值1可對應於區間[1, 2)，係數值2可對應於區間[2, 3)，等等。In some embodiments, each coefficient value may correspond to a value interval based on an interval between adjacent coefficient values. The interval of each coefficient value can be determined based on a rounding scheme, ceiling function, floor function, and/or the like. For example, in an embodiment where the coefficient value is an integer and a floor function is used to determine the value interval, the coefficient value 1 may correspond to the interval [1, 2), the coefficient value 2 may correspond to the interval [2, 3), etc. .

各區間可對應於二進位序列值之一範圍(如使用函數判定者)，其中使用表示對應於區間之二進位序列值之範圍的一位元序列寫碼各係數值。因而，因為具有高計數之係數值一般將對應於二進位序列值之一較大範圍，所以其等可使用較小數目個位元進行壓縮。Each interval may correspond to a range of binary sequence values (such as those determined by using a function), wherein each coefficient value is coded with a bit sequence representing the range of the binary sequence value corresponding to the interval. Therefore, because coefficient values with high counts will generally correspond to a larger range of one of the binary sequence values, they can be compressed using a smaller number of bits.

例如，假設諸係數值係整數，且係數值0對應於區間[-0.5, 0.5)，而係數值2對應於區間[1.5, 2.5)。如圖3B中繪示，基於函數306，範圍308內之位元序列將映射至係數值0，而範圍310內之位元序列將映射至係數值2。因為範圍310跨越一較大範圍之位元序列，所以該範圍之位元序列相較於範圍308之位元序列通常可使用較小數目個共同位元表示。因而，係數值2 (其相較於係數值0具有一較高計數，如圖3A中繪示)相較於係數值0在壓縮時使用較小數目個位元表示。例如，範圍308跨越序列1000…，而範圍310可跨越二進位序列1011…至1110…。因而，係數值0可使用位元序列1000 (4個位元)表示，而係數值2可使用位元序列110 (3個位元)表示。應瞭解，在一些實施例中，表示一壓縮係數值之位元序列可能不表示對應於與該值相關聯之區間之範圍內的全部位元序列，只要位元序列不表示對應於其他係數值之區間之範圍的位元序列即可。For example, suppose that the coefficient values are integers, and the coefficient value 0 corresponds to the interval [-0.5, 0.5), and the coefficient value 2 corresponds to the interval [1.5, 2.5). As shown in FIG. 3B, based on the function 306, the bit sequence in the range 308 will be mapped to the coefficient value 0, and the bit sequence in the range 310 will be mapped to the coefficient value 2. Because the range 310 spans a larger range of bit sequences, the bit sequence of this range can usually be represented by a smaller number of common bits than the bit sequence of range 308. Therefore, the coefficient value 2 (which has a higher count compared to the coefficient value 0, as shown in FIG. 3A) is represented by a smaller number of bits when compressed than the coefficient value 0. For example, the range 308 spans the sequence 1000..., and the range 310 may span the binary sequence 1011... to 1110.... Therefore, the coefficient value 0 can be represented by the bit sequence 1000 (4 bits), and the coefficient value 2 can be represented by the bit sequence 110 (3 bits). It should be understood that, in some embodiments, the bit sequence representing a compression coefficient value may not represent all bit sequences within the range corresponding to the value, as long as the bit sequence does not represent corresponding to other coefficient values. The bit sequence of the range of the interval is sufficient.

圖4A繪示根據一些實施例之一解壓縮電路之一方塊圖。解壓縮電路400可對應於圖1中繪示之解壓縮電路，且其經組態以接收壓縮係數值402且輸出解壓縮係數值404。在一些實施例中，解壓縮電路使用算術寫碼技術及與係數值402相關聯之一函數來解壓縮自壓縮係數值402接收之位元。FIG. 4A shows a block diagram of a decompression circuit according to some embodiments. The decompression circuit 400 may correspond to the decompression circuit shown in FIG. 1, and it is configured to receive the compression factor value 402 and output the decompression factor value 404. In some embodiments, the decompression circuit uses arithmetic coding techniques and a function associated with the coefficient value 402 to decompress the bits received from the compression coefficient value 402.

解壓縮電路在一序列擴展器電路406處接收壓縮係數值402之一或多個位元之一序列，序列擴展器電路406產生經接收位元序列之一高位元序列408及一低位元序列410。如本文中所使用，高位元序列408對應於添加有複數個二進位「1」值之經接收位元序列，且低位元序列410對應於添加有複數個二進位「0」值之經接收位元序列。例如，對於經接收位元序列「10」，高位元序列係「10111…」，而低位元序列將為「10000…」。The decompression circuit receives a sequence of one or more bits of the compression coefficient value 402 at a sequence expander circuit 406, and the sequence expander circuit 406 generates a high bit sequence 408 and a low bit sequence 410 of the received bit sequence . As used herein, the high bit sequence 408 corresponds to the received bit sequence added with a plurality of binary "1" values, and the low bit sequence 410 corresponds to the received bit sequence added with a plurality of binary "0" values. Meta sequence. For example, for the received bit sequence "10", the high bit sequence will be "10111..." and the low bit sequence will be "10000...".

解壓縮函數電路414基於一或多個經接收函數參數412判定待用於解壓縮之一函數。例如，圖4B繪示根據一些實施例之含有對應於不同函數類型之函數計算電路的一例示性解壓縮函數電路。解壓縮函數電路414包含數個函數計算電路，函數計算電路之各者實施用於自一輸入值計算一輸出值之一不同類型之函數。例如，如圖4B中繪示，函數計算電路可包括對應於多項式函數之一第一函數計算電路450a、對應於一高斯分佈函數之一第二函數計算電路450b及對應於一帕松分佈函數之一第三函數計算電路450c。The decompression function circuit 414 determines a function to be used for decompression based on one or more received function parameters 412. For example, FIG. 4B illustrates an exemplary decompression function circuit including function calculation circuits corresponding to different function types according to some embodiments. The decompression function circuit 414 includes several function calculation circuits, and each of the function calculation circuits implements a different type of function for calculating an output value from an input value. For example, as shown in FIG. 4B, the function calculation circuit may include a first function calculation circuit 450a corresponding to a polynomial function, a second function calculation circuit 450b corresponding to a Gaussian distribution function, and a second function calculation circuit 450b corresponding to a Passon distribution function. A third function calculation circuit 450c.

函數參數412可包括指示可由解壓縮函數電路414使用以判定待使用之一函數計算電路之一函數類型(例如，多項式函數、高斯分佈函數及/或類似者)的一第一函數類型參數及零個或多個額外函數係數參數(例如，多項式函數之係數)。如圖4B中繪示，各不同類型之函數可與不同數目個係數及/或不同類型之係數相關聯。例如，函數計算電路450b可經組態以計算用於解壓縮擬合一高斯型分佈之係數值的一函數(例如，一高斯分佈之一積分之一逆運算)，而函數計算電路450c可經組態以計算用於解壓縮一帕松型分佈之一函數。在一些實施例中，解壓縮函數電路414可基於待解壓縮之壓縮係數集或子集接收不同組之函數參數412。解壓縮函數電路414將函數應用於高位元序列408以判定一高係數值416，且將函數應用於低位元序列410以判定一低係數值418。The function parameter 412 may include a first function type parameter indicating a function type (for example, a polynomial function, a Gaussian distribution function, and/or the like) that can be used by the decompression function circuit 414 to determine a function calculation circuit to be used, and zero. One or more additional function coefficient parameters (for example, the coefficients of a polynomial function). As shown in FIG. 4B, different types of functions can be associated with different numbers of coefficients and/or different types of coefficients. For example, the function calculation circuit 450b may be configured to calculate a function for decompressing and fitting a coefficient value of a Gaussian distribution (for example, an integral and an inverse operation of an integral of a Gaussian distribution), and the function calculation circuit 450c may be Configure to calculate a function for decompressing a Passon-type distribution. In some embodiments, the decompression function circuit 414 may receive different sets of function parameters 412 based on the set or subset of compression coefficients to be decompressed. The decompression function circuit 414 applies the function to the high bit sequence 408 to determine a high coefficient value 416, and applies the function to the low bit sequence 410 to determine a low coefficient value 418.

在一些實施例中，解壓縮函數電路414在處理一經接收位元序列(例如，高或低位元序列)時使用函數判定一對應值，且基於對應值所在之一區間識別對應於位元序列之一係數值。例如，若由函數判定之對應值對應於兩個不同係數值之間的一值，則解壓縮函數電路414可基於一區間選擇方案(例如，捨入、頂限函數、地板函數及/或類似者)選擇一係數值。In some embodiments, the decompression function circuit 414 uses a function to determine a corresponding value when processing a received bit sequence (for example, a high or low bit sequence), and identifies the corresponding value based on an interval of the corresponding value. A coefficient value. For example, if the corresponding value determined by the function corresponds to a value between two different coefficient values, the decompression function circuit 414 may be based on an interval selection scheme (for example, rounding, ceiling function, floor function, and/or similar ) Select a coefficient value.

比較器及控制電路420接收由解壓縮函數電路414判定之高係數值416及低係數值418，且判定高係數值及低係數值是否相同。若高係數值及低係數值相同，則將經接收位元序列輸出為一解壓縮輸出係數404。接著，解壓縮電路400可開始自壓縮係數值402接收一新位元序列。The comparator and control circuit 420 receives the high coefficient value 416 and the low coefficient value 418 determined by the decompression function circuit 414, and determines whether the high coefficient value and the low coefficient value are the same. If the high coefficient value and the low coefficient value are the same, the received bit sequence is output as a decompressed output coefficient 404. Then, the decompression circuit 400 can start to receive a new bit sequence from the compression coefficient value 402.

另一方面，若高係數值416及係數值418不相同，則無法使用當前接收之位元序列判定一解壓縮輸出係數。解壓縮電路自壓縮係數值402接收一額外位元，且更新高位元序列408及低位元序列410。在一些實施例中，因為高位元序列408或低位元序列410之任一者在接收一額外位元時將保持相同，所以對於各隨後接收之位元，僅需重新計算一單一額外擴展位元序列(例如，若經接收位元係「1」則重新計算低位元序列410，或若經接收位元係「0」則重新計算高位元序列408)。類似地，解壓縮函數電路414僅需針對重新計算之擴展位元序列判定一係數值，而無需針對高擴展位元序列及低擴展位元序列重新計算高係數值及低係數值兩者。接著，藉由比較器420比較更新係數值以判定是否可輸出一解壓縮係數值或是否需要額外位元。On the other hand, if the high coefficient value 416 and the coefficient value 418 are not the same, the currently received bit sequence cannot be used to determine a decompressed output coefficient. The decompression circuit receives an extra bit from the compression coefficient value 402, and updates the high bit sequence 408 and the low bit sequence 410. In some embodiments, since either the upper bit sequence 408 or the lower bit sequence 410 will remain the same when receiving an extra bit, for each subsequent received bit, only a single extra extension bit needs to be recalculated Sequence (for example, if the received bit is "1", the lower bit sequence 410 is recalculated, or if the received bit is "0", the higher bit sequence 408 is recalculated). Similarly, the decompression function circuit 414 only needs to determine a coefficient value for the recalculated extension bit sequence, and does not need to recalculate both the high coefficient value and the low coefficient value for the high extension bit sequence and the low extension bit sequence. Then, the comparator 420 compares the updated coefficient values to determine whether a decompression coefficient value can be output or whether additional bits are needed.

表1繪示映射至解壓縮係數值之壓縮位元序列之一簡化實例。例如，解壓縮函數電路414可將一函數(如由經接收函數參數412定義)應用於一經接收位元序列(例如，0011…)，其中所得值落入一係數值(例如，-2)之區間內。因而，解壓縮函數電路414將回應於經接收位元序列「0011」而返回係數值「-2」。壓縮位元序列解壓縮係數值 0000… -3 0001… -3 0010… -2 0011… -2 0100… -2 0101… -2 0110… -1 0111 -1 1000 0 1001 1 1010 1 1011 2 1100 2 1101 2 1110 2 1111 3 表1Table 1 shows a simplified example of the compressed bit sequence mapped to the decompression coefficient value. For example, the decompression function circuit 414 may apply a function (as defined by the received function parameter 412) to a received bit sequence (e.g., 0011...), where the resulting value falls within a coefficient value (e.g., -2) Within the interval. Therefore, the decompression function circuit 414 will return the coefficient value "-2" in response to the received bit sequence "0011". Compressed bit sequence Decompression factor value 0000... -3 0001... -3 0010... -2 0011... -2 0100... -2 0101... -2 0110... -1 0111 -1 1000 0 1001 1 1010 1 1011 2 1100 2 1101 2 1110 2 1111 3 Table 1

作為一闡釋性實例，假設解壓縮電路接收位元序列「0100111000000110」。解壓縮電路400接收串流之第一位元(「0」)，因此序列擴展器電路406判定一高擴展位元序列「0111…」及一低擴展位元序列「0000…」。解壓縮函數電路414接收高擴展位元序列及低擴展位元序列，且判定分別對應於「-1」及「-3」之高係數值及低係數值。因為高係數值及低係數值不匹配，所以比較器及控制電路420無法判定待輸出之一單一輸出係數值。因而，解壓縮電路400自位元流接收一後續位元。As an illustrative example, suppose that the decompression circuit receives the bit sequence "0100111000000110". The decompression circuit 400 receives the first bit ("0") of the stream, so the sequence expander circuit 406 determines a high expansion bit sequence "0111..." and a low expansion bit sequence "0000...". The decompression function circuit 414 receives the high-expansion bit sequence and the low-expansion bit sequence, and determines the high coefficient value and the low coefficient value corresponding to "-1" and "-3", respectively. Because the high coefficient value and the low coefficient value do not match, the comparator and control circuit 420 cannot determine a single output coefficient value to be output. Therefore, the decompression circuit 400 receives a subsequent bit from the bit stream.

當接收位元流之下一位元時，解壓縮電路400處之當前位元序列係「01」。因為高擴展位元序列仍為「0111…」，所以序列擴展器電路406僅需針對當前位元序列(「0100…」)重新計算一低擴展位元序列。解壓縮函數電路414亦針對低擴展位元序列計算一更新低係數(「-2」)。因為高係數值及低係數值仍不匹配，所以解壓縮電路400自位元流接收另一位元而不輸出一解壓縮係數值。When the next bit of the bit stream is received, the current bit sequence at the decompression circuit 400 is "01". Because the high extension bit sequence is still "0111...", the sequence expander circuit 406 only needs to recalculate a low extension bit sequence for the current bit sequence ("0100..."). The decompression function circuit 414 also calculates an updated low coefficient ("-2") for the low extension bit sequence. Because the high coefficient value and the low coefficient value still do not match, the decompression circuit 400 receives another bit from the bit stream without outputting a decompression coefficient value.

在接收下一位元流位元之後，當前位元序列係「010」。序列擴展器電路406判定一更新高擴展位元序列「0101…」，解壓縮函數電路414將其判定為對應於一係數值「-2」。因為高係數值及低係數值兩者匹配，所以解壓縮電路400輸出「-2」作為一解壓縮係數值404。解壓縮電路可繼續接收壓縮位元序列「0100111000000110」之位元且輸出對應係數值(例如，針對位元序列「011」輸出「-1」，針對位元序列「1000」輸出「0」，針對位元序列「000」輸出「-3」且針對位元序列「110」輸出「2」)。After receiving the next bit stream bit, the current bit sequence is "010". The sequence expander circuit 406 determines an updated high-expansion bit sequence "0101...", and the decompression function circuit 414 determines that it corresponds to a coefficient value "-2". Because both the high coefficient value and the low coefficient value match, the decompression circuit 400 outputs “-2” as a decompression coefficient value 404. The decompression circuit can continue to receive the bits of the compressed bit sequence "0100111000000110" and output the corresponding coefficient value (for example, output "-1" for the bit sequence "011", and output "0" for the bit sequence "1000". The bit sequence "000" outputs "-3" and the bit sequence "110" outputs "2").

雖然上述實例主要論述使用算術寫碼及解壓縮函數來壓縮及解壓縮模型係數值，但應瞭解，在其他實施例中，可使用不同類型之寫碼。例如，在一些實施例中，可使用霍夫曼寫碼結合函數來壓縮及解壓縮模型係數值。Although the above examples mainly discuss the use of arithmetic coding and decompression functions to compress and decompress model coefficient values, it should be understood that in other embodiments, different types of coding may be used. For example, in some embodiments, the Huffman code combination function can be used to compress and decompress model coefficient values.

在一些實施例中，一模型之係數集可分為複數個子集，其中各子集之係數計數可符合一不同分佈。因而，可基於一不同函數壓縮及解壓縮各係數子集(例如，如圖2中繪示)。例如，可基於壓縮係數之位元通道及位置而將不同函數應用於儲存於TSP之記憶體中之壓縮係數值。交錯輸入 In some embodiments, the coefficient set of a model may be divided into a plurality of subsets, and the coefficient count of each subset may conform to a different distribution. Therefore, each coefficient subset can be compressed and decompressed based on a different function (for example, as shown in FIG. 2). For example, different functions can be applied to the compression factor value stored in the memory of the TSP based on the bit channel and position of the compression factor. Interleaved input

在一些實施例中，複數個解壓縮電路可用於並行解壓縮含有壓縮係數資料之一位元流。例如，在一第一時脈循環期間，各解壓縮電路可處理一不同壓縮係數之一第一位元。當一特定解壓縮電路完成解壓縮一特定係數時，其可移動至當前未處理之一後續壓縮係數。In some embodiments, a plurality of decompression circuits can be used to decompress a bit stream containing compression factor data in parallel. For example, during a first clock cycle, each decompression circuit can process a first bit of a different compression factor. When a specific decompression circuit finishes decompressing a specific coefficient, it can move to a subsequent compression coefficient that is currently unprocessed.

例如，壓縮係數資料之一位元流可包括對應於一第一係數之x個位元及對應於一第二係數之y個位元。在一第一時脈循環期間，一第一解壓縮電路可處理第一係數之第一位元，而一第二解壓縮電路可處理第二係數之第一位元。若x＜y，則在第x+1個時脈循環處，第一解壓縮電路已完成處理第一係數，且可開始處理一第三係數之一第一位元，而第二解壓縮電路可在第y+1個時脈循環處處理一第四係數之一第一位元。For example, a bit stream of compression coefficient data may include x bits corresponding to a first coefficient and y bits corresponding to a second coefficient. During a first clock cycle, a first decompression circuit can process the first bit of the first coefficient, and a second decompression circuit can process the first bit of the second coefficient. If x<y, at the x+1th clock cycle, the first decompression circuit has finished processing the first coefficient and can start processing the first bit of a third coefficient, and the second decompression circuit The first bit of a fourth coefficient can be processed at the y+1th clock cycle.

例如，圖5繪示根據一些實施例之用於並行解壓縮壓縮係數資料的複數個解壓縮電路之一圖。壓縮模型係數112可產生表示為「aabbbbcccdd…」之一位元流，其包括用於寫碼一第一係數「a」之2個位元、用於寫碼一第二係數「b」之4個位元、用於寫碼一第三係數「c」之3個位元及用於編碼一第四係數「d」之2個位元。一劃分器電路502在一第一解壓縮電路110A與一第二解壓縮電路110B之間劃分位元流。劃分器502判定位元流中開始編碼各係數之一位置，且在解壓縮電路110A與110B之間劃分位元流之位元，使得各解壓縮電路解壓縮一不同係數之位元。例如，在一第一時脈循環處，劃分器電路502經組態以將係數「a」之一第一位元傳輸至解壓縮電路110A且將係數「b」之一第一位元傳輸至解壓縮電路110B。解壓縮電路110A及110B之各者使用基於經儲存函數參數114之一函數來處理經接收位元。在一第三時脈循環期間，解壓縮電路110A已完成處理係數「a」之位元且接收下一未處理係數(例如，係數「c」)之一第一位元，而解壓縮電路110B接收且處理係數「b」之第三位元。For example, FIG. 5 shows a diagram of a plurality of decompression circuits for decompressing compression coefficient data in parallel according to some embodiments. The compression model coefficient 112 can generate a bit stream expressed as "aabbbbcccdd...", which includes 2 bits for writing a first coefficient "a" and 4 for writing a second coefficient "b" Bits, 3 bits for coding a third coefficient "c" and 2 bits for coding a fourth coefficient "d". A divider circuit 502 divides the bit stream between a first decompression circuit 110A and a second decompression circuit 110B. The divider 502 determines a position in the bit stream to start encoding each coefficient, and divides the bits of the bit stream between the decompression circuits 110A and 110B, so that each decompression circuit decompresses the bits of a different coefficient. For example, at a first clock cycle, the divider circuit 502 is configured to transmit the first bit of the coefficient "a" to the decompression circuit 110A and the first bit of the coefficient "b" to Decompression circuit 110B. Each of the decompression circuits 110A and 110B uses a function based on the stored function parameter 114 to process the received bits. During a third clock cycle, the decompression circuit 110A has finished processing the bits of the coefficient "a" and received the first bit of one of the next unprocessed coefficients (for example, the coefficient "c"), and the decompression circuit 110B Receive and process the third bit of the coefficient "b".

解壓縮電路110A及110B分別輸出解壓縮模型係數116A及116B，在一些實施例中，可使用交錯解壓縮係數116A及116B之一交錯器電路(未展示)來形成一解壓縮係數位元流。The decompression circuits 110A and 110B respectively output the decompression model coefficients 116A and 116B. In some embodiments, an interleaver circuit (not shown) of the interleaving decompression coefficients 116A and 116B may be used to form a decompression coefficient bitstream.

因為編譯器執行模型係數之初始壓縮且因而知道對應於各壓縮係數值之位元長度，所以編譯器可將指定由哪些解壓縮電路操作一位元流之哪些部分的指令儲存至記憶體，使得各解壓縮電路能夠在解壓縮一先前係數之後接收一後續壓縮係數之一第一位元。程序流程 Because the compiler performs the initial compression of the model coefficients and therefore knows the bit length corresponding to each compression coefficient value, the compiler can store instructions specifying which decompression circuits operate which parts of the bit stream into memory, so that Each decompression circuit can receive a first bit of a subsequent compression coefficient after decompressing a previous coefficient. Procedure flow chart

圖6係根據一些實施例之用於產生一壓縮模型係數集之一程序之一流程圖。使用一機器學習程序建構602及/或訓練一預測模型，此產生模型之一係數集。在一些實施例中，模型可為一神經網路模型。FIG. 6 is a flowchart of a procedure for generating a set of compression model coefficients according to some embodiments. A machine learning program is used to construct 602 and/or train a predictive model, which generates a set of coefficients of the model. In some embodiments, the model may be a neural network model.

一編譯器針對係數集之一或多個子集之各者基於子集內之係數值之分佈選擇604一函數。例如，編譯器產生子集之係數值之一累積計數分佈，且識別最佳擬合所產生分佈之一函數類型。函數類型可基於多項式函數、一高斯分佈函數、一帕松分佈函數及/或類似者。編譯器判定606選定函數類型之參數，以判定最佳擬合子集之係數值之分佈(例如，累積計數分佈)之一函數。編譯器基於經判定函數類型及函數參數壓縮608係數子集。A compiler selects 604 a function for each of one or more subsets of the coefficient set based on the distribution of coefficient values within the subset. For example, the compiler generates a cumulative count distribution of one of the coefficient values of the subset, and identifies a function type of the distribution generated by the best fit. The function type may be based on a polynomial function, a Gaussian distribution function, a Passon distribution function, and/or the like. The compiler determines 606 the parameters of the selected function type to determine a function of the distribution of coefficient values of the best-fit subset (for example, the cumulative count distribution). The compiler compresses a subset of 608 coefficients based on the determined function type and function parameters.

將壓縮係數子集及經判定函數參數儲存610於一記憶體中。可藉由一或多個算術單元使用壓縮係數(在解壓縮之後)來根據預測模型對輸入資料(例如，影像資料)執行操作。The compression coefficient subset and the determined function parameters are stored 610 in a memory. The compression factor (after decompression) can be used by one or more arithmetic units to perform operations on the input data (for example, image data) according to the prediction model.

圖7係根據一些實施例之用於解壓縮壓縮模型係數的一程序之一流程圖。解壓縮電路接收702對應於壓縮係數之資料。在一些實施例中，輸入資料被接收為一位元流，其中各壓縮係數由一可變長度位元序列表示。FIG. 7 is a flowchart of a procedure for decompressing compression model coefficients according to some embodiments. The decompression circuit receives 702 the data corresponding to the compression factor. In some embodiments, the input data is received as a bit stream, where each compression factor is represented by a variable length bit sequence.

解壓縮電路接收704對應於待用於解壓縮經接收壓縮係數資料之一函數的一或多個函數參數。函數參數可指示一函數類型以及函數之一或多個係數(例如，在函數類型係多項式之情況下，函數參數可指示多項式函數之係數)。解壓縮電路基於經接收函數參數組態706待由一解壓縮函數電路使用之函數。例如，在一些實施例中，解壓縮電路包括各自對應於一不同函數類型之複數個解壓縮函數電路。回應於接收函數參數，解壓縮電路選擇對應於由經接收參數指示之一函數類型之一特定解壓縮函數電路，且基於一或多個額外函數參數(例如，對應於函數係數值)組態選定解壓縮函數電路。The decompression circuit receives 704 one or more function parameters corresponding to a function to be used to decompress the received compression factor data. The function parameter may indicate a function type and one or more coefficients of the function (for example, when the function type is a polynomial, the function parameter may indicate the coefficient of the polynomial function). The decompression circuit is based on the received function parameter configuration 706 to be used by a decompression function circuit. For example, in some embodiments, the decompression circuit includes a plurality of decompression function circuits each corresponding to a different function type. In response to the received function parameter, the decompression circuit selection corresponds to a specific decompression function circuit of a function type indicated by the received parameter, and is configured based on one or more additional function parameters (for example, corresponding to function coefficient values) Decompress the function circuit.

解壓縮電路基於經組態函數使用解壓縮函數電路來解壓縮708對應於壓縮係數之輸入資料以輸出解壓縮係數。可將解壓縮係數提供至一TSP。The decompression circuit uses the decompression function circuit to decompress 708 the input data corresponding to the compression factor based on the configured function to output the decompression factor. The decompression factor can be provided to a TSP.

TSP藉由使用自解壓縮電路接收之解壓縮係數對經接收輸入資料執行算術運算而將模型應用710於該輸入資料。該等算術運算可包含矩陣乘法、點乘積運算、FFT及/或類似者。The TSP applies 710 the model to the input data by performing arithmetic operations on the received input data using the decompression coefficients received by the self-decompression circuit. The arithmetic operations may include matrix multiplication, dot product operations, FFT, and/or the like.

圖8係用於使用算術解碼對壓縮模型係數執行解壓縮的一程序之一流程圖。解壓縮電路可接收壓縮係數作為一位元流。因為各係數值可使用一可變長度位元序列表示，所以解壓縮電路可評估位元流之各位元且判定是否可自當前接收之位元獲得一解壓縮係數值。Fig. 8 is a flowchart of a procedure for decompressing the compression model coefficients using arithmetic decoding. The decompression circuit can receive the compression coefficient as a bit stream. Since each coefficient value can be represented by a variable-length bit sequence, the decompression circuit can evaluate each bit of the bit stream and determine whether a decompression coefficient value can be obtained from the currently received bit.

解壓縮電路接收802壓縮係數資料之一位元。解壓縮電路藉由將高位元或低位元之一序列添加至經接收序列而使用壓縮係數資料之當前接收位元序列產生804高擴展位元序列及低擴展位元序列。經接收位元序列可對應於由解壓縮電路接收之一位元序列，其不對應於已由解壓縮電路輸出之一解壓縮係數值。The decompression circuit receives 802 one bit of compression coefficient data. The decompression circuit generates 804 a high-expansion bit sequence and a low-expansion bit sequence using the current received bit sequence of the compression factor data by adding a sequence of high or low bits to the received sequence. The received bit sequence may correspond to a bit sequence received by the decompression circuit, which does not correspond to a decompression coefficient value that has been output by the decompression circuit.

解壓縮電路將一經判定函數應用806於高擴展位元序列及低擴展位元序列以判定解壓縮係數值。經判定函數可對應於與壓縮係數值對應之複數個經接收函數參數。在一些實施例中，將函數應用於高擴展位元序列或低擴展位元序列基於一區間方案產生在兩個不同可能係數值之間且與一特定係數值相關聯之一值。The decompression circuit applies a decision function 806 to the high extension bit sequence and the low extension bit sequence to determine the value of the decompression coefficient. The determined function may correspond to a plurality of received function parameters corresponding to the compression coefficient value. In some embodiments, applying the function to the high-expansion bit sequence or the low-expansion bit sequence generates a value between two different possible coefficient values and associated with a specific coefficient value based on an interval scheme.

解壓縮電路判定808高位元序列及低位元序列之解壓縮係數值是否相同。若是，則當前位元序列足以判定一解壓縮係數值，且解壓縮電路輸出810對應於當前接收之位元序列之解壓縮係數值。接著，解壓縮電路可接收壓縮係數資料之額外位元作為一新位元序列之部分以判定後續解壓縮係數值。The decompression circuit determines 808 whether the decompression coefficient values of the high-bit sequence and the low-bit sequence are the same. If so, the current bit sequence is sufficient to determine a decompression coefficient value, and the decompression circuit output 810 corresponds to the decompression coefficient value of the currently received bit sequence. Then, the decompression circuit can receive the extra bits of the compression coefficient data as part of a new bit sequence to determine the subsequent decompression coefficient value.

另一方面，若高位元序列及低位元序列之解壓縮係數值不同，則當前位元序列不足以產生一解壓縮係數值，且解壓縮電路接收812壓縮係數資料之額外位元，直至對應於高擴展位元序列及低擴展位元序列之解壓縮係數值匹配。額外組態資訊 On the other hand, if the decompression coefficient values of the high-bit sequence and the low-bit sequence are different, the current bit sequence is not enough to generate a decompression coefficient value, and the decompression circuit receives 812 extra bits of the compression coefficient data until it corresponds to The decompression coefficient values of the high-expansion bit sequence and the low-expansion bit sequence match. Additional configuration information

本發明之實施例之前述描述已為繪示之目的而呈現；其並不意欲為詳盡性的或將本發明限制於所揭示之精確形式。熟習相關技術者可瞭解，鑒於上文揭示內容，許多修改及變動係可能的。The foregoing description of the embodiments of the present invention has been presented for illustrative purposes; it is not intended to be exhaustive or to limit the present invention to the precise form disclosed. Those who are familiar with the relevant technology can understand that in view of the content disclosed above, many modifications and changes are possible.

此描述之一些部分在對資訊操作之演算法及符號表示方面描述本發明之實施例。通常藉由熟習資料處理技術者使用此等演算法描述及表示來將其等工作之實質有效地傳達給其他熟習此項技術者。雖然在功能上、計算上或邏輯上描述此等操作，但其等應被理解為由電腦程式或等效電路、微程式碼或類似者實施。此外，亦已證明，不失一般性地，將此等操作配置稱為模組有時是方便的。所描述操作及其等相關聯模組可體現在軟體、韌體、硬體或其等之任何組合中。Some parts of this description describe embodiments of the invention in terms of algorithms and symbolic representations of information operations. Usually, those who are familiar with data processing technology use these algorithm descriptions and representations to effectively convey the essence of their work to other people who are familiar with the technology. Although these operations are described functionally, computationally, or logically, they should be understood as being implemented by computer programs or equivalent circuits, microprogram codes, or the like. In addition, it has also proven that, without loss of generality, it is sometimes convenient to refer to such operational configurations as modules. The described operations and related modules can be embodied in software, firmware, hardware, or any combination thereof.

可單獨或結合其他裝置一起運用一或多個硬體或軟體模組執行或實施本文中描述之步驟、操作或程序之任一者。在一項實施例中，運用包括含有電腦程式碼之一電腦可讀媒體之一電腦程式產品來實施一軟體模組，該電腦程式碼可由一電腦處理器執行以執行所描述之任何或全部步驟、操作或程序。One or more hardware or software modules can be used alone or in combination with other devices to execute or implement any of the steps, operations, or procedures described herein. In one embodiment, a computer program product including a computer readable medium containing computer program code is used to implement a software module, the computer program code can be executed by a computer processor to perform any or all of the steps described , Operation or procedure.

本發明之實施例亦可關於一種用於執行本文中之操作之設備。此設備可專門構造用於所需目的，及/或其可包括藉由儲存於電腦中之一電腦程式選擇性地啟動或重新組態之一通用計算裝置。此一電腦程式可儲存於耦合至一電腦系統匯流排之一非暫時性有形電腦可讀儲存媒體或適於儲存電子指令之任何類型之媒體中。此外，本說明書中指涉之任何計算系統可包含一單一處理器或可為採用多個處理器設計以增加計算能力之架構。The embodiments of the present invention may also relate to a device for performing the operations herein. This device may be specially constructed for the required purpose, and/or it may include a general-purpose computing device that is selectively activated or reconfigured by a computer program stored in the computer. This computer program can be stored in a non-transitory tangible computer-readable storage medium coupled to a computer system bus or any type of medium suitable for storing electronic instructions. In addition, any computing system referred to in this specification may include a single processor or may be an architecture designed with multiple processors to increase computing power.

本發明之實施例亦可關於一種藉由本文中描述之一計算程序產生之產品。此一產品可包括來源於一計算程序之資訊，其中資訊儲存於一非暫時性有形電腦可讀儲存媒體中且可包含一電腦程式產品或本文中描述之其他資料組合之任何實施例。The embodiments of the present invention may also relate to a product produced by one of the calculation procedures described herein. Such a product may include information derived from a computing process, where the information is stored in a non-transitory tangible computer-readable storage medium and may include any embodiment of a computer program product or other data combination described herein.

最後，已主要出於可讀性及指導目的選擇本說明書中使用之語言，且其並非經選擇以描繪或限制發明標的物。因此，本發明之範疇意欲不受此詳細描述限制，而是受限於探討基於此之一應用之任何申請專利範圍。因此，實施例之揭示內容意欲繪示而非限制在以下發明申請專利範圍中闡述之本發明之範疇。Finally, the language used in this specification has been chosen mainly for readability and guidance purposes, and it has not been chosen to describe or limit the subject matter of the invention. Therefore, the scope of the present invention is not intended to be limited by this detailed description, but is limited to the discussion of the scope of any patent application based on this application. Therefore, the disclosure of the embodiments is intended to illustrate rather than limit the scope of the present invention described in the scope of the following invention applications.

100:張量流處理器(TSP) 102:輸入資料值 104:輸出資料值/輸出值 106:算術電路單元 106A:算術單元 106B:算術單元 108:記憶體 110:解壓縮電路 110A:解壓縮電路/第一解壓縮電路 110B:解壓縮電路/第二解壓縮電路 112:壓縮模型係數 112A:壓縮模型係數 112B:壓縮模型係數 114:函數參數 114A:第一函數參數 114B:第二函數參數 116:解壓縮模型係數 116A:解壓縮模型係數/解壓縮模型參數 116B:解壓縮模型係數/解壓縮模型參數 118:預測模型 120:編譯器 300:圖表 302:第一曲線 304:第二曲線/累積分佈曲線/累積計數分佈 306:第三曲線/函數 308:範圍 310:範圍 400:解壓縮電路 402:壓縮係數值 404:解壓縮係數值/解壓縮輸出係數 406:序列擴展器電路 408:高位元序列 410:低位元序列 412:函數參數 414:解壓縮函數電路 416:高係數值 418:低係數值 420:比較器及控制電路 450:函數計算電路 450a:第一函數計算電路 450b:第二函數計算電路 450c:函數計算電路 502:劃分器電路/劃分器 602:程序 604:程序 606:程序 608:程序 610:程序 702:程序 704:程序 706:程序 708:程序 710:程序 802:程序 804:程序 806:程序 808:程序 810:程序 812:程序100: Tensor Stream Processor (TSP) 102: Enter data value 104: Output data value/output value 106: Arithmetic circuit unit 106A: Arithmetic unit 106B: Arithmetic unit 108: Memory 110: Decompression circuit 110A: Decompression circuit/first decompression circuit 110B: Decompression circuit/second decompression circuit 112: Compression model coefficient 112A: Compression model coefficient 112B: Compression model coefficient 114: Function parameters 114A: The first function parameter 114B: Second function parameter 116: Decompress model coefficients 116A: Decompression model coefficients/decompression model parameters 116B: Decompression model coefficients/decompression model parameters 118: Predictive Model 120: Compiler 300: chart 302: The first curve 304: second curve/cumulative distribution curve/cumulative count distribution 306: Third curve/function 308: range 310: Scope 400: Decompression circuit 402: Compression factor value 404: Decompression coefficient value/decompression output coefficient 406: Sequence Expander Circuit 408: High bit sequence 410: low bit sequence 412: function parameters 414: Decompression function circuit 416: High coefficient value 418: Low coefficient value 420: Comparator and control circuit 450: Function calculation circuit 450a: The first function calculation circuit 450b: Second function calculation circuit 450c: Function calculation circuit 502: divider circuit / divider 602: program 604: program 606: program 608: program 610: program 702: program 704: program 706: program 708: program 710: program 802: program 804: program 806: program 808: program 810: program 812: program

圖1繪示根據一些實施例之用於儲存及解壓縮用於一模型中之模型係數的一系統之一示意圖。FIG. 1 shows a schematic diagram of a system for storing and decompressing model coefficients used in a model according to some embodiments.

圖2繪示根據一些實施例之可使用不同函數解壓縮的一壓縮模型係數集之一方塊圖。FIG. 2 shows a block diagram of a set of compression model coefficients that can be decompressed using different functions according to some embodiments.

圖3A及圖3B繪示展示根據一些實施例之一模型係數分佈之例示性圖表。3A and 3B show exemplary graphs showing the distribution of model coefficients according to some embodiments.

圖4A繪示根據一些實施例之一解壓縮電路之一方塊圖。FIG. 4A shows a block diagram of a decompression circuit according to some embodiments.

圖4B繪示根據一些實施例之含有對應於不同函數類型之函數計算電路的一例示性解壓縮函數電路。4B shows an exemplary decompression function circuit including function calculation circuits corresponding to different function types according to some embodiments.

圖5繪示根據一些實施例之用於並行解壓縮壓縮係數資料的複數個解壓縮電路之一圖。FIG. 5 shows a diagram of a plurality of decompression circuits for decompressing compression coefficient data in parallel according to some embodiments.

圖6係根據一些實施例之用於產生一壓縮模型係數集的一程序之一流程圖。FIG. 6 is a flowchart of a procedure for generating a compression model coefficient set according to some embodiments.

圖7係根據一些實施例之解壓縮壓縮模型係數的一程序之一流程圖。FIG. 7 is a flowchart of a procedure for decompressing the compression model coefficients according to some embodiments.

圖8係用於使用算術解碼對壓縮模型係數執行解壓縮之一程序之一流程圖。Fig. 8 is a flowchart of a procedure for decompressing the compression model coefficients using arithmetic decoding.

圖僅為繪示之目的描繪本發明之實施例。熟習此項技術者自以下描述將容易認知，可在不脫離本文中描述之本發明之原理或所得優點之情況下採用本文中繪示之結構及方法之替代實施例。The figures depict embodiments of the invention for illustrative purposes only. Those skilled in the art will readily recognize from the following description that alternative embodiments of the structure and method illustrated in this article can be used without departing from the principles or advantages of the present invention described in this article.

702:接收 702: receive

704:接收 704: receive

706:組態 706: configuration

708:解壓縮 708: Unzip

710:應用 710: Application

Claims

A processor includes: a memory, which stores: compression coefficient data corresponding to a coefficient set associated with a prediction model; a function parameter set associated with the compression coefficient data; a decompression circuit, which Comprising: a first function calculation circuit associated with a first function type; and a second function calculation circuit associated with a second function type; wherein the decompression circuit is configured to: receive from the memory The function parameter set; choose between the first function calculation circuit and the second function calculation circuit based on the received function parameter set; and use the selected function calculation circuit to decompress the compression coefficient data to generate a Decompress the set of coefficients.

For example, the processor of claim 1, further comprising an arithmetic circuit unit to receive an input data set and the decompression coefficient set, and perform one or more arithmetic operations based on the input data set and the decompression coefficient set to generate an output Value set.

For example, the processor of claim 1, wherein the function parameter set includes a first parameter indicating a function type and at least one additional parameter corresponding to a function coefficient, and wherein the decompression circuit is based on the function type indicating the first parameter The first parameter is in the first function calculation circuit and the Make a choice between the second function calculation circuit.

Such as the processor of claim 1, wherein a compression function is used to compress the compression coefficient data from the coefficient set, the compression function is selected based on a cumulative distribution of a plurality of values of the coefficient set, and the function parameter set corresponds to all Select the compression function.

Such as the processor of claim 1, wherein a model training procedure is used to determine the plural values of the coefficient set.

For example, the processor of claim 1, wherein the function parameter set corresponds to a function type, and the function type is selected from a polynomial function, a bimodal distribution function, a Gaussian distribution function, or a Passon distribution function ( At least one of Poisson distribution function).

Such as the processor of claim 1, wherein arithmetic coding or Huffman coding is used to compress the coefficient set to generate the compression coefficient data.

Such as the processor of claim 1, wherein the decompression circuit is configured to apply the function parameter set to at least a part of the compression factor data corresponding to a first compression factor to determine a first decompression of the decompression factor set The coefficient value.

Such as the processor of claim 8, wherein the decompression circuit is further configured to: receive one or more bits of the first compression factor from the compression factor data Row; generate a first extended bit sequence and a second extended bit sequence based on the sequence of received bits; apply the function parameter set to the first extended bit sequence and the second extended bit sequence to determine A first respective coefficient value and a second respective coefficient value; and in response to determining that the first coefficient value and the second coefficient value are the same, output the first coefficient value as the first decompression coefficient value.

Such as the processor of claim 9, wherein the decompression circuit is further configured to: in response to determining that the first coefficient value and the second coefficient value are different, receive the compression appended to the sequence of one or more bits At least one extra bit of the coefficient data is used to generate an updated bit sequence; and based on the updated bit sequence, an updated first extended bit sequence and a second extended bit sequence are generated.

A method for decompressing model coefficients, comprising: receiving compression coefficient data in a decompression circuit, wherein the compression coefficient data corresponds to a coefficient set associated with a prediction model, and the decompression circuit includes a first A first function calculation circuit associated with a function type and a second function calculation circuit associated with a second function type; a function parameter set associated with the compression factor data is received in the decompression circuit; based on the received The function parameter set is selected between the first function calculation circuit and the second function calculation circuit to decompress the compression factor data; and Use the selected function calculation circuit to decompress the compression coefficient data to generate a decompression coefficient set.

For example, the method of claim 11 further includes: receiving an input data set and the decompression coefficient set in an arithmetic circuit unit; and executing one or more arithmetic in the arithmetic circuit unit based on the input data set and the decompression coefficient set Operate to generate a set of output values.

Such as the method of claim 11, wherein the function parameter set includes a first parameter indicating a function type and at least one additional parameter corresponding to a function coefficient, and wherein the decompression circuit is based on the function type indicating the function type. The first parameter selects between the first function calculation circuit and the second function calculation circuit.

Such as the method of claim 11, wherein a compression function is used to compress the compression coefficient data from the coefficient set, the compression function is selected based on a cumulative distribution of a plurality of values of the coefficient set, and the function parameter set corresponds to the selected The compression function.

Such as the method of claim 11, wherein a plurality of values of the coefficient set are determined through a model training procedure.

For example, the method of claim 11, wherein the function parameter set corresponds to a function type, and the function type is selected from a polynomial function, a bimodal distribution function, a Gaussian distribution function or a Poisson distribution function. distribution function) of At least one.

Such as the method of claim 11, wherein arithmetic coding or Huffman coding is used to compress the coefficient set to generate the compression coefficient data.

The method of claim 11, wherein using the selected function calculation circuit to decompress the compression factor data includes: applying the function parameter set to at least a part of the compression factor data corresponding to a first compression factor to determine the decompression factor Set one of the first decompression coefficient values.

The method of claim 18, wherein using the selected function calculation circuit to decompress the compression factor data further comprises: receiving one or a sequence of bits of the first compression factor from the compression factor data; based on the received The sequence of bits generates a first extended bit sequence and a second extended bit sequence; the function parameter set is applied to the first extended bit sequence and the second extended bit sequence to determine the first respective coefficient value And a second respective coefficient value; and in response to determining that the first coefficient value and the second coefficient value are the same, output the first coefficient value as the first decompression coefficient value.

A system for storing and decompressing model coefficients, which includes: a decompression circuit, which includes: a first function calculation circuit associated with a first function type; and A second function calculation circuit associated with a second function type; wherein the decompression circuit is configured to: a first function calculation circuit based on the received function parameter set associated with a compression factor data set And a second function calculation circuit; and use the selected function calculation circuit to decompress the compression coefficient data set to generate a decompression coefficient set; and an arithmetic circuit unit for receiving an input data set and The decompression coefficient set, and one or more arithmetic operations are performed based on the input data set and the decompression coefficient set to generate an output value set.