KR101236673B1

KR101236673B1 - Memory compression

Info

Publication number: KR101236673B1
Application number: KR1020067026117A
Authority: KR
Inventors: 폴 윌킨슨 덴트
Original assignee: 텔레호낙티에볼라게트 엘엠 에릭슨(피유비엘)
Priority date: 2004-06-04
Filing date: 2005-06-03
Publication date: 2013-02-22
Also published as: KR20070021254A

Abstract

본 발명의 예시적인 실시예는 메모리에 저장하기 위해 데이터를 압축하는 방법을 포함한다. 일 실시예에 따르면, 이 방법은 룩업 테이블 값들의 단조 정렬된 수열에 기초하여 값들의 집합을 형성한다. 이 집합에서의 하나 이상의 쌍을 이룬 값에 대해, 이 예시적인 방법은 그 쌍 내의 값들의 차분을 발생한다. 그 쌍 내의 값들 중 하나를 차분에 기초한 값으로 대체시켜 집합을 수정한 후에, 이 예시적인 방법은 수정된 집합에 기초한 최종 값들을 메모리에 저장한다. 본 발명은 또한 룩업 테이블 값들을 저장하는 메모리를 포함한다. 한 예시적인 메모리는 디코더, 인코더, 및 인코더를 디코더와 상호 연결시키는 하나 이상의 십자로 교차된 상호 연결선 패턴들(pattern of crisscrossed interconnect line)을 포함한다. 이 십자로 교차된 상호 연결선 패턴들은 절연 물질과 수직으로 교대 배치(interleave)되어 있는 하나 이상의 도체 배선 평면층(planar layer of conductor track) 상에 구현될 수 있다.Exemplary embodiments of the invention include a method of compressing data for storage in a memory. According to one embodiment, the method forms a set of values based on the monotonically ordered sequence of lookup table values. For one or more paired values in this set, this exemplary method generates a difference in the values in the pair. After modifying the set by replacing one of the values in the pair with a difference based value, this exemplary method stores the final values based on the modified set in memory. The invention also includes a memory for storing lookup table values. One example memory includes a decoder, an encoder, and one or more cross of interconnected interconnect lines that interconnect the encoder with the decoder. These cross-crossed interconnection patterns may be implemented on one or more planar layer of conductor tracks that are alternately perpendicular to the insulating material.

메모리 압축, 룩업 테이블, 왈시 변환, 버터플라이 회로 Memory Compression, Lookup Tables, Walsh Transforms, Butterfly Circuits

Description

Memory Compression {MEMORY COMPRESSION}

본 발명은 룩업 테이블 또는 프로그램 등의 고정된 데이터에 대한 메모리 요건을 갖는 내장된 마이크로프로세서에 기초하여 저가 전자 장치의 크기 및 단가 절감을 가능하게 해준다.The present invention enables the size and cost reduction of low cost electronic devices based on embedded microprocessors with memory requirements for fixed data, such as lookup tables or programs.

전자 장치는 종종 아주 간단한 것부터 아주 복잡한 것에 이르는 다양한 작업을 수행하기 위해 마이크로프로세서를 사용한다. 장치(전자 통신 장치, 원격 제어 장치, 다른 모바일 장치, 기타 등등)가 특정의 기능 또는 기능들을 위해 설계되어 있을 때, 이들 기능들에 대응하는 프로그램은 통상, 개발이 완료되면, 판독 전용 메모리(ROM)에 저장되어 있다. 지수 또는 로그 함수 또는 삼각 함수 등의 계산을 위해 필요한 어떤 함수의 값들에 대한 룩업 테이블 등의 다른 고정된 데이터도 역시 ROM에 저장될 수 있다.Electronic devices often use microprocessors to perform a variety of tasks, from very simple to very complex. When a device (electronic communication device, remote control device, other mobile device, etc.) is designed for a specific function or functions, a program corresponding to these functions is usually read-only memory (ROM) when development is completed. ) Other fixed data, such as lookup tables for values of any function needed for calculations such as exponential or logarithmic functions or trigonometric functions, may also be stored in the ROM.

저가이고 컴팩트한 형태의 ROM은 마스크 프로그램된 ROM(Mask Programmed ROM)이다. 마스크 프로그램된 ROM에서, 주소에 대응하는 다수의 신호선 중 하나 상에 논리 레벨을 제공하기 위해 주소가 디코딩된다. 주소선은 원하는 출력 워드에 대응하는 비트선과 교차하며, 모든 주소선과 비트선의 교차점에 트랜지스터가 배치되어 있으며, 이 경우 이진 "1"이 그 주소에 대한 원하는 비트 출력이다. 이러한 ROM은 실제로 각각의 출력 비트에 대한 거대한 OR-게이트와 동등하며, 주소 A1.OR.A2.OR.A3....이 활성인 경우 출력 비트를 "1"로 판정하고 그렇지 않으면 "0"으로 판정한다.The low cost and compact form of ROM is Mask Programmed ROM. In a mask programmed ROM, the address is decoded to provide a logic level on one of the plurality of signal lines corresponding to the address. The address line intersects the bit line corresponding to the desired output word, and a transistor is disposed at the intersection of all the address lines and the bit lines, in which case binary "1" is the desired bit output for that address. This ROM is actually equivalent to a huge OR-gate for each output bit, which determines the output bit as "1" if address A1.OR.A2.OR.A3 .... is active, otherwise "0". Determined by

저장된 '1'을 나타내는 데 사용되는 트랜지스터는 각각의 다중-입력 OR 게이트에 입력을 제공한다. 도 1은 이러한 ROM을 나타낸 것이며, 여기서 32 비트 길이의 워드가 1024개 주소 각각에 대해 저장되어 있다. 더 큰 메모리가 요망되는 경우, 1024 워드의 연속적인 행들을 형성하기 위해 도 1의 패턴이 반복될 수 있으며, 선택된 행으로부터의 출력을 인에이블시키기 위해, 특정의 열 주소선의 활성화와 함께 특정의 워드를 선택하는 행-선택선이 제공된다.The transistor used to represent the stored '1' provides an input to each multi-input OR gate. Figure 1 shows such a ROM, where a 32 bit long word is stored for each of 1024 addresses. If larger memory is desired, the pattern of FIG. 1 may be repeated to form consecutive rows of 1024 words, and a particular word with activation of a specific column address line to enable output from the selected row. A row-selection line is provided to select.

트랜지스터는 원하는 비트 패턴에 적합하게 되어 있는, 제조 공정 동안에 사용되는 커스텀-설계된 확산(diffusion) 및/또는 접점(contact) 및/또는 배선(metallization) 마스크에 의해 배치된다. 마스크-제조가 비용이 많이 들기 때문에, 이것은 ROM 내용이 대량의 생산 단위에 대해 고정되어 있을 것으로 예상되는 경우에만 사용된다. 따라서, 전자적 또는 UV-소거가능 및 재프로그램가능 판독 전용 메모리(EAROM, EEROM, EPROM, "플래쉬" 메모리, 기타) 등등의 다른 형태의 메모리, 및 강유전체 메모리 등의 다른 보다 최근의 비휘발성 메모리 개발이, 원하는 내용을 알기 이전에 메모리가 제조되고 나중에 채워질 수 있는 것은 물론 사용 중에 수정될 수 있다는 편의성으로 인해 마스크 ROM(masked ROM)보다 우선하여 선택되는 것이 보다 통상적이다. 마스크 프로그램된 ROM의 실리콘 면적 및 그 결과의 비용 이점이 현재까지는 현장 프로그래밍(in-situ programming)의 편의성을 앞지를 정도로 충분하지 않다고 말할 수 있다.The transistors are disposed by custom-designed diffusion and / or contact and / or metallization masks used during the fabrication process, which are adapted to the desired bit pattern. Since mask-making is expensive, it is only used if the ROM contents are expected to be fixed for a large number of production units. Thus, other forms of memory such as electronic or UV-erasable and reprogrammable read only memory (EAROM, EEROM, EPROM, "flash" memory, etc.), and other more recent nonvolatile memory developments, such as ferroelectric memory, It is more common to select a masked ROM over the convenience of the convenience that the memory can be manufactured and later filled as well as modified during use, before knowing the desired content. It can be said that the silicon area of the mask programmed ROM and the resulting cost benefits to date are not sufficient to outweigh the ease of in-situ programming.

어떤 유연성을 보존하면서 마스크 ROM의 이점을 제공할 수 있는 어떤 해결책은 프로그램의 성숙된 부분, 즉 서브루틴 또는 수학적 함수에 대한 고정된 테이블 또는 문자 세트 디스플레이(character set display)를 마스크 ROM에 저장하지만 재프로그램가능 메모리 내의 보다 작은 프로그램에 의해 이들을 링크 및 호출하는 것을 포함한다. 그와 같이, 마스크-ROM 루틴이 교체될 필요가 있는 경우, 교체 루틴이 재프로그램가능 메모리에 배치되어 있기만 하면 되고 마스크-ROM 버전은 바이패스되면 된다("패칭(patching)"이라는 프로세스). 그렇지만, 마스크 ROM의 면적 이점은 여전히 이 기술의 광범위한 사용을 촉진시키기에 충분하지 않다. 따라서, 지금까지의 마스크 ROM보다 상당히 더 콤팩트하고 경제적인 고정-내용 메모리(fixed-content memory)가 필요하다.One solution that can provide the benefits of mask ROM while preserving some flexibility is to store a fixed table or character set display for a mature part of the program, i.e., a subroutine or a mathematical function, but not rewrite it. Linking and calling them by smaller programs in programmable memory. As such, when the mask-ROM routine needs to be replaced, the replacement routine only needs to be placed in reprogrammable memory and the mask-ROM version needs to be bypassed (a process called "patching"). However, the area benefits of mask ROM are still not sufficient to facilitate widespread use of this technology. Thus, there is a need for fixed-content memory, which is considerably more compact and economical than the mask ROM so far.

본 발명의 예시적인 실시예들은 메모리에 저장하기 위해 데이터를 압축하는 방법을 포함한다. 일 실시예에 따르면, 이 방법은 룩업 테이블 값들의 단조 정렬된 수열(monotonically ordered series)에 기초하여 값들의 집합을 형성하는 단계를 포함한다. 상기 집합에서의 하나 이상의 쌍을 이룬 값들에 대해, 이 예시적인 방법은 상기 쌍 내의 상기 값들의 차분을 발생한다. 상기 집합을 수정하기 위해 상기 쌍 내의 상기 값들 중 하나를 상기 차분에 기초한 값으로 대체시킨 후에, 이 예시적인 방법은 상기 수정된 집합에 기초한 최종값들을 메모리에 저장한다.Exemplary embodiments of the invention include a method of compressing data for storage in a memory. According to one embodiment, the method includes forming a set of values based on a monotonically ordered series of lookup table values. For one or more paired values in the set, this exemplary method generates a difference of the values in the pair. After replacing one of the values in the pair with a value based on the difference to modify the set, this exemplary method stores the final values based on the modified set in memory.

일 실시예에 따르면, 수정된 쌍들 내의 나머지 값들은 변경되지 않은 채로 있다. 이 첫번째 반복 이후에, 수정된 집합에 기초한 최종값들이 메모리에 저장된다. 다른 대안으로서, 데이터 저장을 더욱 압축하기 위해 부가적인 반복들이 수행될 수 있다. 예를 들어, 두번째 반복에서, 이 예시적인 방법은 비수정된 값들 간의 쌍을 형성하고 수정된 값들 간의 쌍을 형성하며, 이 새로운 쌍들 내의 값들의 새로운 차분을 발생한다. 그 이후에, 이 예시적인 방법은 상기 쌍들 내의 값들 중 하나를 상기 차분에 기초한 값으로 대체시킴으로써 현재의 집합을 수정한다. 이 프로세스는 미리 정해진 수의 반복이 완료될 때까지 반복된다. 최종 반복을 완료한 후에, 최종 반복에 의해 생성된 값들의 수정된 집합에 기초한 최종값들이 메모리에 저장된다.According to one embodiment, the remaining values in the modified pairs remain unchanged. After this first iteration, the final values based on the modified set are stored in memory. As another alternative, additional iterations may be performed to further compress the data store. For example, in a second iteration, this exemplary method forms a pair between unmodified values and a pair between modified values, resulting in a new difference of values within these new pairs. Thereafter, this exemplary method modifies the current set by replacing one of the values in the pairs with a value based on the difference. This process is repeated until a predetermined number of iterations are completed. After completing the last iteration, the final values based on the modified set of values generated by the last iteration are stored in memory.

다른 실시예에 따르면, 상기 수정된 쌍들 내의 나머지 값들은 상기 쌍 내의 값들의 합산에 기초하여 발생된 값들로 대체된다. 이 첫번째 반복 이후에, 각각의 쌍의 합산 및 차분을 결합하여 최종값들을 발생한다. 다른 대안으로서, 데이터 저장을 더욱 압축시키기 위해 부가적인 반복이 수행될 수 있다. 예를 들어, 두번째 반복에서, 이 예시적인 방법은 합산값들 간의 쌍을 형성하고 차분값들 간의 쌍을 형성하며, 새로운 쌍들 각각 내의 값들의 새로운 합산 및 차분을 발생한다. 이 프로세스는 미리 정해진 수의 반복이 완료될 때까지 반복된다. 최종 반복을 완료한 후에, 최종 반복의 쌍들에 대응하는 합산 및 차분을 결합하여 최종값들을 발생한다.According to another embodiment, the remaining values in the modified pairs are replaced with values generated based on the sum of the values in the pair. After this first iteration, the sum and difference of each pair are combined to generate the final values. As another alternative, additional iterations may be performed to further compress the data store. For example, in a second iteration, this exemplary method forms a pair between the summation values and forms a pair between the difference values and generates a new summation and difference of the values within each of the new pairs. This process is repeated until a predetermined number of iterations are completed. After completing the last iteration, the sum and difference corresponding to the pairs of the last iteration are combined to generate the final values.

일 실시예에서, 본 발명의 알고리즘의 사용에 의해, 워드 길이 L1의 양 2^N1개를 저장하기 위한 ROM은 L1 x 2^N1 비트의 총 크기로부터 L2 비트의 워드 2^N2개, 총 L2 x 2^N2 비트를 저장하는 ROM으로 압축되며, 여기서 L2 < L1 및 N2 < N1이다.In one embodiment, by using the algorithm of the present invention, the ROM for storing the quantity 2 ^N1 of word length L1 is from the total size of L1 x 2 ^N1 bits to 2 ^N2 words of L2 bits, total L2 x 2 ^N2 Compressed into a ROM that stores bits, where L2 < L1 and N2 < N1.

한 응용은 정칙 함수(regular function) F(x)에 대한 룩업 테이블(이 테이블은 x에 대응하는 이진 비트 패턴에 의해 어드레싱되어 그 위치에 사전 저장된 값 F를 출력함)을 제공하는 것이다. 이 경우에, 1차 압축은 함수의 하나 걸러 있는 값들(alternate values)을 이전의 주소로부터의 그의 델타 값으로 대체시킨다. 다른 대안으로서, 인접한 값들의 각각의 쌍이 그의 합산(최하위 비트(LSB)는 버림) 및 그의 차분으로 대체될 수 있다. 함수가 정칙인 경우, 연속한 값들 간의 차분은 함수값 자체보다 훨씬 더 작으며, 따라서 더 적은 비트로 저장될 수 있다.One application is to provide a lookup table for the regular function F (x), which is addressed by the binary bit pattern corresponding to x and outputs the value F pre-stored at that location. In this case, primary compression replaces the alternate values of the function with its delta values from the previous address. As another alternative, each pair of adjacent values may be replaced by its summation (least significant bit (LSB) is discarded) and its difference. If the function is regular, the difference between successive values is much smaller than the function value itself, and thus can be stored with fewer bits.

2차 알고리즘은 압축 절차를 반복하여, 입력으로서 제1 단계로부터의 짝수-주소 값의 어레이를 받고 또 별도로 홀수-주소 델타값 어레이를 받아서, 4개의 함수값 대신에 하나의 함수값, 2개의 1차 차분 및 하나의 2차 차분을 획득하는 것을 포함할 수 있다. 2차 차분은 보통 1차 차분보다 훨씬 더 작으며, 따라서 1차 차분보다 훨씬 더 적은 비트를 사용하여 저장될 수 있어 추가의 압축을 달성할 수 있다. 합산 항의 LSB가 버려지는 수정된 버터플라이(Butterfly) 연산에 기초한 고속 왈시 변환(Fast Walsh Transform)을 사용하여 고차의 압축 알고리즘을 작성하는 체계적 절차가 개시되어 있다. ROM을 판독할 시에 역변환을 사용하여 값이 재구성된다.The quadratic algorithm repeats the compression procedure, taking as input an array of even-address values from the first stage and separately receiving an odd-address delta value array, one function value, two 1s instead of four function values. Obtaining a differential difference and one secondary difference. The second difference is usually much smaller than the first difference, and thus can be stored using much fewer bits than the first difference to achieve further compression. A systematic procedure is disclosed for creating a higher order compression algorithm using a fast Walsh transform based on a modified Butterfly operation in which the LSB of the sum term is discarded. The value is reconstructed using an inverse transform when reading the ROM.

제2 구현은 공통의 또는 거의 공통의 최상위 비트(MSB) 패턴 및 서로 다른 LSB 패턴을 저장함으로써 숫자적으로 인접한 값들의 블록을 저장하며, 그에 의해 저장된 값에 대한 평균 비트수를 감소시킨다. 이는, 블록 내의 공통의 최상위 비트 패턴이 최대 + 또는 -1만큼 다를 수 있는 경우, 보다 편리한 블록 크기, 즉 정규 블록 크기(regular block size)를 가능하게 해준다. 최하위 부분과 연관된 여분의 비트는 공통의 최상위 부분이 특정의 검색된 값에 대해 증가 또는 감소되어야만 하는지 여부를 판정한다. 제2 구현은 제1 구현보다 더 간단한 재구성을 가능하게 해준다.The second implementation stores a block of numerically contiguous values by storing a common or nearly common most significant bit (MSB) pattern and a different LSB pattern, thereby reducing the average number of bits for the stored value. This allows for a more convenient block size, i.e. regular block size, if the common most significant bit pattern in the block can differ by at most + or -1. The extra bits associated with the least significant part determine whether the common most significant part should be increased or decreased for a particular retrieved value. The second implementation allows for simpler reconfiguration than the first implementation.

본 발명은 또한 복수의 룩업 테이블 값들을 저장하는 메모리를 포함하며, 여기서 각각의 룩업 테이블 값은 복수의 주소 심볼을 포함하는 주소와 연관되어 있다. 한 예시적인 메모리는, 디코더, 인코더, 및 하나 이상의 십자로 교차된 상호 연결선 패턴을 포함한다. 디코더는 복수의 주소 심볼 중 하나 이상을 디코딩함으로써 디코더 출력들 하나에 대한 인에이블 신호를 발생하는 반면, 인코더는 상기 인에이블 신호에 기초하여 출력 워드를 발생한다. 십자로 교차된 상호 연결선 패턴은 상기 디코더 출력들 각각을 하나의 인코더 입력에 연결시킨다. 메모리의 크기를 감소시키기 위해, 본 발명의 어떤 태양들은 메모리의 수직 차원을 활용하기 위해 절연 물질과 수직으로 교대 배치(interleave)되어 있는 하나 이상의 도체 배선 평면층(planar layer of conductor track)을 사용하여 십자로 교차된 상호 연결선 패턴을 형성한다.The invention also includes a memory for storing a plurality of lookup table values, where each lookup table value is associated with an address comprising a plurality of address symbols. One example memory includes a decoder, an encoder, and one or more cross-crossed interconnect line patterns. The decoder generates an enable signal for one of the decoder outputs by decoding one or more of the plurality of address symbols, while the encoder generates an output word based on the enable signal. A cross pattern of crosses connects each of the decoder outputs to one encoder input. To reduce the size of the memory, certain aspects of the present invention employ one or more planar layer of conductor tracks that are interleaved perpendicular to the insulating material to utilize the vertical dimension of the memory. Form cross-connected interconnect patterns.

이 메모리에 대한 한 예시적인 사용은 비정칙 함수의 값들을 저장하기 위한 또는 컴퓨터 프로그램 명령어 등의 랜덤한 값들을 저장하기 위한 메모리 크기를 압축시키는 것을 가능하게 해준다. 이 경우에, 본 발명의 알고리즘은 정렬된 숫자 순서(sorted numerical order)로 된 데이터값들 또는 함수에 대해 작용하며, 이는 칩 상의 주소선을 고정된 패턴으로 치환함으로써 달성될 수 있다. 정칙 함수에 대해 기술된 압축 절차는 이어서 어떤 절대 데이터값을 임의의 곳에 저장된 가장 가까운 절대 데이터값에 대한 그의 델타로 대체시키는 데 사용될 수 있으며, 이에 따라 절대 데이터값을 저장하는 것보다 델타를 저장하는 데 더 적은 비트를 사용하게 된다. 2개 이상의 인접한 값들이 동일한 경우, 이들은 하나로 압축될 수 있으며, 대응하는 주소선들이 OR 게이트 입력을 사용하여 병합될 수 있다. 각각의 OR 게이트 입력은 하나의 트랜지스터를 필요로 하며, 따라서 점유되는 실리콘 면적의 관점에서 볼 때 메모리 비트와 등가이다.One exemplary use of this memory makes it possible to compress the memory size for storing values of irregular functions or for storing random values such as computer program instructions. In this case, the algorithm of the present invention operates on data values or functions in sorted numerical order, which can be achieved by substituting a fixed pattern of address lines on a chip. The compression procedure described for the regular function can then be used to replace any absolute data value with its delta to the nearest absolute data value stored anywhere, thus storing the delta rather than storing the absolute data value. Less bits are used. If two or more adjacent values are the same, they can be compressed into one and the corresponding address lines can be merged using the OR gate input. Each OR gate input requires one transistor, so it is equivalent to a memory bit in terms of occupied silicon area.

M차의 알고리즘을 사용하여 압축되어진 어드레싱된 값의 재구성은 M개의 인접한 주소선 중 하나가 활성화될 때마다 L2 비트 길이의 워드를 판독하는 것을 포함한다. 이어서, 기본값(base value), 1차 및 고차 차분을 포함하는 그의 구성 부분이 주소선 중 어느 것이 활성화되는지에 따라 조합된다. 각 그룹의 M개의 주소선은 재구성 조합 로직을 체계적으로 가능하게 해주는 log₂(M) 비트를 획득하기 위해 M-라인 대 log₂(M)-라인 우선순위 인코더를 사용하여 조합될 수 있다.Reconstruction of the addressed value compressed using the M-order algorithm involves reading a L2-bit long word each time one of the M adjacent address lines is activated. Subsequently, its constituent parts, including the base value, primary and higher difference, are combined according to which of the address lines is activated. The M address lines of each group may be combined using an M-line to log ₂ (M) -line priority encoder to obtain log ₂ (M) bits that systematically enable reconstruction combining logic.

본 발명을 사용하면, ROM 실리콘 면적이 알고리즘의 차수 및 사용되는 재구성 조합 로직의 양에 따라 일반적으로 2 내지 5배만큼 감소될 수 있다. 비트 정확도를 갖는 재구성(bit-exact reconstruction)이 항상 일어나며, 이는 물론 압축된 데이터가 컴퓨터 프로그램일 경우에 요구되는 것이다.Using the present invention, the ROM silicon area can be reduced by two to five times, depending on the order of the algorithm and the amount of reconstruction combining logic used. Bit-exact reconstruction always occurs, which is of course required when the compressed data is a computer program.

예시적인 메모리 구현의 제2 태양에서, 랜덤한 ROM 내용의 압축은 주소선을 치환함으로써 데이터를 숫자 순서로 정렬(sort)함으로서 달성된다. 정보 비트보다 더 적은 실제 비트 요소를 갖는 동일 비트 수의 정보를 ROM에 저장하는 것이 가능한데, 그 이유는 정보가 치환 패턴 내에서 효과적으로 저장되기 때문이다. 칩 상에 절연층으로 분리된 아주 많은 상호연결 패턴층을 수용할 수 있기 때문에, 많은 수의 서로 다른 주소선 치환 패턴(각각이 정보를 포함함)이 제조될 수 있고, 그에 따라 수직 차원을 활용하여 증가된 양의 정보를 저장할 수 있다.In a second aspect of the exemplary memory implementation, compression of random ROM content is achieved by sorting the data in numerical order by replacing the address line. It is possible to store the same number of bits of information in the ROM with fewer actual bit elements than the information bits, because the information is effectively stored in the substitution pattern. Because it can accommodate a large number of interconnect pattern layers separated by an insulating layer on the chip, a large number of different address line substitution patterns (each containing information) can be produced, thus utilizing the vertical dimension. To store an increased amount of information.

도 1은 마스크 프로그램된 판독 전용 메모리(ROM)을 나타낸 도면이다.1 shows a mask programmed read only memory (ROM).

도 2는 한 예시적인 정칙 함수의 그래프이다.2 is a graph of one exemplary regular function.

도 3은 한 예시적인 압축 알고리즘을 나타낸 도면이다.3 is a diagram illustrating an exemplary compression algorithm.

도 4는 예시적인 재구성 로직을 나타낸 도면이다.4 illustrates exemplary reconstruction logic.

도 5는 진정한 단조 함수와 지수 근사치 간의 그래프 비교를 나타낸 도면이다.5 shows a graphical comparison between a true monotonic function and an exponential approximation.

도 6A 내지 도 6D는 저장된 데이터를 재구성하기 위한 버터플라이 회로(Butterfly circuit)의 제1 층을 제거하는 한 예시적인 프로세스를 나타낸 도면이다.6A-6D illustrate one exemplary process for removing a first layer of a butterfly circuit for reconstructing stored data.

도 7은 종래의 MOS 트랜지스터를 나타낸 도면이다.7 shows a conventional MOS transistor.

도 8은 한 예시적인 메모리 회로의 구성을 나타낸 도면이다.8 is a diagram illustrating a configuration of an exemplary memory circuit.

도 9A 및 도 9B는 한 예시적인 행 주소 디코더를 나타낸 도면이다.9A and 9B illustrate one exemplary row address decoder.

도 10은 저장된 데이터를 정렬된 순서(sorted order)로 배열하는 한 예시적인 프로세스를 나타낸 도면이다.10 is a diagram illustrating an exemplary process for arranging stored data in a sorted order.

도 11은 랜덤한 데이터의 1차 압축을 사용하는 메모리의 한 예시적인 구현을 나타낸 도면이다.11 is a diagram illustrating an example implementation of a memory using first order compression of random data.

도 12A 내지 도 12C는 주소 디코딩의 한 예시적인 구현을 나타낸 도면이다.12A-12C illustrate one example implementation of address decoding.

도 13은 한 예시적인 우선순위 인코더를 나타낸 도면이다.13 is a diagram illustrating an exemplary priority encoder.

도 14A 및 도 14B는 예시적인 인코더 및 디코더 조합을 비교하는 도면이다.14A and 14B are diagrams comparing example encoder and decoder combinations.

먼저, 특정의 계산을 위해 요망되는 정칙 함수, 예를 들어 단조 증가 함수의 값들에 대한 룩업 테이블이 요구되는 본 발명의 예시적인 응용에 대해 기술한다. 수학식 1은 예시적인 함수를 나타내며,First, an exemplary application of the present invention is described in which a lookup table for the values of the desired regular function, for example the monotonic increment function, is required for a particular calculation. Equation 1 shows an exemplary function,

여기서, "a"는 로그의 밑(base)이다. 이 함수는, 모두 2005년 6월 1일에 제출되어 계류중인 미국 특허 출원 일련 번호 제11/142760호 및 제11/142485호에 기술된 로그 산술에서 나오며, 이는 여기에 인용함으로써 그 전체 내용이 본 명세서에 포함된다.Where "a" is the base of the log. This function is derived from the logarithmic arithmetic operations described in all pending US patent applications Ser. Nos. 11/142760 and 11/142485, filed June 1, 2005, which are incorporated herein by reference in their entirety. Included in the specification.

0 내지 16383의 x를 512로 나눈 값들, 즉 이진 소수점 오른쪽에 9 비트 및 이진 소수점 왼쪽에 5 비트를 갖는 x의 값에 대한 밑이 2인 이 함수에 대한 그래프가 도 2에 도시되어 있다. 이 함수는 0 내지 32의 범위에 있는

로 표시된, 512x의 보수에 대해 그려져 있으며, 인수

이 증가됨에 따라 단조 증가 함수값을 제공한다. 원하는 것은 16384개의 가능한 개별적인 테이블 지점들로부터 임의의 주어진 x 값에 대해, 23 이진 자리수(binary place)의 정확도를 갖는 함수값에의 빠른 액세스를 제공하는 것이다. 다른 대안으로서, 밑 e가 사용될 수 있으며, 이 경우 24 이진 자리수로 유사한 정확도가 달성된다.A graph for this function is shown in FIG. 2 with the base 2 for the value of x divided by 0 to 16383 divided by 512, i.e., 9 bits to the right of the binary point and 5 bits to the left of the binary point. This function is in the range of 0 to 32.

Are drawn for the complement of 512x, denoted by the argument

As it increases, it provides a monotonically increasing function value. What is desired is to provide quick access to a function value with an accuracy of 23 binary places for any given x value from 16384 possible individual table points. As an alternative, the base e can be used, in which case similar accuracy is achieved with 24 binary digits.

후자에 대한 미압축된 룩업 테이블은 16348개의 24-비트 워드, 총 393216 비트를 포함하게 된다. 워드의 값은 가장 가까운 LSB로 반올림되어야만 하며, 이는 나중에 기술되는 중요한 사실이다. 그렇지만, 그 값들이 무엇이든지 간에, 원하는 반올림된 값을 비트 정확도를 갖도록(in a bit-exact manner) 재생성하기 위해 이들 룩업 테이블 값에 적용되는 임의의 압축 및 압축 해제 알고리즘이 필요하다.The uncompressed lookup table for the latter will contain 16348 24-bit words, totaling 393216 bits. The value of the word must be rounded to the nearest LSB, which is an important fact described later. However, whatever the values are, there is a need for any compression and decompression algorithms applied to these lookup table values to regenerate the desired rounded values in a bit-exact manner.

17.32보다 큰 x의 값에 대해, 이 함수는 밑이 e인 경우에 24-비트 정확도까지 영이고, x > 24일 때 밑이 2인 경우에 영이다. 이것은 x > 24, 즉 x > 11 xxx.xxxxxxxxx 또는

< OOxxxxxxxxxxxx에 대해 F(x) = 0으로 간주될 수 있으며, 따라서

> 00111111111 = 4095에 대해서만 값이 필요하다. 그렇지만, 이것은 이 예시적인 함수에 특별한 것이며, 따라서 여기에서 자세히 설명하지 않는다. 오히려, 16384개 값 전부의 제공에 대해 살펴본다.For values of x greater than 17.32, this function is zero to 24-bit accuracy if base e, and zero if base 2 is 2 when x> 24. This is x> 24, i.e. x> 11 xxx.xxxxxxxxx or

<F (x) = 0 can be considered for <OOxxxxxxxxxxxx

A value is required only for> 00111111111 = 4095. However, this is special to this exemplary function and therefore is not described in detail here. Rather, we look at the provision of all 16384 values.

인수

에서의 단지 최하위 1 비트 차이(1 least significant bit difference)에 대한 연속적인 값들 간의 차분은 함수 자체보다 명백히 훨씬 더 작 으며, 따라서 더 적은 비트로 표현될 수 있다. 이 관찰은 임의의 매끈한 함수(smooth function)에 대해 유효하다. 따라서, 1차 압축 알고리즘은

의 짝수 주소값에 대해서만 24-비트 값을 저장하고 홀수 주소에는 이전의 짝수번째 값으로부터의 델타-값을 저장하는 것을 포함한다.take over

The difference between successive values for only the least least significant bit difference in is obviously much smaller than the function itself, and thus can be represented with fewer bits. This observation is valid for any smooth function. Therefore, the first compression algorithm

Storing a 24-bit value only for even address values, and storing delta-values from previous even values in odd addresses.

예시적인 함수에 대한 직접 계산에서 보면, 나오는 최고 델타가 16376(2^-24배)(이는 14-비트 워드 내에 수용될 수 있음)이다. 따라서, ROM은 8192개의 24 비트 워드 및 8192개의 14 비트 워드로서 재구성될 수 있다. 짝수 주소

에 대해서는 24-비트 값만이 필요하고, 홀수 주소

에 대해서는 24-비트 값 및 14-비트 값 둘다가 판독되고 가산되어야 한다. 따라서, ROM은 16384 x 24 = 393216 비트로부터 8192 x (24 + 14) = 311296 비트로 압축되었으며, 21%의 절감, 즉 원래의 크기의 79%로의 압축이 얻어진다.In direct calculations for the example function, the highest delta that comes out is 16376 (2 ^-24 times), which can be accommodated in a 14-bit word. Thus, the ROM can be reconstructed as 8192 24-bit words and 8192 14-bit words. An even address

Requires only 24-bit values for odd addresses

For 24-bit and 14-bit values, both must be read and added. Thus, the ROM was compressed from 16384 x 24 = 393216 bits to 8192 x (24 + 14) = 311296 bits, resulting in a 21% reduction, that is, compression to 79% of the original size.

2차 알고리즘은 인접한 값들의 쌍을 값 및 델타값으로 대체시키는 프로세스를 8192개의 24-비트 값의 제1 어레이에 적용하고 또 별도로 8192개의 14-비트 델타값의 제2 어레이에 적용함으로써 도출될 수 있다. 첫번째 적용은 원래의 함수에서의 2 떨어진(two apart) 델타값을 산출하며, 이는 1 떨어진 델타(single-spaced delta)의 거의 2배이며, 따라서 15-비트의 저장을 필요로 한다. 14-비트 1차 델타의 어레이에의 이 알고리즘의 두번째 적용은 2차 델타를 산출하며, 이는 이론적으로 5 비트를 필요로 하는 0 내지 31의 범위 내에 있어야만 한다. 그렇지만, 함수 의 가장 가까운 24-비트 워드로의 상기한 반올림으로 인해, 2차 델타의 범위는 최소 -1 내지 최대 33으로 증가하여, 6 비트의 저장을 필요로 한다. 이것이 일어나는 이유는 하나의 값이 버림(round down)될 수 있는 반면 인접한 워드는 올림(round up)됨으로써 델타를 1 또는 2만큼 증가(또는 감소)시키기 때문이다.The quadratic algorithm can be derived by applying the process of replacing adjacent pairs of values with values and delta values to a first array of 8192 24-bit values and separately to a second array of 8192 14-bit delta values. have. The first application yields two apart deltas in the original function, which is almost twice the single-spaced delta and thus requires 15-bit storage. The second application of this algorithm to an array of 14-bit primary deltas yields a second delta, which should theoretically be in the range of 0 to 31, requiring 5 bits. However, due to the above rounding to the nearest 24-bit word of the function, the range of secondary deltas increases from min -1 to max 33, requiring 6 bits of storage. The reason for this is that one value can be rounded down while adjacent words are rounded up to increase (or decrease) the delta by one or two.

ROM 크기는 이제 각각 4096개의 24, 15, 14 및 6-비트 값, 총 4096 x 59 = 241664 비트로 감소되어, 39%의 절감, 즉 원래의 크기의 61%로의 압축이 얻어진다. 원하는 값의 재구성은 각각 길이 24, 15, 14 및 6-비트의 4개의 서브워드(sub-word)를 획득하기 위해

의 최상위 12 비트로 4096 x 59 비트 ROM을 어드레싱하는 것을 포함하며, 이어서

의 최하위 2 비트가 다음과 같이 4개의 서브워드로부터의 원하는 값의 재구성을 제어하는 데 사용된다.The ROM size is now reduced to 4096 24, 15, 14 and 6-bit values, totaling 4096 x 59 = 241664 bits, respectively, resulting in a 39% savings, i.e. compression to 61% of the original size. The reconstruction of the desired value is used to obtain four subwords of

length

24, 15, 14 and 6-bits respectively.

Addressing a 4096 x 59 bit ROM with the most significant 12 bits of

The least significant two bits of are used to control the reconstruction of the desired value from the four subwords as follows.

LSB = 00에 대해 24, 15, 14 및 6-비트 값을 F, D2, D1, 및 DD로 표기하면, 원하는 값은 F이고,

If the 24, 15, 14, and 6-bit values for LSB = 00 are denoted as F, D2, D1, and DD, the desired value is F,

01 F + D1 01 F + D1

10 F + D2 10 F + D2

11 F + D1 + D2 + DD 11 F + D1 + D2 + DD

이다. F + D1을 형성하려면 14-비트 가산기를 필요로 하고, F + D2를 형성하려면 15-비트 가산기를 필요로 하며, D2 + DD를 형성하려면 6-비트 가산기를 필요로 한다. 따라서, 원하는 값을 재구성하기 위해서는 35 비트의 전가산기 셀(full adder cell)이 필요하다. 예시적인 함수의 경우에, -1 내지 +33의 DD 범 위는 저장된 15-비트 D2 값을 1만큼 감소시키고 이하의 규칙을 사용함으로써 0 내지 34로 천이될 수 있다.to be. A 14-bit adder is needed to form F + D1, a 15-bit adder is required to form F + D2, and a 6-bit adder is required to form D2 + DD. Thus, a 35 bit full adder cell is needed to reconstruct the desired value. In the case of the exemplary function, the DD range of -1 to +33 can be shifted from 0 to 34 by decreasing the stored 15-bit D2 value by 1 and using the following rule.

LSB = 00에 대해, 원하는 값이 F이고,

For LSB = 00, the desired value is F,

01 F + D1 01 F + D1

10 F + D2 + 110 F + D2 + 1

11 F + Dl + D2 + DD 11 F + Dl + D2 + DD

이다. 따라서, 6-비트 가산기는

의 LSB에 따라 1 또는 DD를 D2에 가산하고, 일반적으로 마이너스에서 플러스까지의 DD의 임의의 범위가 수용될 수 있게 해주기 위해 값 DD0 또는 DD1을 가산할 수 있다.to be. Thus, the 6-bit adder

1 or DD can be added to D2 according to the LSB, and the value DD0 or DD1 can be added to generally allow any range of DD from minus to plus to be acceptable.

도 3은 인접한 값들 간의 차분을 구하는 프로세스를 반복하는 것에 의한 압축 알고리즘을 나타낸 것이다. 저장될 원하는 값은 처음에 f0, f1, f2,.., f6,...,(이하 마찬가지임), f(16384)로 표기된 16384개의 24-비트 값이다. 차분을 구하는 프로세스의 첫번째 적용은 짝수번째 값은 그대로 두지만, 각각의 홀수번째 값은 이전의 짝수번째 값과의 차분으로 대체된다. 예를 들어, f5는 d4 = f5 - f4로 대체된다. 예시적인 매끈한 함수의 경우, 이들 차분은 최대 14-비트 길이 값인 것으로 계산되며, 이는 원래의 24-비트 값보다 1024배 더 작다. 거의 직선인 훨씬 더 매끈한 함수의 경우, 그 범위의 단지 1/16384만큼 떨어져 있는 2개의 값들 간의 차분이 최대의 원래값보다 1/16384배 더 작으며 따라서 14비트 짧아질 수 있을 것으로 예상할 수 있다. 일반적으로, 워드 길이의 이러한 단축은 임의의 매끈한 함 수에 대해 일어나게 된다. 따라서, 두번째 행에서, 함수값은 8192개 값 쌍으로 표현되고, 하나는 24-비트 길이이고 다른 하나는 14비트 길이이다.3 shows a compression algorithm by repeating the process of finding differences between adjacent values. The desired values to be stored are 16384 24-bit values initially denoted by f0, f1, f2, .., f6, ..., (as follows), f (16384). The first application of the process of finding the difference leaves the even value intact, but each odd value is replaced by the difference from the previous even value. For example, f5 is replaced by d4 = f5-f4. For the exemplary smooth function, these differences are calculated to be up to 14-bit long values, which are 1024 times smaller than the original 24-bit value. For a much smoother function that is nearly straight, the difference between two values that are only 1/16384 apart of the range can be expected to be 1/16384 times smaller than the original maximum and thus be 14 bits shorter. . In general, this shortening of word length occurs for any smooth function. Thus, in the second row, the function value is represented by 8192 value pairs, one 24-bit long and the other 14 bits long.

도 3의 세번째 행에서, f0, f2, f4,...로 표기된 짝수번째 값 및 d0, d2, d4,..로 표기된 차분값은 나머지 f-값에 대해 차분 알고리즘을 적용하고 또 별도로 d-값에 적용하는 것을 나타내기 위해 떨어져 도시되어 있다. 4번째 행에 나타낸 결과는 매 두번째 짝수번째 값, 즉 원래의 값들의 매 4번째 값이 현재 24-비트 값으로서 그대로 있지만, f2-f0 등의 그 사이의 짝수번째 값은 15-비트 차분으로 되어 있다. 하나 걸러있는 d-값도 마찬가지로 변하지 않은 14-비트 값인 채로 있지만, d6-d4 등의 그 사이의 값들은 6-비트 2차 차분으로 감소되어 있다. 이제, 원래의 16384개 24-비트 값은 24-비트, 15-비트, 14-비트 및 6-비트 값을 포함하는 4096개 값 집합으로 압축되었다.In the third row of Fig. 3, the even-numbered values denoted by f0, f2, f4, ... and the differential values denoted by d0, d2, d4, .. apply the difference algorithm to the remaining f-values and separately d- Shown apart to indicate application to the value. The result shown in line 4 is that every second even value, i.e. every fourth value of the original values, is still present as a 24-bit value, but even-numbered values such as f2-f0 are 15-bit difference. have. The other d-values remain likewise unchanged 14-bit values, but the values in between, such as d6-d4, are reduced to 6-bit second order differences. The original 16384 24-bit values are now compressed into a set of 4096 values, including 24-bit, 15-bit, 14-bit and 6-bit values.

이 프로세스는 대부분의 값들이 단지 2 정도의 비트 길이의 고차 차분(high order difference)이 될 때까지 반복될 수 있다. 이들 궁극적인 고차 차분은, 원래의 24-비트 값들이 가장 가까운 24 이진 자리수로 올림 또는 버림된 것으로 인해 약간 랜덤한 패턴을 형성한다. 메모리의 총 크기는 차분 알고리즘이 몇번 반복하여 적용되는지에 따라 감소된다.This process can be repeated until most values have a high order difference of only two bit lengths. These ultimate high order differences form a slightly random pattern because the original 24-bit values are rounded up or down to the nearest 24 binary digits. The total size of the memory is reduced depending on how many times the difference algorithm is applied.

도 4는 2차 차분 압축 알고리즘에 대한 재구성 로직을 나타낸 것이다. 위쪽 그림에 나타낸 16384개 24-비트 워드를 저장하는 원래의 방법은 14-비트 주소가 적용되는 룩업 테이블을 포함한다. 아래쪽 그림은 룩업 테이블이 4096개 59-비트 값들로 감소된 2차 차분 압축 기술을 나타낸 것이다. 이들은 원래의 14-비트 주소 중 최상위 12 비트만을 사용하여 어드레싱되며, 이는 목표값이 존재하는 4개 값으로 된 그룹을 효과적으로 정의한다. 24-비트 절대값, 15-비트 더블-스페이스 차분, 14-비트 싱글-스페이스 차분, 및 6-비트 2차 차분을 포함하는 59 비트가 도시된 바와 같이 각자의 가산기에 인가된다. 첫번째 열의 가산기는 두번째 최하위 비트에 의해 인에이블되어, 그 비트가 '1'인 경우 15-비트 차분을 24-비트 값에 가산하고 6-비트 2차 차분을 14-비트 차분에 가산하고, 그렇지 않고 인에이블 비트가 '0'인 경우 24 비트 값이 15-비트 값을 가산하지 않고 통과되고 14-비트 값이 6-비트 값을 가산하지 않고 통과된다. 두번째 열의 가산기는 원래의 14-비트 주소의 최하위 비트가 '1'인 경우 처음 2개의 가산기의 출력을 가산하고, 그렇지 않은 경우 24-비트 값을 그대로 통과시킨다. 이와 같이, 원래의 14-비트 주소의 최하위 2비트가 4개의 값으로 된 그룹 중 어느 것이 발생되는지를 제어한다.4 shows the reconstruction logic for a second-order differential compression algorithm. The original method of storing 16384 24-bit words, shown in the figure above, includes a lookup table with a 14-bit address. The figure below shows a second-order differential compression technique with a lookup table reduced to 4096 59-bit values. They are addressed using only the top 12 bits of the original 14-bit address, which effectively defines a group of four values where the target value exists. 59 bits, including 24-bit absolute, 15-bit double-space difference, 14-bit single-space difference, and 6-bit secondary difference, are applied to each adder as shown. The adder of the first column is enabled by the second least significant bit, if that bit is '1', adds a 15-bit difference to a 24-bit value and a 6-bit secondary difference to a 14-bit difference, otherwise If the enable bit is '0', the 24-bit value is passed without adding a 15-bit value and the 14-bit value is passed without adding a 6-bit value. The adder in the second column adds the output of the first two adders if the least significant bit of the original 14-bit address is '1', otherwise passes through the 24-bit value. As such, it controls which of the group of four values the least two bits of the original 14-bit address occur.

D2가 D1의 거의 2배임에 주목하여 추가의 절감이 행해질 수 있으며, 따라서 D1의 비트 정확도를 갖는 값(bit exact value)을 획득하기 위해 14-비트 값 INT(D2/2) = RightShift(D2)에 적용될 정정과 함께 15-비트 D2만 저장되면 된다. 이 정정은 D1-(D2/2)로 표현되는 2차 차분이며, 이 D1-(D2/2)는, 직접 계산에 의하면, 4 비트의 저장을 필요로 하는 -1 내지 8의 범위를 차지하는 것임을 알았다. 이어서, 또한 이 2차 차분이 값 DD의 1/4과 거의 같다는 것을 알 수 있다. 따라서, 2차 차분이 DD/4와의 차분으로 대체될 수 있으며, 이 DD/4는 직접 계산에 의하면 2 비트의 저장을 필요로 하는 -1 내지 +1의 범위를 가지며, 사실상 DDD로 표기될 수 있는 3차 차분이다.Note that additional savings can be made by noting that D2 is nearly twice as large as D1, so that 14-bit value INT (D2 / 2) = RightShift (D2) to obtain a bit exact value of D1. Only 15-bit D2 needs to be stored with the correction to be applied. This correction is a quadratic difference represented by D1- (D2 / 2), which D1- (D2 / 2), according to direct calculation, occupies a range of -1 to 8, requiring 4 bits of storage. okay. Then, it can also be seen that this secondary difference is approximately equal to 1/4 of the value DD. Thus, the second order difference can be replaced by the difference from DD / 4, which, by direct calculation, has a range of -1 to +1 which requires two bits of storage, and in fact can be denoted DDD. That's the third difference.

따라서, 16384개 24-비트 함수 값들 각각을 비트-정확도로 재생성하기 위해, 저장된 값은 다음과 같이 4개의 값으로 된 4096개 집합을 포함한다.Thus, in order to regenerate each of 16384 24-bit function values with bit-accuracy, the stored value includes 4096 sets of four values as follows.

24-비트 값: F(4i)24-bit value: F (4i)

15-비트 값: D2(4i) = F(4i + 2) - F(4i) 15-bit value: D2 (4i) = F (4i + 2)-F (4i)

6-비트 값: DD(4i) = (F(4i + 3) - F(4i + 2)) - (F(4i + 1) - F(4i)) 6-bit value: DD (4i) = (F (4i + 3)-F (4i + 2))-(F (4i + 1)-F (4i))

2-비트 값: DDD(4i) = (INT(D2(4i)/2)) - (F(4i + 1) - F(4i)) - INT(DD(4i)/4)2-bit value: DDD (4i) = (INT (D2 (4i) / 2))-(F (4i + 1)-F (4i))-INT (DD (4i) / 4)

4개의 함수값 F(4i), F(4i + 1), F(4i + 2), F(4i + 3)은 이어서 이하로부터 비트-정확도로 재구성될 수 있다.The four function values F (4i), F (4i + 1), F (4i + 2), F (4i + 3) can then be reconstructed with bit-accuracy from below.

F(4i + 1) = F (4i) + INT(D2(4i)/2) + INT(DD(4i)/4) - DDD(4i) F (4i + 1) = F (4i) + INT (D2 (4i) / 2) + INT (DD (4i) / 4)-DDD (4i)

F(4i + 2) = F(4i) + D2(4i) F (4i + 2) = F (4i) + D2 (4i)

F(4i + 3) = F(4i) + INT(D2(4i)/2) + DD(4i) + INT(DD(4i)/4) - DDD(4i) F (4i + 3) = F (4i) + INT (D2 (4i) / 2) + DD (4i) + INT (DD (4i) / 4)-DDD (4i)

4개의 원래 24-비트 값으로 된 그룹이 96 대신에 총 47 비트로 대체되었으며, 압축율이 2:1보다 약간 더 낫다.The four original 24-bit values were replaced by a total of 47 bits instead of 96, with a slightly better compression ratio than 2: 1.

역으로, 다른 대안으로서, D2를 얻기 위해 2D1에 적용될 정정과 함께 14-비트 D1만이 저장될 수 있다. 이와 같이, 이하의 것을 저장하는 것 등의 다른 구성이 가능하다.Conversely, as another alternative, only 14-bit D1 may be stored with the correction to be applied to 2D1 to obtain D2. In this way, other configurations, such as storing the following, are possible.

24-비트 값 F(4i)24-bit value F (4i)

14-비트 값 D1 = F(4i + 1) - F(4i)14-bit value D1 = F (4i + 1)-F (4i)

6-비트 값 DD = (F(4i + 3) - F(4i + 2)) - D1 6-bit value DD = (F (4i + 3)-F (4i + 2))-D1

3-비트 값 DDD = (F(4i + 2) - F(4i)) - (2D1 + INT(DD/4)) 3-bit value DDD = (F (4i + 2)-F (4i))-(2D1 + INT (DD / 4))

이에 의해 재구성이 약간 더 간단하게 될 수 있다. INT(DD/4) 등의 연산이 DD의 2개의 LSB를 사용하지 않는 자명한 논리 함수를 나타낸다는 것을 잘 알 것이다.This makes the reconstruction slightly simpler. It will be appreciated that operations such as INT (DD / 4) represent obvious logical functions that do not use the two LSBs of DD.

서로 다른 차수의 알고리즘을 사용하여 달성되는 압축을 보여주는 테이블을 계산하기 이전에, 수정된 왈시 변환(Walsh Transform)에 기초하여 보다 체계적인 버전의 고차 압축이 개발된다. 왈시 변환 방법은 인접한 값들의 쌍의 합산 및 차분을 형성하고 원래의 값들을 이 합산 및 차분으로 대체시킨다. 예를 들어, f0, f1, f2, 및 f3를 포함하는 데이터를 압축하기 위한 첫번째 반복에서, f0는 s0 = f0 + f1으로 대체되고, f1은 f1 - f0로 대체되며, f2는 s2 = f2 + f3로 대체되고, f3는 d2 = f3 - f2로 대체된다. 값 쌍의 합산 및 차분을 계산하는 회로는 버터플라이 회로(butterfly circuit)라고 부른다.Before calculating a table showing the compression achieved using different order algorithms, a more systematic version of higher order compression is developed based on a modified Walsh Transform. The Walsh transform method forms the sum and difference of adjacent pairs of values and replaces the original values with this sum and difference. For example, in the first iteration to compress data containing f0, f1, f2, and f3, f0 is replaced by s0 = f0 + f1, f1 is replaced by f1-f0, and f2 is s2 = f2 + is replaced by f3, and f3 is replaced by d2 = f3-f2. The circuit that calculates the sum and difference of the value pairs is called a butterfly circuit.

합산들 모두는 모든 차분에 뒤따라서 인접하여 배열된다. 차분들은 앞서 본 바와 같이 합산보다 더 짧은 워드가 된다. 이 프로세스는 이어서 2차 알고리즘을 얻기 위해 또 마찬가지로 하여 고차 알고리즘을 위해 합산 값들의 블록에 대해 반복되고 이어서 별도로 차분 값들(각각 원래 값의 길이의 1/2임)의 블록에 대해 반복된다. 16384개 값들에 대한 최고차 알고리즘은 14이다. 그 때, 블록들은 각각 단지 하나의 값만을 가지며 따라서 더 이상 압축될 수 없다. 14차 알고리즘은 실제로 16384-포인트 왈시 변환이며, 보다 낮은 차수의 알고리즘은 전체 왈시 변환의 중단 단계들에서 중단된다. 메모리에 저장되는 최종의 압축값을 형성하기 위해 최 종 반복과 연관된 합산 및 차분이 결합된다. 최종 반복이 임의의 선택된 반복일 수 있고 따라서 14번째 반복일 필요가 없다는 것을 잘 알 것이다.All of the summations are arranged adjacently following every difference. The differences are shorter words than summation, as seen previously. This process is then repeated for a block of sum values for a higher order algorithm as well for a higher order algorithm and then for a block of differential values (each half of the length of the original value) separately. The highest-order algorithm for 16384 values is 14. At that time, the blocks each have only one value and thus can no longer be compressed. The fourteenth order algorithm is actually a 16384-point Walsh transform, and the lower order algorithm is stopped at the abort stages of the full Walsh transform. The sum and difference associated with the last iteration is combined to form the final compression value stored in memory. It will be appreciated that the last iteration may be any selected iteration and therefore need not be the 14 th iteration.

2개의 24-비트 값의 합산을 형성할 때, 워드 길이는 1비트만큼 증가될 수 있다. 그렇지만, 합산 및 차분은 항상 동일한 최하위 비트를 가지며, 따라서 이 최하위 비트는 정보를 잃지 않고 이들 중 하나에서 버려질 수 있다. 합산에서 최하위 비트를 버리고 수학식 2를 사용함으로써,When forming the sum of two 24-bit values, the word length can be increased by one bit. However, the sum and difference always have the same least significant bit, so this least significant bit can be discarded in one of them without losing information. By discarding the least significant bit in the summation and using Equation 2,

그 대신에, 합산이 길이가 증가하는 것이 방지된다. 재구성 시에, 차분이 합산에 가산되거나 그로부터 차감되고, 차분의 LSB는 누락된 합산 LSB를 대체하기 위해 제1 가산기/감산기 스테이지의 캐리/바로우(carry/borrow) 입력에 피드시키는 것에 의해 2번 사용된다. 수학적으로, 수정된 변환 버터플라이(transform butterfly)는 다음과 같이 쓰여질 수 있고,Instead, the summation is prevented from increasing in length. Upon reconstruction, the difference is added to or subtracted from the summation, and the LSB of the difference is used twice by feeding the carry / borrow input of the first adder / subtractor stage to replace the missing summation LSB. do. Mathematically, the modified transform butterfly can be written as

SUM(i) = INT((F(2i + 1)+ F(2i))/2) SUM (i) = INT ((F (2i + 1) + F (2i)) / 2)

DIFF(I) = F(2i + 1) - F(2i)DIFF (I) = F (2i + 1)-F (2i)

재구성은 다음과 같이 쓰여질 수 있다.The reconstruction can be written as

F(2i +1) = SUM(i) + INT(DIFF(i)/2) + AND(1,DIFF(i)) F (2i +1) = SUM (i) + INT (DIFF (i) / 2) + AND (1, DIFF (i))

F (2i) = SUM(i) - INT(DIFF(i)/2) - AND(1,DIFF(i)) F (2i) = SUM (i)-INT (DIFF (i) / 2)-AND (1, DIFF (i))

이상의 것이 왈시-푸리에 변환More Than Walsh-Fourier Transforms

SUM(i) = F(2i + 1) + F(2i) SUM (i) = F (2i + 1) + F (2i)

DIFF(i) = F(2i + 1) - F(2i) DIFF (i) = F (2i + 1)-F (2i)

및 그의 역And his station

F(2i + 1) = (SUM(i) + DIFF(i))/2 F (2i + 1) = (SUM (i) + DIFF (i)) / 2

F(2i) = (SUM(i) - DIFF(i))/2 F (2i) = (SUM (i)-DIFF (i)) / 2

에서 사용되는 공지된 "버터플라이" 연산의 수정된 버전이다.A modified version of the known "butterfly" operation used in.

고속의 밑이 2인 의사-왈시 변환(Fast, base-2, Pseudo-Walsh Transform)(의사-FWT)은 2개의 인접한 값들로 된 그룹을 하나의 값 및 하나의 델타값으로 변환하거나 4개의 인접한 값들로 된 그룹을 하나의 값, 2개의 델타값, 하나의 2차 델타값으로 변환하거나, 기타 등등을 하기 위해 상기한 수정된 버터플라이를 사용하여 구성될 수 있다. 이것이 고차 알고리즘의 특성을 조사하거나 그를 구현하는 일을 체계적인 프로세스로 만들어준다. 적절한 의사-왈시 변환을 위한 FORTRAN 코드가 이하에 주어져 있다.Fast, base-2, pseudo-walsy transform (pseudo-FWT) converts a group of two adjacent values into one value and one delta, or four adjacent A group of values may be constructed using the modified butterfly described above to convert one value, two delta values, one secondary delta value, or the like. This makes a systematic process for investigating or implementing the higher order algorithms. The FORTRAN code for proper pseudo-walsh conversion is given below.

C F IS THE GROUP OF M VALUES TO BE PSEUDO-WALSH TRANSFORMED C F IS THE GROUP OF M VALUES TO BE PSEUDO-WALSH TRANSFORMED

C AND N=LOG2 (M) IS THE ORDER OF THE ALGORITHM C AND N = LOG2 (M) IS THE ORDER OF THE ALGORITHM

SUBROUTINE PSWLSH(F,M,N) SUBROUTINE PSWLSH (F, M, N)

INTEGER*4 F(*),SUM,DIFF INTEGER * 4 F (*), SUM, DIFF

N1=M/2 N1 = M / 2

N2=1 N2 = 1

DO 1 I=1, N DO 1 I = 1, N

L0=1 L0 = 1

DO 2 J=1, N1 DO 2 J = 1, N1

L=LO L = LO

DO 3 K=1 ,N2 DO 3 K = 1, N2

SUM=(F(L+N2)+F(L))/2 SUM = (F (L + N2) + F (L)) / 2

DIFF=F(L+N2)-F(L) DIFF = F (L + N2) -F (L)

F(L+N2)=DIFF F (L + N2) = DIFF

F(L)=SUM F (L) = SUM

L=L+1 L = L + 1

3 CONTINUE 3 CONTINUE

LO=L+N2 LO = L + N2

2 CONTINUE 2 CONTINUE

N1=N1/2 N1 = N1 / 2

N2=2*N2 N2 = 2 * N2

1 CONTINUE 1 CONTINUE

RETURN RETURN

END END

이하의 테이블은 다른 차수의 의사-왈시 알고리즘을 사용하여 달성되는 예시적인 함수의 데이터 압축의 정도를 나타낸 것이다.The table below shows the degree of data compression of the example function achieved using other orders of pseudo-walsh algorithm.

알고리즘의 차수The order of the algorithm 워드의 수Number of words 총 워드길이Total word length 총 비트수Total number of bits 밑-eBase-e 밑-2Base-2 00 1638416384 2424 393216393216 376832376832 1One 81928192 3838 311296(319488)311296 (319488) 294912294912 22 40964096 5959 241664(258048)241664 (258048) 221184221184 33 20482048 9393 190464(215040)190464 (215040) 169984169984 44 10241024 147147 150528(182272)150528 (182272) 135168135168 55 512512 238238 121856(161792)121856 (161792) 106496106496 66 256256 381381 97536(142848)97536 (142848) 8908889088 77 128128 624624 79072(135424)79072 (135424) 7168071680 88 6464 10841084 6937669376 5996859968 99 3232 19471947 6230462304 5113651136 1010 1616 35073507 5611256112 4507245072 1111 88 57385738 4590445904 3709637096 1212 44 1002010020 4008040080 3049230492 1313 22 1778217782 3556435564 2629626296 1414 1One 3208532085 3208532085 2318823188

괄호 안의 비트수는 표준의 버터플라이 연산을 사용하는 왈시 변환이 사용될 때 얻어지는 것이다. 표준의 왈시 변환의 압축 열화는 부분적으로는 합산항의 워드 길이의 증가로 인한 것이고 부분적으로는 고차 차분이 형성될 때 원래의 테이블 값에서의 반올림 오차의 증대로 인한 것이며, 이 반올림 오차는 수정된 버터플라이가 사용되고 또 합산항의 LSB가 버려지는 경우 덜 눈에 띄게 된다. 이 경우 둘다에서, 적절한 역 알고리즘이 적용될 때 원래의 값의 비트 정확도의 재구성이 달성된다. 밑이 2인 경우, 함수

가 23 이진 자리수까지 계산되고, 밑이 e인 경우에는 24 자리까지 계산되며 이는 비슷한 정확도를 나타낸다.The number of bits in parentheses is obtained when the Walsh transform is used using standard butterfly operations. The compression degradation of the standard Walsh transform is partly due to an increase in the word length of the summing term and partly due to an increase in the rounding error in the original table values when higher order differences are formed, which is a modified butter. It becomes less noticeable when a ply is used and the LSB of the summation is discarded. In both cases, reconstruction of the bit accuracy of the original value is achieved when the appropriate inverse algorithm is applied. If base is 2, function

Is calculated to 23 binary digits, and if it is the base e, it is calculated to 24 digits, indicating similar accuracy.

따라서, 최종적인 변환 차수 14가 전체 어레이에 작용하는 상황에서, 룩업 테이블은 단일 워드 32085(23188) 비트 폭으로 감소되며, 이는 평균적으로 테이블 값마다 2 비트 미만이지만, 각각의 값은 24-비트(23-비트) 정확도로 재구성될 수 있다. 단일 워드는 더 이상 테이블이 아니며, 주소를 필요로 하지 않는데, 그 이유는 그것이 항상 판독될 수 있고, 값들이 단지 재구성 가산기의 입력에 하드와이 어될 수 있기 때문이다. 그렇지만, 14 레벨의 재구성 가산기가 필요하며, 필요한 단일-비트 가산기 셀의 수는 대략 65000이다. 가산기 셀이 메모리 비트보다 크기 때문에, 이것은 최소 실리콘 면적을 나타내지 않는다. 최소 면적은 저장되는 비트의 수와 값 재구성을 위해 필요한 가산기 셀의 수 간의 어떤 절충에서 달성된다. 그럼에도 불구하고, 대략 4 또는 5배의 실리콘 면적 감소가 전체적으로 실현가능한 것처럼 보인다.Thus, with the final conversion order 14 working on the entire array, the lookup table is reduced to a single word 32085 (23188) bits wide, which is on average less than 2 bits per table value, but each value is 24-bit ( 23-bit) accuracy. A single word is no longer a table and does not require an address because it can always be read and values can only be hardwired to the input of the reconstruction adder. However, 14 levels of reconstructing adders are needed, and the number of single-bit adder cells required is approximately 65000. Since the adder cell is larger than the memory bits, it does not represent the minimum silicon area. The minimum area is achieved in some compromise between the number of bits stored and the number of adder cells needed for value reconstruction. Nevertheless, a silicon area reduction of approximately 4 or 5 times appears to be entirely feasible.

또한 이상에서 매끈한 함수의 임의의 개별 값을 계산하게 될 기계를 구축하는 방법이 주어진 것으로 결론지을 수 있다. 메모리 테이블은 극한에서 사라지고, 가산기 트리에의 하드와이어된 입력으로 대체된다. 이것에 의해, 0을 가산하는 가산기 스테이지가 이전의 스테이지로부터의 캐리 전달만을 처리하도록 간단화될 수 있고 또 마찬가지로 항상 그의 입력 중 하나에서 "1"을 갖는 가산기가 이와 유사하게 간단화될 수 있는 추가적인 간단화 단계에 이르게 될 수 있다.It can also be concluded that the method given above builds a machine that will calculate any individual value of the smooth function. The memory table disappears at the extremes and is replaced by hardwired inputs to the adder tree. By this, an adder stage that adds zero can be simplified to handle only carry transfers from the previous stage and likewise an adder that always has a "1" at one of its inputs can be similarly simplified. This may lead to a simplification step.

의사-FWT 테이블 압축 알고리즘은 또한 밑이 2인 로그가 사용되고 X₂가 9 비트 길이일 때 값

을 저장하는 데 필요한 512-워드 테이블에 적용될 수 있다. 이 함수는 인수 범위의 일부분에 걸쳐 충분할 수 있는 상기 예시적인 함수에 대한 근사치이다. 전범위에 걸쳐 이 근사치를 사용하고 또 더 짧은 워드 길이를 갖는 필요한 정정만을 저장하는 것이 유용하다.The pseudo-FWT table compression algorithm also uses a value when the base 2 log is used and X ₂ is 9 bits long.

Can be applied to the 512-word table needed to store the. This function is an approximation to the example function that may be sufficient over a portion of the argument range. It is useful to use this approximation over the entire range and store only the necessary corrections with shorter word lengths.

밑-2를 사용하면, 이 밑-2의 지수 함수에 대한 비트 패턴은 인수가 0.X₂에서 1.X₂, 2.X₂, 기타 등등으로 변할 때 시프트로 반복되며, 따라서 0.X₂(여기서, X₂는 9-비트 값임) 표기된 기본 범위(principal range)만이 합성되면 된다. 이하의 테이블은 함수를 합성하기 위해 저장될 필요가 있는 총 비트수를 제공하며, 이들은 0 내지 511이 인수 X₂의 값에 대해 이진 소수점 이후의 가장 가까운 23 비트로 반올림된 것이다.Using the base-2, the bit pattern for the exponential function of this base-2 is repeated in the shift as the argument changes from 0.X ₂ to 1.X ₂ , 2.X ₂ , and so on, so 0.X ₂ (where X ₂ is a 9-bit value) Only the marked principal range needs to be synthesized. The table below provides the total number of bits that need to be stored to synthesize the function, where 0 to 511 are rounded to the nearest 23 bits after the binary decimal point for the value of the argument X ₂ .

알고리즘의 차수The order of the algorithm 워드 수Word count 총 워드길이Total word length 총 비트수Total number of bits 00 512512 2323 1177611776 1One 256256 3737 94729472 22 128128 5757 72967296 33 6464 8989 56965696 44 3232 139139 44484448 55 1616 219219 35043504 66 88 358358 28642864 77 44 585585 23402340 88 22 974974 19481948 99 1One 16081608 16081608

마지막 행은, 고정된 하드와이어된 입력을 갖는 제어된 가산기만을 사용하여, 메모리 없이 함수를 계산하는 기계를 구성한다.The last row configures the machine to compute the function without memory, using only a controlled adder with a fixed hardwired input.

상기한 비트 이외에,

또는

의 값(이 값은 로그 영역에서의 가산 및 감산을 위한 로그 산술에서 사용됨)을 얻기 위해 이 지수 함수에 대한 정정이 필요하다.In addition to the bits mentioned above,

or

A correction to this exponential function is needed to get the value of (this value is used in logarithmic arithmetic for addition and subtraction in the logarithmic region).

도 5에

및

의 원하는 값들과 지수 함수 간의 차분이 밑-2의 경우에 대해 나타내어져 있다. 이들 차분은 원래의 함수보다 더 적은 공간을 차지하는 룩업 테이블에 저장될 수 있으며, 칩 면적은 대략적으로 오른쪽 아래의 곡선 아래의 삼각형 면적으로 표시된다. 게다가, 정정도 역시 정칙 함수를 형성하기 때문에, 그 테이블도 역시 본 명세서에 기재된 기술에 의해 압축될 수 있다.5

And

The difference between the desired values of and the exponential function is shown for the case of base-2. These differences can be stored in a lookup table that takes up less space than the original function, and the chip area is approximately represented by the triangular area under the curve at the bottom right. In addition, since the correction also forms a regular function, the table can also be compressed by the techniques described herein.

압축 알고리즘의 변형에서, 재구성 가산기가 대부분 제거될 수 있다. 이것 은 주어진 24-비트 값과 주어진 14-비트 값의 합산이 그 결과로서 항상 동일한 14 LSB가 얻어진다는 것을 알아낸 것에 기초하고 있다. 따라서, 14-비트 델타, 즉 d0 대신에 24-비트 값과 14-비트 값의 사전 계산된 합산, 즉 c0 = f0 + d0의 14 LSB를 저장하는 것이 효율적이다. 그렇지만, 또한, 14-비트 델타의 24-비트 값에의 합산이 14번째 비트로부터 15번째 비트로의 캐리를 야기했는지 여부를 가리키는 여분의 비트를 저장할 필요가 있다. 이와 같이, 14 비트 대신에 15 비트를 저장함으로써, 재구성 가산기의 처음 14 비트가 제거될 수 있다. 15번째 비트는 최상위 10비트에 대한 캐리 전달 회로에 적용된다. 캐리 전달 회로는 전가산기(full adder)보다 간단하며, 15번째 비트가 삽입될 수 있는 응용에서의 나중의 가산 스테이지가 있는 경우 제거될 수 있다. 이것에 대해서는 도 6을 참조하여 설명한다. 사실상, 원래의 인접한 값들의 쌍의 최하위 비트는 보유되지만, 공통의 최상위 부분에 연관되어 있다. 게다가, 여분의 비트는 그 쌍의 두번째 값을 재구성할 때 최상위 부분이 증가되어야만 하는지 여부를 나타낸다.In a variant of the compression algorithm, most of the reconstruction adders can be eliminated. This is based on the finding that the sum of a given 24-bit value and a given 14-bit value always results in the same 14 LSB being obtained. Thus, it is efficient to store a 14-bit delta, i.e. a 14-bit pre-calculated summation of a 24-bit value and a 14-bit value, i.e. c0 = f0 + d0, instead of d0. However, it is also necessary to store an extra bit indicating whether the addition of the 14-bit delta to the 24-bit value caused a carry from the 14th bit to the 15th bit. As such, by storing 15 bits instead of 14 bits, the first 14 bits of the reconstruction adder can be eliminated. The 15th bit applies to the carry delivery circuit for the most significant 10 bits. The carry transfer circuit is simpler than a full adder and can be eliminated if there are later add stages in the application where the 15th bit can be inserted. This will be described with reference to FIG. 6. In fact, the least significant bit of the original pair of contiguous values is retained, but associated with the common highest part. In addition, the extra bits indicate whether the top portion should be increased when reconstructing the second value of the pair.

도 6A는 SELECT 제어선에 따라 2개의 대안의 값 중 하나를 생성하기 위해 동일한 14-비트 값을 24-비트 값에 가산하거나 그로부터 감산하는 종래의 버터플라이 회로를 나타낸 것이다. 도 6B는 버터플라이 회로가 결과의 2개의 대안적인 14 LSB 중 하나를 생성하기 위해 14-비트 값을 24-비트의 14 LSB에 가산하거나 그로부터 감산하는 14-비트 가산기로 간단화될 수 있다는 것과, 2개의 대안적인 10-비트 MSB 패턴 중 하나를 생성하기 위해 24 비트 값의 10 MSB와 결합되는 가산 또는 감산 동작으로부터의 캐리(carry) 또는 바로우(borrow) 비트를 나타낸 것이다. MSB 패턴 은 14-비트 가산기/감산기에 의해 바로우 또는 캐리가 발생되지 않는 경우 동일할 수 있다. 도 6C에서, 2개의 대안적인 14-비트 결과는 단순히 사전 저장되며, 차분(감산) 동작에 대응하는 것은 24-비트 값 대신에 10 MSB와 함께 저장되는 반면, 합산 결과에 대응하는 것은 14-비트 값 대신에 저장된다. 그렇지만, 15번째 비트는 10 MSB가 합산 경우에 동일한지 1만큼 증가되는지를 나타내는 데 필요하다. 차분 경우에 15번째 비트는 항상 0이고, 24-비트 워드로부터의 14 LSB와 함께 셀렉터로의 입력으로 도시되어 있다. 트랜지스터의 부존재로서 메모리에서 0이 구현되는 경우, 0의 열은 실리콘 면적을 차지하지 않는다. 도 6C에서 캐리 전달기는 여전히 필요에 따라 10 MSB를 증가시키기 위해 필요하다.6A shows a conventional butterfly circuit that adds or subtracts the same 14-bit value to or from a 24-bit value to produce one of two alternative values according to a SELECT control line. 6B shows that the butterfly circuit can be simplified with a 14-bit adder that adds or subtracts a 14-bit value to or from a 24-bit 14 LSB to produce one of the two alternative 14 LSBs of the result. Represents a carry or borrow bit from an add or subtract operation that is combined with a 10 MSB of 24 bit value to produce one of two alternative 10-bit MSB patterns. The MSB pattern may be the same if no barrow or carry is generated by the 14-bit adder / subtractor. In FIG. 6C, two alternative 14-bit results are simply pre-stored, and corresponding to the difference (subtraction) operation is stored with 10 MSBs instead of 24-bit values, while corresponding to the summation result is 14-bit. Stored instead of the value. However, the fifteenth bit is needed to indicate whether 10 MSBs are equal or incremented by one when summed. The 15th bit is always 0 in the differential case and is shown as an input to the selector with 14 LSBs from the 24-bit word. If zero is implemented in memory as the absence of a transistor, then the columns of zero do not occupy silicon area. In Figure 6C the carry transmitter is still needed to increase 10 MSBs as needed.

도 6C에서의 셀렉터는 명시적으로 제공될 필요가 없다. 주소 비트가 다수의 저장된 값들 중 하나 또는 다른 것을 선택하는 것은 메모리에서 본질적인 것이다. 따라서, 2개의 대안적인 14(또는 15) 비트 패턴은 선택선에 의해 간단히 선택되며, 이 선택선은 14-비트 주소의 최하위 비트이다. 10 MSB는 주소의 13 MSB에 의해 어드레싱된다. 따라서, 메모리는 8192개 10-비트 워드 및 16384개 14/15-비트 워드, 즉 총 319488 비트를 포함하며, 이는 14-비트 가산기를 제거하기 위해 8192 비트 더 많다. 마지막으로, 도 6D는 캐리 비트로 10 MSB를 증가시키는 것을 지연시키고 후속하는 가산기에서 편리한 기회에 사용될 캐리 비트를 단지 피드 포워드함으로써 캐리 전달기를 생략하는 것을 나타낸 것이다.The selector in FIG. 6C does not need to be provided explicitly. It is essential in memory that an address bit selects one or the other of a number of stored values. Thus, two alternative 14 (or 15) bit patterns are simply selected by a select line, which is the least significant bit of the 14-bit address. 10 MSBs are addressed by 13 MSBs in the address. Thus, the memory contains 8192 10-bit words and 16384 14 / 15-bit words, a total of 319488 bits, which is 8192 bits more to eliminate the 14-bit adder. Finally, FIG. 6D illustrates omitting the carry transmitter by delaying increasing 10 MSB with carry bits and only feed forwarding the carry bits to be used for convenient opportunities in subsequent adders.

도 6에서 24-비트 워드를 10-비트 부분과 14-비트 부분으로 분할하는 것은 14-비트 값보다 결코 크지 않은 짝수번째 및 홀수번째 어드레싱된 값 간의 델타를 나타내는 예시적인 함수로부터 일어난다. 10 MSB는 항상 동일하거나 짝수번째와 인접한 홀수번째 어드레싱된 값 간에 단지 1만큼 다르다. 이와 유사하게, 2개 떨어져 있는 값들 간의 차분이 결코 15-비트 값보다 크지 않으며, 4개 떨어져 있는 값들 간의 차분이 결코 16-비트 값보다 크지 않다. 따라서, 4개(또는 심지어 5개) 인접한 값들의 그룹은 1의 가능한 증가를 제외하고는 동일한 최상위 8비트를 갖는 반면, 서로 다른 최하위 16-비트 부분을 갖는다. 따라서, 대안적인 실현은 메모리를 4096개 8-비트 값 및 16384개 17-비트 값으로 배열하는 것이며, 17번째 비트는 8 MSB가 1만큼 증가되어야만 하는지 여부를 가리킨다. 이것은 비트 수를 311296으로 감소시킨다. 서로 다른 분할에 대한 메모리의 비트 수를 나타내는 테이블이 이하에 나타내어져 있다.Dividing a 24-bit word into a 10-bit portion and a 14-bit portion in FIG. 6 results from an exemplary function that represents the delta between even and odd addressed values that are never greater than 14-bit values. The 10 MSB is always only 1 different between the same or even and adjacent odd addressed values. Similarly, the difference between two distant values is never greater than a 15-bit value, and the difference between four distant values is never greater than a 16-bit value. Thus, a group of four (or even five) adjacent values have the same highest 8 bits except for a possible increase of 1, while having different least significant 16-bit portions. Thus, an alternative realization is to arrange the memory into 4096 8-bit values and 16384 17-bit values, with the 17th bit indicating whether 8 MSBs should be increased by one. This reduces the number of bits to 311296. A table showing the number of bits of memory for different partitions is shown below.

그룹 크기Group size MSB의 수Number of MSBs LSB의 수(+1)Number of LSBs (+1) 메모리 비트Memory bits 22 1010 1515 319488319488 44 88 1717 311296311296 88 77 1818 309248309248 1616 66 1919 317440317440 3232 55 2020 330240330240 6464 44 2121 345088345088

이 테이블은 비트 수를 최소화하는 최적의 분할이 있음을 나타낸다.This table indicates that there is an optimal partition that minimizes the number of bits.

이 프로세스는 이제 각각의 그룹 내의 LSB 패턴이 단조적으로 증가하는 값들이라는 것을 인식함으로써 반복될 수 있다. 예를 들어, 마지막 경우에서의 64개 값들로 된 각각의 그룹에서, 2개의 인접한 값들은 14-비트 값보다 큰 차이가 날 수 없다. 따라서, 이들은 각각의 쌍에 대해 7 MSB(이는 1만큼 증가될 수 있어야만 함)를 저장하는 32개의 2의 그룹 및 2개의 대안적인 14 또는 비트 LSB 패턴으로 분할될 수 있으며, 15번째 비트는 7 MSB가 1만큼 증가되어야만 하는지를 나타낸다.This process can now be repeated by recognizing that the LSB pattern in each group is monotonically increasing values. For example, in each group of 64 values in the last case, two adjacent values cannot differ by more than a 14-bit value. Thus, they can be divided into 32 groups of two and two alternative 14 or bit LSB patterns that store 7 MSBs for each pair, which must be incremented by 1, with the 15th bit being 7 MSBs. Indicates whether should be increased by one.

64개의 21-비트 값들로 된 그룹을 표현하는 다른 방식들이 이하에 나타내어져 있으며, 전체 메모리 비트 수가 암시되어 있다.Other ways of representing a group of 64 21-bit values are shown below, implying the total number of memory bits.

그룹 크기Group size MSB의 수Number of MSBs LSB의 수(+1)Number of LSBs (+1) 64-그룹에 대한 비트64-bit for groups 총 메모리Total memory 22 77 1515 11841184 304128304128 44 55 1717 11681168 300032300032 88 44 1818 11841184 304128304128

32의 그룹의 경우에 대한 절차를 반복하면 이하와 같이 된다.Repeating the procedure for the group of 32 results in:

그룹 크기Group size MSB의 수Number of MSBs LSB의 수(+1)Number of LSBs (+1) 32-그룹에 대한 비트32-bit for group 총 메모리Total memory 22 66 1515 576576 297472297472 44 44 1717 576576 297472297472 88 33 1818 588588 303616303616

16의 그룹에 대해, 이하가 얻어진다.For 16 groups, the following is obtained.

그룹 크기Group size MSB의 수Number of MSBs LSB의 수(+1)Number of LSBs (+1) 16-그룹에 대한 비트Bit for 16-group 총 메모리Total memory 22 55 1515 280280 292864292864 44 33 1717 284284 296960296960

정칙 함수의 주어진 임의의 테이블의 경우, 상기한 시도를 수행하여 최소 수의 저장된 비트를 얻게 되는 방법을 결정할 수 있다.For any given table of regular functions, the above attempt can be performed to determine how to obtain the minimum number of stored bits.

상기한 방법은 매끈한 함수에 대해서 뿐만 아니라 구간별로 매끈한 함수에 대해서도 사용될 수 있다. 예를 들어, 32개 값으로 된 그룹이 상기한 기술들 중 임의의 것을 사용하여 한번에 압축되는 경우, 32의 각 그룹 내의 값들이 매끈한 함수, 즉 연속 함수를 표현하기만 하면 된다. 서로 다른 그룹 간의 불연속은 중요하지 않다.The above method can be used not only for a smooth function but also for a smooth function for each section. For example, if a group of 32 values is compressed at one time using any of the above techniques, the values within each group of 32 need only represent a smooth function, ie a continuous function. Discontinuities between different groups are not important.

이 기술이 보다 랜덤한 데이터에 어떻게 적용될 수 있는지를 설명하기 위해, 먼저 일반적인 MOS ROM의 구성에 대해 기술한다. 도 7은 MOS 트랜지스터의 구성을 나타낸 것이다. 전도성 게이트 전극은 프로세싱 온도에 견디기 위해 녹는점이 높은 전도성 물질로 이루어져 있으며, 일반적으로 이산화실리콘의 얇은 절연층에 의 해 실리콘 기판과 분리되어 있다. 게이트 및 그 아래에 있는 산화물 절연체는 일반적으로 전체 기판을 게이트 구조로 먼저 덮고 이어서 게이트가 요망되는 곳을 제외하고는 에칭 제거함으로써 생성된다. 이어서, 게이트의 양측에 불순물을 주입함으로써 드레인 전극 및 소스 전극이 형성된다. 주입된 불순물은 실리콘의 녹는점 바로 아래까지 온도를 상승시킴으로써 실리콘 기판 결정 구조와 통합될 수 있어야만 한다. 이와 같이 MOS 트랜지스터를 형성하였으면, 이들 트랜지스터는 알루미늄 상호연결 배선을 증착함으로써 상호 연결될 수 있다.To explain how this technique can be applied to more random data, a general configuration of a MOS ROM is described first. 7 shows the configuration of a MOS transistor. The conductive gate electrode is made of a conductive material with a high melting point to withstand processing temperatures and is usually separated from the silicon substrate by a thin insulating layer of silicon dioxide. The gate and the oxide insulator below it are generally created by first covering the entire substrate with the gate structure and then etching away except where the gate is desired. Next, drain electrodes and source electrodes are formed by injecting impurities into both sides of the gate. The implanted impurities must be able to integrate with the silicon substrate crystal structure by raising the temperature just below the melting point of the silicon. Once the MOS transistors have been formed, these transistors can be interconnected by depositing aluminum interconnect wiring.

도 8은 이와 같이 제조된 ROM을 나타낸 것이다. 소스 및 드레인 확산의 교대로 있는 스트라이프(stripe)는 소스 및 드레인이 병렬로 되어 있는 다수의 트랜지스터를 형성한다. 이 트랜지스터는 게이트 구조가 에칭 제거되지 않은 곳에 위치된다. 트랜지스터는 이진 1이 저장되도록 요망되는 곳에 배치되며, 이진 0이 저장되어야 하는 곳에는 트랜지스터가 배치되지 않는다. 실제로, 도 8에서, 트랜지스터는 드레인선 위쪽의 소스선으로부터의 게이트 및 드레인선 아래쪽의 소스선으로부터의 게이트를 포함하는 것으로 도시되어 있다. 이 둘다는 동일한 게이트 신호로 인에이블되고, 따라서 사실상 병렬로 있는 2개의 트랜지스터이다. 개별적인 드레인선의 수는 워드에서의 비트 수에 대응한다. 동일한 워드에 대한 소스선 전부가 워드 행 인에이블 신호의 좌측에 연결되어 있고, 이 인에이블 신호는 그 행에 있는 워드를 판독하기 위해 접지로 풀링된다.8 shows a ROM manufactured as described above. Alternating stripes of source and drain diffusions form a number of transistors in which the source and drain are in parallel. This transistor is located where the gate structure is not etched away. Transistors are placed where binary 1 is desired to be stored, and transistors are not placed where binary 0 should be stored. In fact, in FIG. 8, the transistor is shown to include a gate from a source line above the drain line and a gate from a source line below the drain line. Both are two transistors that are enabled with the same gate signal, and are therefore in parallel in nature. The number of individual drain lines corresponds to the number of bits in the word. All of the source lines for the same word are connected to the left side of the word row enable signal, which is pulled to ground to read the words in that row.

트랜지스터의 서로 다른 열은 그 행에서의 서로 다른 워드에 대응한다. 주소 디코더는 이진 주소 입력을 받아서 한 열 내의 모든 게이트 상에 인에이블 신호 를 주며, 따라서 특정의 워드의 모든 비트를 인에이블시켜 드레인선으로 보낸다. 워드가 그 위치에 0을 포함하고 있는 경우, 드레인선은 턴온된 트랜지스터에 의해 풀다운(pull down)되고, 그렇지 않고 트랜지스터가 없는 경우(그 워드 내의 0에 대응함) 풀다운되지 않는다.Different columns of transistors correspond to different words in that row. The address decoder accepts a binary address input and sends an enable signal on all gates in a column, thus enabling all bits of a particular word to send to the drain line. If the word contains zero at that position, the drain line is pulled down by the turned-on transistor, otherwise it is not pulled down if there is no transistor (corresponding to zero in that word).

다수의 행의 워드가 제조될 수 있고, 주소선은 모든 행을 통해 위쪽에서 아래쪽으로 작동된다. 각각의 워드-행에 대한 소스 인에이블선은 제2 행-주소 디코더에 연결되고, 그러면 전체 주소는 열 주소와 행 주소의 조합이다. 인에이블 워드-행의 소스선을 풀다운시키는 적합한 행-주소 디코더가 도 9A 및 도 9B에 도시되어 있다.Multiple rows of words can be produced, and the address line operates from top to bottom through all rows. The source enable line for each word-row is connected to a second row-address decoder, where the full address is a combination of column address and row address. A suitable row-address decoder that pulls down the source line of the enable word-row is shown in Figures 9A and 9B.

도 9A에 도시되어 있는 바와 같이, 캐스케이드가능 구성 블록은 소스가 "캐스코딩 입력(cascoding input)"(C)에 공통 연결되어 있는 2개의 MOS 트랜지스터를 포함한다. 주소선은 한쪽 트랜지스터의 게이트에 연결되어 있고, 그의 드레인은 상보적 주소 신호를 나타내며, 이 신호는 다른쪽 트랜지스터의 게이트에 연결되어 있다. 그 결과 주소선이 하이인지 로우인지에 따라 한쪽 트랜지스터 또는 다른쪽 트랜지스터 중 어느 하나가 소스와 드레인 사이에서 도통한다. 이 구성 블록의 캐스케이드가 도 9B에 도시되어 있으며, 여기서 주소 비트(A1)에 의해 제어되는 제1 장치의 드레인 연결부(DR1, DR2)는, 둘다가 주소 비트(A2)에 의해 제어되는, 이 체인에서 보다 위쪽에 있는 2개의 동일한 구성 블록의 소스 연결부에 연결되어 있다. 이들의 드레인 연결부는 차례로 모두가 주소 비트(A3)에 의해 제어되는 구성 블록들의 쌍에 연결되어 8개의 드레인 연결부를 제공하며, 이하 마찬가지로 되어 있다. 이와 같이, N-비트 주소 비트 패턴은 2^N개의 최종 드레인 연결부 중 하나로부터 "칩 인에이블"이라고 하는 제1 소스 연결부까지의 전도 경로를 생성하기 위해 구성 블록들의 이진 트리에 의해 디코딩된다.As shown in FIG. 9A, the cascadeable configuration block includes two MOS transistors whose source is commonly connected to a "cascoding input" (C). The address line is connected to the gate of one transistor, the drain of which represents a complementary address signal, which is connected to the gate of the other transistor. As a result, either the transistor or the other transistor conducts between the source and the drain, depending on whether the address line is high or low. The cascade of this building block is shown in Fig. 9B, where the drain connections DR1 and DR2 of the first device controlled by the address bit A1 are both controlled by the address bit A2. It is connected to the source connection of two identical building blocks, which are located above. These drain connections are in turn connected to a pair of building blocks, all of which are controlled by address bits A3, to provide eight drain connections, which are likewise below. As such, the N-bit address bit pattern is decoded by a binary tree of building blocks to create a conducting path from one of the 2 ^N final drain connections to a first source connection called " chip enable ".

칩 인에이블선을 풀다운시키면 인에이블된 전도 경로에 대응하는 드레인 연결부 중 하나를 풀다운시키며, 이는 차례로 워드-행 소스선 중 하나를 풀다운시키고, 이는 차례로 그 워드 행(이 워드행에 대응하는 워드 트랜지스터는 열 주소에 의해 턴온되었음)에 대한 드레인선을 풀다운시킨다. 감지 증폭기(도시 생략)는 어느 비트선이 전원으로부터 전류를 끌어올 수 있는지를 검출하고 어드레싱된 워드에 대응하는 출력 비트 패턴을 생성한다. 감지 증폭기는 긴 체인의 트랜지스터를 통해 손실을 보상하고 버퍼링된 논리 레벨 출력을 메모리 외부에 있는 프로세서에 제공한다.Pulling down the chip enable line pulls down one of the drain connections corresponding to the enabled conduction path, which in turn pulls down one of the word-row source lines, which in turn causes that word row (the word transistor corresponding to this word row). Pulls down the drain line for < RTI ID = 0.0 > a < / RTI > turned on by the column address. A sense amplifier (not shown) detects which bit line can draw current from the power supply and generates an output bit pattern corresponding to the addressed word. Sense amplifiers compensate for losses through long chains of transistors and provide buffered logic-level outputs to processors outside of memory.

도 10은 랜덤한 값들을 포함하고 있는 메모리가 숫자값의 순서로 배열될 수 있는 방법을 나타낸 것이다. 워드들은 주소 디코더의 동일한 출력선에 연결된 채로 있으면서 단순히 숫자 순서로 정렬되고 배열된다. 이것에 의해 디코딩된 주소선이 도시된 바와 같이 십자로 교차되어야만 하며, 이는 적당한 다중층 상호연결 패턴으로 달성될 수 있다. 주소선은 그 다음 행의 워드에의 연결을 위해 다른 십자 교차 패턴으로 재배열될 수 있으며, 이는 완전히 다른 순서로 정렬될 가능성이 있다. 이것 자체로 압축을 달성하지 않으며 각각의 주소와 연관된 정확한 값을 여전히 판독해내면서 심지어 랜덤한 데이터를 숫자 값의 단조 증가 순서로 칩 상에 배열하는 가능한 방법을 예시한 것에 불과하다.10 illustrates a method in which a memory including random values may be arranged in order of numerical values. The words are simply aligned and arranged in numerical order while remaining connected to the same output line of the address decoder. This requires that the decoded address line must be crossed crosswise as shown, which can be achieved with a suitable multilayer interconnection pattern. The address lines can be rearranged in different cross-crossing patterns for linking to the words of the next row, which is likely to be arranged in a completely different order. It does not achieve compression on its own and merely illustrates the possible way of arranging random data on the chip in monotonically increasing order of numerical values while still reading out the exact value associated with each address.

도 11은 디코더, 십자 교차선 패턴, 및 인코더를 사용하는 랜덤한 데이터의 1차 압축을 위한 메모리의 구현예를 나타낸 것이다. 도 10과 비교하여 값-정렬된 순서에서 하나 걸러 있는 전워드(full word)가 제거되었다. 제거된 워드 이전의 워드는 이제 2개의 주소를 OR하는 OR 게이트를 포함함으로써 그 자신의 주소는 물론 그에 인접해 있던 생략된 워드의 주소로도 어드레싱된다. 생략된 워드의 주소선은 이제 생략된 워드를 재생성하기 위해 재구성 로직에 의해 이전의 워드에 가산되어지는 단축된 델타 워드를 가능하게 해준다. 메모리의 비트 수는 쌍마다 2개의 입력 OR 게이트의 대가로 원래의 전워드들의 모든 쌍에 대해 전워드(full word) 및 델타 워드(delta word)로 감소되었다. 2-입력 OR 게이트는 최대 4개 트랜지스터이며, 이는 4비트로 카운트될 수 있다. 이것은 절감되는 실리콘 면적의 양을 추정함에 있어서 델타-워드 길이에 가산되어야만 한다.11 illustrates an implementation of a memory for first order compression of random data using a decoder, cross-cross pattern, and encoder. Compared to FIG. 10, every other word in the value-sorted order has been eliminated. The word before the removed word now contains an OR gate that ORs two addresses, thereby addressing its own address as well as the address of the abbreviated word that was adjacent to it. The address line of the omitted word now enables a shortened delta word that is added to the previous word by the reconstruction logic to regenerate the omitted word. The number of bits of memory has been reduced to full and delta words for every pair of original full words at the expense of two input OR gates per pair. The two-input OR gate is up to four transistors, which can be counted as four bits. This must be added to the delta-word length in estimating the amount of silicon area to be saved.

주소선 십자 교차 패턴도 역시 칩 면적을 필요로 한다. 전워드 아래쪽의 OR 게이트를 재할당함으로써, 십자 교차 연결 패턴이 서로 절연되거나 상호 분리되어 있는 적당한 수의 배선층을 사용하여 전워드에 의해 점유되는 영역 상을 지날 수 있다. 이러한 층들은 포토리쏘그라피 기술을 사용하여 제조될 수 있다. 본 발명의 목적상, 기본적으로 무한 수의 배선층이 사용될 수 있으며 절연층이 십자로 교차된 상호연결선 및/또는 십자로 교차된 상호연결선 패턴을 원하지 않는 접촉으로부터 분리시키는 것으로 가정한다. 실제로, 동일한 칩 면적을 사용하여 더 많은 데이터를 저장하기 위해, 상호연결 패턴의 층 상의 층을 사용하는 것은 수직 차원 을 이용한다.The address line cross pattern also requires chip area. By reassigning the OR gate below the previous word, a crossover interconnection pattern can pass over the area occupied by the previous word using an appropriate number of wiring layers that are insulated or separated from each other. Such layers can be prepared using photolithography techniques. For the purposes of the present invention, it is assumed that essentially an infinite number of wiring layers can be used and that the insulating layer separates the crossover and / or crossover interconnection patterns from unwanted contacts. Indeed, to store more data using the same chip area, using layers on the layers of the interconnection pattern uses vertical dimensions.

고차 차분 알고리즘의 개념을 랜덤한 데이터로 확장함에 있어서 조심을 해야 하는데, 그 이유는 1차보다 큰 차분이 의미있는 규칙성을 나타내지 않을 수 있기 때문이다. 그 대신에, 하나의 전워드를 워드들의 그룹에 대한 기본값으로서 선택하고 그룹 내의 다른 워드들을 기본값으로부터의 그의 델타로 대체하는 것이 압축의 효율을 향상시킬 수 있다. 예를 들어, 4개의 값 f0, f1, f2 및 f3의 그룹은 f0, d1=f1-f0, d2=f2-f0, 및 d3=f3-f0로 대체될 수 있다. 그러면, 4개의 주소의 논리 OR는 f0의 값을 어드레싱해야만 한다. 4-입력 게이트는 최대 8개의 트랜지스터이다. 이 4-입력 OR 게이트가 차지하는 실리콘 면적은 3개의 전워드 f1, f2 및 f3를 3개의 델타-워드 d1, d2 및 d3로 감소시키는 절감에 대한 대가이다. 4-입력 OR 게이트는 2-입력 OR 게이트보다 단지 2배 복잡할 뿐이지만 메모리 감축은 이제 대략 3배 이상이다. 따라서, OR 게이트의 오버헤드는 달성된 절감에 비해 줄어들었다. 저장될 2개의 데이터 값이 동일하고 따라서 값 정렬된 순서에서 서로 이웃하여 나타나는 경우마다, 이들은 단일 값으로 대체되고 델타가 저장될 필요가 없다. 그러면, 0 델타에 대응하는 주소선이 필요없고, 2개의 주소선의 단일의 저장된 값으로의 OR가 도 12에 나타낸 바와 같이 AND-OR 게이트를 사용함으로써 주소 디코딩의 마지막 단계로 흡수될 수 있다.Care should be taken in extending the concept of higher-order difference algorithms to random data, because differences greater than the first order may not represent meaningful regularity. Instead, selecting one whole word as the default for a group of words and replacing other words in the group with its delta from the default can improve the efficiency of compression. For example, a group of four values f0, f1, f2 and f3 may be replaced with f0, d1 = f1-f0, d2 = f2-f0, and d3 = f3-f0. Then, the logical OR of the four addresses must address the value of f0. The four-input gate is up to eight transistors. The silicon area occupied by this four-input OR gate is the cost of reducing three full words f1, f2 and f3 to three delta-words d1, d2 and d3. The four-input OR gate is only twice as complex as the two-input OR gate, but memory reduction is now approximately three times more. Thus, the overhead of the OR gate is reduced compared to the achieved savings. Whenever two data values to be stored are identical and thus appear next to each other in a value sorted order, they are replaced with a single value and the delta does not need to be stored. Then, no address line corresponding to zero delta is needed, and the OR of the two address lines to a single stored value can be absorbed as the last step of address decoding by using an AND-OR gate as shown in FIG.

도 12A는 예를 들어 8개의 주소 비트를 256 주소선 중 하나로 디코딩하는 도 9 유형의 2개의 디코더를 나타낸 것이다. 도 9가 개방-드레인 출력을 가지고 있기 때문에, 메모리 판독 사이에 출력을 기지의 상태로 리셋하기 위해 풀업 트랜지스 터(또는 저항기)가 부가되어야만 한다. 그러면, 한쪽 디코더의 256개 라인(A0...A255)은 다른쪽 디코더의 256개 라인(B0-B255) 위에서 교차되어 65536개 교차점을 이루게 된다. 65536개 라인의 개개의 라인이 디코딩되어야만 할 때, 2-입력 AND 게이트가 도 12B에 도시한 바와 같이 그 접합점에 배치되어, 출력선마다 4개 트랜지스터의 주소 디코더 복잡도를 제공하는데, 그 이유는 8*256개 라인 디코더 내의 트랜지스터의 수가 비교에서 무시할 정도이기 때문이다. 그렇지만, 2개의 주소의 OR만이 필요할 때, 도 12C는 2개의 AND 게이트의 8개 트랜지스터가 추가의 트랜지스터를 사용하지 않고 AND-OR를 형성하기 위해 재연결될 수 있는 방법을 나타낸 것이다.12A shows two decoders of the FIG. 9 type, for example, decoding eight address bits into one of 256 address lines. Since FIG. 9 has an open-drain output, a pull-up transistor (or resistor) must be added to reset the output to a known state between memory reads. Then, 256 lines (A0 ... A255) of one decoder cross over 256 lines (B0-B255) of the other decoder to form 65536 intersections. When individual lines of 65536 lines must be decoded, a two-input AND gate is placed at its junction as shown in FIG. 12B, providing an address decoder complexity of four transistors per output line, for eight reasons. This is because the number of transistors in the 256-line decoder is negligible in the comparison. However, when only an OR of two addresses is needed, Figure 12C shows how eight transistors of two AND gates can be reconnected to form an AND-OR without using additional transistors.

모바일 장치에서 사용되는 어떤 일반적인 DSP 프로그램을 분석하여 상기한 기술을 사용하여 가능한 절감을 추정하였다. 8464개 16-비트 명령어를 포함하는 제1 프로그램은 단지 911개의 서로 다른 16-비트 값을 포함하는 것으로 판정되었다. 분명하게도, 많은 값들이 여러번 반복되고, 이들을 단지 한번만 저장하고 AND-OR 주소 디코딩을 사용하는 상기한 기술은 델타 값을 사용할 필요없이 칩 면적을 대략 9배 절감할 수 있다. 2296개 워드의 제2 프로그램은 단지 567개의 서로 다른 16-비트 값을 포함하는 것으로 밝혀졌으며, 역시 상당한 압축비를 가능하게 해준다. 결합된 프로그램은 10760개 워드에서 1397개 워드로 압축된다. 종래의 메모리의 주소 디코더에서는 AND-OR 주소 디코딩에서 사용되는 주소당 4-트랜지스터가 필요하였으며, 따라서 복잡도를 증가시키지 않는다. 크기 감축의 보다 정확한 추정은 주소 디코더의 주소당 4-트랜지스터를 메모리 어레이의 워드당 16개 트 랜지스터에 부가하여 종래의 메모리에 대한 총 워드(word total)당 20개 트랜지스터로 되며, 이는 4+16/9개 트랜지스터로 감소되고, 이는 대략 3배 감축을 나타낸다.Some common DSP programs used in mobile devices have been analyzed to estimate possible savings using the techniques described above. The first program containing 8464 16-bit instructions was determined to contain only 911 different 16-bit values. Obviously, the above technique, where many values are repeated many times, storing them only once and using AND-OR address decoding, can save approximately 9 times the chip area without having to use delta values. The second program of 2296 words was found to contain only 567 different 16-bit values, which also allows for a significant compression ratio. The combined program is compressed from 10760 words to 1397 words. In conventional memory address decoders, four transistors per address used in AND-OR address decoding were needed, thus not increasing the complexity. A more accurate estimate of the size reduction is the addition of four transistors per address in the address decoder to 16 transistors per word in the memory array, resulting in 20 transistors per word total for conventional memory, which is 4+. It is reduced to 16/9 transistors, which represents approximately a 3x reduction.

저장될 워드의 수가 가능한 값의 수를 초과할 때, 예를 들어 1 메가워드의 16-비트 값을 저장할 때, 그 특정의 16-비트 워드를 인에이블시키기 위해 동일한 워드가 저장되어질 모든 주소를 조합하기 위해 AND-OR 주소 디코딩을 사용하면 65536개 이상의 값을 저장할 필요가 없다는 것이 분명하다. 이러한 유형의 메모리는 AND-OR 어드레싱의 복잡한 상호연결 패턴을 처리하기 위해 더욱 중복하는 상호 연결층을 사용하면 이득을 본다. 이 기술은 따라서 보다 효율적인 메모리를 실현하기 위해 칩 상에 수직 차원(더 많은 상호연결층)을 사용하는 방법이다.When the number of words to be stored exceeds the number of possible values, for example when storing a 16-bit value of 1 megaword, combine all the addresses where the same word will be stored to enable that particular 16-bit word. If you use AND-OR address decoding to do this, it is clear that you do not need to store more than 65536 values. This type of memory benefits from the use of more redundant interconnect layers to handle the complex interconnect patterns of AND-OR addressing. This technique is therefore a method of using vertical dimensions (more interconnect layers) on the chip to realize more efficient memory.

후자의 경우에, 65536개의 가능한 워드가 심지어 저장될 필요가 없다. 적절한 주소선을, 0 내지 65535의 값의 출력이 발생되게 하는 데 필요한 65536개 라인들 중 하나로 AND-OR하였으면, 우선순위 인코더라고 하는 장치가 65536:16 라인 인코더로서 사용될 수 있으며, 이 라인 인코더는 16:65536 라인 주소 디코더의 역함수이다. 우선순위 인코더는 입력당 최대 6개 트랜지스터를 갖게 제조될 수 있으며, 이는 그렇지 않은 경우 필요하게 되는 워드당 16-비트보다 작다. 우선순위 인코더는 도 13에 도시되어 있다.In the latter case, 65536 possible words do not even need to be stored. Once an appropriate address line has been AND-ORed to one of the 65536 lines needed to cause an output of a value between 0 and 65535, a device called a priority encoder can be used as the 65536: 16 line encoder, which 16: 65536 Inverse function of line address decoder. Priority encoders can be manufactured with up to six transistors per input, which is less than 16-bits per word otherwise needed. The priority encoder is shown in FIG.

0부터 N-1까지 번호가 매겨진 N개의 입력으로 된 세트가 제1 절반(0 내지 N/2-1) 및 제2 절반(N/2 내지 N-1)으로 분할된다. 대응하는 라인들의 쌍은 제1 절반 및 제2 절반으로부터 선택되어(예를 들어, 도시된 바와 같이 라인 0 및 N/2), 서로 OR된다. NOR 게이트는 입력들 중 어느 하나가 1인 경우 논리 '0' 출력을 제공한다. N-형 트랜지스터는 하부 입력이 1인 경우에 도통되고 0임에 틀림없는 상부 입력을 출력 B에 연결시킨다. P-형 트랜지스터는 어느 한 입력이 1인 경우에 도통되고, 상부 입력이 1인 경우에 상부 입력을 출력 B에 연결시킨다. 따라서, 출력 B는 입력 중 어느 하나가 1인 경우 상부 입력의 극성을 가지며, 그렇지 않은 경우 출력 B는 개방 회로이다. 모든 출력 B는 병렬로 되어, 와이어된 OR(wired OR)를 형성하고 활성 입력선이 입력들 중 상반부에 있는 경우 1 출력을 제공하고 그렇지 않은 경우 '0'이 출력된다. 이것은 원하는 인코딩의 최상위 비트이다. 이 프로세스는 이어서 N/2개의 출력(A0 내지 A(N/2-1))이 원하는 인코딩의 제2 최상위 비트를 생성하기 위해 반복되고, 이하 마찬가지이다. 입력선들을 완전히 인코딩하는 데 필요한 스테이지의 수는 따라서 N/2 + N/4 + N/8...= N-1이고, 각각의 스테이지는 6개의 트랜지스터(4-트랜지스터 NOR 게이트와 P형 및 N형 트랜지스터)를 포함한다.A set of N inputs numbered from 0 to N-1 is divided into a first half (0 to N / 2-1) and a second half (N / 2 to N-1). The corresponding pair of lines are selected from the first half and the second half (eg, lines 0 and N / 2 as shown) and are OR'ed together. The NOR gate provides a logic '0' output when either of the inputs is one. The N-type transistor is energized when the bottom input is 1 and connects the top input to output B, which must be zero. The P-type transistor is energized when either input is 1 and connects the top input to output B when the top input is 1. Thus, output B has the polarity of the top input if either of the inputs is 1, otherwise output B is an open circuit. All outputs B are in parallel, forming a wired OR and providing 1 output if the active input line is in the upper half of the inputs, otherwise a '0' is output. This is the most significant bit of the desired encoding. This process is then repeated to produce N / 2 outputs A0 through A (N / 2-1) to generate the second most significant bit of the desired encoding, and so on. The number of stages required to fully encode the input lines is therefore N / 2 + N / 4 + N / 8 ... = N-1, with each stage having six transistors (four-transistor NOR gate and P-type and N-type transistor).

어드레스 디코더와 그의 역의 조합, 즉 우선순위 인코더는 도 14에 도시한 바와 같이 단지 입력을 재생성한다. 그렇지만, 라인들이 교차되어 있는 경우, 임의의 1:1 매핑(치환 박스(Substitution box), 즉 S-박스 또는 정보 무손실 코딩(information lossless coding)이라고도 함)이 생성된다. 어떤 주소들이 AND-OR되어, 어떤 출력 값이 빠진 것을 암시하는 경우, 다대일 매핑(many:one mapping) 또는 정보 손실 코딩(information lossy coding)이 생성된다. 따라서, 주소 워드 길이와 같은 출력 워드 길이를 갖는 판독 전용 메모리가 이와 같이 생성될 수 있 다. 정보가 주소 라인 십자 교차 또는 치환 패턴에 분명히 저장되며, 그것이 도 14A와 도 14B 사이의 유일한 차이이다. 주소 디코더에서 라인당 추정된 4개 트랜지스터를 사용하고 우선순위 인코더에서 6개를 사용하면 1024개 10-비트 워드를 넘는 메모리 크기에 대해 절감을 보여준다. 그렇지만, 다수의 상호연결층을 사용하면, 몇개의 서로 다른 상호연결 패턴(이들 모두는 동일한 우선순위 인코더 및 주소 디코더를 공유함)이 생성될 수 있어, 원하는 상호연결 패턴을 선택하는 방법을 제공한다. 도 13에 도시한 바와 같이 병렬로 된 P형 및 N형 트랜지스터로 이루어진 라인마다의 수열 통과 스위치(series pass switch)는 이것을 위해 사용될 수 있다. 따라서, M개의 상호연결 패턴이 선택될 수 있고 워드당 2 + 10/M개 트랜지스터를 사용하여 M.2^N개의 N-비트 워드 크기의 메모리가 생성될 수 있으며, 이는 층의 수가 증가함에 따라 효율이 증가함을 보여준다. 예를 들어, 스위치가 논리 1 및 논리 0 레벨 둘다에 대해 도통하도록 보장하기 위해 -V_t(P-형 스위치의 경우) 또는 V_cc + V_t(N-형 스위치의 경우)의 전원을 생성함으로써 단일-트랜지스터 통과 스위치가 사용될 수 있다.The combination of the address decoder and its inverse, i. E. The priority encoder, only reproduces the input as shown in FIG. However, if the lines are crossed, a random 1: 1 mapping (also referred to as a Substitution box, ie an S-box or information lossless coding) is generated. If some addresses are AND-ORed to imply that some output is missing, a many-to-one mapping or information lossy coding is generated. Thus, a read only memory having an output word length equal to the address word length can thus be created. Information is clearly stored in the address line cross intersection or substitution pattern, which is the only difference between FIGS. 14A and 14B. Using an estimated four transistors per line at the address decoder and six at the priority encoder shows savings for memory sizes over 1024 10-bit words. However, using multiple interconnect layers, several different interconnect patterns can be created, all of which share the same priority encoder and address decoder, providing a method of selecting the desired interconnect pattern. . As shown in Fig. 13, a series pass switch for each line composed of P-type and N-type transistors in parallel can be used for this. Thus, M interconnect patterns can be selected and M.2 ^N N-bit word size memories can be created using 2 + 10 / M transistors per word, which is efficient as the number of layers increases. This increases. For example, by generating a power supply of -V _t (for P-type switches) or V _cc + V _t (for N-type switches) to ensure that the switch is conductive for both logic 1 and logic 0 levels. Single-transistor pass-through switches may be used.

요약하면, 단조 함수를 나타내는 저장된 데이터의 테이블이 숫자적으로 인접한 값들의 그룹마다 단지 하나의 기본값과, 그 그룹 내의 다른 값들에 대한 기본값과의 델타를 저장함으로써 압축될 수 있음이 밝혀졌다. 다른 대안으로서, 값들의 그룹이 수정된 버터플라이 연산을 사용하는 왈시 변환을 사용하여 변환될 수 있다. 후자의 제한적인 경우에서는 그 결과 제한된 입력을 갖는 가산기의 어레이로 임의 의 단조 함수를 합성할 수 있는데, 그 이유는 메모리 테이블이 사라지기 때문이다. 델타를 저장하는 것에 대한 대안이 개시되어 있으며, 이는 델타를 가산하는 것의 사전 계산된 결과의 LSB 부분과, MS 부분으로의 캐리가 필요한지를 나타내는 여분의 비트를 저장하는 것을 포함하며, 따라서 대부분의 재구성 가산기가 필요없게 된다. 이 기술은 구간별 단조 함수로 확장되었으며, 여기서 이 함수는 각각의 압축 그룹에 대해 단조적이다. 이 기술은 이어서 주소 라인을 치환함으로써 데이터가 먼저 숫자 순서로 정렬되는 것을 생각함으로써 컴퓨터 프로그램 등의 랜덤한 데이터로 확장되었다.In summary, it has been found that a table of stored data representing a monotonic function can be compressed by storing a delta between only one default value for each group of numerically adjacent values and the default value for other values in the group. As another alternative, the group of values may be transformed using the Walsh transform using a modified butterfly operation. In the latter limited case, the result is that an arbitrary monotonic function can be synthesized with an array of adders with limited input, because the memory table disappears. An alternative to storing deltas is disclosed, which includes storing the LSB portion of the precalculated result of adding the delta and extra bits indicating whether carry to the MS portion is needed, and thus most reconstruction There is no need for an adder. This technique has been extended to the interval-wise monotonic function, where the function is monotonic for each compression group. This technique was then extended to random data, such as computer programs, by substituting address lines to consider that the data was first sorted in numerical order.

델타가 영인 경우, 저장될 필요도 없고 그것을 어드레싱하기 위해 주소선이 생성될 필요도 없으며, 따라서 기본 워드에 대해 필요한 주소의 OR 연산의 간단화가 가능하게 된다. 일반적인 DSP 프로그램의 분석에서 보면 이 기술만을 사용하여 대부분의 압축이 달성될 수 있다. 게다가, 모든 가능한 출력값이, 이들을 저장하지 않고, 주소선당 6개 트랜지스터를 사용하는 우선순위 인코더에 의해 생성될 수 있으며, 이는 가능한 출력값의 수가 64보다 클 때 더욱 효율적이다. 그러면, 정보는 주소 디코더를 우선순위 인코더에 연결시키는 상호연결 패턴에 존재한다. 마지막으로, 메모리가 몇개의 이러한 상호연결 패턴을 구성하기 위해 수직 차원을 이용하고 워드당 사용되는 트랜지스터의 수를 궁극적으로는 워드당 1 또는 2개의 트랜지스터로 감소시키기 위해 통과 스위치를 사용하여 원하는 패턴을 선택함으로써 몇개의 랜덤한 데이터 세트를 저장할 수 있다는 것을 알았다.If the delta is zero, it does not need to be stored and no address line needs to be created to address it, thus simplifying the OR operation of the required address for the base word. In the analysis of a typical DSP program, most compression can be achieved using this technique alone. In addition, all possible output values can be generated by the priority encoder using six transistors per address line, without storing them, which is more efficient when the number of possible output values is greater than 64. The information then resides in an interconnect pattern connecting the address decoder to the priority encoder. Finally, the memory uses the vertical dimension to construct some of these interconnection patterns, and the pass switch is used to select the desired pattern to ultimately reduce the number of transistors used per word to one or two transistors per word. It has been found that by selecting several random data sets can be stored.

물론, 본 발명이 본 발명의 필수적인 특성을 벗어나지 않고 본 명세서에 구 체적으로 기재된 것들과 다른 방식으로 실행될 수 있다. 본 실시예들은 모든 점에서 제한적이 아닌 예시적인 것으로 간주되어야 하며, 첨부된 청구항의 의미 및 균등 범위 내에 속하는 모든 변경이 본 발명의 범위 내에 포함되는 것으로 보아야 한다.Of course, the invention may be practiced otherwise than as specifically described herein without departing from the essential characteristics of the invention. The present embodiments are to be considered in all respects as illustrative and not restrictive, and all changes that come within the meaning and range of equivalency of the appended claims are to be embraced within their scope.

Claims

A method of compressing data for storage in memory,

Forming an ordered set of lookup table values by reordering the lookup table values in a monotonic order,

Pairing adjacent values in the ordered set to form a plurality of non-overlapping pairs of values,

For one or more paired values, generating a difference between the values in the pair of values,

Reducing the number of bits needed to store the lookup table values by replacing one value in at least one pair of values with a corresponding difference to produce a compressed ordered set, and

Storing the compressed order set in the memory

Data compression method comprising a.

2. The method of claim 1, wherein modifying the set further comprises maintaining another one of the values in the paired values.

3. The method of claim 2, further comprising the step of repeating the occurring and the replacing step before the storing step.

4. The method of claim 3, wherein at least one of the pairs is different for different iterations.

2. The method of claim 1, wherein said compressed order set comprises at least one of said lookup table values.

The method of claim 1, wherein the generating step comprises generating a difference and summation of the values in the pair,

The replacing step,

Replacing one of the values in the pair with a value based on the difference, and

Replacing another one of the values in the pair with a value based on the summation

Data compression method comprising a.

7. The method of claim 6, further comprising discarding the least significant portion of the summation.

7. The method of claim 6, further comprising the step of repeating the occurring and the replacing step before the storing step.

9. The method of claim 8, wherein said pairs of values differ for different iterations.

7. The method of claim 6, further comprising generating combined values from a final iteration by concatenating each of the sum and the difference corresponding to the paired values of the last iteration,

And said replacing comprises replacing said values in said paired values of said last iteration with corresponding combined values.

7. The method of claim 6, further comprising repeating the generating and replacing steps until one of the summations includes a sum of all lookup table values rearranged in a monotonic order, and

Generating a combined value by combining summations and differences from corresponding iterations

More,

Said replacing comprises replacing said values in said paired values from a final iteration with said combined value.

7. The method of claim 6, further comprising obtaining one or more synthesized values based on the combination of the sum and the difference generated for one or more paired values,

Replacing one of the values in the paired values with the value based on the difference comprises replacing one of the values in the paired values with a least significant portion of the synthesized values. .

2. The method of claim 1, further comprising reconstructing one or more lookup table values based on the one or more values in the compressed order set stored in memory in accordance with a predetermined reconstruction formula.

2. The method of claim 1, wherein said lookup table values comprise values derived from a monotonic function.

15. The method of claim 14, wherein each of the forming, the generating, the replacing, and the storing are applied to discrete monotonic segments of a piecewise monotonic function. The data compression method further comprising the step of.

2. The method of claim 1 wherein the lookup table values comprise non-monotonic values arranged in a monotonic order.

17. The method of claim 16, further comprising placing the values in the compressed order set in a first layer of the memory, and through one or more crisscrossed address lines disposed in the second layer of the memory. Accessing one or more of the values in the compressed order set.

A method of compressing data for storage in memory,

Forming an ordered set of lookup table values by rearranging the lookup table values in a monotonic order,

Generating, for one or more paired values, a difference and summation of the values within the paired values,

In order to generate a compressed ordered set, storing the lookup table values by reducing the number of bits needed to store the ordered set by replacing values in at least one pair of values with corresponding sums and differences. Reducing the number of bits needed, and

Storing the compressed order set in the memory

Data compression method comprising a.

19. The method of claim 18, further comprising generating combined values by combining each of the summation and the difference for each paired value,

And said replacing comprises replacing each of said sum and said difference with a corresponding combined value.

19. The method of claim 18, further comprising the step of repeating the occurring and the replacing step before the storing step.

21. The method of claim 20, further comprising generating combined values from the last iteration by combining summations and differences corresponding to the pairs of the last iteration,

19. The method of claim 18, further comprising rounding each sum to the nearest unit to form one or more modified sums;

And said replacing comprises replacing values in at least one pair of values with said corresponding difference and said modified sum.

23. The method of claim 22, further comprising repeating the reoccurring, rounding, and replacing steps prior to the storing step, and

From the last iteration, generating combined values by combining summations and differences corresponding to the pairs of the last iteration

More,

19. The method of claim 18, further comprising generating a synthesized value for one or more of the paired values by combining the difference with the summation,

And the replacing step includes replacing values in at least one pair of values with values based on the summation and the least significant portion of the synthesized value.

19. The method of claim 18, wherein the lookup table values comprise values derived from a monotonic function.

27. The method of claim 25, further comprising applying the forming, the generating, and the replacing step to separate forging sections of the forging function for each section.

19. The method of claim 18, wherein the lookup table values comprise non-monotonic values that are arranged in a monotonic order.

29. The method of claim 27, further comprising: placing the values in the compressed order set in a first layer of the memory, and the compressed order set through one or more cross-crossed address lines disposed in the second layer of memory. Accessing one or more of the values in the data compression method.

19. The method of claim 18, further comprising discarding the least significant portion of the summation before the replacing.

A method of compressing data for storage in memory,

Forming an ordered set of lookup table values by reordering the lookup table values in a monotonic order, the order set comprising non-overlapping pairs of values (f0, f1) and (f2, f3),

generating d0 by obtaining a difference between f0 and f1 and generating d2 by obtaining a difference between f2 and f3,

Reducing the number of bits needed to store the lookup table values by replacing f1 with d0 and replacing f3 with d2 to produce a compressed ordered set, and

Storing the compressed ordered set in the memory based on the set modified by the replacement.

Data compression method comprising a.

The method of claim 30, wherein prior to the storing step,

Pairing values in the compressed ordered set to generate paired values (f0, f2) and (d0, d2),

generating d1 by obtaining a difference between f0 and f2 and generating d3 by obtaining a difference between d0 and d2, and

replacing f2 with d1 and d2 with d3

Data compression method further comprising.

31. The method of claim 30, further comprising generating s0 by summing f0 and f1 and generating s2 by summing f2 and f3.

33. The method of claim 32, wherein replacing comprises replacing f0 with s0, replacing f1 with d0, replacing f2 with s2, and replacing f3 with d2. .

34. The method of claim 33, further comprising: generating a first combined value by combining s0 and d0, and

generating a second combined value by combining s2 and d2

More,

The replacing step further comprises replacing s0 and d0 with the first combined value, and replacing s2 and d2 with the second combined value.

34. The method of claim 33, further comprising generating c0 by synthesizing s0 with d0 and generating c2 by synthesizing s2 with d2,

The replacing further includes replacing f0 with s0, replacing f2 with s2, replacing d0 with the lowest portion of c0, and replacing d2 with the lowest portion of c2. Way.

34. The method of claim 33, further comprising: pairing values in the modified set to generate paired values (s0, s2) and (d0, d2),

generating s1 by summing s0 and s2 and generating s3 by summing d0 and d2, and

generating d1 by finding the difference between s0 and s2 and generating d3 by finding the difference between d0 and d2

More,

The replacing step further includes replacing s0 with s1, replacing s2 with d1, replacing d0 with s3, and replacing d2 with d3.

37. The method of claim 36, further comprising generating a combined value by combining s1, d1, s3, and d3,

And said replacing step further comprises replacing s1, d1, s3 and d3 with said combined value.

delete

38. Compressed data representing original data compressed using the method of any one of claims 1-12 or 14-37, and

Reconstruction logic to reconstruct the original data from the compressed data

Memory containing.

51. The memory of claim 50 wherein the memory is read only memory (ROM).

An electronic device comprising a memory,

The memory comprising:

Reconstruction logic to reconstruct the original data from the compressed data

Electronic device comprising a.

53. The electronic device of claim 52, wherein the memory is a read only memory (ROM).

The electronic device of claim 52 wherein the electronic device is an electronic communication device.