KR0149998B1

KR0149998B1 - Real time discrete cosine transform

Info

Publication number: KR0149998B1
Application number: KR1019900021516A
Authority: KR
Inventors: 최도형
Original assignee: 정용문; 삼성전자주식회사
Priority date: 1990-12-22
Filing date: 1990-12-22
Publication date: 1998-10-15
Also published as: KR920013155A

Abstract

실시간으로 이산여현 변환하는 회로에 관한 것으로, 디지털 영상데이터 처리 시스템에서 데이터의 중복성을 줄이기 위하여 Lee의 고속 알고리즘을 실행하는 1차원 FDCT프로로세서를 구현함에 있어 기존이 두메모리를 사용하던 방식을 배제하고 하나의 메모리만 사용하여 임의의 번지로부터 독출한 데이터를 연산한 결과를 다시 그번지에 기록 시킴으로써 같은 시간동안의 데이터 처리에 사용하는 메모리이 수를 반으로 줄여 하드웨어의 수를 줄이고 경비를 절감한다.A circuit for discrete cosine transforming in real time. In order to reduce the redundancy of data in a digital image data processing system, the existing two-memory method is used to implement a one-dimensional FDCT processor that executes Lee's high-speed algorithm. By using only one memory, the result of calculating the data read from an arbitrary address is recorded in the address again, which reduces the number of hardware and the cost by reducing the number of memories used for data processing during the same time in half.

Description

Discrete cosine inverter for real time processing

제1도는 Lee의 알고리즘.Figure 1 shows Lee's algorithm.

제2도는 Chen의 블록도.2 is a block diagram of Chen.

제3도는 종래의 블록도.3 is a conventional block diagram.

제4도는 종래의 동작 타이밍도.4 is a conventional operation timing diagram.

제5도는 본 발명의 블록도.5 is a block diagram of the present invention.

제6도는 본 발명의 동작 타이밍도.6 is an operation timing diagram of the present invention.

본 발명은 디지털 영상 데이터 처리시스템에 있어서 데이터의 중복성을 실시간으로 줄이는 이산여현 변환하는 회로에 관한 것으로서, 특히 Lee의 알고리즘을 실행하는 1차원 FDCT 프로세서를 구현함에 있어 같은 시간동안의 데이터 처리에 사용하는 메모리의 수를 반으로 줄인 이산여현 변환회로에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a discrete cosine transform circuit that reduces data redundancy in real time in a digital image data processing system. In particular, the present invention relates to a one-dimensional FDCT processor that executes Lee's algorithm. Discrete cosine conversion circuit halved the number of memories.

일반적으로 변환을 이용하여 영상데이터를 부호화 하는 경우 데이터 감출 성능면에서 최적인 KLT(Karhunen-Loeve Transform)은 영상의 공분산 함수에 근거한 변환이기 때문에 TV영상과 같이 각 구역마다 공분산 행렬이 다른 경우 매번 공분산 행열을 구해야 하므로 실제로 하드웨어나 소프트웨어의 구현이 거의 불가능하다.Generally, KLT (Karhunen-Loeve Transform), which is optimal in terms of data hiding performance when encoding image data using transform, is a transformation based on the covariance function of the image. Since the matrix must be obtained, it is almost impossible to implement hardware or software.

반면에 이와 가장 유사한 성능을 갖는 이산여현 변환(Discrete Cosine Transform ; 이하 DCT라함)은 소프트웨어나 하드웨어의 구현이 가능하여 변환을 이용한 데이터 감축에 널리 쓰여왔다. DCT의 고속 알고리즘에는 Chen과 Lee의 알고리즘이 있는데 그중 전자인 Chen의 고속 알고리즘은 복잡한 계산량을 많이 줄여 하드웨어 구현에도 적합하였으나, 후자인 Lee의 고속 알리리즘은 상기 Chen의 알고리즘을 보다 덧셈의 수는 증가 되었지만 중요한 승산의 수를 줄임으로써 계산이 간편해져 실시간 처리의 하드웨어 구현이 더욱 용이하게 되었다.On the other hand, Discrete Cosine Transform (DCT), which has the most similar performance, has been widely used for data reduction using transform because it can be implemented in software or hardware. DCT's high-speed algorithms include Chen and Lee's algorithms, of which the former Chen's high-speed algorithms are suitable for hardware implementation by reducing the amount of complex computation, but the latter Lee's high-speed algorithm increases the number of additions to Chen's algorithm. However, by reducing the number of significant multiplications, the calculation is simplified, making hardware implementation of real-time processing easier.

제1도와 제2도는 8개의 입력에 대한 Lee의 알고리즘을 각각 FDCT(Forward DCT)와 IDCT(Inverse DCT)신호 흐름 그래프로 나타낸 것이고 이것을 하드웨어로 구현한 것이 제3도이다.FIG. 1 and FIG. 2 show Lee's algorithm for eight inputs as FDCT (Forward DCT) and IDCT (Inverse DCT) signal flow graphs respectively. FIG.

제3도에서 제1램(10)의 데이터를 읽어내어 덧셈, 뺄샘, 곱셈등을 행한 후 제2램(20)에 기억시키고 다음 단계에서는 제2램(20)에서 데이터를 읽어내어 마찬가지로 연산한 결과를 제1램(10)에 기억시키는 동작을 여러번 반복실시 함으로써 원하는 영상데이터의 감축이 행해진다.In FIG. 3, the data of the first RAM 10 is read, added, subtracted, and multiplied, and stored in the second RAM 20. In the next step, the data is read from the second RAM 20 and calculated similarly. By repeatedly performing the operation of storing the result in the first RAM 10 several times, the desired video data is reduced.

즉, 제3도에서 제1램(10)의 A포트 어드레스가 0이고 B포트어드레스가 4일시 제4도에 나타낸 타이밍 관계를 살펴볼 때 상기 제1램(10)으로 공급되는 기록/독출(Write/Read) 제어신호(W/R1)는 (4b)에 도시된 바와 같이 하이 상태이고 독출 동작을 수행하게 된다. 상기 A, B포트를 통해 독출된 0번지와 4번지에 저장 되었던 데이터는 가산부(30)에서 합산된 다름 제1래치부(50)에 의해 (4C)에 도시된 바와 같이 제1클럭(CL1) 상기 제1램(10)이 입력 포트로 전달된다.That is, in FIG. 3, when the A port address of the first RAM 10 is 0 and the B port address is 4, the write / read supplied to the first RAM 10 is examined. / Read) control signal W / R1 is in a high state as shown in (4b) and performs a read operation. The data stored in the 0 and 4 addresses read through the A and B ports are added to the first clock CL1 as shown in 4C by the first latch unit 50 added by the adder 30. The first ram 10 is transferred to the input port.

이때 제1아웃인에이블신호(OE1)는 (4d)에 도시된 바와 같이 로우상태이다.At this time, the first out enable signal OE1 is low as shown in (4d).

또한 이 순간 제2램(20)의 B포토 어드레스는 0이며 제2 기록/독출 제어신호(W/R2)가 (4i)에 도시된 바와 같이 로우 상태에서 하이 상태로 변하는 순간 상기 가산부(30)에서 합산된 결과를 상기 제1래치부(50)를 통해 지정된 0번지에 기록한다.Also, at this moment, the B port address of the second RAM 20 is 0 and the addition unit 30 at the moment when the second write / read control signal W / R2 changes from a low state to a high state as shown in (4i). In step 1), the result of the sum is recorded at the address 0 designated through the first latch unit 50.

다시말하면 (4a)에 도시된 DCT클럭이 2개 발생되는 구간동안 래치동작이 이루어진다.In other words, the latch operation is performed during the period in which two DCT clocks shown in (4a) are generated.

한편 상기 가산부(30)에서의 합산 결과가 제1클럭(CL1)에 의해 제1래치부(50)에 래치되는 순간 감산부(40)에서는 상기 제1램(10)의 0와 4번지로부터 독출한 데이터를 감산하고, 그 감산 결과는 멀티플랙서(70)에서 선택신호(G1)의 상태에서 따라 선택되어 (4e)에 도시된 바와같은 신호(X1, Y1)에 의해 승산부(90)로 입력되고 연속적으로 (4f)에 도시된 바와 같은 신호(M)에 의해 롬(80) 데이터와 승산된다.On the other hand, when the summation result in the adder 30 is latched to the first latch unit 50 by the first clock CL1, the subtractor 40 starts from addresses 0 and 4 of the first ram 10. The readout data is subtracted, and the result of the subtraction is selected according to the state of the selection signal G1 in the multiplexer 70 and multiplied by the signals X1 and Y1 as shown in (4e). Is multiplied by the ROM 80 data by a signal M as shown in (4f).

상기 승산 결과는 제2클럭신호(CL2)가 (4g)에 도시된 바와 같이 로우 상태에서 하이 상태로 변하는 순간 제2래치부(100)를 통해 상기 제2램(20)의 입력포트에 도달하게 되며 이때 상기 제2램(20)의 입력 포트 어드레스가 4이므로 4번지에 기록된다.The result of the multiplication causes the second clock signal CL2 to reach the input port of the second RAM 20 through the second latch unit 100 at the moment when the second clock signal CL2 changes from a low state to a high state as shown in (4g). In this case, since the input port address of the second RAM 20 is 4, it is recorded at address 4.

이와같은 방법으로 계속되는 다음 데이터들도 순차적으로 처리된 후 필요한 번지내에 기록된다.The next data which continue in this way are also processed sequentially and recorded in the required address.

그러나 상기와 같은 방법을 Lee의 고속 알고리즘에 적용할 경우에는 상기 제1램(10)과 제2램(20)은 같은 동작을 수행한다.However, when the method described above is applied to Lee's high speed algorithm, the first ram 10 and the second ram 20 perform the same operation.

그러므로 불필요한 하드웨어의 증가를 초래하는 단점이 있었으며 커스텀 아이씨(Custom IC)화 할 때에도 외부 메모리의 증가로 인해 효율을 감소시키는 문제점이 있었다.Therefore, there was a disadvantage of causing unnecessary hardware increase, and there was a problem of decreasing efficiency due to the increase of external memory even when customizing IC.

따라서 본 발명의 목적은 Lee의 알고리즘을 실행하는 1차원 FDCT프로세서를 구현함에 있어 같은 시간동안의 데이터 처리에 사용하는 메모리의 수를 반으로 줄인 이산여현 변환회로를 제공함에 있다.Accordingly, an object of the present invention is to provide a discrete cosine conversion circuit in which the number of memories used for data processing for the same time is reduced by half in implementing a 1D FDCT processor that executes Lee's algorithm.

이하 본 발명을 첨부한 도면을 참조하여 설명한다.Hereinafter, the present invention will be described with reference to the accompanying drawings.

제5도는 본 발명에 따른 1차원 FDCT의 구성도로서, 두 개가 데이터를 동시에 읽으 수 있는 독립적인 출력 포트가 두 개이며 데이터를 일시보관하는 버퍼로써 각 스테이지(Stage)간의 인터페이싱을 담당하는 램(10)과, 상기 램(10)으로부터 독출된 데이터를 가산하는 가산부(30)와, 상기 램(10)으로부터 독출된 데이터를 감산하는 감산부(40)와, 상기 가산부(30)출력을 래치하여 상기 제1램(10)으로 공급하는 제1래치부(50)와, 상기 감산부(40)의 감산결과를 선택적으로 출력하는 제1멀티플랙서(70)와, 소정이 계수를 발생하는 롬(80)과, 상기 제1멀티플랙서(70)와 롬(80)의 출력을 승산하는 승산부(90)와, 상기 승산 결과를 래치하여 상기 램(10)으로 전달하는 제2래치(100)와, 상기 램(10)의 두 포트로부터 출력되는 데이터를 선택적으로 출력하는 제2멀티플랙서(60)로 구성된다.5 is a schematic diagram of a one-dimensional FDCT according to the present invention, which has two independent output ports for reading data at the same time and a RAM for interfacing between stages as a buffer for temporarily storing data ( 10), an adder 30 for adding data read from the RAM 10, a subtractor 40 for subtracting the data read from the RAM 10, and an output of the adder 30. A first latch unit 50 latched and supplied to the first ram 10, a first multiplexer 70 selectively outputting a result of the subtraction of the subtractor 40, and a predetermined coefficient A multiplier 90 multiplying the output of the first multiplexer 70 and the ROM 80, and a second latch configured to latch and transmit the multiplication result to the RAM 10. 100 and a second multiplexer 60 for selectively outputting data output from the two ports of the RAM 10.

제6도는 본 발명에 따른 FDCT프로세서의 타이밍도로서, (6a)는 DCT클럭 신호 파형이고, (6b)는 램의 기록/독출 제어신호 (W/R1) 파형이며, (6C)는 제1클럭신호(CL1) 파형이고, (6d)는 제1아웃인에이블 신호(OE1) 파형이며, (6e)는 멀티플라이어 입력 래치신호(X1, Y1) 파형이고, (6f)는 멀티플라이어의 승산신호(M) 파형이며, (6g)는 제2클럭신호(CL2) 파형이고, (6h)는 제2아웃인에이블신호(OE2) 파형이다.6 is a timing diagram of an FDCT processor according to the present invention, where 6a is a DCT clock signal waveform, 6b is a RAM write / read control signal (W / R1) waveform, and 6C is a first clock. A signal CL1 waveform, 6d is a first out enable signal OE1 waveform, 6e is a multiplier input latch signal X1, Y1 waveform, and 6f is a multiplier multiplication signal (1). M) is a waveform, 6g is a second clock signal CL2 waveform, and 6h is a second out enable signal OE2 waveform.

상술한 구성에 의거 본 발명에 상세히 설명한다. 램(10)의 A포트와 B포트 어드레스가 각각 0와 4로 지정되고 그 번지의 내용들이 독출되어 가산부(30)와 감산부(40)로 입력된다. 처리속도가 충분하므로 상기 가산부(30)에서 가산된 결과를 (6a)에 도시된 바와 같은 DCT클럭이 1개 발생한 후 제1클럭신호(CL1)의 상승 에지에서 래치한다. 같은 시점에서 상기 감산부(40)에서의 감산결과도 신호(X, Y1)에 의해 승산부(90)에 래치한다.The present invention will be described in detail based on the above configuration. The A and B port addresses of the RAM 10 are designated as 0 and 4, respectively, and the contents of the address are read and input to the adder 30 and the subtractor 40. FIG. Since the processing speed is sufficient, the result added by the adder 30 is latched at the rising edge of the first clock signal CL1 after one DCT clock as shown in (6a) has occurred. At the same time, the result of the subtraction in the subtraction section 40 is also latched in the multiplication section 90 by the signals X and Y1.

다음 순간 상기 램(10)의 B포트 어드레스를 0으로 지정하고 상기 램(10)의 기록/독출 제어신호(W/R1)가 (6b)에 도시된 바와같이 로우상태에서 하이 상태로 바뀐다. 이때 전술한 바와 같이 제1클럭신호(CL1)에 의해 래치된 데이터가 상기 램(10)의 0번지에 기록된다.In the next moment, the B port address of the RAM 10 is designated as 0, and the write / read control signal W / R1 of the RAM 10 changes from a low state to a high state as shown in 6b. At this time, as described above, the data latched by the first clock signal CL1 is written to the address 0 of the RAM 10.

한편 감산결과의 롬(80)으로부터 출력되는 계수가 승산부(90)에서 승산된 결과는 (6g)에 도시된 바와 같은 제2클럭(CL2)에 의해 상기 램(10)이 입력 포트로 전달된다. 이순간 상기 램(10)의 B포트 어드레스는 4번지로 지정되고, 기록/독출 제어신호(W/R1)가 로우상태에서 하이 상태로 변하는 순간 상기 승산된 데이터가 상기 지정된 4번지에 기록된다. 다음에 연속되는 데이터들도 상기한 방법을 반복 수행하여 처리한다.On the other hand, as a result of multiplying the coefficient output from the ROM 80 of the subtraction result by the multiplier 90, the RAM 10 is transmitted to the input port by the second clock CL2 as shown in (6g). . At this moment, the B port address of the RAM 10 is designated as 4, and the multiplied data is written to the designated 4 as soon as the write / read control signal W / R1 changes from a low state to a high state. Next, subsequent data are processed by repeating the above method.

상술한 바와 같이 사용되는 램의 수를 반으로 줄이므로서 하드웨어의 감소효과 및 경비절감 효과를 얻을 수 있으므로 커스텀 아이씨화 측면에서 볼 때 효율성이 증대되는 이점이 있다.As described above, since the number of RAMs used is reduced in half, a reduction effect of hardware and a cost reduction effect can be obtained, and thus efficiency is increased in terms of custom ICization.

Claims

In the digital image processing system, there are two independent output ports capable of reading two data at the same time, and a RAM 10 for interfacing between stages as a buffer for temporarily storing data, and the RAM ( An adder 30 for adding data read out from the 10; a subtractor 40 for subtracting data read from the RAM 10; and an output of the adder 30; 10) a first latch unit 50 to be supplied to the first, a first multiplexer 70 selectively outputting the result of subtraction of the subtractor 40, a ROM 80 for generating a predetermined coefficient, A multiplier 90 multiplying the output of the first multiplexer 70 and the ROM 80, a second latch 100 which latches the multiplication result and transfers the result to the RAM 10, and the RAM And a second multiplexer 60 for selectively outputting data output from the two ports of (10). Real-time Discrete Cosine Converter.