KR19980018215A

KR19980018215A - Video data processing method and device

Info

Publication number: KR19980018215A
Application number: KR1019970034995A
Authority: KR
Inventors: 리더 클리프; 손 재철; 쿼레시 암자드; 누옌 르
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1996-08-19
Filing date: 1997-07-25
Publication date: 1998-06-05
Also published as: CN1189058A; DE19735880A1; TW436710B; JP4290775B2; JPH1093961A; CN1523895A; CN1145362C; KR100262453B1

Abstract

컴퓨터 시스템은 동시에 동작할 수 있는 3개의 처리기, 즉 스칼라 처리기, 벡터 처리기 및 비트스트림 처리기를 포함한다. 비디오 데이터를 엔코딩 또는 디코딩함에 있어서, 벡터 처리기는 단일 명령 다중 데이터 처리기에 의해 효율적으로 수행될 수 있는 동작, 예를 들어 이산여현변환(DCT)과 움직임 보상을 수행한다.The computer system includes three processors capable of operating simultaneously, namely a scalar processor, a vector processor and a bitstream processor. In encoding or decoding video data, the vector processor performs operations that can be efficiently performed by a single instruction multiple data processor, for example discrete cosine transform (DCT) and motion compensation.

비트스트림 처리기는 허프만 및 RLC 엔코딩 또는 디코딩을 수행한다. 비트스트림 처리기는 컴퓨터 시스템이 여러개의 데이터 스트림을 동시에 처리하도록 문맥들을 절환할 수 있다. 스칼라 처리기 및 벡터 처리기는 단일 산술 또는 불 명령을 실행하도록 프로그램될 수 있다. 비트스트림 처리기는 단일 산술 또는 불 명령을 실행하도록 프로그램될 수는 없으나, 전체적인 비디오 데이터 처리 동작을 수행하기 위해서 프로그램될 수 있다.The bitstream processor performs Huffman and RLC encoding or decoding. The bitstream processor can switch contexts so that the computer system can process multiple data streams simultaneously. Scalar processors and vector processors can be programmed to execute a single arithmetic or boolean instruction. The bitstream processor may not be programmed to execute a single arithmetic or bool instruction, but may be programmed to perform an overall video data processing operation.

Description

Video data processing method and device

본 발명은 컴퓨터에 의한 데이터 처리에 관한 것으로서, 특히 컴퓨터에 의한 비디오 데이터 처리에 관한 것이다.The present invention relates to data processing by a computer, and more particularly, to video data processing by a computer.

컴퓨터는 시스템 데이터를 압축하거나 복원하기 위하여 사용되어 왔다. 시스템 데이타에는 정지 및/또는 동화상의 이미지를 포함하는 비디오 데이터가 포함된다. 또한, 시스템 데이터에는 오디오 데이터, 예를 들어 동화상의 사운드 트랙이 포함될 수 있다. 비디오 데이터를 고속 처리할 수 있는 방법 및 회로를 제공하는 것이 바람직하다.Computers have been used to compress or restore system data. System data includes video data including images of still and / or moving images. In addition, the system data may include audio data, for example, a sound track of a moving picture. It is desirable to provide a method and circuit that can process video data at high speed.

따라서 본 발명의 목적은 비디오 데이터를 고속 처리할 수 있는 방법 및 회로를 제공하는 데 있다. 몇가지 실시예에 있어서, 본 발명에 의한 컴퓨터 시스템은 동시에 동작할 수 있는 3개의 처리기, 즉 스칼라 처리기, 벡터 처리기 및 비트 스트림 처리기를 포함한다. 비디오 데이터를 엔코딩 또는 디코딩함에 있어서, 벡터 처리기는 단일 명령 다중 데이터(Single Iinstruction Multiple Data:SIMD) 처리기에 의해 효율적으로 수행되는 동작을 수행한다. 이와 같은 동작으로는, 1) 이산여현변환(Discrete Cosine Transform:DCT)와 같은 선형 데이터 변환, 2) 움직임 보상이 있다. 비트스트림 처리기는 워드 또는 반워드(half-words) 보다 특정 비트상에서의 동작을 포함하는 동작들을 수행한다. 이와 같은 동작으로는 예를 들어 MPEG-1, MPEG-2, H.261 및 H.263에 사용되는 허프만(huffman) 및 RLC 엔코딩과 디코딩이 있다. 스칼라 처리기는 하이 레벨 비디오 처리(예를 들어, 픽쳐 레벨 처리)를 수행하고, 벡터 및 비트스트림 처리기의 동작을 동기화시키고, 외부 장치와의 인터페이스를 제어한다.Accordingly, an object of the present invention is to provide a method and a circuit capable of processing video data at high speed. In some embodiments, the computer system according to the present invention includes three processors capable of operating simultaneously, namely a scalar processor, a vector processor and a bit stream processor. In encoding or decoding video data, the vector processor performs an operation that is efficiently performed by a single instruction multiple data (SIMD) processor. Such operations include 1) linear data transformations such as Discrete Cosine Transform (DCT), and 2) motion compensation. The bitstream processor performs operations including operations on specific bits rather than words or half-words. Such operations include the Huffman and RLC encoding and decoding used for MPEG-1, MPEG-2, H.261 and H.263, for example. The scalar processor performs high level video processing (eg, picture level processing), synchronizes the operations of the vector and bitstream processors, and controls the interface with external devices.

몇가지 실시예에 있어서, 컴퓨터 시스템은 여러개의 데이터 스트림을 동시에 처리할 수 있다. 그 결과, 컴퓨터 시스템의 사용자는 2개 이상의 모임과 영상 회의를 할 수도 있다. 비트스트림 처리기에서는 여러 가지 비트스트림이 실시간적으로 동시에 엔코딩 또는 디코딩되도록 문맥들(contexts)을 절환할 수 있기 때문에 다중 데이터 스트림을 동시에 처리할 수 있다.In some embodiments, the computer system may process multiple data streams simultaneously. As a result, a user of a computer system may be able to video conference with two or more meetings. The bitstream processor can process multiple data streams simultaneously because the contexts can be switched so that different bitstreams can be encoded or decoded simultaneously in real time.

몇가지 실시예에 있어서, 스칼라 및 벡터 처리기는 각각 처리기가 단일 산술 명령 또는 불(boolean) 명령을 수행하도록 프로그램될 수 있다는 점에서 볼 때 프로그램가능하다. 비트스트림 처리기는 단일 산술 명령 또는 불(boolean) 명령을 수행하도록 프로그램될 수 없다는 점에서 볼 때 프로그램가능하지 않다. 오히려, 비트스트림 처리기는 한 세트의 비디오 데이터에 대하여 전체적인 비디오 데이터 처리동작을 수행하도록 프로그램될 수 있다. 비트스트림 처리기가 단일 산술 명령 또는 불 명령을 수행하기 위해 프로그램되지 않도록 함으로써, 비트스트림 처리기가 고속으로 동작할 수 있다. 스칼라 및 벡터 처리기가 프로그램 가능하도록 함으로써, 비디오 데이터 엔코딩 및 디코딩 표준에서 변형된 시스템을 채택하는 것이 용이하다.In some embodiments, scalar and vector processors are each programmable in the sense that the processor can be programmed to perform a single arithmetic or boolean instruction. Bitstream processors are not programmable in the sense that they cannot be programmed to perform a single arithmetic or boolean instruction. Rather, the bitstream processor may be programmed to perform an overall video data processing operation on a set of video data. By not allowing the bitstream processor to be programmed to perform a single arithmetic or Boolean instruction, the bitstream processor can operate at high speed. By making the scalar and vector processor programmable, it is easy to adopt a system modified from the video data encoding and decoding standard.

도 1은 본 발명에 따른 미디어 카드의 블록도.1 is a block diagram of a media card according to the present invention.

도 2는 본 발명에 따른 멀티미디어 처리기의 블록도.2 is a block diagram of a multimedia processor in accordance with the present invention.

도 3은 도 2에 도시된 처리기의 일부인 비트스트림 처리기의 블록도.3 is a block diagram of a bitstream processor that is part of the processor shown in FIG.

도 4 내지 도 6은 본 발명에 따른 컴퓨터 시스템의 블록도.4-6 are block diagrams of computer systems in accordance with the present invention.

도 7은 도 2에 도식된 처리기의 펌웨어 구조를 나타내는 도면.7 is a diagram showing the firmware structure of the processor illustrated in FIG.

도 8 및 도 9는 도 1의 시스템을 위한 어드레스 맵을 보여주는 도면.8 and 9 show address maps for the system of FIG.

도 10은 도 2에 도시된 처리기의 DSP 코아를 나타내는 블록도.FIG. 10 is a block diagram illustrating a DSP core of the processor shown in FIG. 2. FIG.

도 11은 도 2에 도시된 처리기의 일부인 벡터 처리기에 적용된 파이프라인을 나타내는 도면.FIG. 11 illustrates a pipeline applied to a vector processor that is part of the processor shown in FIG.

도 12는 도 11의 벡터 처리기의 기능적인 블록도.12 is a functional block diagram of the vector processor of FIG.

도 13은 도 11의 벡터 처리기에 있어서 실행 데이터 경로를 나타내는 도면.FIG. 13 is a diagram showing an execution data path in the vector processor of FIG. 11; FIG.

도 14는 도 11의 벡터 처리기에 있어서 로드 및 저장 데이터 경로를 나타내는 도면.14 illustrates load and store data paths in the vector processor of FIG.

도 15는 도 2의 처리기의 캐쉬 시스템의 블록도.15 is a block diagram of a cache system of the processor of FIG.

도 16은 도 15의 캐쉬 시스템에 있어서의 명령 데이터 캐쉬를 나타내는 도면.FIG. 16 is a diagram illustrating an instruction data cache in the cache system of FIG. 15. FIG.

도 17은 도 2의 처리기에 있어서 캐쉬 제어 유니트의 데이터 경로 파이프 라인을 나타내는 도면.FIG. 17 illustrates the data path pipeline of the cache control unit in the processor of FIG. 2; FIG.

도 18은 도 2에 도시된 시스템에 있어서 캐쉬 제어 유니트의 어드레스 처리 파이프라인을 위한 데이터 경로를 나타내는 도면.FIG. 18 illustrates a data path for an address processing pipeline of a cache control unit in the system shown in FIG. 2; FIG.

도 19 내지 도 22는 도 2의 처리기에 있어서 스테이트 머쉰을 나타내는 도면.19 to 22 show state machines in the processor of FIG.

도 23은 도 15의 캐쉬 시스템에서 사용된 어드레스 포맷을 나타내는 도면.FIG. 23 illustrates an address format used in the cache system of FIG. 15. FIG.

도 24는 도 2의 처리기에 있어서 버스를 나타내는 도면.FIG. 24 shows a bus in the processor of FIG. 2; FIG.

도 25는 도 2의 처리기에 있어서 중재 제어 유니트를 나타내는 도면.25 illustrates an arbitration control unit in the processor of FIG. 2;

도 26 내지 도 29는 도 2의 처리기에 대한 타임이도.26-29 are time diagrams for the processor of FIG.

도 30 내지 도 32는 도 2의 처리기에 있어서 메모리 리퀘스트 신호를 나타내는 도면.30 to 32 illustrate memory request signals in the processor of FIG.

도 33은 도 2의 처리기에 있어서 버스 중재 제어 유니트를 나타내는 도면.FIG. 33 shows a bus arbitration control unit in the processor of FIG. 2; FIG.

도 34 내지 도 36은 도 2의 처리기에 대한 타이밍도.34-36 are timing diagrams for the processor of FIG.

도 37 및 도 38은 도 2의 처리기에 있어서 버스 인터페이스 회로를 나타내는 도면.37 and 38 show bus interface circuits in the processor of FIG.

도 39 및 도 40은 도 1의 시스템에 대한 가상 프레임 버퍼(VFB)를 나타내는 도면.39 and 40 illustrate a virtual frame buffer (VFB) for the system of FIG.

도 41은 도 1의 시스템에 대한 버스 인터페이스 회로를 나타내는 도면.FIG. 41 illustrates a bus interface circuit for the system of FIG. 1. FIG.

도 42 및 도 43은 도 2의 시스템에 대한 메모리 콘트로러를 나타내는 도면.42 and 43 illustrate a memory controller for the system of FIG.

도 44는 도 2의 시스템에 대한 어드레스 콘트롤러를 나타내는 도면.FIG. 44 illustrates an address controller for the system of FIG.

도 45 및 도 46은 도 1의 시스템에 사용되는 포맷들을 나타내는 도면.45 and 46 illustrate formats used in the system of FIG.

도 47은 도 1의 시스템에 있어서 스테이트 머쉰을 나타내는 도면.FIG. 47 illustrates a state machine in the system of FIG. 1; FIG.

도 48은 도 1의 시스템에 대한 데이터 콘트롤러의 블록도.48 is a block diagram of a data controller for the system of FIG.

도 49 내지 도51은 도 1의 시스템에 대한 타이밍도.49-51 are timing diagrams for the system of FIG.

도 52 및 도 53은 도 2의 처리기에 있어서 장치 인터페이스 회로를 나타내는 도면.52 and 53 illustrate device interface circuits in the processor of FIG. 2;

도 54 내지 도 56은 도 1의 시스템의 각 부에 대한 블록도.54-56 are block diagrams of parts of the system of FIG. 1;

도 57 내지 도 59는 도 1의 시스템에 있어서 레지스터들을 나타내는 도면.57-59 illustrate registers in the system of FIG.

도 60은 도 1의 시스템에 있어서 프레임 버퍼 및 비디오 윈도우를 나타내는 도면.60 illustrates a frame buffer and a video window in the system of FIG.

도 61은 도 1의 시스템에 대한 타이밍도.FIG. 61 is a timing diagram for the system of FIG. 1. FIG.

도 62는 도 1의 시스템에 있어서 레지스터를 나타내는 도면.FIG. 62 illustrates a register in the system of FIG. 1; FIG.

도 63은 도 1의 시스템에 대한 타이밍도.63 is a timing diagram for the system of FIG.

도 64 내지 도 66은 도 1의 시스템에서 사용되는 버퍼들을 나타내는 도면.64-66 illustrate buffers used in the system of FIG.

도 1은 멀티미디어 처리기(110)를 포함하는 미디어 카드(100)를 나타낸 것이다. 실시예에 있어서, 멀티미디어 처리기(110)는 그 사양이 캘리포니아 산호세에 있는 삼성 반도체 주식회사에서 만들어지는 타입 MSP-1EX(상표) 처리기이다. 처리기 MSP-1EX는 아래에 있는 블록 A에 기술되어 있다.1 illustrates a media card 100 that includes a multimedia processor 110. In an embodiment, the multimedia processor 110 is a type MSP-1EX ™ processor whose specifications are made by Samsung Semiconductor, Inc., San Jose, CA. Processor MSP-1EX is described in block A below.

처리기(110)는 로컬 버스(105)를 통해 호스트 컴퓨터 시스템(도시안됨)과 통신한다. 몇가지 실시예에 있어서, 버스(105)는 32비트, 33MHz PCI 버스이다.Processor 110 communicates with a host computer system (not shown) via local bus 105. In some embodiments, bus 105 is a 32-bit, 33 MHz PCI bus.

처리기(110)로부터 출력되는 디지탈 비디오 데이터는 D/A(디지탈/아날로그) 변환기(112)에 결합된다. 비디오 부분 뿐만 아니라, 디지탈 비이오 데이터는 오디오 부분, 예를 들어 영화의 사운드 트랙을 포함할 수 있다. 변환기(112)의 출력은 아날로그 데이터를 처리하는 TV 세트(도시안됨) 또는 다른 시스템에 결합될 수 있다.Digital video data output from the processor 110 is coupled to the D / A (digital / analog) converter 112. In addition to the video portion, the digital video data may comprise an audio portion, for example a sound track of a movie. The output of converter 112 may be coupled to a TV set (not shown) or other system that processes analog data.

몇가지 실시예에 있어서, 처리기(110)는 A/D(아날로그/디지탈) 변환기(도 4 내지 6 참조)로 ㅂ터 출력되는 디지탈 비디오 데이터를 수신하기 위한 입력 포트를 포함한다.In some embodiments, processor 110 includes an input port for receiving digital video data output to an A / D (analog / digital) converter (see FIGS. 4-6).

처리기(110)는 코덱(114)에 연결된다. 코덱(114)은 테이프 레코더(도시안됨) 또는 다른 장치로 부터 아날로그 오디오 데이터를 수신한다. 코덱(114)은 전화선(도시안됨)으로 부터 아날로그 전화 데이터를 수신한다. 코덱(114)은 아날로그 데이터를 디지탈화한 후, 이를 처리기(10)로 전송한다. 코덱(114)은 처리기(110)로 부터 디지탈 데이터를 수신하여, 이들 데이터를 아날로그 형태로 변환하고, 필요에 따라 이 아날로그 데이터를 전송한다.Processor 110 is coupled to codec 114. Codec 114 receives analog audio data from a tape recorder (not shown) or other device. Codec 114 receives analog telephone data from a telephone line (not shown). The codec 114 digitizes the analog data and then transmits it to the processor 10. The codec 114 receives digital data from the processor 110, converts these data into an analog form, and transmits the analog data as necessary.

처리기(110)는 버스(122)에 의해 메모리(120)에 연결된다. 도 1에 있어서, 메모리(120)는 SDRAM(synchronous DRAM)이고, 버스(122)는 64비트, 89MHz 버스이다. 다른 실시예에서는 다른 메모리, 버스 폭, 및 버스 속도가 사용된다. 비동기 메모리 및 버스들이 몇가지 실시예에 사용된다.Processor 110 is connected to memory 120 by bus 122. In FIG. 1, memory 120 is an synchronous DRAM (SDRAM), and bus 122 is a 64-bit, 89 MHz bus. In other embodiments, different memories, bus widths, and bus speeds are used. Asynchronous memory and buses are used in some embodiments.

카드(100)의 몇가지 실시예는 르 누옌을 출원인으로 하여 본출원과 동일자로 출원된 Multiprocessor Operation in a Multimedia Signal Processor라는 발명의 명칭을 갖는 미합중국 특허출원 명세서(변리사 참조번호:M-4364 US)를 기재되어 있으며, 상기 미합중국 특허출원 명세서의 전체적인 내용은 본 발명에서 참조로 인용된다.Some embodiments of the card 100 are described in the United States Patent Application Specification (Mat. The entire contents of these US patent applications are incorporated herein by reference.

도 2는 처리기(110)의 일실시예에 따른 블록도이다. 처리기(110)은 스칼라 처리기(210), 벡터 처리기(VP;220) 및 비트스트림 처리기(BP; 245)를 포함한다. 몇가지 실시예에 있어서, 처리기(210)는 40MHz로 동작하며, 공지된 표준 ARM7 명령어 세트를 지원하는 32비트 RISC 처리기이다. 벡터 처리기(220)는 80MHz로 동작하며, 288 비트 벡터 레지스터들을 구비한 단일 명령 다중 데이터(SIMD) 처리기이다. VP(220)의 일실시예는 송 등을 출원인으로 하여 본출원과 동일자로 출원된 Efficient Context Saving and Restoring in a Multitasking Computing System Environment라는 발명의 명칭을 갖는 미합중국 특허출원 명세서(변리사 참조 번호:M-4365 US)에 기재되어 있으며, 상기 미합중국 특허출원 명세서의 전체적인 내용은 본 발명에서 참조로 인용된다. 처리기(210,220)는 단일 산술 명령 또는 불 명령 또는 이들 명령의 시퀀스를 수행하도록 프로그램될 수 있다.2 is a block diagram of an embodiment of a processor 110. The processor 110 includes a scalar processor 210, a vector processor (VP) 220, and a bitstream processor (BP) 245. In some embodiments, processor 210 is a 32-bit RISC processor that operates at 40 MHz and supports the known standard ARM7 instruction set. Vector processor 220 operates at 80 MHz and is a single instruction multiple data (SIMD) processor with 288 bit vector registers. One embodiment of VP 220 is a United States patent application specification (patent reference: M-) filed with the same applicant as Song, et al. 4365 US), the entire contents of which are incorporated herein by reference. Processors 210 and 220 may be programmed to perform a single arithmetic instruction or bool instruction or a sequence of these instructions.

몇가지 실시예에 있어서, 비디오 데이터를 고속으로 행하기 위해서 비트스트림 처리기(245)는 단일 산출 명령 또는 불 명령을 수행하기 위해 프록램되지 않도록 설계된다. 특히, BP(245)는, ADD, OR, ADD AND ACCUMULATE등과 같은 단일 명령을 수행하도록 프로그램될 수 없다. 오히려, BP(245)는 부록 A의 10장에 기술되어 있는 비디오 데이터 처리 동작을 수행하도록 프로그램된다. 이와 동시에, 스칼라 처리기(210)와 벡터 처리기(20)는 단일 산술 또는 불 명령을 수행하도록 프로그램될 수 있다. 그러므로, 처리기(110)는 비디오 표준에서 변형을 도모할 수 있다.In some embodiments, to perform video data at high speed, the bitstream processor 245 is designed not to be programmed to perform a single compute command or a Boolean command. In particular, the BP 245 cannot be programmed to perform a single command, such as ADD, OR, ADD AND ACCUMULATE. Rather, the BP 245 is programmed to perform the video data processing operations described in Chapter 10 of Appendix A. At the same time, scalar processor 210 and vector processor 20 may be programmed to perform a single arithmetic or boolean instruction. Therefore, processor 110 can make modifications in the video standard.

도 2에 도시된 바와 같이, 스칼라 처리기(210)과 벡터 처리기(220)는 캐쉬 서브시스템(230)에 연결된다. 캐쉬 서브시스템(230)은 버스(IOBUS;240)와 버스(FBUS;250)에 연결된다. 몇가지 실시예에 있어서, IOBUS(240)는 32비트, 40MHz 버스이고, FBUS(250)는 64 비트, 80MHz 버스이다.As shown in FIG. 2, scalar processor 210 and vector processor 220 are coupled to cache subsystem 230. Cache subsystem 230 is coupled to bus (IOBUS) 240 and bus (FBUS) 250. In some embodiments, IOBUS 240 is a 32-bit, 40 MHz bus and FBUS 250 is a 64-bit, 80 MHz bus.

IOBUS(240)는 비트스트림 처리기(245), 인터럽트 콘트롤러(248), 전 2중 통신(full-duplex) UART 유니트(243)과 4개의 타이머(242)에 연결된다. FBUS(250)는 메모리 버스(122; 도 1 참조)에 연결된 메모리 콘트롤러(258)에 연결된다. FBUS(250)는 PCI 버스(105)에 연결된 PCI 버스 인터페이스 회로(255)에 연결된다. 또한, FBUS(250)는 비디오 D/A(112;도 1 참조), 코덱(114)과 경우에 따라 비디오 A/D 변환기(도 4 내지 도 6에 도시된 것과 같음)를 인터페이스하는 회로를 포함하는 장치 인터페이스 회로(252;Customer ASIC으로도 불리워짐)에 연결된다. 또한, 처리기(110)는 메모리 데이터 이동기(290)를 포함한다.IOBUS 240 is coupled to bitstream processor 245, interrupt controller 248, full-duplex UART unit 243, and four timers 242. FBUS 250 is coupled to a memory controller 258 coupled to the memory bus 122 (see FIG. 1). FBUS 250 is coupled to PCI bus interface circuit 255 coupled to PCI bus 105. The FBUS 250 also includes circuitry to interface the video D / A 112 (see FIG. 1), the codec 114, and optionally the video A / D converter (such as shown in FIGS. 4-6). Device interface circuitry (also called Customer ASIC). The processor 110 also includes a memory data mover 290.

처리기(110)는 여러개의 데이터 스트림을 동시에 처리할 수 있다. 예를 들어, 처리기(110)의 사용자가 2개 이상의 모임과 영상 회의를 하는 경우, 처리기(110)는 사용자가 여러 개의 모임에 대해 보고 들을 수 있도록 비디오 및 오디오 처리를 수행한다. 다중 비디오 데이터 스트림을 처리하기 위해서 처리기(110)는 문맥 절환을 지원한다. 이는 BP(245)가 다중 데이터 스트림들 사이를 절환하는 것을 의미한다. 영상 회의에 있어서, 각 데이터 스트림은 멀리 떨어져 있는 별개의 모임으로 부터 올 수 있다. 대안으로, 사용자가 영상 회의에 첨가하여 동시에 영상 회의 또는 영화 상영을 시청할 수 있도록 하기 위하여 부가적인 데이터 스트림이 영화 채널로 부터 올 수 있다. 문백 절환은 부록 A의 10.12절에 기술되어 있다. 문맥이 절환되면, 스칼라 처리기(210)는 현재 문맥들을 저장하고, 다른 문맥을 처리하기 위하여 BP(245)를 초기화시킨다.Processor 110 may process multiple data streams simultaneously. For example, when a user of the processor 110 has a video conference with two or more meetings, the processor 110 performs video and audio processing so that the user can view and hear several meetings. To process multiple video data streams, processor 110 supports context switching. This means that the BP 245 switches between multiple data streams. In video conferencing, each data stream may come from a separate meeting at a distance. Alternatively, additional data streams may come from the movie channel to allow the user to add to the video conference and watch the video conference or movie show at the same time. The grammar switching is described in Annex A, section 10.12. Once the context is switched, the scalar processor 210 stores the current contexts and initializes the BP 245 to process other contexts.

BP(245)는 다음과 같은 비디오 데이터 포맷 즉,The BP 245 is a video data format as follows.

1. ISO/IEC 표준 11172(1992) 에 기술되어 있는 MPEG-1;1. MPEG-1 as described in ISO / IEC Standard 11172 (1992);

2. 문서 ISO/IEC JTC 1/SC 29 N 0981 Rev(1995. 3. 31)에 기술되어 있는 MPEG-2;2. MPEG-2 as described in document ISO / IEC JTC 1 / SC 29 N 0981 Rev (March 31, 1995);

3. IUT-T 권고 H.261(1993. 3)에 기술되어 있는 H.261; 및3. H.261 described in IUT-T Recommendation H.261 (1993. 3); And

4.드래프트 IT-T 권고 H.263(1996. 5. 2)에 기술되어 있는 H.263을 처리할 수 있다.4. Can deal with H.263 as described in draft IT-T Recommendation H.263 (May 2, 1996).

비디오 데이터는 스칼라 처리기(210), 벡터 처리기(220) 및 비트스트림 처리기(245)로 나누어져 처리됨으로써 고속 처리가 실현되도록 한다. 좀더 상세하게는, 벡터 처리기(220)는 선형 변환(DCT 또는 역DCT)과 움직임 보상을 수행한다.The video data is divided into a scalar processor 210, a vector processor 220, and a bitstream processor 245 so as to realize high speed processing. More specifically, the vector processor 220 performs linear transformation (DCT or inverse DCT) and motion compensation.

이들 동작은 벡터 처리기에 적합하다. 왜냐하면, 이들 동작은 때때로 데이터의 여러 부분에 대하여 수행되는 동일한 명령을 필요로 하기 때문이다. 비트스트림 처리기(245)는 허프만 디코딩 및 엔코딩과 지그재그 비트스트림 처리를 수행한다.These operations are suitable for vector processors. This is because these operations sometimes require the same instructions to be performed on different parts of the data. The bitstream processor 245 performs Huffman decoding and encoding and zigzag bitstream processing.

스칼라 처리기(210)는 비디오 및 오디오 역다중화와 동기화 및 I/O 인터페이싱 작업을 수행한다.The scalar processor 210 performs video and audio demultiplexing, synchronization, and I / O interfacing.

엔코딩 및 디코딩 동작의 예는 부록 A의 10.6.1절 및 10.6.2절에 나타나 있다. 엔코딩 동작에 있어서, 압축되지 않은 디지탈 데이터가 버스(105)를 통해 프레임 메모리(120) 또는 호스트 시스템(도시안됨)으로 부터 도착한다. 몇가지 실시예에 있어서, 장치 인터페이스 회로(252)는 비디오 A/D 변환기를 포함하고, 압축되지 않은 데이터가 변환기로 부터 도착한다. 벡터 처리기(220)는 양자화, DCT 및 움직임 보상을 수행한다. 비트스트림 처리기(245)는 VP(220)의 출력을 수신하고, GOB(Group of Blocks)들 및 슬라이스들을 생성한다. 특히, BP(245)는 허프만 및 RLC 엔코딩과 지그재그 비트스트림 처리를 수행한다. 스칼라 처리기(210)는 BP(245)의 출력을 수신하고, 픽쳐 계층 부호화(picture layer coding), GOP(group of pictures) 부호화 및 시퀀스 계층 부호화를 수행한다. 이후, 스칼라 처리기(210)는 오디오 및 비디오 데이터를 다중화하고, 부호화된 데이터를 버스(105 또는 122)를 통해 저장 장치 또는 네트워크로 전송한다. 네트워크로의 전송은 몇가지 실시예에 있는 네트워크에 연결된 장치 인터페이스 회로(252)로의 전송을 포함한다.Examples of encoding and decoding operations are given in Sections 10.6.1 and 10.6.2 of Appendix A. In an encoding operation, uncompressed digital data arrives from the frame memory 120 or host system (not shown) via the bus 105. In some embodiments, device interface circuitry 252 includes a video A / D converter, and uncompressed data arrives from the converter. The vector processor 220 performs quantization, DCT, and motion compensation. Bitstream processor 245 receives the output of VP 220 and generates GOBs (Groups of Blocks) and slices. In particular, BP 245 performs Huffman and RLC encoding and zigzag bitstream processing. The scalar processor 210 receives the output of the BP 245 and performs picture layer coding, group of pictures (GOP) coding, and sequence layer coding. The scalar processor 210 then multiplexes the audio and video data and transmits the encoded data via the bus 105 or 122 to a storage device or network. Transmission to the network includes transmission to device interface circuit 252 connected to the network in some embodiments.

디코딩에 있어서, 처리는 역으로 수행한다. 스칼라 처리기(210)는 시스템 데이터를 비디오 및 오디오 성분으로 역다중화하고, 비디오 데이터의 시퀀스 계층, GOP 및 픽쳐 계층 디코딩을 수행한다. 그 결과 생성되는 GOB들 또는 슬라이스들은 비트스트림 처리기(245)로 공급된다. 처리기(245)는 지그재그 처리와 허프만 및 RLC 디코딩을 수행한다. VP(220)는 BP(245)의 출력을 수신하여 역양자화, IDCT 및 움직임 보상을 수행하다. VP(220)는 필요로 하는 경우(예를 들어, 픽쳐 이미지의 에지를 평탄화하고자 하는 경우) 임의의 전처리를 수행하고, 복원된 디지탈 픽쳐들을 장치 인터페이스 회로(252) 또는 저장 장치로 공급한다. 스칼라 처리기(210), 벡터 처리기(220)와 비트스트림 처리기(245)는 여러 블록의 데이터에 대하여 병렬로 동작할 수 있다.In decoding, the process is performed in reverse. The scalar processor 210 demultiplexes system data into video and audio components, and performs sequence layer, GOP, and picture layer decoding of the video data. The resulting GOBs or slices are fed to bitstream processor 245. Processor 245 performs zigzag processing and Huffman and RLC decoding. VP 220 receives the output of BP 245 to perform dequantization, IDCT, and motion compensation. The VP 220 performs any preprocessing if necessary (eg, to planarize the edges of the picture image) and supplies the reconstructed digital pictures to the device interface circuit 252 or the storage device. The scalar processor 210, the vector processor 220, and the bitstream processor 245 may operate in parallel with data of several blocks.

스칼라 처리기(210)가 픽쳐 계층 및 상위 계층들을 처리함으로써, 처리기 내부의 통신을 감소시킨다. 이는 픽쳐 계층 및 상위 게층들이 제어 및 I/O 기능을 위해 스칼라 처리기(210)에서는 사용되지만, 벡터 처리기(220) 및 비트스트림 처리기(245)에서는 사용되지 않는 정보를 포함하고 있기 때문이다. 이와 같은 정보의 예로는 프레임들을 장치 인터페이스 회로(252)로 전송하기 위해 스칼라 처리기(210)에서 사용되는 프레임 레이트를 들 수 있다.The scalar processor 210 processes the picture layer and higher layers, thereby reducing communication within the processor. This is because the picture layer and higher layers contain information that is used by the scalar processor 210 for control and I / O functionality, but not used by the vector processor 220 and the bitstream processor 245. An example of such information is the frame rate used in the scalar processor 210 to send the frames to the device interface circuit 252.

도 3은 비트스트림 처리기(245)의 일실시예에 따른 블록도이다. 도 3에 도시된 신호들은 브록 A의 10.5절에 기술되어 있다. 이들 신호들은 비트스트림 처리기(245)와 IOBUS(240; 도 2 참조)간의 인터페이스를 제공한다. BP(245)에 있어서, 이들 신호들은 SRAM(320)을 포함하는 IOBUS 인터페이스 유니트(310)에 의해 처리된다. 또한, BP(245)는 VLC FIFO 유니트(330), VLC LUT ROM(340), 제어 스테이트 머쉰(350)과, 레지스터 파일과 SRAM을 포함하는 BP 코아 유니트(360)을 포함한다. 도 3의 블록은 부록 A의 10.4절에 기술되어 있다.3 is a block diagram of an embodiment of the bitstream processor 245. The signals shown in FIG. 3 are described in Section 10.5 of Block A. These signals provide an interface between bitstream processor 245 and IOBUS 240 (see FIG. 2). In the BP 245, these signals are processed by the IOBUS interface unit 310, which includes the SRAM 320. The BP 245 also includes a VLC FIFO unit 330, a VLC LUT ROM 340, a control state machine 350, and a BP core unit 360 including a register file and an SRAM. The block of FIG. 3 is described in section 10.4 of Appendix A. FIG.

ROM(340)은 4가지 표준 즉, MPEG-1, MPEG-2, H.261 및 H.263에 대하여 허프만 엔코딩 및 디코딩시 사용되는 룩업테이블을 포함한다. 테이블에 저장되는 정보의 양이 방대함에도 불구하고, ROM(340)은 768*12 비트의 작은 사이즈를 가진다.ROM 340 includes a lookup table used for Huffman encoding and decoding for four standards, namely MPEG-1, MPEG-2, H.261 and H.263. Despite the vast amount of information stored in the table, ROM 340 has a small size of 768 * 12 bits.

작은 사이즈는 테이블을 공유하고, 부록, A의 4절에 기술되어 있는 다른 기술들에 의해 실현된다.The small size is shared by the table and is realized by the other techniques described in section 4 of the Appendix.

본 발명을 특정의 바람직한 실시예에 관련하여 도시하고 설명하였지만, 본 발명이 그에 한정되는 것은 아니고 이하의 특허청구의 범위에 의해 마련되는 본 발명의 정신이나 분야를 이탈하지 않는 한도내에서 본 발명이 다양하게 게조 및 변화될 수 있다는 것을 당 업계에서 통상의 지식을 가진 자는 용이하게 알 수 있다. 특히, 본 발명은 임의의 회로, 클럭 게이트 또는 이들 실시예의 타이밍에 의해 한정되는 것은 아니다.While the present invention has been illustrated and described with reference to certain preferred embodiments, the invention is not limited thereto, and the invention is not limited to the spirit or field of the invention as set forth in the following claims. Those skilled in the art will readily appreciate that various modifications and variations can be made. In particular, the invention is not limited by any circuit, clock gate, or timing of these embodiments.

상술한 바와 같이 본 발명에 따르면, 비트스트림 처리기에서는 여러 가지 비트스트림이 실시간적으로 동시에 엔코딩 또는 디코딩되도록 문맥을 절환할 수 있기 때문에 다중 데이터 스트림을 동시에 처리할 수 있다. 또한, 비트스트림 처리기가 단일 산술 명령 또는 불 명령을 수행하기 위해 프로그램되지 않도록 함으로써 비트스트림 처리가 고속으로 동작할 수 있다.As described above, according to the present invention, the bitstream processor can process multiple data streams simultaneously because the context can be switched so that various bitstreams can be encoded or decoded simultaneously in real time. In addition, bitstream processing can operate at high speed by not allowing the bitstream processor to be programmed to perform a single arithmetic or bool instruction.

Claims

A system for encoding or decoding video data, the system comprising:

A vector processor for performing linear transformation on video data,

A bitstream processor for compressing the output of the vector processor or restoring video data for input to the vector processor, and

Control circuitry for synchronizing operations of the vector processor and the bitstream processor;

The bitstream processor is interrupted by the control circuitry to stop processing for one video data stream and to start processing for another video data stream so that the system can encode or decode the two video data streams in real time. And the bitstream processor is capable of processing two video data streams at about the same time.

The method of claim 1,

And wherein each video data stream represents a moving picture.

A system for encoding or decoding video data, the system comprising:

A vector processor for performing linear transformation on video data, and

A bitstream processor for compressing the output of the vector processor or for reconstructing video data for input to the vector processor;

The vector processor may be programmed to execute a single arithmetic or boolean instruction, and the bitstream processor may not be programmed to execute a single arithmetic or boolean instruction.

A method for encoding or decoding video data, the method comprising:

A vector processor for performing linear transformation on the video data, and a bitstream processor for compressing the output of the vector processor or reconstructing the video data for input to the vector processor;

The method of claim 4, wherein

Wherein each video data stream represents a moving picture.

A method for encoding or decoding video data, the method comprising:

A vector processor for performing linear transformation on video data, and

A bitstream processor for reconstructing video data for compressing or inputting the output of the vector processor to the vector processor;

Appendix A

MSP-1EX System Specifications

Chapter 1 Technical Overview

This chapter provides a technical overview of the multimedia signal processor (MSP-x) presented by hardware and software designers.

1.1 Features

Multimedia Signal Processors (MSP-x) form a group of single-chip VLSI devices to provide a wide range of direct functionality for personal computer and custom product applications.

The MSP family is based on a robust vector processor architecture that applies a single instruction multiple data (SIMD) model for computed for optimal cost / performance. Its characteristics are as follows.

* Full programmability

* Based on the ARM instruction set structure

* Directly 40MHz ARM7 RISC CPU Core

* 80MHz vector processor for high performance digital signal processing

2.56 Gops for 9-bit integer ALU operation

16-bit integer multiplication-2.56 Gops for cumulative operation

* 640 Mflops for 32-bit IEEE floating point addition

* 1280 Mflops for 32-bit IEEE floating point multiplication

Unused 10Kgates for optional customization or graphics capabilities

* Based on 0.65μm 3.3v / 5c CMOS technology

* 128 pin-128 pin cage

The MSP initially supports four main functions.

* video

* Audio / Sound

Telecommunication

* 2D / 3D graphics (optional)

1.1.1 Video

* All functions are programmable in the firmware.

Real-time MPEG-1 decoding and encoding

* Real time MPEG-2 decoding

Near real time MPEG-2 encoding

* Real-time H.324 decoding and encoding

* Image scaling for any screen size or resolution

* Color space conversion between RGB and YUV

Image filtering for picture contour enhancement and noise reduction

4/3 pulldown conversion

1.1.2 Audio / Sound

* All functions are programmable in the firmware.

Real-time MPEG-1 audio decoding and encoding

Real-time MPEG-2 audio decoding and encoding

* Real-time H.320 and H.324 audio decoding and encoding

* Real-time G.728 and G.723 voice coding

* Realtime Sound Blaster Emulation

Wavetable Synthesis

* FM synthesis

1.1.3 Telecommunication

1.1.3.1 Modem

Standard asynchronous COM port interface (NS 16550A UART compatible)

* V.34 up to 2.4 Kbps with 28.8K

CCITT-V.32bis with data rates for 4800, 9600 unsigned and 9600 bps trellis coding

Hayes AT instruction set compatibility

* Call progress monitor

* V.25bis auto dial

DTMF and pulse dialing

* Asynchronous error recovery protocol

* V.42 error correction

1.1.3.2 fax

* V.29 at 9600 bps or 7200 bps

* 4800 bps or 2400 bps V.27

* Call progress monitor

DTMF and pulse dialing

* G3 transfers

* T.4 / T.30 operation

1.1.3.3. Answering a call

* Record greetings via phone set or microphone

* Respond to pre-recorded messages by automatically answering incoming calls

* Record message from caller

* Play message left by caller

1.1.4 2D / 3D graphics (optional)

* BITBLT

2D line polygon drawing and shading

* Geometry mining calculations for 3D points, lines, and triangles

* 3D color calculation with texture mapping

Blending

1.2 Hardware Structure

1.2.1 Overview

The MSP-1 family of multimedia coprocessors is designed to meet a variety of requirements, including density levels, cost, and performance. A block diagram that includes an MSP-1 processor is shown in FIG.

The MSP-1 family includes the following pin-out options.

MSP-1 is designed to be used entry-level without the use of an external SDARM.

* MSP-1EX includes 32-bit memory for interfacing with external SDRAM.

The MSP-1F includes 64-bit memory for interfacing with external SDRAM.

MSP-1G includes an integrated SVGA controller, RAMDAC with added 3D graphics acceleration.

5 is a block diagram of a system including an MSP-1E processor.

1.2.2 External Codec

6 is a block diagram of a system including an MPS-1 processor with an external codec.

1.2.2.1 MPS-1EX Material List

The following is a list of materials presented for MSP-1EX.

* MSP-1EX

* 512K × 32 bit synchronous DRAM

* NTSC / PAL encoder (Samsung KS0119)

Audio telecommunication codec (AD1843 from Analog Devices)

Other (capacitors, resistors, amplifiers, connectors, etc.)

* Printed circuit board

1.3 micro structure

1.3.1 Overview

Basically, the MSP microstructure consists of a very powerful DSP core and a memory I / O service system defined by the customer (see Figure 2). The DSP core includes the following.

32-bit ARM7 RISC CPU operating at 40 MHz and used for general processing

* A vector processor that operates at 80 MHz and is used for signal processing

A shared cache subsystem that operates at 80 MHz and has 2 KB instruction cache, 5 KB data cache, and 16 KB ROM cache. The data cache can be controlled by hardware or software.

High speed 64-bit bus (FBUS) operating at 80MHz and interfacing with many internal FBUS peripherals

Low speed 32-bit bus (IOBUS) operating at 40 MHz and interfacing with many IOBUS peripherals

Internal FBUS peripherals include:

32-bit 33 MHz PCI bus interface

* 64-bit SDRAM memory controller

8-channel DMA controller

Custom ASIC Logic Blocks, Custom ASIC Logic Blocks provide a total of 10 Kgates including interfaces to various analog codecs and custom I / O devices. Interface logic supports Samsung's KS0119 NTSC encoder and AD1843 codec from Analog Devices.

* Memory data mover used to DMA data from Pentlure memory to MSP local SDRAM memory

Bitstream processor that processes video bitstreams

* 16450 UART Serial Line

8254-compatible timer

8259-compatible interrupt controller

The MSP also contains special registers (MPS control registers) used for software controlled initialization interrupts.

1.4 MSP-1EX Pin Description

1.4.1 total: 256-pin

1.4.2 PCI bus interface (53 pins)

CLK clock input pin

RSTL Input Pin Reset, Active Low

AD [31: 0] address and data bus pins

C_BE0LControl Byte 0 Enable Pin, Active Low

C_BE1LControl Byte 1 Enable Pin, Active Low

C_BE2LControl Byte 2 Enable Pin, Active Low

C_BE3LControl Byte 3 Enable Pin, Active Low

PAR parity pin

FRAMEL cycle frame pin, active low

IRDYL initiator ready pin, active low

TRDYL Target Ready Pin, Active Low

STOPL stop processing pin, active low

LOCKL-locked pin, active low

IDSEL initiator select input pin

DEVSEL device selection pin, active low

REQL Bus Request Pin, Active Low

GNTL Bus Approved Pins, Active Low

PERRL parity error pin, active low

SERRL System Error Pin, Active Low

INTAL Interrupt Pin, Active-Low

1.4.3 Others (6-pin)

TCKJTAG test clock input pin

TDIJTAG test data input pin

TD0JTAG test data output pin

TMSJTAG test mode select input pin

TRSTLJTAG test reset input pin

CLK clock input, which is a 40MHz clock input pin.

1.4.4 KS0119 NTSC / PAL Encoder Interface (24-pin)

Frame Synchronization Output to KS0119 for SFRS3 Wire Host Interface

Serial Clock Output to SCLKKS0119

SDAT Serial Data I / O

Horizontal sync signal input to BGHSMSP

Vertical Synchronization Signal Input to BGVSMSP

MSSEL Master Selection

Pixel data output to PD [15: 0] KS0119

Pixel clock output to BGCLKKS0119

PROMCSLBIOS PROM Chip Selection

1.4.5 AD1843 Audio Telecommunication Codec Interface (6-Pin)

A43SCLK Serial clock input / output. SCLK is a bidirectional signal that feeds the clock as an output to the serial bus when the bus master (BM) pin is driven to HI and accepts the clock as an input when the BM pin is driven to LO.

A43SDFS Serial data frame synchronous input / output. SDFS is a bidirectional signal that feeds the frame sync signal as an output to the serial bus when the bus master (BM) pin is driven to HI, and accepts the frame sync signal as an input when the BM pin is driven to LO.

Serial data data input pin to AD1843, output from A43SDI MSP. All control and playback transfers are 16-bit long MSBs.

Serial data output pin to MSP output from A43SD0AD1843. All space control register read and play transfers are 16-bit long MSBs.

1.4.6 Memory Bus Interface (87-pin)

RAS1L output pin (active low). This is a row address strobe that latches the row address from MA [11: 0] into the internal row address buff of the selected SDRAM bank.

CAS 1L output pin (active low). This is a row address strobe that latches the column address from MA [11: 0] into the internal column address buffer of the selected SDRAM bank.

MWEL output pin (active low). This is a write enable for SDRAM.

MAI [11: 0] output pins. Multiplexed row and column address signals for SDRAM.

MD [63: 0] in / out SDRAM data pin

MA23 output pin. Memory Address Bits 23

MA24 output pin. Memory Address Bits 24

DQM output pin. Makes the SDRAM data high impedance after the clock and masks the output (this pin is used only for the synchronous DRAM interface).

MCKE output pin. Mask the SDRAM's system clock to stop operation from the next clock cycle.

MCS0L Output Pin (Active Low), Selecting SDRAM Chip for Lower 32 Bits

MCS1L output pin (active low). SDRAM chip selection for the top 32 bits

MR. DYH output pin. SDRAM ready signal.

MEMCLK output pin. This is the clock output pin for SDRAM.

1.4.7 Power

VDD3.3V power pin

VCC5 volt power pin

VSS Ground Pin

[Table 1]

MSP-1EX Pin Assignments

1.5 Firmware Structure

1.5.1 Overview

MSP provides a powerful and open application environment through the highly optimized combination of vectorized DSP firmware libraries (running on the vector processor) and system management functions (running by ARM7).

MSP separates signal processing development from host application development, providing scalable performance, cost-effective multimedia communications, ease of use, and ease of handling. It also reduces application development and maintenance costs.

1.5.2 Firmware Structure

The MSP firmware system structure is as shown in FIG. The shaded areas represent MSP system elements and the remaining margins represent the underlying PC application and operating system.

1.5.2.1 MOSA (Multimedia Operation System Architecture)

The MSP's real-time operating system kernel is called MOSA, which is a subset of Microsoft's real-time kernel MMOSA.

MOSA is a real-time, robust, multitasking, preemptive operating system that is used for multimedia applications implemented on MSP. It performs the following main functions:

Interfacing Host Windows 95 and Windows NT

* Downloading of selected application firmware from the host

* Scheduling MSP tasks for execution in ARM7 and vector processors

* Management of all MSP system resources, including memory I / O devices

Synchronization of communication between MSP tasks

* Reporting of MSP related interrupts, exceptions and status conditions

MOSA runs exclusively on ARM7.

See the MMOSA real-time kernel specification for more details.

1.5.2.2 Multimedia Library Module

The multimedia library module provides board-wide modules that perform functions such as data compression, MPEG video audio, voice coding and synthesis, and Sound Blaster compatible audio. Each module is optimized for an MSP environment and designed to run in a multitasking environment.

1.5.3 Telecom Library

1.5.3.1 Overview

With appropriate DSP firmware, the MSP can be used to support intercepted voice applications, to answer incoming phone calls, and to store messages on the hard disk. In addition, the system speaker may use a microphone to service a half-duplex speaker phone. Incoming and outgoing calls are detected and used by the system. In addition, the call progress tone can be heard through the handset, system speaker, stereo headphones or audio output channels of the selected telephone under program control.

1.6 Programming Model

1.6.1 Overview

From a hardware point of view, MSP is a single-chip solution that includes two CPUs and many integrated peripherals. From a software perspective, the MSP is a high performance digital signal processing (DSP) device that resides on the PCI bus.

Control of the MSP by the host CPU can be realized by any of the following.

Write-read of MSP control status registers via PCI bus, or

Shared data structures present in host system memory

* Shared data structures in MSP local memory

MSP program execution always starts with an ARM7 CPU, which in turn can initialize a second dependent execution stream in the vector processor. Control synchronization between the ARM7 CPU and the vector processor is performed by arbitrary coprocessor instructions (STARTVP, INTVP, TESTVP) in ARM7 and special instructions (VJOIN, VINT) in the vector processor. Data transfer between the ARM7 CPU and the vector processor can be performed by a data movement instruction executed in ARM7.

The ARM7 CPU is generally responsible for handling most interrupt exceptions, as well as host interface, resource management, and I / O device processing. The vector processor is responsible for all the digital signal processing and any special interrupts such as coprocessor interrupts (which occur in the vector processor in ARM7) and hardware stack overflows (in the vector processor).

In addition, MSP includes many integrated peripherals for interfacing to various I / O devices. The addresses of all peripheral devices are memory mapped and thus can be accessed with standard memory load storage instructions (either by the ARM7 CPU or the vector processor).

1.6.2 Power On, Reset Reset

After power is applied, the MSP automatically enters a self-test sequence to verify correct functionality. Self-test sequences include the following.

Initialization of all internal MSP registers

Perform self-test diagnostics of semiconductor chips to identify all elements of MSP

And the self-test sequence is expected to last near tds seconds. At the end of the self-test sequence, the MSP prepares to perform the MSP firmware, which includes:

* Load and run MSP initialization software

* Loading and running MSP's real-time operating system kernel MMOSA

MSP supports three types of reset:

* Hardware control system reset by PCI bus

* Software control system reset by PCI system reset bit in MSP control register

* Restart software control by the vector resume bit in the MSP control register.

1.6.3 PCI Array Registers

As an I / O device for the PCI bus, the MSP is defined in PCI Rev 2.1 and contains a set of placement registers shown in Table 2.

[Table 2]

PCI array registers

1.6.3.1 Device Vendor Identifier Register

See PCI Bus Specification Rev 2.1 for more details.

1.6.3.2 Status Command Register

See PCI Bus Specification Rev 2.1 for more details.

1.6.3.3 Class Code Calibration Identifier Register

See PCI Bus Specification Rev 2.1 for more details.

For MSP-1EX, the class code is defined as 03 and the subclass is zero.

1.6.3.4 Other Registers

See PCI Bus Specification Rev 2.1 for more details.

1.6.3.5 MSP Base Address Register (MSP BASE)

This register stores the base address for the MSP device. This address is written by the host system software (Windows 95 / NT) and used by the MSP hardware to address the memory.

1.6.3.6 VFB Base Address Register

This register stores the base address for the VGA virtual frame buffer. This address is written by the host system software (Windows 95 / NT) and used in MSP hardware to emulate the VGA frame buffer.

1.6.3.7 Extended ROM Base Address

See PCI Bus Specification Rev 2.1 for more details.

1.6.3.8 Interrupt Line Register

See PCI Bus Specification Rev 2.1 for more details.

1.6.4 ARM7 CPUs

The ARM7 RISC CPU is the master processor of the MSP. It contains a 32-bit data path and consists of a standard ARM7 instruction set structure. ARM7 also includes special coprocessor instructions for interfacing with vector processors.

1.6.5 Vector Processor

The vector processor is the DSP engine of MSP, which includes a 288-bit data path and acts as a coprocessor for ARM7. These functions are described in the vector processor architecture document.

The vector processor 220 operates at 80 MHz and operates in six stages of pipeline: fetch, decode, issuer, register access, execute and write. It includes. It is optimized for DSP related processing.

1.6.6 Virtual Memory Management

The MSP-1EX does not support virtual memory management.

1.6.7 Interrupt Execution Processing

Most interrupt execution processing in the MSP is done by ARM7.

All internal I / O device interrupts enter the internal 8254 interrupt controller, determine the priority between them, and send the highest priority interrupt to ARM7 for further processing.

1.6.8 Physical Memory Address Map

The ARM7 and vector processor programs show all memory mapped MSP input / output devices according to the physical memory shown in FIG.

The MSP address map seen by the ARM7 (or vector processor) extends from 0 to 4GB.

Addresses in the 2GB to 4GB range are mapped to 0 to 2GB host (Pentium) PCI addresses according to the following relationship:

Host PCI Address: = ARM7 Address-8000 0000 (in hex)

This mapping allows the ARM7 (or vector processor) to use addresses from 2 GB to 4 GB to access host PCI memory addresses from 0 to 2 GB. ARM7 cannot access host PCI memory addresses larger than 2GB.

The host (Pentium) programs also show all memory mapped input / output devices according to the somewhat limited physical memory shown in FIG.

From the host (Pentium),

MSP_BASE is the start of the MSP address map.

MSP_BASE + 7DFFFFF is the end of the MSP address map.

* MSP address map is defined only in the 128MB range.

[Table 3]

MSP I / O Device Address Map

1.6.9 MSP Host Control Register

The MSP-1EX contains special registers used for initialization and interruption by the host (Pentium processor).

[Table 4]

MSP Control Register Definition

bit 0PCI System Reset.

This bit is used by the host (Pentium) to completely reset the entire MSP system hardware, including all MSP related internal and external I / O devices. After resetting the PCI system, the MSP will process a standard reset sequence that includes performing all on-chip self-test diagnostics for ARM7, vector processors and I / O devices. This reset has the same effect as a hardware system reset.

bit 1 Restarts the ARM7 vector processor. This bit is used by the host (Pentium) to resume the ARM7 and vector processor. This restart is distinguished from a complete PCI system reset in the sense that the MSP does not process any normal reset sequence and does not perform any on-chip self-test diagnostics. This bit is set, ARM7 starts execution at address 0, and the vector processor enters idle mode. At this time, no internal or external I / O devices are affected.

bit 2 MSP interrupt request from host (Pentium). This bit is used by the host (Pentium) to directly interrupt the MSP and is connected to one of the inputs of an internal 8259 programmable interrupt controller (PIC) used to interrupt ARM7. This bit is set by the host (Pentium) and cleared by ARM7.

bit 3PCI Host Interrupt Acknowledgment. This bit is used by the host (Pentium) to acknowledge the PCI host interrupt request that the MSP has issued. This bit is set by the host (Pentium) and cleared by ARM7.

bit 31: 4 reservation

1.6.10 MSP ARM7 Control Registers

The MSP-1EX has a special register used to interrupt the host by the ARM7 processor.

[Table 5]

MSP ARM7 Control Register Definition

bit 0 PCI host interrupt from MSP. This bit is used by the MSP to interrupt the host through active verification of the PCI INTA # pin on the PCI bus. This bit is set by ARM7 and cleared by the host (Pentium) via the PCI bus.

bit 3 Reservation

1.6.11 MSP internal μROM

The internal ROM has a total of 16 KBytes and includes the following.

μROM initialization software

* Self-test diagnostic software

* Various system management software

Various library subroutines

* Cache for instruction and data constants

The address map is shown in Table 6 below.

[Table 6]

Internal μROM Address Map

1.6.12 MSP Internal SRAM

The internal SRAM performs the function of cache or local memory in accordance with the options determined by the MSP's Vector Control Status Register (VCSR).

In local memory mode, the address space is mapped to the internal SRAM portion starting at position MCP_BASE: 040 0000.

1.6.13 Peripherals inside MSP

The MSP also has many peripherals on two internal buses: Fbus running at 64 bits, 80 MHz and IObus running at 32 bits, 40 MHz.

The devices on the Fbus are:

Memory controller for external synchronous DRAM

Virtual frame buffer interface

* PCI bus controller for external PCI bus

Customer ASIC Interface

* 8 channel DMA controller

Memory data mover (for transferring data between host memory and SDRAM)

* KS0122 codec serial line

* KS0119 codec serial line

* AD1843 codec serial line

On the other hand, the devices on IObus are as follows.

8254-compatible programmable interval timer

8259-compatible programmable interrupt controller (8 levels)

16450-compatible UART serial line

Bitstream processor for MPEG bitstream decoding encoding

The register address map of these peripherals is shown in Table 3.

[Table 7]

Internal Peripheral Register Address Map

1.6.14 IOBUS Peripherals

1.6.14.1 8254-compatible programmable interval timer

The MSP includes a standard 8254-compatible programmable interval timer for use with software with the following features:

It has three independent 16-bit counters.

* Supports 6 programmable counter modes.

All counters are programmed by writing to the control word register and initial count.

Control word register

This register holds various control information for the timer. The bit definitions for this register are shown in Table 8.

Table 8

Control word register

* Status register

This register holds status information for the timer.

Counter 0,1,2

These three registers are mainly counted against the timer. Each counter is 16 bits wide, preset, and counts down from each binary in BCD mode. The inputs, gates, and outputs of this register are characterized by the selection of MODES stored in the control word register. These three counters are completely independent.

1.6.14.2 8259-Compatible Programmable Interrupt Controller (PIC)

The MSP programmable interrupt controller is a very common standard 8259 for all x86-based personal computers. Its features include:

* Supports 8 levels of priority.

Programmable Interrupt Modes

Individual request mask capability

In the MSP-1EX, eight levels of interrupt inputs are assigned to various I / O devices as follows.

* Level 0 (highest) is assigned to the 8254 timer.

Level 1 is allocated to the virtual frame buffer (VFB).

Level 2 is assigned to a customer ASIC logic block containing a DMA controller.

Level 3 is assigned to the bitstream processor.

Level 4 is assigned to the PCI bus interface.

Level 5 is assigned to tbd.

Level 6 is assigned to tbd.

* Level 7 is assigned to 16550 UART.

The output of the interrupt controller is coupled to the interrupt request line (nFIQ) of the ARM7 RISC CPU.

* Register description

There are three 8-bit registers used to initiate the operation of the PIC:

Initialization command word 1 (ICW1)

Initialization command word 2 (ICW2): Not used for MSP-1EX.

Initialization command word 3 (ICW3): Not used for MSP-1EX.

Initialization command word 4 (ICW4)

In addition, there are three 8-bit registers used to control the operation of the PIC:

* Motion Control Word 1 (OCW1)

* Action Control Word 2 (OCW2)

* Motion Control Word 3 (OCW3)

All these registers are specially encoded in both the address part (bit0) and the data part. Refer to the standard 8259 specification for more details.

Table 9

8259 Register Description

1.6.14.3 16450-Compatible UART Serial Line

The MSP includes a 16450-compatible UART serial line that is used as an interface with external serial I / O devices. Refer to the standard 16450 specification for more details.

1.6.14.4 Bitstream Processor

A bitstream processor is a specialized logic block that processes video bitstream data. Its functions are as follows.

Variable length Huffman decoding and encoding

* Unpacking and Packing of Video Data in Zigzag Storage Format

Various bit-level processing

The bitstream processor operates as a concurrent processing unit and is software controlled by the vector processor or ARM7. See the Bitstream Processor section for more details.

1.6.15 FBUS Peripherals

FBUS peripherals include:

Customer ASIC Logic Interface

8 channel DMA controller

* Video encoder serial line interface for Samsung's KS0119

Audio Telecom Serial Line Interface to Analog Devices AD1843

1.6.16.1 ASIC Interface Logic Interface

This section contains the interface logic for all external codecs and both custom ASIC logic blocks. These blocks are all implemented in hardware and do not have program-visible registers. See the ASIC interface section for more details.

1.6.16.2 DMA controller

The MSP-1EX has an on-chip DMA controller with the following features:

* 8 independent DMA channels

Enable / Disable Control for Individual DMA Channels

* IO device for memory transfer or reverse transfer

Address increment and decrement

See the ASIC Interface section for more details.

1.6.15.3 Memory Data Mover

The MSP-1EX also has a special memory data mover. This memory data mover is used to move data between host (Pentium) memory and MSP local SDRAM memory. The memory data mover is basically a special DMA controller that contains the following registers.

MSP Current Address Register: This 32-bit register defines the SDRAM memory address at the beginning of the memory data transfer. This register can be written or read by ARM7 and the initial value must be loaded by ARM7. The address is incremented based on the data transfer size.

Host Current Address Register: This 32-bit register defines the host memory address at the beginning of the memory data transfer. This register can be written or read by ARM7 and the initial value must be loaded by ARM7. The address is incremented based on the data transfer size.

MSP Stop Address Register: This 32-bit register defines the SDRAM memory address at the end of the memory data transfer. This register can be written or read by the ARM7 and used in comparison to the MSP current address register. If they match, the memory data mover generates an MSP End-Of-Process signal.

Host Stop Address Register: This 32-bit register defines the host memory address at the end of the memory data transfer. This register can be written or read by the ARM7 and used in comparison to the host current address register. If they match, the memory data mover generates an end-of-process signal from the host.

Status register: This register contains status information related to the memory data mover. Bit encoding is as follows.

0: MSP EOP. This bit determines whether the memory data mover has reached the stop address of the MSP. If ARM7 initializes the source current address register, ARM7 is reset to 0080 0000 (hex). This bit is only read by ARM7 and not written.

1: HOST EOP. This bit determines whether the memory data mover has reached the host's stop address. If ARM7 initializes the host current address register, ARM7 is reset to 8000 000 (hex). This bit is only read by ARM7 and not written.

Control register: This register contains information related to the memory data mover. This bit encoding is as follows.

0: direction. This bit determines the direction of the data transfer. If this bit is 0 (default), the direction of data transfer is from the host (Pentium) memory to MSP SDRAM memory, and if this bit is 1, the direction of data transfer is from SDRAM to host memory. This bit must be written by ARM7.

1: Interrupt Enable. This bit determines whether the memory data mover interrupts ARM7 at the end of the data transfer. This bit must be written by ARM7.

2: DMA enable. This bit enables the memory data mover to operate. This bit must be written by ARM7.

3: data transfer size. If this bit is 0 (the default), the data transfer size of each memory is 32 bytes; if it is 1, it is 64 bytes. This bit must be written by ARM7.

1.6.15.4 KS0119 Video Encoder Serial Line Interface

The KS0119 video encoder serial line interface includes:

Double-buffered receive data buffer register containing read data from the codec

Double buffer data buffer register containing write data to the codec

A control status register containing various control status information for the serial line.

Table 10

KS0119 Video Encoder Serial Line Interface Registers

The bit encoding of the control status register is as follows.

bit 0: Receive data is full. This bit is set when the serial line receives 8 bits of data from the KS0119 codec. If interrupt enable (bit 7) is set, the interrupt request will also be issued to ARM7.

bit 1: The transmission data buffer is empty. This bit is set when the serial line is ready to send data to the KS0119. If interrupt enable (bit 7) is set, the interrupt request will also be issued to ARM7.

bit 7: Interrupt Enable. This bit is used by ARM7 to enable interrupt requests.

1.6.15.5 AD1843 Audio Telecom Serial Line Interface

The AD1843 serial line interface includes:

* A set of double-buffered registers containing data read from the codec

* A set of double-buffered registers containing data to be written by the codec

See the AD1843 codec interface section for more details.

1.6.16 Command Performance

Table 11 shows the instruction performance at the vector processor cycle count where every cycle is 12.5 ns. The external memory bus width is 64 bits and assumes a page mode clock of 40 MHz. All instruction performance is given in 32 byte vector mode. The rules are as follows:

Ras: Number of cycles required for external memory to make first access. Typically 75 ns or 6 cycles are required.

Latency: The number of cycles to execute the first instruction.

Rate: The number of cycles existing between similar consecutive instruction executions. If the latency is equal to the rate, only one number is used.

Table 11

Command execution performance

Chapter 2 DSP Core

This chapter describes the specification of DSP cores as seen by hardware and software designers.

2.1 Overview

DSP cores are fundamental to MSP and are only responsible for all operations. This DSP core consists of:

* A 32-bit ARM7 RISC CPU that operates at 40 MHz and is used for general purpose data processing such as real-time OS, interrupt and exception handling, and I / O device management.

* A vector processor that operates at 80 MHz and is used for digital signal processing, such as discrete cosine transform, FIR filtering, convolution, video motion estimation, and so on. This vector processor is initialized by ARM7, can run concurrently with ARM7, and is synchronized with ARM7 by special control instructions.

Operating at 80 MHz, 1 KB instruction cache for ARM7 and 1 KB data cache, 1 KB instruction cache for vector processor and 4 KB data cache for shared 16 KB integrated instruction data cache for ARM7 and vector processor Cache subsystem consisting of ROM. The data cache for the vector processor can be controlled by hardware or software. The cache subsystem interfaces with ARM7 over a 32-bit data bus and with a vector processor over a 128-bit data bus.

32-bit, 40-MHz input-output bus (IOBUS) that interfaces with various internal peripherals such as bitstream processors, interrupt controllers, timers, and UARTs.

* 64-bit, 80MHz high-speed input / output bus (FBUS) that interfaces with PCI bus controllers, memory controllers, DMA controllers, and customer ASIC logic blocks.

The block diagram of the DSP core is as shown in FIG.

2.2 ARM7 RISC CPU

2.2.1 Overview

The ARM7 RISC CPU is a general purpose 32-bit RISC processor core. The ARM7 RISC CPU interfaces with the vector processor through a standard coprocessor interface and is used to handle most of the non-operational concentrations, such as real-time OS, IO device interrupt handling, and communication with the host CPU.

The ARM7 CPU has the following features:

* Very static operation, ideal for power sensitive applications.

* Low power consumption: 0.6mA / MHZ @ 3V.

* High Performance: 25MIPs @ 40MHz (40MIPs Peak) @ 3V.

* Large and small operating modes.

Fast interrupt response for real-time applications (22 clock cycles at 40 MHz)

* Simple but powerful command set.

* Very compact layout of about 6mm ² .

2.2.2. Registers

ARM7 has 31 general purpose registers and 6 status registers, a total of 37 registers. The programmer is provided with 16 general purpose registers and one or two status registers. In all processor modes such as User, Supervisor, IRQ, FIQ, Abort, and Undefined, R0 and R15 are directly accessible.

All registers except R15 are general purpose and used to hold data or address values. R15 holds the program counter PC. The CPSR-Current Program Status Register, which is a status register, contains the ALU flag and current mode bits.

R14 is used as a subroutine link register and receives a set of R15 data when a branch and link instruction is performed. In other cases, R14 can also be used as a general purpose register.

Table 12

General purpose registers and program counter

Table 13

Program status registers

2.2.3 Exceptions

An exception is an abnormal condition that occurs during instruction processing, which results in a change of control flow. The seven types of ARM7 exception behavior, listed from upper priority to lower priority, are:

Reset (highest priority)

Abort (data)

* FIQ

* IRQ

Abort (prefetch)

* Undefined command traps, software interrupts (lowest priority)

Table 14

Exception vector table

2.2.4 Instruction Set

All ARM7 instructions are executed conditionally, which means that ARM7 instructions may or may not be executed depending on the values of the N, Z, C, and V flags in the CPSR register.

ARM7 instructions can be divided into several categories:

* Branches and linked branches (B, BL)

* Data Processing (AND, EOR, SUB, RSB, ADD, ADC, SBC, RSC, TST, TEQ, CMP, CMN, ORR, MOV, BIC, MVN)

* PSR Transfer (MRS, MSR)

* Odds and Odds-cumulative (MUL, MLA)

* Single Data Transfer (LDR, STR)

* Block Data Transfer (LDM, STM)

* Single Data Swap (SWP)

* Software Interrupt (SWI)

Coprocessor Data Operation (CDP) (this is a group of instructions)

Coprocessor Data Transfer (LDC, STC)

Coprocessor Register Transfer (MRC, MCR)

2.3 Vector Processor

2.3.1 Overview

The vector processor is a powerful digital signal processor that uses a single instruction multiple data (SIMD) structure for maximum performance. It consists of a pipelined RISC engine operating in parallel on multiple data elements to achieve the best performance. Multiple data elements are packed into 576-bit vectors, which can be calculated at the following rates.

* 32 8/9 bit fixed-point arithmetic operations every 12.5 ns-cycle, or

* 16 16-bit fixed-point arithmetic operations every 12.5 ns-cycle or

* 8 32-bit fixed or floating point arithmetic operations every 12.5 ns-cycle

2.3.2 Execution Pipelines

The vector processor uses a six stage pipeline as shown in FIG. 11 to execute the instruction. Most 32-bit scalar operations are pipelined at one instruction rate per cycle, while most 576-bit vector operations are pipelined at one instruction rate every two cycles. All Loads Stores overlap with arithmetic operations and are executed independently by separate load store hardware.

To balance the complexity and performance of the design, the vector processor can generate and execute instructions using out of order hardware interlocks for resource and data dependency checking. This feature significantly improves performance especially during periods when data cache is lost due to load and store.

2.3.3 Hardware Microstructure

The vector processor consists of four main functional blocks as described in FIG.

Instruction Fetch Unit (IFU)

Instruction decoder issuer

* Command execution data path

* Load Storage Unit (LSU)

The instruction fetch unit is responsible for prefetching instructions and processing to control the flow of instructions to subroutines such as branches and jumps. The IFU has 16 entry queues of instructions prefetched for the current execution stream and eight entry queues of instructions prefetched for the branch target stream. The IFU can receive eight instructions from the instruction cache every cycle.

The instruction decoder issuer is responsible for decoding and scheduling all instructions. Although the issuer can schedule out-of-order instructions according to the execution resource and operand data validity, the decoder can process one instruction per cycle and always process instructions that arrive sequentially from the IFU. .

The vector processor realizes most of its performance through several 288-bit data paths (see Figure 13) operating at 12.5 ns / cycle, including:

* Register file with four ports that can support two reads and two writes per cycle

* 8 32 * 32 parallel multipliers that generate 12.5 ns for each operation of 8 32-bit multiplication (integer or floating-point format), 16 16-bit multiplication, and 32 8-bit multiplication.

Eight 36-bit ALUs that generate 12.5 ns for each of eight 36-bit ALU operations (integer or floating-point format), 16 16-bit ALU operations, or 32 8-bit ALU operations

The load storage unit is designed to interface with the data cache through separate read write data buses, each 288 bits wide, as described in FIG.

2.3.4 Interrupt exception

The vector processor recognizes only two special conditions:

* A coprocessor interrupt (CPINT) instruction executed by an ARM7 program.

* Hardware stack overflow as a result of nested jump multiplication to subroutine instructions executed by the vector processor program

See the vector processor architecture document for more details on how the vector processor handles these two special conditions.

All other interrupt and exception conditions generated by the MCP are handled by ARM7.

2.4 Cache Subsystem

2.4.1 Overview

The cache control unit (CCU) interfaces with ARM7 cores, vector execution units (LSU, IFU), memory (MCU, PCI, DMA, CODEC) and IO devices (BP, UART, timers, interrupt controllers). The CCU interfaces with high speed (80MHz) FBUS and low speed (20MHz) IOBUS. The CCU is the central data transfer unit between virtually all internal CPU core units and peripheral IO devices. Refer to the block diagram (pp. 1-10) in the MSP-1E system specification for a detailed description of the CCU on the MSP chip.

To support a very high performance cache system, the CCU design uses a transaction based on a protocol that supports all read and write operations. Any unit that needs to access the memory can make a request to the CCU control unit. The arbiter in the control unit accepts the request based on a fixed priority and returns a 'transaction_id' to the requester. The requester stores this 'transaction_id' so that it can recognize the data returned when the data actually arrived. While the CCU control is processing requests from one unit (which may require many cycles if a cache miss occurs), new requests from another unit will be accepted in the next cycle along with another 'transaction_id'. Can be. In this method of pending a request, high performance can be realized since blocking of successive requests from other units does not occur. Currently, the CCU can accept and accept one read request and one write request simultaneously in one cycle.

The interface unit (FBUS) to the memory consists of an address queue of four entries and a write-back latch of one entry. In the best case, the FBUS can contain one pending refill (read) request from the ARM instruction cache, one pending refill (read) request from the VEC instruction cache, one write request from the VEC data cache, and dirty. The cache line can support one write-back request coming from the VEC data cache.

In addition, the cache memory itself is optimized for high performance. The MSP cache system has on-chip cache SRAM and cache ROM. Cache SRAM is divided into four different banks to prevent data thrashing between the ARM CPU and vector cores or between instructions and data. Cache ROM provides high-speed and high-density data storage for ARM7 and vector cores. Although the tag is not changed for the cache ROM, valid bits are not available and data is returned from external memory. In summary, the on-chip cache memory contains the following blocks.

* Write-back data cache with 1KB direct mapped instruction cache and 1kB direct mapped, 32-bit data bus interface to ARM7

* Directly mapped 1KB, instruction cache with 256-bit bus interface to vector instruction fetch unit

Write-back data cache with 4KB direct mapped and 256-bit bus interface to vector execution units. The data cache is dual-ported and can provide 256 bits of read data and support 256 bits of write data every 80 MHz cycle.

* 4KB VEC data cache can be formed by scratch-pad operation under software control.

Instruction data ROM cache shared and integrated for use in ARM7 and vector processors. The interface to ARM7 is through the same 32-bit bus as its instruction cache, and the interface to the vector processor is through the same 256 bits as its instruction cache.

* 5 ports:

Read / write port for ARM7

Read port for instruction fetch unit of vector processor

Read / write port for load / storage unit of vector processor

Read / write port for IOBUS on vector processor

Read / write port for FBUS

32 * 256 bit SRAM (~ 1KB) for caching ARM7 CPU instructions

32 * 256 bit SRAM (~ 1KB) for ARM7 CPU data cache

128 * 256 bit SRAM (~ 4KB) for vector processor data cache

32 * 256-bit SRAM (~ 1KB) for vector processor instruction cache

* 512 * 256 bit SRAM (~ 16KB) for data instruction cache

Control of the vector data cache is performed by hardware control or software control.

2.4.2 Cache Subsystem Structure

FIG. 15 is a block diagram of an MSP cache system, and is composed of the following blocks: Instruction Data Cashe (IDC), Cache ROM, CCU_DATA_DP, CCU_ADR_DP, CCU_CTL, and CCU_SM. Each subblock is described in more detail below.

2.4.2.2. IDC

Instruction and data cache (IDC; see FIG. 16) is an on-chip SRAM memory, used to provide instruction and data cache access. This cache consists of four banks: ARM_IC (1 KB), ARM_DC (1 KB), VEC_IC (1 KB) and VEC_DC (4 KB) for one array. In any cycle, this cache accepts one read request and one write request. Tagged RAM has two read ports. The read port address and the write port address are compared with the internal cache tag for hit or miss conditions. The data RAM has only one read port that is accessed by the read port address. In addition, the tag RAM and the data RAM are written using different sets of write addresses. Therefore, in order to access the cache array, four sets of cache bank selection signals and three sets of line indexes are required.

IDC has the following characteristics:

* Maps directly to the write-back rule.

The cache line size is 64B, but the data width is 32B, which corresponds to the vector data width size of the MSP chip.

Each line has two valid bits, one for the high vector and one for the low vector. The data cache also has two dirty bits, one for each vector.

The tag size for ARM_IC, ARM_DC and VEC_IC is 22 bits (address bits 10-bit 31), and the tag size for VEC_DC is 20 bits (address bits 12-bit 31).

The line index bits for ARM_IC, ARM_DC and VEC_IC are 5 bits (address bits 5-bit 9) and the line index bits for VEC_DC are 7 bits (address bits 5-bit 11).

* VEC_DC (4KB) can be reformed into a scratch-pad under software control.

The V_CLEAR signal is used to reset all of the cache line valid bits at once. In the future V_CLEAR will be able to selectively reset only individual banks.

2.4.2.3 Data Path Pipeline

See FIG. 17.

2.4.2.4 Address Path Pipeline

The data path for the address processing pipeline is as shown in FIG.

CCU ADDRESS DP

2.4.3 Interface

2.4.3.1 Data Type

The CCU handles different data types from the various requesting units described in Table 15.

Table 15

CCU behavior when handling different data types

2.4.3.2 ARM interface

The ARM7 CPU core runs at half the frequency of the MSP chip (40MHz), while the CCU runs at the MSP chip's frequency of 80MHz. Synchronization between these two clocks is important in design. In general, the clock generator unit switches MCLK on the rising edge of CLK1. In addition, the global reset signal connected to ARM7 is de-asserted when CLK1 and MCLK are low. In this way the two units are properly synchronized.

ARM7 has only one input bus (ARM_DATA31: 0) for instructions and data, but the MSP chip has a dedicated instruction cache (ARM_IC, 1KB) and data cache (ARM_DC, 1KB). The CCU can use ARM_NOPC to distinguish between these two kinds of requests.

To further improve performance, the CCU adds a micro instruction cache (UI_CACHE, 32B) and a micro data cache (UD_CACHE, 32B) located between the main cache and the ARM7 core. Each of these caches has eight words of contiguous code and data. These micro caches have their own tags (27 bits), tag comparators and valid bits. Valid bits are all cleared during the system reset period.

The ARM7 micro caches act as pre-fetch buffers rather than the actual cache. During the ARM7 read period, the address (ARM_A31: 0) is always compared with the tag. The hit reads back the instruction or data via ARM_DATA31: 0. One micro cache then sends a request to the CCU along with the address, data type and other control information. The arbiter logic in the CCU authorizes requests from all units to make read requests. Currently, in obtaining approval, ARM7 has the highest priority over other blocks. The reason is that ARM7 rarely makes a request unless the ARM7 micro cache has a miss. However, the CCU may have internal hold cycles to provide multiple cycle requests or address queue full conditions. During this time, no external requests will be accepted.

Writing from ARM7 always invalidates UD_CACHE when the address hits UD_TAG. No attempt has been made to design UD_CACHE as a write-through or write-back cache. By invalidating UD_CACHE write hits, data between ARM_CD and UD_CACHE can be matched.

The CCU controls arm_nwait while sending read or write requests to ARM_IC or ARM_DC. In general, the CCU does not hold arm_nwait during the write period. Once the write request is granted without looking at ccu_write_hold2, ARM7 simply gets the data in ARM_DATA31: 0 in the next cycle. The CCU has an internal write buffer to store data. ARM7 can continue the instruction line. However, the CCU always holds arm_nwait for one cycle, even if the data is in the main cache. If the read request misses the main cache, more cycles are held until data is returned from the external main memory. The ARM_CCU interface state machine illustrated in FIG. 19 describes a condition under which the CCU controls arm_nwait.

In Figure 19:

START: Start state for state machine if no request, read data is returned, or write request is issued without hold

HOLD: The CPU grants an ARM7 request for reading or writing, and cancels the authorization with a hold signal.

TAG: The CPU checks the tag with a read address.

MISS: The read address has a miss, and ccu sends a refill request to the external dram.

DATA: Read data is returned, and the CCU sends the returned data to the micro data cache.

2.4.3.3. FBUS interface

The CCU_FBUS interface state machine F_SM is as shown in FIG. 20. In FIG. 20:

IDLE: Kids Status

REQ: Sends a read or write request to the FBUS arbiter.

GRT1: Approval size is larger than 8B.

GRT2: Approved size is larger than 16B.

GRT3: Approved size is larger than 24B.

GRT4: Drive data for last cycle

The data reception state machine D_SM is shown in FIG. 21. In Figure 21:

IDLE: Kids Status

Receive the first 8B data from ONE: Fdata63: 0.

Receive second 8B data from TWO: Fdata63: 0.

Receives the third 8B data from THREE: Fdata63: 0.

Receive the fourth 8B data from FOUR: Fdata63: 0.

REFILL: Refills IDC before returning data to the requester.

RDY: Prepares to return the requester to the data.

2.4.4 Read and Write Operations

The read and write state machines are as shown in FIG.

2.4.4.1 Read Behavior

Instruction and Data Cache (IDC) in the MSP operates in three pipeline cycles: request cycle, tag cycle, and data cycle. In a cache hit situation, the IDC may carry instructions or data in every cycle.

The Cache Controller Unit (CCU) is responsible for arbitration between ARM7, the vector processor unit, FBUS and IOBUS for cache SRAM access. The CCU monitors the bus requests from these four masters and approves the bus to the winner with a specific ID number. The CCU also generates a cache address bus and read / write control signals to access the cache and compare tags.

If there is a cache hit, the bus master winning the arbitration can access the cache for read / write operations. If there is a cache miss, the CCU issues a request and then assists the bus master without waiting for lost data returned from main memory. So, a bus master with a cache miss must keep an ID number. Then, if the requested data is in the cache, the CCU sends a GRANT signal to the lost bus master with the same ID number. This bus master accepts or ignores data.

When a cache miss occurs, a line fetch is performed to receive data from main memory. The line size is defined as 64 bytes, so the CCU executes eight consecutive memory accesses (64 bits each time) to feed data from main memory to the cache.

Request cycle:

The CCU accepts read requests from several units (ARM, IFU, LSU, IO) in CLK1. The requester displays the request signal lsu_req and the read / write signal lsu_rw at the beginning of CLK1. At the end of CLK1, the CCU accepts one of these read requests by running ccu_grant_id [9: 0]. If ccu_grant_id [9: 6] matches the request's unit_id, the request is granted. The requester must latch ccu_grant_id [5: 0] because ccu_grant_id [5: 0] is the transaction_id associated with the request.

If the request is granted, the requester sends other control information, such as the address (lsu_adr [31: 0]) and the cache off operation (lsu_ccu_off) and data type (lsu_vec_type [1: 0], lsu_data_type [2: 0]) at CLK2. Send to CCU

If ccu_rd_hold_2 is not displayed at the end of CLK2, the request passes completely to the CCU and the requested data is returned after some time. However, if ccu_rd_hold_2 is displayed, the requester continues to send the address and control information while canceling the request approved by CLK1. Since all previous grant_id information is still valid, the next cycle does not need to issue the same read request again. ccu_rd_hold_2 remains constant in CLK1 until released by the CCU in CLK2.

ccu_rd_hold_2 is a timing threshold signal used to inform the requester that the CCU is busy processing other things in the current cycle, so that an approved request has not yet been processed.

* Tag cycle

If the request is approved and not later canceled in the request cycle, the request enters the tag comparison phase of cache access. The CCU uses an address lsu_adr [11: 5] and a bank select signal (requester) to select a line for tag reading. The tag hit signal ccu_lsu_hit_2 is known at the end of CLK2. The data is returned in the next cycle for a hit situation. The read port tag is output and latched by CLK.

In addition, the address queue status is evaluated in this cycle. The tag miss and the 'almost_full_address_queue' indicate the 'ccu_rd_hold_2' signal. The CCU state machine does not process any new read requests, but retries a stopped tag comparison.

Since each cache line 64B contains two vectors, the significant bits of the accessed vector must be valid to obtain a tag hit. For a double vector 64B data read, two valid bits must be valid to obtain a tack hit. The cc_off operation always causes a tag miss, and the request is posted to the address queue.

Data cycle

This is a cycle in which the CCU returns data to the requester. Data is loaded on ccu-dout [127: 0] with the lower 16B driven by CLK1 and the upper 16B driven by CLK2. In the case of a 64B data request, one additional cycle is used to terminate the transmission.

The CCU always runs ccu_data_id [9: 0] in the initial half cycle of CLK2 to inform the requester that data will be returned in the next CLK1. The requester always compares ccu_data_id [9: 0] for proper return data. In addition, a tag hit is used as an indicator of conveyance data.

If there is a tag miss in the tag cycle and the address queue is not full, the CCU starts a cache line fetch by posting the missing address, id information and other control information in the CLK1 to the four entry address queue. Currently, each address queue contains approximately 69 bits of information. The memory address latch is loaded at CLK2, and an FBUS request is generated at the next CLK1.

2.4.4.2 Write operation

The write operation in IDC operates in three pipeline cycles: request cycle, tag cycle, and data write cycle. In a write address hit situation, the IDC may write data to the cache data array in every cycle.

Request cycle:

The CCU accepts write requests from several units (ARM, LSU, IO) in CLK1. The requester displays the request signal lsu_req, the read / write signal lsu_rw and the vector type lsu_vec_type [1: 0] at the beginning of CLK1. At the end of CLK1, the CCU approves one of these write requests. Write approval for different units is realized by marking the grant signal ccu_lsu_wr_grant directly as the requesting unit. Since no data is returned, the request unit does not need to receive a transaction_id from the CCU. In CLK2, the requester must supply the address lsu_adr [31: 0], the cc_off signal lsu_ccu_off and the data type lsu_data_type [2: 0].

Similarly, in the read case, the CCU displays ccu_wr_hold_2 at the end of CLK2 to inform the request that the request has been approved but has not been processed in the current cycle. The requester continues to drive the address, cc_off signal, and data type information until ccu_wr_hold_2 is released. Then, in the next cycle, the requester supplies the write data to ccu_dout [127: 0].

* Tag cycle

If the request is approved and not later canceled in the request cycle, the request enters the tag comparison phase of cache access. This cycle compares write port address tags. The CCU uses an address lsu_adr [11: 5] and a bank select signal (requester) to select a cache line. The tag hit signal ccu_lsu_hit_2 is known at the end of CLK2. cc_off writes always cause tag misses, and write data is loaded on the FBUS for external writes.

The requester starts driving data to ccu_din [143: 0], as in lower 16B in CLK1 and upper 16B in CLK2. In the case of a 64B data transfer, the requester takes one additional cycle to drive the data. The CCU has an internal write data latch to hold this data. If this write hits the cache (one or two cycles are used to write the actual data to the cache), or if the cache misses (very few cycles are used to write the data), the requester must complete the write. To be considered.

Data write cycle

This cycle is the cycle by which the CCU writes the actual data to the cache for the cache hit situation. If there are tag misses in the tag cycle, the CCI handles them differently depending on the data type.

If the data type is 32B and the line is clean (two vectors are also clean), the CCU just overwrites the current line with the new tag and the new data. It also marks the vector being accessed as valid and dirty while leaving other vectors in the same line invalid.

If the data type is less than 32B, this cycle is partially written data. This partial data is stored in a temporary register. The CCU fetches the lost half line 32B from the memory, loads it, and returns it to the cache. The partial data is then written to the cache line with the appropriate byte enable signal.

For every write miss with dirty cache lines, the CCU first copies the dirty lines. Since dirty data has not yet been used, the CCU marks the hold with grant logic so that no new read or write requests are accepted. Thereafter, internal readout is started using the dirty line to fetch the dirty cache line data. As a result, the write back address and data are supplied to the memory.

2.4.5 Programming Model

The cache subsystem is all hardware controlled using load and store instructions, eliminating the need for software-visible registers.

2.4.5 The IDC and ROM address formats are as shown in FIG.

Chapter 3 IOBUS Description

This chapter describes the specification of IOBUS as presented by the hardware designer.

3.1 Overview

IOBUS is designed for the low speed standard peripherals used in the system. The bus serves as the main interface between the MSP Cache Control Unit (CCU), Bitstream Processor (BSP) and Timer / Interrupt Controller and all other IO peripherals such as UARTs. The format of the bus is very similar to Intel's IO bus. The bus arbiter control logic always monitors the bus for requests and uses the round-robin scheme to generate appropriate request-approval. The potential bus master always displays the bus-request and waits for the bus-approval to be displayed before occupying the bus. The bus master always drives the address and control lines for the duration of the protocol.

IOBUS is a synchronous bus that operates at 40MHz overall. All acknowledgments on MSP IOBUS occur in the first cycle after the request is sampled as active. The bus can handle up to 16 byte transfers for four cycles (four bursts). This is accomplished by using two size bits that inform the bus arbiter of the transfer size requested by the bus master.

IOBUS has a 32-bit address and data multiplexer. The address always appears first before the data. The IOB_ALE (Address Latch Enable) signal is used by the receiving device to latch the address. Even if an 8-bit device is connected to the bus, all bus accessing assumes 32-bit transmission. By convention, 8-bit devices use the lower 8 bits [7: 0] of the bus, and 16-bit devices use the lower 16 bits [15: 0] of the bus. If a 16-bit device wants to communicate with an 8-bit device, it must put the correct data on the lower 8 bits of the bus so that the 8-bit device can find and latch the data. If there are multiple requests in the same period, unauthorized requesters should always hold their requests until approved by the IOBUS arbiter. There are many bus-accessing cycles, or 4 * 32-bit transfers (up to 16 bytes), for the requests allowed by this scheme. Block transfers are always divided into several 32-bit transmissions.

All bus acknowledgments are generated by IOBUS arbiters. However, there is parallel decoding logic that always monitors the address (if valid) and generates the appropriate chip selection (for the next clock cycle) as the destination. Chip selection is always valid for only one cycle, after which the address is marked for all read and write requests. Each IOBUS node has a dedicated chip selection as input. See pin descriptions and timing diagrams.

The 2-bit size information is generated by the next master, approved by the bus arbiter, and then valid for two bus cycles. If CS is indicated to determine the bus transfer cycle, the selected slave must obtain size information. In addition, upon reading or writing, the IOBUS arbiter keeps track of the transfer size to determine that the bus cycle is over, before starting to find a new request. There is no difference between the data during burst-to-bus transmissions (read or write).

In a data read transfer, the requester tells the requester when the data is valid, and a READY signal is used to initiate this data latch. This READY signal is generated by the bus master and slave.

To satisfy this protocol, all IOBUS nodes need to design an IOBUS interface before processing the request. This interface must meet the following specifications:

3.2 Pin Description

Hereinafter, the address, data and control signal definitions for the system IOBUS on the bus master side will be described. See FIG. 24 showing the IOBUS structure definition. As mentioned above, IOBUS is a multiplexed address / data bus.

xxx is a three character code representing the requester name (ccu, bsp, urt, tmr, int).

* System IOBUS Signal Definitions

3.3 Logic Definition

The IOBUS arbitration control unit is as shown in FIG.

3.4 IOBUS Timing

IOBUS read timing (transfer size = 1 word (4 bytes)) is as shown in FIG. 26, IOBUS write timing (transfer size = 1 word (4 bytes)) is as shown in FIG. 27, and IOBUS read timing transfer Size = 4 words (16 bytes)) are as shown in FIG. 28, and IOBUS write timing (transfer size = 4 words (16 bytes)) is as shown in FIG.

Chapter 4 FBUS Description

This chapter describes the specification of FBUS from the hardware designer's point of view.

4.1 Overview

Memory controllers, PCI, customer-customized semiconductors, and cache subsystems interface with the system bus FBUS through non-multiplexed address and data bus lines.

One central FBUS arbitration control logic monitors the request and issues an acknowledgment using a priority scheme. The bus master (address and data source) always displays the bus request and waits for approval. In steady state, the acknowledgment occurs in the same cycle where the request for pending the bus was not used by another master / slave (all acknowledgments are generated in combination). Once the motor receives the bus acknowledgment, the address / data / control line is sent to the next cycle. The data ready signal always processes the actual data to inform the receiver that the next cycle latch has begun.

In order to make the most of the bus bandwidth, four consecutive requests are received / transmitted in a pipeline back to back manner and require a request FIFO to provide four requests. The memory controller has four deep request FIFOs and two deep data FIFOs. Due to this protocol characteristic, AF_FULL and DF_FULL signals are required. These represent the address FIFO pool and the data FIFO pool, respectively. FBUS supports 8, 16 and 32 byte data transfers using an acknowledgment counter and request size bus.

Each FBUS unit has control logic to request a bus. This logic varies from unit to unit depending on the application (memory / PCI / cache, etc.). However, the actual bus arbitration unit is the same for each unit and is redundant in all submodules. This unit acts as a medium between the external bus master / slave and the internal unit logic. For example, for a memory controller, once CAS is active, the memory controller indicates internal requests to the FBUS arbitration logic via internal signals indicating that FBUS needs to be used. In response to this request, the FBUS controller displays the request to the system external to the memory controller and waits for approval. Once the acknowledgment is received, the address / data / control is sent from the first entry of the response and from the data FIFO in the memory controller.

The system request size for the memory controller can range from 1 byte up to a 32-bit size. For request sizes of 32 bytes or more, the source / requester uses the FBUS size bits to initialize several requests. This is due to the limitations of the SDARM memory bus (1 or 2 SAMSUNG SDRAM 1M * 16).

The SDRAM is programmed for eight wrap lengths to realize the full 32 bytes required by the rest of the system. For requests of 32 bytes or less, all 32 bytes are fetched from the SDRAM, but only the desired number of bytes are sent to the destination.

In addition, the ten bit requester ID buses are validated with chip select signals (same cycles as address / data).

Every FBUS node generates a 3-bit destination ID into the FBUS arbiter. These three bits are validated with the request and indicate the destination of the request. The destination ID bits [1: 0] are decoded from the requester ID input as follows.

Requester ID [9: 6] Source Destination ID [1: 0]

0 Reservation N / A

1ARM7N / A

10 FUN / A

11LSUN / A

100CCU0

101ASIC11

110MEM1

111PCI10

1XXX Reservation

The destination ID bit [2] is used to indicate read / write request status. This helps FBUS to distinguish between address requests (reads) and address / data requests (writes).

In steady state, the acknowledgment counter bits grCNT [1: 0] indicate the number of FBUS cycles the requester needs the bus. For back to back requests, the request informs the bus master of the length of the request. The FBUS master controller marks the acknowledgment according to the two acknowledgment counter bits.

FBUS is a split transaction bus that supports posted reads. It requests the bus to request the bus, and once approved, the FBUS drives the address and terminates the transaction. After a while, the slave / data source uses the destination ID and returns data while returning the request for the same request. This feature greatly improves bus bandwidth and allows other masters to use FBUS faster.

See the timing chart for more details.

4.2 Pin Description

Hereinafter, the address, data, and control signal of the system FBUS will be described.

As mentioned above, FBUS is a non-multiplexed address / data bus.

xxx is a three character code representing the requester name (mem, pci, asc, ccu).

Table 16

System FBUS Signal Definitions

FIG. 30 shows a memory read request FBUS flow, FIG. 31 shows a memory write request FBUS flow, FIG. 32 shows a master / slave non-memory request FBUS flow, and FIG. 33 shows a central FBUS arbitration control unit. will be.

34 through 36 are FBUS timing diagrams, and FIG. 34 shows memory request FBUS timing (shows 8 byte data transfers and multiple data cycles of 16/32/64/128 bytes are used). Fig. 35 shows the memory read request FBUS timing (transfer size = 8 bytes), and Fig. 36 shows the memory back to back write request (transfer size = 32 bytes).

Chapter 5 PCI Bus

This chapter describes the specification of PCI core and PCI glue logic that interfaces with internal FBUS.

5.1 Overview

The MSP_1E PCI controller is designed to meet PCI bus specification revision 2.1. See this standard specification for more details.

The PCI unit contains two main sections: PCI core and FBUS 'gulu' logic.

PCI cores interface with external PCI devices primarily operating at a PCI bus speed of 33MHz. FBUS 'glue' logic interfaces with Samsung FBUS operating at 80MHz. This 'glue' logic interfaces between PCI cores and FBUS. Rate synchronization can be realized using FIFO at the two ends of the subblocks.

Samsung's PCI Core also includes virtual frame buffer logic and all the VFB registers needed to interface with ARM7 via FBUS.

The only feature for this PCI unit is the host CPU MSP chip and interrupt handling from the MSP chip to the host CPU. This will be described in more detail.

5.1.1 Samsung PCI core block diagram is as shown in FIG.

5.2 PCI FBUS Interface Logic (see Figure 38)

The subblocks of the PCI core interface with the PCI core of the SAND micros and the MSP internal FBU. Address and data are stored in the FIFO at two ends. This subblock also serves to synchronize the PCI signal and the FBUS clock.

PCI core logic may be FBUS master and slave devices. Most access is directed to local SDRAM memory via a 64-bit FBUS. See the FBUS chapter for a description of the FBUS protocol.

PCI FBUS control logic also includes virtual frame buffer registers and controls. This register is programmed by ARM via FBUS. See also block.

5.3 PCI VFB Logic

39 is a VFB block diagram and FIG. 40 is a VFB register.

5.3 PCI Core Logic

MSP PCI Core fully satisfies the PCI 2.1 specification. Added is the number of registers added for interrupts and software MSP reset.

The software in ARM7 can interrupt the host CPU by setting a PCI host interrupt request from the MSP (bit3) in the MSP control register. This allows PCI core logic to interrupt the host CPU by setting an interrupt pin on the PCI bus (INTA #). The host CPU then acknowledges the interrupt via PCI host interrupt acknowledgment (bit4) in the MSP control register. This causes the interrupt line to be inactive.

The MSP PCI core can also accept interrupts from the host CPU, which are basically interrupts to ARM7. Since the PCI specification does not support any interrupt input pins, an MSP interrupt request (bit2) from the host in the MSP control register is used to provide this functionality. The host CPU can set this bit to indicate an interrupt to ARM7. Next, once the host interrupt is acknowledged, ARM7 clears this register. See the block diagram in FIG. 41.

For FIG. 41, three registers are required that are mapped to the MSP region rather than the PCI space.

Refer to the PCI 2.1 specification for more details on the actual PCI cores.

Chapter 6 Memory Controller

6.1

This chapter describes the specifications of the memory controller in terms of hardware and software designers.

6.2 Overview

The MSP memory controller has several features and has a programmability level for trade-offs in cost and performance. The memory controller interfaces with the main system bus FBUS and DRAM chips operating at 80MHz. In order to realize an 80 MHz clock frequency, synchronous DRAM is used in the early design phase.

As a result, the memory subsystem supports standard high-speed page DRAM, extended data output (EDO) DRAM, and synchronous DRAM. The memory bank size is limited to two external banks that can be interleaved.

Early synchronous DRAM memory controllers have the minimum features needed to operate DRAM. The following shows the basic first pass memory controller features.

Samsung's synchronous DRAM support

One memory bank using two SDRAM chips (1M * 16)

* Cas-Before-Ras (CBR) refresh support

Partial write support to initiate read-modify_write operations

Internal bank interleave support (ping pong via MA [11])

80MHz memory and processor bus (1: 1) frequency matching

Programmable Refresh Rate

Address and data queuing for efficient use of the system bus

Manual two bank precharge support

The MSP memory controller has two main subcomponents: a data controller and an address controller. The data controller has read and write data queues for storing data read from the DRAM and for writing data from the processor bus. The data controller also includes RMW logic for writing bytes. All control over the data controller comes from the address controller.

The address controller has a request queue, response ID queue, memory access decoding logic, lazy comparator logic, RAS / CAS state machine, refresh state machine, and all necessary control signals used by the data controller.

The SDRAM memory clock is the same as the system clock. The SDRAM receives each set of control signals.

6.2.1 The memory controller block diagram is as shown in FIG.

6.2.2 The memory controller flow is as shown in FIG.

6.3 Address Controller (AC)

In the memory controller, the address controller section not only manages the data controller but also generates all DRAM control. This section of the MSP memory controller also handles the address and control path of the FBUS interface. The following block diagram shows several sub-sections of the address controller unit.

6.3.1 An address controller block diagram is shown in FIG. 44.

6.3.2 Memory Controller Request FIFO

The MSP memory controller has four deep request FIFOs that store FBUS addresses and control information for dispatching to the actual memory controller state machine. Each entry in the request FIFO has a valid bit indicating that the particular entry is valid. The memory controller state machine always supports the lowest entry in the FIFO, which is ENTRY_0. Once a request is provided and the column address strobe (CAS) is active, the memory controller displays a clear signal to clear this entry. Depending on the FIFO FULL / EMPTY status, the barrel shift is initialized to shift the valid content to entry zero.

The MSP memory controller request FIFO format is as shown in FIG.

6.3.3 Memory Controller Address Decode / Map

The address decoding logic mainly serves to generate 11-bit SDRAM row address MA [10: 0] and 8-bit column address MA [7: 0]. These address lines are directly driven to SDRAM address inputs [11: 0]. The memory address bit [11] is used to toggle between the internal SDRAM bank and improved memory bus usage for performance.

This memory address is generated using a programmable multiplexer given through a register indicating

Current system cache line size

-Number of internal banks

Internal bank interleaving

The system cache line offset is 5 bits for a 32 byte cache line. 46 shows a proposed memory address format resulting from the FBUS system address for 16 MB DRAM.

This multiplexed memory address is valid for one cycle with the RAS and CAS strobes indicated by the memory controller state machine.

The MCU may perform 8 byte writes instructing read-modify-write operations. However, bit [2] of the FBUS address is always zero to start only the address. This bit is mapped to bit [0] of the SDRAM address, which is one of three bits representing the starting address as follows.

Faddr [4: 2] write sequence (WRAP = 8)

00-1-2-3-4-5-6-7

102-3-4-5-6-7-0-1

1004-5-6-7-0-1-2-3

1106-7-0-1-2-3-4-5

These are all excellent starting addresses and sequences supported by the MCU.

All read operations assume 32 bytes, and the starting address is (000) = rna [2: 0] = Faddr [4: 2].

6.3.4 Memory Controller Status Machine

The MSP memory controller has one master controller state machine. This state machine is responsible for generating all timings (RAS / CAS / WE / CS / DQM) for SDRAM control signals. The state machine always monitors the request FIFO for valid entries in entry zero. Once the valid bit is detected, the state machine begins the SDRAM sequence initiation. In addition, the page_hit signal is monitored from the page comparator to determine if RAS precharge is required.

RAS precharge is performed on the current active / open bank. The manual precharge sequence includes indicating CS, RAS, WE and MA [10] to activate the zero state. The inner bank select bit MA [11] is used to select the bank for precharging. For Read: The precharge command is displayed after it has been received from the data SDRAM to avoid data conflicts. In the case of writing, a precharge is issued after the last bit of data has been written to the memory. Once the precharge command is complete, the particular bank is idle for the next memory operation. According to the SDRAM specification, the precharge command can be generated at any time after tRAS (min) (here 60ns) is satisfied. However, due to the wrap length, which is currently 4, the memory controller state machine generates a precharge command after data is read / written to the memory.

The following shows the SDRAM parameters used with the MSP memory controller.

Table 17

SDRAM Parameters

tRAS can be used in 5 cycles to achieve 60ns thermal access time for synchronous DRAM. See the memory controller timing chart.

6.3.4.1 State machine diagram

47 shows the SDRAM memory controller RAS / CAS state machine diagram.

6.4 Refresh Memory Controller

The synchronous DRAM needs to be refreshed every 32 ms (15.6 ns) to hold the data in each storage cell. Synchronous DRAM also supports two modes of refresh: automatic refresh and self refresh.

6.4.1 SDRAM Auto Refresh

Using standard automatic refresh, two internal banks are alternately refreshed by an internal counter. Since the number of rows is 4096, automatic refresh requires a 2048 automatic refresh cycle to refresh the entire DRAM.

The auto refresh command is generated by indicating that CKE and WE are high and CS and RAS CAS are low. This command is displayed only when two banks are in the idle state.

The time required to end automatic refresh is

tRC (min) / cycle time = 100ns (spec) /12.5ns=8cycles (80MHz)

6.4.21 SDRAM Self Refresh

Self refresh is another mode used in Samsung's SDRAM. This is generally the preferred refresh mode for data retention and low power operation. Here, SDRAM disables all input buffers except the internal clock and CKE.

If CS, RAS, CAS and CKE are low and WE is high, the self refresh mode is entered. Self-refresh mode does not require the MSP memory controller because it requires shooting the SDRAM clock and retrying with the CKE signal.

6.4.3 Manual Refresh

This refresh mode requires a state machine / counter design. The counter times out every 15.6us and displays the refresh strobe with memory controller logic.

The memory controller then terminates the current refresh and immediately initiates the SDRAM refresh cycle. This cycle is exactly like the automatic refresh cycle unless it has a limit in the idle state.

6.5 Data Controller

The data controller section in the memory roller is primarily provided as a data queue for writing data from the processor or reading data from the SDRAM. This controller also has write merge logic for every partial write (bike write). Partial writes first start DRAM reads, then merge the data, and finally write the fully modified word back into memory. Therefore, any refresh following the partial write sequence must take a performance hit.

6.5.1 The data controller block diagram is as shown in FIG.

6.6 Pin Signature

This controller provides the following package pins:

* RAS_I: Output pin (active low). This is a row address strobe for latching the row address from MA [11: 0] into the internal row address buffer of the selected DRAM bank.

CAS_I: Output pin (active low). This is a column address strobe for latching the column address from MA [11: 0] into the internal column address buffer of the selected DRAM bank.

WE_I: Output pin (active-low at write). This is for driving the write enable input pin of the DRAM.

MA [11: 0]: output pins. Multiplexed row and opendress signals for DRAM.

* DQM: Output pin. After the clock, leave the SDRAM output high impedance and mask the output (this pin is only used for synchronous DRAM interfaces).

CS_I: Output pin (active low). Disabled or Enabled for Selected SDRAM Operation (This pin is only used for synchronous DRAM interfaces.)

CLK: Output pin. This is the clock output pin for synchronous DRAM and is only used in SDRAM and has the same phase as the system clock of the MSP.

6.7 The memory controller timing diagram is as shown in FIGS. 49 to 51. Matters related to FIG. 49 are as follows.

Assume Samsung's SDRAM.

-Memory and system operating at 80 MHz.

One or two external SDRAMs (1M * 16).

4/8 programmable lap length to fetch lines from memory.

tRCD = 3.

tCAS = 3.

Internal delay = 2 clocks.

Memory latency = 8 cycles (8 * 12.5 = 100ns).

System data from memory is delayed by two cycles for arbitration (read data).

6.8 Programmable Model

On the programmer's side, the control registers associated with the memory controller are:

6.8.1 SDRAM Reset Register (R / W)

This register is reset after each system reset. This is a 1-bit register that carries a reset_sdram signal that starts the SDRAM power-on sequence. System Resistor This register is set to one. This register must be cleared by software to operate the SDRAM.

bit 0 is set to system reset and cleared to operate the SDRAM.

Programming address:

Faddr [31:20] = 12'h010

Faddr [3: 0] = 4'b1011

6.8.2 SDARM Burst Type Register (R./W)

This register programs the SDRAM burst type. This is a 1-bit register that is programmed to zero for sequential burst types.

Programming address:

Faddr [31:20] = 12'h010

Faddr [3: 0] = 4'b1010

bit 0 is set to system reset and cleared to operate the SDRAM.

6.8.3 SDRAM Refresh Register (R / W)

This register programs the SDRAM refresh value. This is a 12-bit register programmed via FBUS.

Programming address:

Faddr [31:20] = 12'h010

Faddr [3: 0] = 4'b1001

bit 11-0 is set to a system reset and programmed as a refresh value to 4E0.

6.8.4 SDRAM RAS Prefetch (tRP) Registers (R / W)

This register programs the SDRAM RAS precharge value. This is a 3-bit register programmed via FBUS.

Programming address:

Faddr [31:20] = 12'h010

Faddr [3: 0] = 4'b1000

bit 2-0 is set to system reset and programmed as 1 or 2 or 3.

6.8.5 SDRAM CAS Latency (tCAC) Register (R / W)

This register programs the SDRAM CAS latency. This is a 3-bit register programmed via FBUS.

Programming address:

Faddr [31:20] = 12'h010

Faddr [3: 0] = 4'b0011

bit 2-0 is set to system reset and programmed as 1 or 2 or 3.

6.8.6 SDRAM RAS CAS Latency (tRCD) Registers (R / W)

This register programs the SDRAM RCD latency. This is a 3-bit register programmed via FBUS.

Programming address:

Faddr [31:20] = 12'h010

Faddr [3: 0] = 4'b0010

bit 2-0 is set to system reset and programmed as 1 or 2 or 3.

6.8.7 SDRAM WRAP LENGTH Register (R / W)

This register programs the lap length of the SDRAM for data. This is a 3-bit register programmed via FBUS.

Programming address:

Faddr [31:20] = 12'h010

Faddr [3: 0] = 4'b0001

bit 2-0 is set to system reset and programmed as 1 or 2 or 4 or 8.

6.8.8 SDRAM NOP TIME Register (R / W)

This register programs the NOP time of the SDRAM for the power-on sequence. This is a 16-bit register programmed via FBUS.

Programming address:

Faddr [31:20] = 12'h010

Faddr [3: 0] = 4'b0000

bit 15-0 is set to system reset and programmed to 200us depending on the clock frequency.

Chapter 7 ASIC Interface

This chapter describes the specifications of the ASIC interface unit.

7.1 Overview

ASIC interface unit (see FIG. 52) is one programmable 32-bit DMA,

It has several FIFOs and control blocks. The ASIC interface block interfaces the main system bus (FBUS) operating at 80 MHz and the CODEC interface block that interfaces the MSP, AD1843 (audio and telephone), KS0122 (video capture), KS0119 and VGA. The current assumption is that all CODEC interfaces and DMA controllers operate at full FBUS speed to avoid any synchronization problem.

The customer ASIC block has three main sections: the FBUS master / slave interface, the MSP 8-channel DMA controller and the actual CODEC. Data is passed from FBUS to CODEC or from CODEC to FBUS. However, the address is only generated from the DMA controller. This address may then be an FBUS mapped to the FBUS interface unit. All writes from other FBUS nodes only program the registers in the CODEC section. All other traffic must read an acknowledgment with size and ID information. See the FBUS specification.

The following are the features of the ASIC interface unit.

* Supports 32-bit native DMA features (8 channels--one for each CODEC).

Two 4 dips 64-bit data FIFO.

1 1 dip 52 bit request FIFO.

* 1 2 dips * 52 bit reply FIFO.

* Master / slave support for FBUS and CODEC interface blocks.

* Operating frequency: up to 80MHz.

* Support for IO to Memory and Memory to IO Access.

* Highest priority support for channel 0 used for KS0119.

* Special address bus support for high performance with KS0119.

This customer interface logic supports three different CODECs.

* Audio and Phone CODECs (AD18 / 43). This CODEC has a bidirectional 64-bit data bus that communicates with the DMA controller. (Channel 4 → DAC1, Channel 5 → DAC2, Channel 6 → ADC Left, Channel 7 → ADC Light)

* Video capture CODEC (KS0122). This CODEC has a bidirectional 64-bit data bus and can initiate M → IO, IO → M requests for DMA (channel 2).

* Video backend CODEC (KS0119). This codec receives data directly from the memory controller (channel 0).

ASIC Interface Block

7.2 Direct Memory Access (DMA) Controller

The DMA controller has registers used for address generation and interpretation. This DMA controller has eight independent channels. Each channel has a current address register and a stop address register. The start and stop address registers are preprogrammed via the batch block. The current address register is loaded each time a DMA request occurs from one of eight CODECs. Once the FBUS grants access, this DMA address is incremented every cycle until the current address matches the stop address register. At that point, the DMA controller generates a signal end of process (EOP). This signal causes an interrupt to the process. All eight DMA channels have a common arbitration unit that controls the multiplexer and address comparison block.

This DMA controller supports access between IO and memory, memory and IO, and memory and memory. Whenever the codec wants to communicate with the DMA, the codec displays a DMA_REQ signal and waits for a DACK, which is a DMA acknowledge signal, from the DMA. Once recognized, the CODEC drives the M-IO signals and data. The DMA controller selects the appropriate channel according to the approved DACK. See also block.

7.3 DMA register description

7.3.1 Current Address Register

Each channel has a 29 bit current address register (bits31: 3) which requires all addresses to be arranged in 8 bytes. In fact, this register is a 29-bit counter. This register is read by ARM7 and the initial value is loaded from ARM7 via FBUS. This address is increased based on the data transfer size. The address currently in the address register is transferred to the address generation block to load the address on the FBUS through the multiplexer. The current address register holds the address value in the idle state.

7.3.2 Stop Address Register

Each channel has a 29 bit stop address register (bits31: 3) which requires all addresses to be arranged in 8 bytes. This register is written by ARM7 via FBUS. These values are used to compare with the current address in the compare block.

If the current address matches the stop address, the DMA controller generates an EOP signal for each channel.

7.3.3 Status register

This register stores information indicating whether each channel has reached a stop address. Bits7: 0 specifies which channel has reached the stop address and is reset when ARM7 initializes the current address register via the CCU.

This register is read by ARM7 and ARM7 cannot write this register.

7.3.4 Control register

This register stores information about the operation of the DMA controller. Bits7: 0 specifies which DMA channels are enabled for operation. These bits are reset whenever the corresponding channel reaches a stop address, and ARM7 sets these bits to resume operation. If any channel enable bit is zero, the DMA does not send a DMA_ACK to that CODEC even if the codec sends DMA_REQ to the DMA. Bits 19:16 specify which pair of DMA channels are connected together to operate as a double-buffer. For example, if channel 0 and channel 1 are connected in a double buffer, the DMA controller automatically switches channel 1 when the current address of channel 0 reaches the stop address, and the current address of channel 1 is the stop address. The DMA controller automatically switches channel 0 when it reaches. Bit28: 21 stores information related to the read / write mode of each channel. If any of these bits is set to 1 by ARM7, then that channel is used for read operations and the remaining channels are used for write operations. Bir31 specifies whether DMA sent an EOP signal to an interrupt controller. If this bit is zero, the DMA does not send an EOP even if any channel reaches the stop address.

7.3.5 Mask Register

Each bit in the control register is associated with a mast bit in the mask register. If the mast bit is zero, it prevents the corresponding bit in the control register from being updated. Initially this register 31: 0 is set to FFFFFFFF (hex).

7.3.6 Programming

Start and stop addresses are programmed by ARM7 via FBUS.

The FBUS mapping values are as follows.

CCU → 0040_0000-007F_FFFF,

MCU → 0080_0000-047F_FFFF,

PCI → 0840_0000-FFFF_FFFF.

In address programming, Address [26: 0] is set based on Table 18.

Table 18

DMA register address map

Table 19

Encoding of Status Registers

Table 20

Encoding of Control Registers

7.4 CODEC initialization

The customer ASIC unit supports the initialization of each codec. In effect, ARM7 is responsible for CODEC initialization through the customer ASIC unit.

It has an address decoder for generating a request signal for each CODEC of the customer ASIC unit. Whenever a customer ASIC unit wishes to communicate with an arbitrary CODEC, it sends a request signal to the CODEC and waits for an acknowledgment signal from the CODEC. After receiving the acknowledgment signal, the customer ASIC unit sends data and address to the CODEC.

If the ARM7 wants to read batch data in any CODEC via the CCU, the customer ASIC unit sends the address to the CODEC. The customer ASIC unit sends the CCU a transaction v) upon receiving data from the CODEC. At this point, batch data is sent to the ARM7 via the CCU.

Table 21

CODEC Batch Register FBUS Address Map

53 shows a customer ASIC network.

4. I / O Pin Definitions

(Table 22)

I / O Pin Definitions for Customer ASIC Units

Chapter 8 AD1843 CODEC Interface

8.1

This chapter describes the AD1843 CODEC interface.

8.2 Overview

The AD1843 CODEC interface block is for the interface between the AD8143 serial bus and the MSP DMA module. The AD1843 sends and receives data and control / status information through the serial port. The AD1843 has four pins responsible for the serial interface: SDI, SDO, SCLK, and SDFS. The SDI pin is for serial data input to the AD1843, and the SDO pin is for serial data output from the AD1843. The SCLK pin is for the serial interface clock.

In internal and external communication, the AD1843 requires that data bits be sent after the rising edge of SCLK and sampled on the falling edge of SCLK. The SDFS pin is for serial interface frame synchronization. The AD1843 CODEC interface is based on master mode, which means that the SCLK and SDFS signals are generated by the AD1843. The default SCLK frequency is 12.288 MHz and one frame cycle is 48 KHz.

The basic structure of the CODEC interface is based on DMA. The AD1843 interface specifies four different DMA channels: channel 4 for DAC1, channel 5 for DAC2, channel 6 for ADC left, and channel 7 for ADC write. The size of the channel transfer from or to DMA is 64 bits at a time. Therefore, DMA channels 4 and 5 carry two different 32-bit data (16 bits for left and 16 bits for write). DMA channels 6 and 7, on the other hand, send four different 16-bit data from the CODEC interface to the SDRAM at one time.

The DAC1 and DAC2 interfaces recognize that data is valid when the flag bit of each channel is set. The DAC1 and DAC2 interfaces check the flag bits and then request DMA. When the flag bit is reset, no DAC1 and DAC2 interface DMA requests are issued. The actual operation of the flag bit is controlled by the DMA clock. The DMA block does not generate a DMA acknowledge signal when the flag bit is reset. If the ADC left and write FIFOs are not full, no DMA request is generated. The software must check the ADC flag register to read the remaining data over the data bus. After reading this data over the data bus, the FIFO becomes empty and issues a DMA request when the FIFO becomes full.

The AD1843 control register is read and written by sending a read / write request with the control register address at the control word input. If a read is requested, the contents of the addressed control register are transmitted for the next frame, and if a write is requested, the data to be written must be sent to AD1843 slot 1. To improve the performance of the MSP, the programmer must check the control flag register before reading or writing the control register in the CODEC. When the flag bit of the control flag register is set, read and write operations of the CODEC register are possible.

8.3 DMA channel assignment

DMA Channel 4 DAC1 Left, Write

DMA Channel 5 DAC2 Left, Write

DMA Channel 6 DAC Left

DMA Channel 7 DAC Left

8.4 Data Format for DMA

The data size is 64 bits and is configured as follows.

8.5 Base Address

04CO_4000 DAC1 BASE

04CO_5000 DAC2 BASE

04CO_6000 ADCL BASE (Left Channel)

04CO_7000 ADCR BASE (Light Channel)

8.6 register map

8.7 Register Definition

8.7.1 Control register write data input

The most significant bit (MSB) is the first data input bit transmitted.

8.7.2 Control word input

r / w Read / write request. Reading from or writing to the control register occurs every frame. Setting to 1 indicates control register read, while resetting this bit to 0 indicates control register write.

ia4: 0 control address register for reading or writing

8.7.3 Control register data output

The contents of the control register addressed in the previous frame

8.7.4 ADC Flag Register

r4v-rlv Valid ADC write data is in buffer. Indicates which data in the buffer is valid.

r4v-1lv Valid ADC left data is in the buffer. Indicates which data in the buffer is valid.

8.7.5 ADC Left First Data

ADC Left First Data in Buffer

8.7.6 ADC Left Second Data

ADC Left Second Data in Buffer

8.7.7 ADC Left Third Data

ADC Left Third Data in Buffer

8.7.8 ADC Left Fourth Data

ADC Left Fourth Data in Buffer

8.7.9 Control Flag Register

wfl Control register write flag. When set, the CODEC prepares to receive control register data.

rfl Control register read flag. When set, the CODEC prepares to transfer control register data.

Chapter 9 Video Codecs

9.1 Overview

The video codec logic interfaces to the KS0119 and KS0122 chips on the evaluation board and to the DMA module on the MSP chip. The KS0119 codec also provides screen refresh operation. For this operation a direct data path to the MCU module is implemented as shown in FIG.

9.2 Parent Module Definition

The upper module has three sub modules as shown in FIG.

-KS0119 Screen Refresh Module

-KS0122 Video Data Capture Module

3-Wire Serial Host Interface Module Accesses KS0119 and KS0122 Chip Placement Registers

9.3 DMA Channel Assignment

DMA CH0KS0119 codec

DMA CH1 reservation

DMA CH2KS0122 codec

DMA CH3 reservation

DMA CH4AD1843 audio codec

DMA CH5AD1843 audio codec

DMA CH6AD1843 audio codec

DMA CH7AD1843 audio codec

DMA CH8 reservation

DMA CH9 reservation

9.4 3-Wire Host Interface Module

This module interfaces to the KS0119 and KS0122 chips, where all registers within the chip are accessed through the serial interface. The three-wire serial interface module supports the functionality of the communications protocol on these chips and includes registers for the KS0119 and KS0122 interface logic. See FIG. 3.

9.5 EPROM Interface

The KS0119 IO pins are used to load program data immediately after a system reset and as an interface to an external EPROM that is part of the MSP-1EX boot initialization. See pin assignments for more details.

The EPROM is memory mapped to addresses from C0 000H to DF FFFH.

9.6 KS0119 Register Description

KS019 has the same base address CODEC_REQO as 04B0 0000, which extends to 04DBF FFFF.

9.6.1 KS0119 Register Address Map

KS0119 register address map

9.6.2 Frame Size Register

This register controls the frame size transmitted to the CODEC chip, as shown in FIG. 57, and the minimum frame length is 3 bytes.

9.6.3 Chip ID Register

This register stores the CODEC chip ID value, which stores 03H for KS0119 writes and 83H for KS0119 reads.

9.6.4 Control / Data Register

This register tells CODEC chip KS0119 that the next byte transferred is a register index or data bit. For KS0119, 08H indicates that the next byte is an index and 09H indicates that the next byte is data.

9.6.5 Index / Data 0 Register

This register stores an index value or 0 bytes of data for the CODEC chip placement register, depending on the value transferred in the previous byte. See communication protocol in the Programming Reference.

9.6.6 Data 1 register

This register stores data written to the CODEC register Index + 1.

9.6.7 Data 2 Register

This register stores data written to the CODEC register Index + 2.

9.6.8 Data 3 Register

This register stores data written to the CODEC register Index + 3.

9.6.9 KS0119 Logic Control Register

Bit designation for the KS0119 control register is as shown in FIG.

9.6.10 HS and VS Polarity

This register defines the polarity of the horizontal and vertical sync signals. A value of 0 is defined as active low, while a value of 1 is defined as active high. The bit designation is as follows.

Bit 0: VS Polarity

Bit 1: HS Polarity

9.6.11 HS offset

The active signal is generated after this offset value, which is defined as 00H.

9.6.12 VS Offset

9.6.13 The status register is as shown in FIG.

9.6.14 Read Data Serial Interface Register

This register indicates that the read flag transitions from busy to ready, and then stores valid data from the serial port.

9.6.15 Read PROM Data Register

This register stores valid data when the PROM flag is ready.

9.6.16 Programming Reference

9.6.16.1 Deployment and initialization

The video display hardware is controlled to operate in two modes: VGA overlay mode and VGA emulation mode.

This mode of operation is controlled by setting a bit in the logic control register.

MSSEL: 0 for VGA overlay mode

In VGA emulation mode 1.

VGA overlay mode requires the presence of a VGA card on the PC system.

The monitor cable is connected to the MSP card.

Supported VGA resolution is up to 800 * 600

The display buffer is required to be the same size as in the VGA setting.

In order to set the video window by which software fills the color key blind spots in the VGA frame buffer, the video data must be written in the blind spots of the same size and position as the blind spots in the VGA frame buffer in the MSP SDRAM. See FIG. 60.

The KS0119 chip recognizes color keys and switches the VGA input port to a video input port. The software sets the DMA channel 0 start address to the upper left of the SDRAM video output buffer, and the DMA record length is the bits per pixel used in the video data and the resolution set on the VGA card (4: 2: 2 = 16 bits per pixel). Is set according to.

9.6.16.2 Serial protocol 3-wire interface to KS0119

When setting the batch register in the KS0119 chip, the protocol is as follows.

At least two frames are required to be sent to the peripheral chip.

The first frame is for setting the index of the batch register.

The second frame is for reading or writing data (contents of register).

The software sets the frame size register to the appropriate length and sets the serial access bit to one. The software then loads all the bytes needed for the frame before changing the frame size register, and the CODEC interface logic waits until all bytes are loaded before the serialization of the frame begins.

The first transmitted frame is for setting an index, and the frame size is three. See FIG. 61.

The second frame is for setting registers, and the frame size is three.

After each data byte, the chip automatically increments the index by one, which makes it possible to set consecutive registers by sending multiple bytes of data to the CODEC interface logic supporting up to four data bytes.

When a read or write operation is performed, the software checks the read and the odd flags of the status register for valid data during the read operation, or checks whether the write flag = ready before sending the next frame.

The following example shows the steps to set up a KS0119 data sheet.

Since two registers have consecutive indices, these two bytes can be loaded in a single frame. First, the index should be set as follows.

A load frame size register (Address = 04B0_0000H) with an 83H value (frame size = 3, serial access bit set)

Load ID register with value -03H (Address = 04B0_0001H)

-Load data / control byte: 08H value indicating address of next byte to KS0119 (Address = 04B0_0002H)

-Load index register with address 6AH (Address = 04B0_0003H)

The serial interface detects whether or not the contents in the frame size register match and starts frame transfer, and the write flag in the status register is set to a busy state. The software checks the flags in the status register before sending the next frame. The flag is ready and the software can load the value for the next frame.

9.7 KS0122 Register Description

The KS0122 has a base address corresponding to 04C0 2000, which extends to 0420 2FFF.

9.7.1 KS0122 Register Address Map

9.7.2 Frame Size Register

This register controls the frame size transmitted to the CODEC chip, as defined in FIG. 62, and the minimum frame length is 3 bytes.

9.7.3 Chip ID Register

This register stores the CODEC chip ID value, 04H for KS0122 writes and 84H for KS0122 reads.

9.7.4 Control / Data Register

This register tells the CODEC chip KS0122 that the next byte sent is a register index or data bit. For KS0122, 00H indicates that the next byte is an index and 01H indicates that the next byte is data.

9.7.5 Index / Data 0 Register

9.7.6 Data 1 Register

This register stores data written to the CODEC register Index + 1.

9.7.7 Data 2 Register

This register stores data written to the CODEC register Index + 2.

9.7.8 Data 3 Register

This register stores data written to the CODEC register Index + 3.

9.7.9 KS0122 Logic Control Register

The bit designation for the KS0122 control register is as follows.

bits 1: 0

04: 2: 2 format

14: 1: 1 format

10CCIR656 format

9.7.10 Status register

bits 1: field status

0: storm field

1: radix field

bits 0: VS Status

0: VS from 1 to 0

1: VS from 0 to 1

9.7.11 Read Data Serial Interface Register

9.7.12 Serial Protocol 3-Wire Interface to KS0122

When setting the batch register for the KS0122 chip, the protocol is as follows.

At least two frames are required to be transmitted to the peripheral chip.

The first frame is for setting the index of the batch register.

The second frame is for reading or writing data (contents of register).

The first transmitted frame is for setting an index, and the frame size is three. See FIG. 63.

The second frame is for setting registers, and the frame size is three.

When a read or write operation is performed, the software checks the read and write flags of the status register for valid data during the read operation, or checks whether the write flag = ready before sending the next frame.

The following example shows the steps to set up a KS0122 data sheet.

To set the values for chroma key byte 0 and byte 1, the index for this register is 6AH for byte 0 and 6BH for byte 1. See the KS0122 data sheet.

-Load frame size register with address 83H (frame size = 3, serial access bit set) (Address = 04B0_0000H)

-Load ID register with value 03H (Address = 04B0_0001H)

-Load data / control byte: 08H value indicating address of next byte to KS0122 (Address = 04B0_0002H)

-Load index register with a value of 6AH (Address = 04B0_0003H)

The serial interface detects whether or not the contents in the frame size register match and starts frame transfer, and the write flag in the status register is set to a busy state. The software checks the flags in the status register before sending the next frame. If the flag is ready, the software can load the value for the next frame.

10th Equipment Stream Processor

10.1

This chapter describes the functional requirements for designing a bitstream processor (BP), one of the major MSP processing engines for video data compression and decompression applications.

10.2 Abbreviations

A / V Audio and Video

BP Bitstream Processor (MSP Block)

CCU Cache Control Unit (MSP Block)

Common intermediate format with luminance sample resolution of 352 * 288 at CIF29.97Hz

DCT Discrete Cosine Transform

DMA direct memory access

DSM Digital Storage Media

FBUS Fast Bus (MSP Internal Data Bus)

GOB Block Group

GSTN General Switched Telephone Network (already known as PSTN)

HDD Hard Disk Driver

I / F interface

IOBUS I / O Bus (MSP Internal Peripheral Bus)

ITU-T-601 Tables for digital coding of color television signals with sample resolution of 720x480 at 29.97 Hz and 720x576 at 25 Hz, respectively (previously referred to as CCIR 601), however, the display resolution can be 720x480 or 704x480.

LSB min bit

LUT Look-up Table

MPEG Motion Picture Expert Group

MSB Max Significant Bits

MSP Samsung Multimedia Signal Processor

Quarter_CIF with luminance resolution of 176x144 at QCIF29.97Hz

RLC run_length level code

SDRAM Synchronous Dynamic Random Access Memory

MPEG-1 video table compliant information input and format with luminance resolution of 352x240 at 29.97Hz for SIFNTSC and 352x288 at 25Hz for PAL

TSD will be defined

VLC Variable Length Code

VP vector processor (MSP block)

10.3 Key Features

Supports encoding and decoding applications of MPEG-1, MPEG-2, H.261 and H.263, and syntax to form and interpret slice (or GOB) layers.

* Perform RLC processing in real time

* Hoffman code processing in real time using all Hoffman tables in the MPEG-1, MPEG-2, H.261 and H.263 video standards.

* Supports two forward / reverse zig-zag scan conversion laws

IOBUS interfaces at a maximum transfer rate of 731.4 Mbits / sec (32-bit @ 40 MHz)

Maximum operating clock frequency is 40MHz

* Includes 9.2 Kbit ROM for Hoffman Cotec Look-Up Table

* Contains 320 byte internal SRAM

* Preemption and cooperative context switching modes

* The destination gate calculation for the control path is 6Kgates plus RAM and ROM

10.4 Overview

The bit stream processor (BP) is one of four MSP internal peripherals. This is a hardware organization block to support multiple bit streams in video compression and decompression status.

These devices are specifically designed for bit_rezel processing because the VP and ARM7 inside the MSP do not have enough structure for this bit manipulation.

These BPs send and receive data over a 32-bit bus called IOBUS with a maximum transfer rate of 731.4 Mbits / sec.

The BP operates as an independent processor and is controlled by the software of ARM7 or VP.

More particularly, the BP encodes and decodes all information contained in the slide or GOB and below, and sends and receives data to and from the CCU. The BP also performs forward and reverse zig-zag conversion and encodes and decodes the differential DC coefficients.

Moreover, these BPs reconstruct the motion vectors using differential motion vectors in decoding, and opposite in encoding except for the two specialities: dual_prime mode in MPEG-2 encoding and prediction mode in H.263 encoding and decoding. Perform the action.

If the BP is assumed to operate in simple mode, the BP will start processing the slide or GOB once, and the BP will be interrupted after the slice or GOB processing is complete. This operation is achieved by the full duplex mode encoding and decoding the slice or GOB by interleaving.

If the ARM7 wants to switch the BP to another task momentarily, the BP will support a preemptive contact switching mode that completes the BP process before the current slice or GOB completes.

3 shows a block diagram of a BP.

As shown in Fig. 3, the BP includes a five block IOBUS interface device, a VLC FIFO device, a VLC LUT ROM, a control state machine and a BP core device. The input and output data is operated by an IOBUS interface device containing 16x32 bit RAM. It supports all data movement and interrupt requests. The VLC FIFO device prepares the next data word for the data decoding operation, and performs output data packing for the data encoding operation.

The VLC lookup table ROM has a 764x12bit size that stores all the necessary information for all Hoffman code processing. The control state machine controls all encoding and decoding. The BP core device is a small processor that includes an adder, a comparator, a barrel shifter, a register file and 128x16 bit RAM. Bit manipulation is useful for the core.

10.5 Signal Definition

The signals required for the BP external interface are shown in Table 23.

The signal at the end of character 1 represents the active_row.

In the directional column of Table 1, B, I, and O are bidirectional signals, meaning input signals and output signals, respectively.

Table 23

BP signal definition

10.6 Data Flow for Encoding / Decoding

This includes, for example, the data flow of representative video encoding and decoding applications. The audio data flow is not described in detail here.

10.6.1 Encoding Case

Step E1: Raw (RAW / A / V Data Entry)

Typically the input video and audio signals are sampled, digitalized for external codecs and supplied to the user ASIC. In multimedia PC environments, however, some VGA control boards also include frame and sound capture. Therefore, raw A / V data is delivered from either the user ASIC or the PCI bus interface. Custom ASIC or PCI buses contain a small buffer of 32BYTES. The data in this buffer is transferred to external SDRAM via FBUS using DMA operation. This movement of data is initiated by the ARM7 after the power is reset.

Step E2: Pre Filtering by VP

First, the VP fetches image data stored in the SDRAM of the VP data start (typically the scratch pad area). The VP then temporarily filters these pixels and scales the space. After pre-filtering, the resolution of the image is normally converted from ITU_T_601 size to CIF or QCIF size. This VP records the prefiltered results for external SDRAM.

Step E3: Data Compression by VP

The VP fetches the pre-filtered data of the SDRAM back into the VP data cache so that compression is performed according to the rules given in the corresponding standard. Normally, the VP performs forward DCT / forward adaptive quantization, motion prediction, macroblock type determination, and the like.

After performing this procedure, the VP must record the result with the appropriate head information back into the VP data cache. In practice, this VP data cache area is used as a BP input buffer. To check the buffer status, a flag signal is used.

Step E4: Initialize BP by ARM7

In practice, before the BP can operate, ARM7 must initialize the BP's initial register.

This initialization is not performed for 128 cycles after the power-on reset signal is applied. In particular, ARM7 must initialize the I / O buffer address and BP instruction registers and specify the number of macroblocks encoded within the slice or GOB.

After initializing these registers, ARM7 must set the BP enable flag to perform the BP process.

Step E5: Bitstream Process by BP

If any one of the two input buffers is full, BP starts reading data through IOE JS. That is, the BP can read data only when the buffer is full. Then, BP converts 8x8 block data in the zigzag format. The result is directly encoded with RLC and Hoff.

The Huffman coded result may be transferred to either the ARM7 data cache or the SDRAM. The BP should only write to the output buffer if it is empty so that the buffer does not overflow. As a last example of this process, if the number of macroblocks processed equals the number of macroblocks specified by ARM7, the BP will interrupt the ARM7 at the byte and position of the last data and terminate the current slice or GOB process.

Step E6: Bitstream Formation and A / V Multiplexing by ARM7

ARM7 combines Hoffman-coded data and syntax variables to produce the last bit stream, and repeats the process.

And ARM7 can also manipulate with the top layer of slice or GOB multiplex audio and video bitstreams. This result is written to SDRAM by ARM7.

Step E7: Network Interface by VP (Selection for Video Conferencing)

For video telephony or video conferencing applications, up to step 6 above, the VP functions as a network interface, such as a V.34 modem for H.324 GSTN video telephony or a 1400 series interface for H.320 ISDN video conferencing.

Step E8: Output the Last Bit Stream

The last bitstream stored in the SDRAM is sent to either the custom ASIC or the PCI bus. Normally the user ASIC block is used for the network interface, and the PCI bus interface is used for recording (eg HDD) data storage.

When this data is moved, the DMA data transfer initialized by ARM7 is used.

10.6.2 Decoding Case

Step D1: Bitstream Fetch

In a multimedia environment, the compressed bitstream is supplied from one of a CD-ROM driver, an HDD and a network interface.

Therefore, this bitstream can be either a custom ASIC or a PCI bus. Data stored in a 32-byte buffer on a custom ASIC or PCI bus is transferred to SDRAM using DMA.

Step D2: Network Interface by VP (Selection for Video Conferencing)

In video conferencing, the data is first performed by a V.34 or 1400 series network interface routine by the VP. VP writes the result to SDRAM.

Step D3: A / V Demultiplexing and Header Analysis by ARM7

ARM7 moves data in SDRAM to the ARM7 data cache and performs A / V bitstream demultiplexing. For the video bitstream, ARM7 also retrieves all start codes and parses the header until slice GOB is detected. ARM7 stores the decoded bitstream syntax variables in the special area of SDRAM by ARM7. The demultiplexed audio and video bitstreams are sent to each of the rate buffers in the SDRAM. The size of the rate buffer may be different for each operation. For example, for video rate buffer size, MPEG-1 recommends 370 Kbits and MPEG-2 SMS 1.835 Mbits.

Step D4: BP Initialization by ARM7

Performing this step is the same as step E4 in the previous subsection except that it does not require to initialize the register for the number of coded macroblocks. In other words, the initialization should not be performed for 128 cycles after the power-on reset signal is applied.

Step D5: Bitstream Process by BP

After initializing the BP for a particular slice or GOB, the recovered data is sent to two buffers.

The BP reads data through IOBUS, which checks the status of the pull plug. In BP, if the input data contains a headword. Analyze syntax variables.

If the BP continues to recognize the next bit as a Hoffman code, Huffman decoding is performed within the top four cycles for each Hoffman code. If Hoffman decode is a DCT AC coefficient, the Hoffman decoded result is a decoded RLC representing a 64 pixel component.

The reproduction pixel is in turn converted to zig-zag, and finally transferred to two output buffers so that VP performs forward quantization. The BP continues this process after detecting an initial code other than a slice or GOB. If this is not detected, BP interrupts ARM7 with byte and bit position information for the last used data. ARM7 then searches for the next slice or GOB start code and repeats this process.

Step D6: Restore Data of VP

Using the result of step D5, the VP performs image reproducibility using inverse quantization, inverse DCT and motion buffer. After completing the encoding process, the VP stores the result in SDRAM.

Step D7: Post Process of VP

Before the video and audio data is transferred to the digital / analog converter, the pixel performs the above process so that VP obtains the desired output resolution and image.

This result will again be stored in SDRAM.

Step D8: (RAW) A / V Data Output

Finally, the reproduced audio and video data in the SDRAM are output using DMA. Again, this movement of data is initiated by ARM7. Current video overlay technology allows the PCI bus to send data to the video source, and finally the data will be sent to either a custom ASIC or PCI bus.

10.7 Programming Model

10.7.1 BP base device address

The BP has the following 32-bit base device addresses.

MSP_BASE BP_BASE Address_Offset

Here, MSP_BASE is 5 bits defined by the MSP base PCI device address, BP_BASE is 7 bits such as 7'b 1111100, and Address_Offset is 20 bits allocated to the BP internal register.

Therefore, the address range assigned to the BP in the entire MSP I / O device address map is 27'h 7C0_0000 to 27'h 7CF_FFFF.

10.7.2 Internal Register Specifications

The internal register set is shown in Table 24. All registers shown in Table 2 can be written or read by ARM7 or VP.

(Table 24)

BP internal register

BP-MODE [31: 0] (lead-only, no default)-This register defines the video standard type and various picture level information, details of which can be found in subsection 10.8.1.

BP_CONTROL [31: 0] (Lead / Write, default value is 32 'h 0000_0000)-This register contains various control variables for BP operation. The ARM7 or VP will set each flag in this register and some flags will be reset by the BP. Bit specifications can be found in subsection 10.8.2.

* IBUF0_START [31: 0] (read / write, no default)-This register initializes the initial address defined by ARM7 to be input buffer 0 of the BP input bidirectional buffer. The initialization value for IBUF0_START is always less than IBUFO_END and is equal to IBUFO_START [3: 0] 4'b0000.

IBUF0_END [31: 0] (lead-only, no default)-This register defines the last address as input buffer 0 of the BP input bidirectional buffer, as described in section 10.11.

IBUF1_START [31: 0] (read / write, no default)-This register initializes the ARM7 start address so that the input buffer of the BP input double buffer is one.

* The initialization value of IBUF1_START is always smaller than IBUF1_END, and IBUF1_START [3: 0] is equal to 4'b0000. This is described in Section 10.11.

IBUF1_END [31: 0] (Lead Only, No Default)-This register defines the last address so that the input buffer of the BP input double buffer is one. This is described in Section 10.11.

OBUF0_START [31: 0] (read / write, no default)-This register initializes the start address of ARM7 so that the output buffer of the BP output double buffer is zero.

The initialization value of OBUF0_START is always smaller than OBUFO_END and OBUFO_START [3: 0] is equal to 4'b0000. This is described in Section 10.11.

OBUF0_END [31: 0] (lead-only, no default)-Defines the last address so that the output buffer of this register BP output double buffer is zero. This is described in Section 10.11.

OBUF1_START [31: 0] (read / write, no default)-This register initializes the start address of ARM7 to be output buffer 1 of the BP output double buffer by ARM7.

The initialization value of OBUF1_START is always smaller than OBUF1_END, and OBUF1_START [3: 0] is equal to 4'b0000. This is described in Section 10.11.

OBUF1_END [31: 0] (lead-only, no default)-This register defines the last address to be output buffer 1 of the BP output double buffer. This is described in Section 10.11.

* SAVE_ADR [31: 0] (lead-only, no default)-This register defines the initial address of the SDRAM to store the BP internal contact when preemptive contact switching mode is required. See subsection 10.12.1 for further information.

VALID_BYTE_ADR [31: 0] (lead / write, no default)-This register indicates the position of the last valid data byte of the input double buffer in decoding or the output double buffer in encoding. The purpose of this register is to handshaking between ARM7 and BP. In general, additional information is also required for valid bit positions of valid byte data, which is contained in the BP_CONTROL [31: 0] register. Details are given in section 10.13.

BP_STATUS [31: 0] (Lead / Write, default value is 32 'h0000_0000')-This register indicates the various internal states of the BP. Every bit position of at least two bytes (eg, BP_STATUS [15: 0]) is an interrupt condition that can set ARM7_IRQ to one. This register can be accessed in two ways. The ARM7 or VP can read or write a full 32-bit register using address 27'h7CO_0050. In general, however, ARM7 and VP preferably write (or reset) the contents of the BP_STATUS register in bits. The BP also supports characteristic content by assigning addresses in the range 27'h7CO_0030 to 27'h7CO_004F for each bit of BP_STATUS. These bit contents are described in subsection 10.8.3.

* BP_INT_MASK [15: 0] (Lead Only, default value is 16hFFFF) Each bit of this register corresponds to the interrupt condition by BP_STATUS [15: 0] and indicates the condition before coding inside BP_STATUS [15: 0]. And-ed. If one mask bit is set to zero, the corresponding interrupt condition is unconditionally set to zero (eg, disabled). Details about these interrupts are described in section 10.9.

* V_MB_SIZE [7: 0] (Lead Only, No Default)-This register indicates the vertical size of the picture to be encoded or decoded. This value here means the number of macroblocks. For example, if the vertical size is 288, then V_MB_SIZE [7: 0] = 288/16 = 8. ARM7 must configure the BP encoding and decoding operations before starting each time.

* H_MB_SIZE [7: 0] (lead-only, no default value)-This register indicates the horizontal size of the picture to be encoded or decoded. This value here means the number of macroblocks. For example, if the vertical size is 352 pels, then H_MB_SIZE [7: 0] = 352/16 = 2. ARM7 must configure the BP encoding and decoding operations before starting each time.

ARM7_IRQ [0] (write-only, default is 0)-This register is a 1-bit flag for requesting an interrupt from ARM7 and is directly connected to the ARM7_IRQ output port. This flag is set if any bits of BP_STATUS [15: 0] are set to one. ARM7 resets this flag.

10.8 BP I / 0 Dataword Format

This section contains instruction data and macroblock data word formats for BP I / O.

10.8.1 BP_MODE register format

The 32-bit BP_MODE register at address 27'h7C0_0000 has the following format given in Table 25. That is, BP_MODE [31] = PARAM_SET2 [7] and BP_MODE [0] = SF [0].

Table 25

BP_MODE register format

* standard_format [SF] _ The video standard used is defined in Table 26. The SF must always be defined by ARM7 before the BP is enabled for all video encoding and decoding applications.

Table 26

Definition of SF

* picture_type (PT)-picture coding type is defined in Table 27.

The value 00 for PT is a special case for MPEG-1, MPEG-2 and H 263 applications. In particular, the D_picture is assigned as the picture type for MPEG-2 even though it is not used for MPEG-2. This is because the MPEG-1 bitstream is a subset of the MPEG-2 bitstream.

Table 27

Definition of PT

picture_structure (PS) -Picture structure information is defined in Table 28. Again, the value 00 for PS is illogical and results in an error.

Table 28

Definition of PS

parameter_set 0,1 and 2 (PARAM_SET 0, PARAM_SET 1, PARAM_SET 2)-These three bytes are defined by various variables used in MPEG-1, MPEG-2 and H.263. Definitions for each variable set are described in Tables 29 and 30.

Table 29

Definition of PARAM_SET0

The 2-bit intra dc Precision variable defined in * intra_dc_precision (IDP) -MPEG-2 should be set to 00 in MPEG-1 applications.

top_field_first (TFF) -flag for MPEG-2 used for motion vector encoding and decoding.

The frame for frame_pred_frame_dct (FPFD) -MPEG-2 indicates that frame_DCT and frame prediction are used.

In motion_motion_vectors (CMV) or advamced_prediction_mode (AP) -MPEG-2, this flag indicates that motion vectors are used for inter-image macroblocks. In H.263, this flag is set to 1 if the advanced prediction mode is ON. Otherwise it is set to zero. For other standards this flag should be set to zero.

The flag for intra_vlc_format (IVF) -MPEG-2 determines the VLC table format for macroblocks between images.

The flag for alter-scan (AS) -MPEG-2 determines the order of the coefficients to be encoded and decoded.

In vertical_size_flag (VSF) or continuous_presence_multipoint (CPM) -MPEG-1 and MPEG-2, it should be set to 1 if the vertical size of the image of this flag exceeds 2800 lines, otherwise it should be set to 0. In H.263, this flag is continuously set to 1 if the current multipoint mode is used and to 0 otherwise.

Table 30

Definition of PARAM_SET1 and PARAM_SET2

10.8.2 BP_CONTROL Register Format

The bit specifications for the BP_CONTROL [31: 0] register (address 27'h7C0_0004) are shown in Table 31.

Table 31

BP_CONTROL register format

* BP_enable (BP-EN)-When this flag is set to 1 by ARM7 or VP, BP performs the process. Therefore, all other register structures are completed before this flag is set. If the BP has completed the process, this plug is cleared by the BP.

software_reset (SOFT_RESET)-When the flag is set by ARM7 or VP, the BP stops the current process and returns to the initial register in all default registers, becoming idle. ARM7 can restart the BP process by setting the BP-EN flag. The BP hardware reset signal is active low.

* pause (PAUSE)-When the flag is set to 1 by ARM7 or VP, BP stops the current process operation. The user executes the stop operation by setting the BP_EN flag.

* detect_start_code (DETECT_START_CODE)-When the flag is set to 1 by ARM7 or VP, the BP finds the next start code in the data in the IBUFO. Therefore, the user must set the preferred addresses for IBUFO_START and IBUFO_END. This command will work properly if the BP is idle. Therefore, ARM7 must first send a software reset instruction to BP before sending it out if BP is not idle.

* seep (STEP)-When this flag is set to 1 by ARM7 or VP, the BP performs one state of the current operation. This is a very necessary feature for debugging. ARM7 must first send a stop instruction to enable this step.

context_switching_mode (CTX_MODE)-When the flag sets CTX_SWITCH to 1 and sets it to 1 by ARM7 or VP, the BP performs preemptive switching mode. If this is set to 0 as CTX_SWITCH is set to 1, the BP performs cooperative contact switching mode. Setting CTX_MODE without setting CTX_SWITCH to 1 does not affect the BP process. See section 10.2 for details on contact switching.

context_reload_request (CTX_RELOAD)-When the flag is set to 1 by ARM7 or VP, the BP reloads the contacts already stored in the SDRAM. The BP then reads the stored contact from address SAVE_ADR [31: 0]. See section 10.12 for details on contact switching.

error_handle_mode (ERR_HANDLE_MODE)-This flag is used to perform error recovery to the BP when an error occurs in the transmitted compressed bit stream.

If the input bit stream is invalid data, the BP interrupts ARM7 and checks the contents of this flag. When this flag is set to 1, the BP automatically finds the next start code. If the start code is slice or GOB, BP repeats this process. When this flag is set to 0, the BP will run idle without looking for the next start code. Handshaking between BP and ARM7 is described in section 10.13.

number_of_macroblocks_to_be_encoded (NO_MBS [15: 0])-This register contains 16 bits representing the number of macroblocks encoded in a slice or GOB. Using this bit resolution up to 65535, the macroblocks are encoded in slices or GOBs. Here, a value of 0 is not allowed as the number of macroblocks.

10.8.3 BP_STATUS Register Format

The BP_STATUS [31: 0] register (address 27'h 7CO_0050) is shown in Table 32.

Table 32

BP_STATUS Resist Format

The input_buffer_0_done (IBUF0_DONE) -flag uses all the data in input buffer 0 by the BP. This flag is set by the BP and cleared by ARM7 or BP. This flag indicates an interrupt status.

input_buffer_1_done (IBUF1_DONE)-This flag indicates that the data in input buffer 1 has been exhausted by the BP. This flag is set by the BP and cleared by ARM7 or VP. This flag indicates an interrupt status.

output_buffer_0_full (OBUF0_FULL)-This flag indicates that output buffer 0 is filled by the BP. The flag is set by the BP and cleared by ARM7 or VP. This flag indicates an interrupt status.

output_buffer_1_full (OBUF1_FULL)-This flag indicates that output buffer 1 is filled by the BP. The flag is set by the BP and cleared by ARM7 or VP. This flag indicates an interrupt status.

BP_processing_done (BP_DONE)-This flag indicates that the BP has encoded a slice or GOB or detected a start mode other than a slice or GOB upon decoding. This flag is set by the BP and cleared by ARM7 or VP. This flag indicates an interrupt status.

context_switching-done (CTX_SW_DONE)-This flag indicates that the BP is about to switch from contact switching mode to another task. This flag is set by the BP and cleared by ARM7 or VP. This flag indicates an interrupt status.

context_reload_done (CTX_RELOAD_DONE)-This flag indicates that the BP has completed the reload operation for the contact stored from address SAVE_ADR [31: 0]. This flag is set by the BP and cleared by ARM7 or VP. This flag indicates an interrupt status.

BP_error_flag (BP_ERR)-This flag indicates that an error occurs at the BP while processing the data. This flag is set when BP_ERR_CODE [7:] 0 (= BP_STATUS) [31:24] is not zero. Details can be found in subset 10.9.2.

input_buffer-0_full (IBUF0_FULL)-This flag indicates that the data in input buffer 0 is filled in by ARM7 or VP. This flag is set by ARM7 or VP and cleared by BP.

input_buffer-1_full (IBUF1_FULL)-This flag indicates that the data in input buffer 1 is filled in by ARM7 or VP. This flag is set by ARM7 or VP and cleared by BP.

output_buffer-0_done (OBUF0-DONE)-This flag indicates that the data in output buffer 0 has been exhausted by ARM7 or VP. This flag is set by ARM7 or VP and cleared by BP.

output_buffer-1_done (OBUF1_DONE)-This flag indicates that the data in output buffer 1 has been exhausted by ARM7 or VP. This flag is set by ARM7 or VP and cleared by BP.

* valid_bit_position (VALID_BIT_POS [2: 0])-The 3-bit information stored in VALID_BYTE_ADR [31: 0] for the next procedure indicates the valid bit position of the data byte. In video encoding, BP must set a value, and ARM7 must do the following from this bit position: In video decoding, ARM7 must set a value and BP must perform the process from these bit positions.

* BP_error_code (BP-ERR_CODE [7: 0]) The 8-bit information indicates what error occurred in the BP. A zero value indicates that no error occurred. Details are given in subsection 10.9.2.

10.8.4 Input Data Format for Decoding and Output Data Format for Encoding

In this case, the data is essentially a compressed bit string. Such data should include data compressed according to initial code, header variables, and corresponding standards. This bit stream is packetized byte by byte, but in some operations it is not necessary to allocate bytes. This bit stream contains data for various slices or GOBs.

10.8.5 Input data format for encoding and output data format for decoding

In this case, the data substantially consists of macroblock header information, motion data and pixel coefficient data. This kind of data format is defined below.

10.8.5.1 Macroblock Header Word

The macroblock header always consists of 6 bytes and has the following data format given in table 33.

Table 33

Macroblock Head Word Format

Here, the variables shown in the table are defined next.

vertical_macroblock_address (VMA) or group_number (GRN0) —These bytes indicate the position of a vertical macroblock with a value from 1 to 255. The first vertical position is described as 1, not 0. In exceptional cases, in H.261 encoding, this field indicates groupf_number information indicating the position of the block group.

horizontal_macroblock_address (HMA) or macroblock_position (MBPS) —This field indicates the position of a horizontal macroblock with a value from 1 to 255. The first horizontal position is described as 1, not 0. In exceptional cases, in H.261 encoding, this field indicates any of the 33 possible positions of the macroblock in the GOB.

macroblock_intra (I)-If the current macroblock is inter-image encoded, it is set to 1, otherwise it is set to 0.

macroblock_patterm (P)-If the current macroblock contains a coded block, it is set to 1, otherwise it is set to 0.

macroblock_quant (Q)-If the current macroblock has a new quantum scale variable, it is set to 1, otherwise it is set to 0.

macroblock_motion_forward (MF)-If the current macroblock is forward prediction, it is set to 1, otherwise it is set to 0.

macroblock_motion_backward (MB)-If the current macroblock contains backward prediction or B-blocks in H.263, it is set to 1, otherwise it is set to 0.

Bits [5] of dct_type (DT), loop_filter (LF), or advanced_prediction (M4) -byte 2 have different meanings in their respective operations. This is not used in MPEG-1. In MPEG-2, it means dct_type. If the macroblock is field DCT coded, this flag is set to one. If frame DCT coded, it should be set to zero. In H.261, this flag is set if the loop filter is used in the current macroblock. Otherwise, it is set to zero. In H.263, if the current macroblock used the advanced prediction mode, it is set to one, otherwise it is set to zero.

The * motion_type (MT) -2 bit field indicates a frame_motion_type or fieldmotion_type used in MPEG-2, which is represented in Tables 34 and 35.

Table 34

meaning of frame_motion_type

Table 35

meaning of field_motion_type

* quantizer_scale (Q_SCALE) -An integer not represented in the range 1 to 31 to scale the reproduction level of the DCT coefficient level. All macroblock headers must contain appropriate values for these variables, even if their value is the same as the value of the previous macroblock (ie macroblocks_quant is zero). In encoding, the user is responsible for writing the appropriate values in these fields. At decoding time, the BP should write the Hoffman decoded quantizer scale value in this field. If the current macroblock does not contain Hoffman code in this field. The BP should use the scale value of the previous macroblock.

coded_block_pattern_0 (CBP_0) -6-bit code indicates a block encoded in the current macroblock.

here,

CBP_0 [5] ⇒ Luminance (Y) 0 blocks

CBP_0 [4] ⇒ 1 block of luminance (Y)

CBP_0 [3] ⇒ 2 blocks of luminance (Y)

CBP_0 [2] ⇒ 3 blocks of luminance (Y)

CBP_0 [1] ⇒ Color Blue (Cb) Block

CBP_0 [0] ⇒ Color Red (Cr) Block

coded_block_pattern_1 (CBP_1) -Additional coded_blocks_pattern for B-blocks of BP frames in H.263. here,

CBP_1 [5] ⇒ 0 blocks of luminance (Y)

CBP_1 [4] ⇒ 1 block of luminance (Y)

CBP_1 [3] ⇒ 2 blocks of luminance (Y)

CBP_1 [2] ⇒ 3 blocks of luminance (Y)

CBP_1 [1] ⇒ Color Blue (Cb) Block

CBP_1 [0] ⇒ Color Red (Cr) Block

logical_channerl_indicator (LCI)-The 2-bit information for the GOB logical channel is used only at the current multipoint in succession in H.263.

* frame_id (FID)-2-bit information of the GOB frame ID for H.263

macroblock_address_indicator (MBA_INC) -Represents two-byte information to indicate an increasing value of the current macroblock address. This information is always provided by the BP as additional information and the user does not need to set it in the input format. And any value specified in the input macroblock header word will be ignored by the BP.

* previous_dc_luminance (PRE_DC_Y)-Two-byte information for the dc value of the luminance block in the previous macroblock. If the macroblock is skipped, the reset value is transmitted. This information is always provided by the BP as additional information, so the user does not need to set it in the input format. And any value specified in the input macroblock header word will be ignored by the BP.

* previous_dc_chrominance_blue (PRE_DC_Cb)-This is 2-byte information for the dc value of the blue color block in the previous macroblock. If the macroblock is skipped, the reset value is transmitted. This information is always provided by the BP as additional information, so that the user does not need to set the input format. And any value specified in the input macroblock header word will be ignored by the BP.

* previous_dc_chrominance_red (PRE_DC-Cr)-This is 2 bytes of information for the dc value of the red color block in the previous macroblock. If the macroblock is skipped, the reset value is transmitted. This information is always provided by the BP as additional information, so the user does not need to set up an input pocket. And any value specified in the input macroblock header word will be ignored by the BP.

10.8.5.2 motion data words

Each macroblock header first considers the case of MPEG-1 and MPEG-2 if the macroblock contains additional motion words, if the macroblock contains a motion vector. This standard will have the additional header word format shown in Table 36 for motion vectors when either of the following occurs.

Condition 1) When MF = 1 or (I = 1 and CMV = 1)

Condition 2) When MB = 1

Table 36

Common motion vector data formats for MPEG-1 and MPEG-2

In Table 36, the values of all elements are help-pel precisions. The FS0, FS1, FS2 and FS3 are 1-bit flags for confirming field selection in each motion vector. If no field is selected, the flag should be set to zero. This is because MPEG-1 does not use field selection information, so this flag is set to zero.

One exceptional case occurs in MPEG-2 encoding of dual prime motion vectors. In this case, the forward motion vector consists of 16 bytes (substantially 8 bytes are used), and the format is as in Table 37. Normally, the BP will convert motion vector values to differential values in video encoding applications. However, the motion vector component in Table 37 is the difference value that is directly input from the Hoffman encoder. The dual prime motion vector is operated by BP for MPEG-2 decoding applications.

Table 37

Motion vector data format in dual prime mode for MPEG-2 encoding

H.261 and H.263 have some other motion vector data formats. In most cases, one byte can represent a certain value of a motion vector component. Depending on the contents of the MF and M4 flags, there are at least two corresponding motion compensation macroblocks, and in many cases will have ten motion vector components. The data format of the motion vector data is shown in Table 38.

Table 38

Motion vector data format for H.261 and H.263

10.8.5.3 Pixel Count Data Word

Four video compression standards have different maximum pixel bit lengths for quantization levels. This comparison is shown in Table 39.

Table 39

I / O pixel bit resolution

Therefore, the pixel data formats for MPEG and video conferencing standards differ, as can be seen in Table 40.

Table 40

Pixel Count Data Format

10.9 Interrupt Condition

The BP interrupts ARM7 by checking the ARM7_IRQ flag if it meets the interrupt conditions described in this section. The BP has two sets of interrupt conditions, namely default and error conditions. These conditions are stored in BP_STATUS [15: 0]. If either bit is set by the BP, it will activate the ARM7_IRQ signal. These conditions can all be masked by setting the corresponding bits in the BP_INT_MASK [15: 0] register.

10.9.1 Default Interrupt Condition

* Default condition 0 (BP_STATUS [0])-When processing of input butter 0 is finished, you should also check the ARM7_IRQ which the BP also sets the IBUF0-DONE flag.

Default condition 1 (BP_STATUS [1])-When processing of input buffer 1 is finished, the BP also checks ARM7_IRQ which sets the IBUF1_DONE flag.

Default condition 2 (BP_STATUS [2])-When the processing of input buffer 0 is finished, or ARM7_IRQ that the BP sets the OBUF0-DONE flag should be checked.

Default condition 3 (BP_STSTUS [3])-When processing of input buffer 1 is finished, you should also check the ARM7_IRQ which the BP sets the OBUF1_FULL flag.

Default condition 4 (BP_STATUS [4])-For video encoding, when you finish a slice or GOB designed by ARM7, or for video decoding, when you arrive at an initial code that is not a slice or GOB, or the BP is BP_DONE You should check the ARM7_IRQ setting the flag.

Default condition 5 (BP_STATUS [5])-When completing a contact save operation in preemptive contact switching mode, or when the current slice or GOB is finished in cooperative contact switching mode, the BP must check the ARM7_IRQ setting the CTX_SW_DONE flag. do.

Default condition 6 (BP_STATUS [6])-When the contact load is running again, the BP should also check the ARM7_IRQ setting the CTX_RELOAD_DONE flag.

* Default condition 7 (BP_STATUS [7] _currently, BP_STATUS [7] is maintained. Therefore, these bits should be set to 0. Normally, it is recommended that these default interrupt conditions be masked using BP_INT_MASK [7: 0]. However, in some operations you may want to mask Default condition 1.

10.9.2 Error Interrupt Condition

If an error occurs at the BP, the BP sets the BP_ERR flag to be an ARM7 interrupt request. At the same time, the BP sets appropriate data among non-zero values in the BP_ERR_CODE field in the BP-STATUS register. This 8-bit BP_ERR_CODE has the following meaning.

* BP_CODE = 8'b0000_0000; No error occurred

* BP_ERR_CODE = 8'b0000_0001; improperly set in BP_MODE register

* BP_ERR_CODE = 8'b0000_0010; Improperly set horizontal macroblock position

* BP_ERR_CODE = 8'b0000_0011; Inappropriately set vertical macroblock position

* BP_ERR_CODE = 8'b0000_0100; Invalid VLC for Macroblock Address Increase

* BP_ERR_CODE = 8'b0000_0101; Invalid VLC for Macroblock Type

* BP_ERR_CODE = 8'b0000_0110; Invalid VLC for Macroblock Motion Code

* BP_ERR_CODE = 8'b0000_0111; Inappropriate cancellation motion vector marker bit

* BP_ERR_CODE = 8'b0000_1000; Illegal VLC for Coded Block Pattern

* BP_ERR_CODE = 8'b0000_1001; Inadequate VLC for block DCT dc size

* BP_ERR_CODE = 8'b0000_1010; Improper DCT dc value

* BP_ERR_CODE = 8'b0000_1011; Inadequate VLC for block DCT ac coefficients

* BP_ERR_CODE = 8'b0000_1100; # of blocks in one macroblock exceeds 64.

* BP_ERR_CODE = 8'b0000_1101; Inappropriate f_CODE value (e.g. value is 0)

* BP_ERR_CODE = 8'b0000_1110; Inadequate VLC for block DCT ac coefficients

* BP_ERR_CODE = 8'b0000_1111; Improper IBUF and OBUF address setting

* BP_ERR_CODE = 8'b0000_0000; The least significant 4 bits of the start address for the BP I / O buffer are not zero.

Other BP_ERR_CODE values are stored.

10.10 Detailed Functionality Requirements

10.10.1 IOBUS Interface

All data movement between the BP and the CCU is via IOBUS. The IOBUS is a 32-bit @ 40 MHz synchronous bus containing multiplexed addresses and data. Since at least 7 cycles are required to transfer 16 bytes of data over the IOBUS, the maximum transfer rate of the IOBUS is 91.4 Mbytes / sec (= 731.4 Mbits / sec).

The BP can be a master or slave for all IOBUS lead and write transfers. When the VP operates as a master, it must send a requester signal to the IOBUS arbiter. If there is no IOBUS, the arbiter will give the BP and send a device select signal.

The competition of data over the IOBUS will be one of the following three categories. That is, 32-bit pixel data containing two or four pixel elements, 32-bit compressed bit stream words, and syntax / control variables for encoding and decoding operations. In addition to information such as timing charts for IOBUS interfaces, the user is advised to review the MSP IOBUS specification.

10.10.2 Block Layer Process

10.10.2.1 Zig-zag scan specification

The BP supports the two zig-zag scan conversion metrics presented in the MPEG video standard. These 8 * 8 block data transmitted between VP and BP all contain 64 components.

10.10.2.2 RLC code

For RLC decoding, the BP generates zero and level data according to the Hoffman decode result of the DCT ac coefficients. If a 64 pixel end_of_block signal is detected before the data is produced in one 8 * 8 block, the RLC decoder produces the remaining zero data. For RLC encoding, the BP counts ambiguous zero data and combines with the next NON-zero data to generate run-length and level codes. If all of the remaining data is equal to zero, then an end-of-block is generated rather than an RLC for the remaining data. The operation cycle for the RLC code proceeds by the number of zeros generated in this way.

10.10.2.3 Hoffman Code

The BP Hoffman code supports all Hoffman tables recommended for MPEG-1, MPEG-2, H.261 and H.263 video table specifications. If all ROM words are 12 bits, all tables will be able to run on the look-up table. However, any Hoffman table with something simple or very complex can be implemented using hardwired logic. The decoder table, which is implemented using the look-up-table ROM, is summarized in Table 41.

Table 41

Required ROM Size for Hoffman Decoder Lookup Table

The contents of the encoder table requiring a larger ROM size than the decoder table are summarized in Table 42.

Table 42

ROM Size Required for Hoffman Encoder Lookup Tables

From Tables 41 and 42, the size of the ROM as a whole required for the Hoffman encoder and decoder is 768 * 12 bits. The table does not include stuffing code, escape_code, sine bits of DCT coefficients and end_of_block code operated by the state machine.

The operating cycles for each Hoffman code are listed in Table 43.

Table 43

Processing Cycles for Hoffman Codes

Finally, the JPEG coding table indicates that it cannot be implemented if the above process is performed. However, a JPEG encoding application may be used in the dc_coeff_next_0 table.

10.10.2.4 Differential dc values

In the case of an intra block, the BP also calculates the differential dc coefficient of the first element of 8 * 8 block data and reproduces the dc value with the differential dc coefficient already transmitted.

10.10.2.5 Non-coded Blocks

The BP does not support uncoded blocks. The VP and ARM7 perform an unsigned block. In order for the VP and ARM7 to process this kind of block, the BP represents an uncoded block in coded_block_pattern appearing in the word of the macroblock header.

10.10.2.6 Sequence of block transmissions

The order of blocks in one macroblock sent for encoding and decoding is as follows; Luminance (Y) blocks 0, 1, 2 and 3, color blue (Cb) and color red (Cr) blocks.

10.10.3 Macroblock Layer Process

10.10.3.1 Differential motion vector

The BP calculates a differential motion vector from the motion estimation result and reproduces the motion vector with the transmitted differential motion vector except as follows.

The first case is the dual prime mode for MPEG-2 video encoding. In this case, the motion vector transmitted to the BP is a vector '[0] [0] [1: 0], not a vector' [r] [0] [1: 0) (7.6 of the MPEG-2 Video Standard). See section 3.6).

The second case is H.263's advanced prediction mode. In this case, four motion vectors and these values should be transmitted to / from the BP as difference values.

10.10.3.2 Skip Macroblocks

The BP does not support skipped macroblocks. The VP and ARM7 process the skipped macroblock in this way. In the VP and ARM7 for processing the skipped macroblock as above, the BP writes horizontal and vertical macroblock addresses in the header word of the macroblock.

10.10.3.3 Macroblock Stuffing Code

In MPEG-1, if a macroblock stuffing code occurs in one cycle, the BP should discard it. However, in MPEG-encoding, the BP prevents the user from including macroblock stuffing code in the macroblock layer header. In general, this stuffing code is used to control the output video rate buffer. Therefore, instead of inserting the macroblock stuffing code, it is recommended to insert zero stuffing bits between the start codes.

For MPEG-1 and MPEG-2 applications, the bit stream output must be byte-aligned up to the slice layer. Although the bit stream output is byte-aligned to the picture layer, it is byte-aligned to the GOB layer for H.263 applications. However, the output of the H.261 encoder is not byte-aligned. Therefore, the bitstream that forms the routine in ARM7 is programmed to account for this difference. If the amount of data for the last data transfer over IOBUS is 16 bits or less for encoding, BP automatically performs a zero-fill operation at the end of the slice.

10.10.4.2 Extra Slice Information

In decoding, the BP discards any extra slice information contained in the slice head of the MPEG-1 or MPEG-2 bitstream. In encoding, the BP does not insert any extra slice information requested by the user. If the user still wants to include this information in the MPEG-1 or MPEG-2 bitstream, this information may be inserted into the bitstream previously encoded by the BP.

10.10.4.3 Intra slice

In the MPEG-2 slice layer bitstream, a parameter called intra_slice is used to indicate that the current slice is composed only of intra macroblocks. This information is not used during decoding and is intended to assist DSM applications when performing fast forward or fast reverse functions. Therefore, the BP discards this information in decoding applications and inserts 0 into intra_slice in the slice layer header in encoding applications.

10.10.4.4 Slice or GOB Start Code

In MPEG-1, MPEG-2, H.261, a picture has at least one slice or GOB start code. However, H.263 pictures do not have GOB start code and header information. In particular, the first GOB in any H.263 picture has no start code and header information. Therefore, if the incoming bitstream is for H.263, the BP state machine must process the macroblock layer immediately. In addition, if a GOB start code is found while the bitstream is being decoded, the BP decodes the start code and continues processing without interrupting ARM7.

10.11 Input / Output Double Buffer Interface

10.11.1 General Description

The input and output buffers are implemented as double buffers. Therefore, as shown in Figs. 64 and 65, four memory buffers, IBUF0, IBUF1, OBUF0, and OBUF1, are used.

64 and 65, each buffer has a start and end address and a full and complete flag. To determine each buffer size, the user must enter appropriate values for the start and end addresses for each buffer.

When the source processor for a general buffer completes writing to the buffer, it sets the full flag and starts writing to another bank. When the sink processor for bank knows that the accessed bank is full, it reads data. If the bank is empty, the sink sets a completion flag and checks the fill flag of the other bank.

The four starting addresses are updated by the BP as described in section 10.7.02. Each register for the start address stores the last byte address accessed by the BP whenever the BP accesses the input or output buffer. Therefore, ARM7 sets the corresponding start address when any one of IBUF0_DONE, IBUF1_DONE, OBUF0_FULL, and OBUF1_FULL flags is set.

Also, the last four bits of the start address are always set to zero by ARM7. This is due to the internal data alignment structure between the FBUS, CCU and IOBUS. In addition, each end address must be set such that the total number of bytes of each buffer size is a multiple of 16. In addition, the minimum buffer size is recommended to be 64 bytes for MPEG-1 and MPEG-2 and 128 bytes for H.261 and H.263. This is to prevent performance degradation due to frequent interruption of BP to ARM7.

10.11.2 Handling Abnormal Buffer Status

When two output buffers are filled, the BP stops processing and falls to the idle state regardless of the input double buffer status. If the OBUF0_DONE or OBUF1_DONE flag is set, the BP automatically exits this idle state.

When the two input buffers are empty, the BP does not need to stop processing and continues processing until the data remaining therein is completed. However, if the two input buffers are empty, BP immediately interrupts ARM7. After the end of the remaining data processing, if the input buffers are still empty, the BP falls to the idle state. When the IBUF0_FULL or IBUF1_FULL flag is set, the BP automatically exits from this state.

The idle states described in this section are different from the other idle states described in this specification. This is because the control command of ARM7 is normally required to escape from other idle states.

10.11.3 Physical Implementation of the I / O Buffer: Example

In most cases, it is up to you to determine the location and size of the BP input and output buffers. The user implements the buffer in the scratch pad area of the VP data cache, ARM 7 data cache, or SDRAM. Although the implementation of the BP input and output double buffers is somewhat limited, there is an efficient way to implement such buffers.

Here, a special example of the implementation of the rate buffer in video decoding applications will be given. In this case, the user wants to implement the BP input buffer as a circular buffer. Here, SDRAM is used, and a complete rate buffer is assumed to be divided into four blocks as shown in FIG.

Initially, the user can set Rate_Buffer_Block_0 and Rate_Buffer_Block_1 to IBUF0 and IBUF1, respectively. This is made possible by setting as follows.

IBUF0_START = Rate_Buffer_Address_0;

IBUF0_END = Rate_Buffer_Address_1;

IBUF1_START = Rate_Buffer_Address_2;

IBUF1_END = Rate_Buffer_Address_2;

IBUF1_END = Rate_Buffer_Address_3

If all the data in IBUF0 (that is, the data in Rate_Buffer_Block_0) are used by the BP, the BP interrupts ARM7. Then, ARM7 sets Rate_Buffer_Blokc_0 to IBUF0 by setting as follows.

IBUF0_START = Rate_Buffer_Address_4;

IBUF0_END = Rate_Buffer_Address_5.

If the data in IBF1 is used up by the BP, the BP interrupts ARM7. Then, ARM sets Rate_Buffer_Block_3 to IBUF1 by setting as follows.

IBUF1_START = Rate_Buffer_Address_6;

IBUF1_END = Rate_Buffer_Address_7.

If all data in Rate_Buff_Block_2 is used by the BP, ARM7 resets Rate_Buffer_Block_0 back to IBUF0 by setting the address as in the first step.

Thus, a circular buffer can be implemented by simply repeating this complete process. This example shows that the use of the BP double buffer is very flexible according to the user's intention.

10.12 Context switching

If more than one application runs MSP, the ARM7 operating system instructs the BP to terminate the current task and switch to another task. This process is commonly referred to as context switching. BP supports two kinds of context switching modes described below.

10.12.1 Preemptive Context Switching

Forced exclusion context switching means that the BP has now performed normal 8 * 8 pixel block processing and then ended normal processing. ARM7 commands forced exclusion context switching mode by setting the CTX_SWITCH and CTX_MODE flags to 11 in the BP_CONTROL [6: 5] register. Once the current block processing is complete, the BP sends the internal context to the external SDRAM for later processing.

When the BP completes saving the context, it interrupts ARM7 by setting the CTX_SW_DONE flag located at BP_STATUS [5]. ARM7 then saves all of the contents of the BP's I / O buffer and initializes the BP for other work.

This mode allows BP to respond as soon as possible to ARM7's context switching requests. In the worst case, the BP needs about 150 cycles (= 3.75usec) to complete the current block processing. In normal cases, however, it is desirable to assume that several dozen cycles are required to complete the block processing.

10.12.2 Cooperative Context Switching

Cooperative context switching eliminates the context storage process in the BP. This is due to the fact that all BP internal states must be initialized when processing all slices or G0B layers. In this mode, the BP continues normal processing of the current slide or G0B and then completes the processing.

ARM7 commands the coordinated context switching mode by setting the CTX_SWITCH and CTX_MODE flags in the BP_CONTROL [6: 5] register to 10. When the current slide or G0B process is complete, the BP interrupts ARM7 by setting the CTX_SW_DONE flag located at BP_STATUS [5]. ARM7 then saves all of the contents of the BP's I / O buffer and initializes the BP for other work.

10.12.3 Reload Context

To switch the previous task, BP reloads the context stored in SDRAM from address SAVE_ADR [31: 0]. To request this context reload, the BP needs to be in an idle state. A possible situation for this request is if BP_DONE is set, or if CTX_DONE or ARM7 resets BP with software. So, if ARM7 sets the CTX_RELOAD flag in BP_CONTROL [7], the BP will exit the idle state and start reading the stored context.

After the BP completes the context reload operation, it interrupts ARM7 by setting the CTX_RELOAD_DONE flag. ARM7 then initializes the internal registers of the BP and enables the BP for processing the previous job.

10.13 Task Handshaking

This section covers the detailed process for job handshake when the BP has finished processing. Here, the uptet of the pointer for the last data means that the BP has written appropriate values in VALID_BYTE_ADR [31: 0] and VALID_BIT_POS [2: 0], respectively.

10.13.1 For Encoding

In steady state, input data for encoding is supplied from VP. When one of the input double buffers is filled by the VP, the BP starts reading data over the IOBUS. At the end of processing (i.e., if the number of macroblocks processed is equal to the number of macroblocks specified by ARM7), the BP sets the BP_DONE flag to interrupt ARM7 and fall to the idle state.

The pointer for valid data indicates the end of the compressed bit stream for the slice or G0B. Also, VALID_BYTE_ADR [31: 0] indicates the position in one of the output double buffers.

ARM7 combines the compressed bit stream with the higher layer header to form the final bit stream and repeats the process. If ARM7 wants to restart the BP before the data in the output double buffer is completely exhausted, at least one output double buffer is exhausted and the pointer is updated by the BP when the BP is resumed, so You can leave the pointer as is.

10.13.2 For decoding

First, ARM7 looks for a slice or G0B start code (if present). If the start code is found, ARM7 initializes and enables BP. After performing Huffman decoding, RLC decoding, and reverse zigzag scan conversion at the BP, data is transferred to the output buffer for VP processing. The BP continues this processing routine until a non-slide or non-G0B start code is detected. If they are detected, the BP interrupts ARM7 by setting a pointer to the last data used in the engine of the non-slide or non-G0B start code. Next, ARM7 decodes the start code and performs header parsing until the next slide or G0B code is found.

10.13.3 Errors Found in Compressed Bit Streams

In video telephony applications where actual data is transmitted over telephone lines and public switching networks, it is very likely that some invalid data will be included in the incoming bit stream. In this case, the BP should interrupt ARM7 and check the ERR_HANDLE_MODE flag. It is safe if the user determines the error handling mode before the BP is enabled for a particular application.

If the ERR_HANDLE_MODE flag is set to 1, the BP automatically finds the next start code. If the start code is for a slice or G0B, the BP continues normal processing. This mode is very efficient because the BP can find the start code faster than the ARM, and the ARM7 can perform other processing routines while the BP finds the next start code. However, if a start code other than the slice or G0B layer is found, the BP sets the BP_DONE flag to interrupt the ARM7 again and falls to the idle state. In this case, the pointer used for the last data must point to the end of the next start code.

If the ERR_HANDLE_MODE flag is set to 0, the BP falls to the idle state without looking for the next start code. In this case, the pointer used for the last data should indicate where the error was found. This mode is useful if you want to debug a dirty bit stream using ARM7 instructions.

Appendix B

MPC Bitstream Processor

The bit stream processor (BP) is one of the MSP processing cores that is important for video data encoding and decoding applications. BP handles MPEG slice layer encoding and decoding, and H.261 / H.263 block group (G0B) layer encoding and decoding. In decoding applications, the BP provides the vector processor and the ARM-7 core with the full information contained in each macroblock.

The bit stream processor hardware is divided into four functional blocks.

IOBUS port interface with I / O control and decoding unit

* BP Control State Machine

Codec cores including BP register multiplexer, registers, arithmetic logic unit (ALU) and multiplexer, FIFO control unit

* VLC FIFO Unit

VLC codec with lookup ROM with codec address generator

A description of the VLC LUT ROM 340 (see FIG. 3) is as follows.

1.0 methodology

The lookup table unit is the heart of Huffman encoding and decoding. The unit supports all VLC tables included in the MPEG-1, MPEG-2, H.261 and H.263 specifications and is supported by Samsung MSP. Most of these tables are implemented with ROM 12 bits wide. However, if the lookup process is too simple or does not fit in the size of the ROM table, special encoding and decoding will be applied. All four specifications in this layer include many variable length codes up to 17 bits. In addition to the encoding or decoding values, code sizes and valid code indicators are provided for encoding and decoding to ensure that the processing is processed correctly. Using conventional methods to encode or decode the VLC table, the ROM table and address generator will be very large.

1.1 The implementation is as follows:

If designing an address generator is not difficult, share the ROM table as much as possible.

Reorder VLC tables based on encoding or decoding.

Decode the '0' count and the '1' count first, based on the Huffman code.

Reduce table size by using 1-bit flags such as sign or even / odd.

* If possible, separate one ROM location into 'upper' and 'low'.

Simplify the address generator using the least significant bits (LSBs) of the VLC to generate a ROM table address.

This method is very efficient. The final ROM table size is 768 * 12 bits, which is much smaller to involve the problem. The lockup is performed by the ROM table address generator and the ROM table lockup process. The address generator decodes an input signal such as a table type, a mode, and a VLC value to generate an address of the ROM table. The encoding or decoding data is then obtained from the ROM table values and other information. The decoding table has two formats, one for DCT coefficients with one ROM location per VLC code, and the other for different table where each ROM location is divided into upper 6 bits and lower 6 bits. Thus, each location has two VLC codes. The encoding table has two formats, one for the TCOEF of H. 263, and the other for the other table. Each ROM location contains one Huffman code for encoding application. The size of the ROM table is 768 * 12 bits. The table can be represented as:

[Table 1]

VLC Decode ROM Table Map

[Table 2]

VLC Encoding ROM Table Map

1.2 decoding

All tables for decoding are rearranged based on '0' or '1' counts. If the MSB of the VLC code is '0', a '0' count is applied; otherwise, a '1' count is used. For example, the code '00001xxx' has four '0's, and the code' 1110xxx 'has three' 1 'counts. The decoding process first decodes the '0' / '1' count, and outputs the '0' / '1' count of the VLC code to the ROM table address generator.

The address generator then decodes the rest of the code to generate an address. The address consists of two parts, one is an offset and the other is obtained from the VLC table as a so-called masked address. The address is obtained from the OR of the two parts. Other information provided by the address generator can be represented as follows.

* VLC code size

Special Flag: The 2-bit flag indicates the decoding state machine for 'ESCAPE', 'END OF BLOCK', 'STUFFING', or 'START CODE' in H.261.

High data extract enable: Valid data is the upper 6 bits.

Sign / even enable: This flag indicates that decoding should extract the LSB of the VLC as a sign or even-bit based on the table.

* Valid VLC

Mask shift bits and mask: These two signals are applied to generate a masked address.

With respect to the ROM table, except for Tables 14, 15 of MPSG-2 and Table 12 of H.263, data formed in the upper and lower bit formats are stored in respective positions.

1.2.1 Table 12 / MPEG-2

This table is shown in Table 2-B. 5c / MPEG-1 and Table 5 / H. Same as 261.

ROM table format: bits 10 to 6: Run; Bit 5 to 0: level

1.2.2 Table 15 / MPEG-2

Most of this table is shared with Table 14 / MPEG-2 because it has the same run, level, and VLC code as Table 14 / MPEG-2.

ROM table format: bits 10 to 6: Run; Bit 5 to 0: level

1.2.3 Table 12 / H. 263

This table has one or more output values 'LAST' when compared to MPEG-2's tables 14 and 15.

ROM table format: bit 11: LAST; Bits 10 to 4: run; Bit 3 to 0: level

1.2.4 Movement code / macroblock increment

This clause covers Table 1 / MPEG-2, Table 10 / MPEG-2, Table 2-B.1 / MPEG-1, Table 2-B.4 / MPEG-1, Table 1 / H.261, Table 3 / H .261 and Table 10 / H. 263 is different.

For the motion code, it is an even value flag except when VLC = 1. Thus, half the table is decoded. Table 10 / H if the tile / excellent bit is ignored. Except for the upper part of 263, the two kinds of tables have the same VLC value and decoding value. The decoded value swings up to 6 bits, which means that two table values can be placed in one location. Although the decoding values of the lower part of Table 10 / H.263 are different from others, the tile binary values are the same because of the fixed point. In other words, we use 16 1/2 positions as fixed points to handle all these tables. Use one simple FSM to generate the ROM address. In an application, the ROM table provides an absolute value when the motion code is decoded. On the other hand, if the address generator enables the sign bit, the decoder extracts the LSB, in which case '1' means-and '0' means +. This algorithm can be written as

if (sign_enable == 1)

increment-value = sign * ROM_table-value;

else

increment_value = ROM_table_value;

If the macroblock address increment table is decoded, the result is obtained from the ROM table value and the even flag. For example, the ROM table provides a value of '5'. If the even flag is 'high', a result of '10' is obtained, and if the even flag is 'low', a value of '11' is obtained. This algorithm can be written as

if (even_enable == 1)

increment_value = (ROM_table_value1)

| (~ Even_bit);

|

else

increment_value = ROM_table_value;

ROM table format: bits 11 to 6: upper data; Bits 5 to 0: Lower data

1.2.5 Macroblock Pattern

This section covers Table 9 / MPEG-2, Table 2-B.3 / MPEG-1, and Table 4 / H.261 (CBP).

The decoded value is generated up to 6 bits, which means you can put two data in one location. That is, 32 locations are used to handle all of these tables.

ROM table format: bits 11 to 6: upper data; Bits 5 to 0: Lower data

1.2.6 Macroblock Type

This section covers tables 2,3,4 / MPEG-2, tables 2-B.2 / MPEG-1, tables 2 / H.261 (MTYPE), and tables 3,4 / H.263 (MCBPC).

The decoded value occurs up to 5 bits. Again, the concept of parent / child data is used. One simple FSM is used to generate the ROM address.

ROM table format: bits 11 to 6: upper data; Bits 5 to 0: Lower data

Although some bits have different meanings for each specification, the format of the macroblock type is defined globally for each specification on the basis of MPEG. H.263 requires two-stage decoding based on the information request, which is as follows.

* Decoding MCBPC with 3-bit macroblock type

Macroblock Type Based on the Macroblock Type, PB Flag, and Picture Type The format of the macroblock type in the VLC table is as follows.

[Table 3]

MPEG macroblock type format

[Table 4]

MCBPC format in H. 263

[Table 5]

H. 261's Macroblock Type Format

From Table 4, not only 3-bit macroblock types but also 2-bit chroma shells are obtained. Here, the macroblock type is a 3-bit value having a range of 0 to 4, inclusive. As described above, the detailed macroblock type type is decoded in the second step.

[Table 6]

Macroblock Type Decoding Lookup Table of H.263

1.2.7 DCT DC size

This section covers Tables 12, 13 / MPEG-2, and Table 2-B.5 / MPEG-1. Due to the VLC structure a '1' count is used here instead of a '0' count.

ROM table format: Bits 10 to 6: Parent data: Chroma; Bit 5 to 0: Lower Data: Luminance. Bits 11 and 5 are reserved.

1.2.8 CBPY

This clause differs from Table 9 / H.263. This table contains two sets of days, one for the interpicture and one for the intrapicture. One set of values is the inverse of the other set of values, allowing one set of data to be stored in ROM. Here, the intra data is located in the ROM. One 4-bit value is used to represent the CBPY value.

ROM table format: bits 9 to 6: upper data; Bits 3 to 0: Lower data. Ratios 11 to 10 and bits 5 to 4 are reserved.

1.2.9 Dual prime and mode

This section covers Table 11 / MPEG-2 and Table 7 / H.263.

These two tables are very simple and small so they can be decoded directly.

1.3 encoding

Like the decoding clause, the encoding process uses the concept of '0' / '1' counts. The ROM table contains information about the VLC code according to the first / last '1' and the size of the code following the first one for the '0' / '1' count, the '0' or '1' count. According to this format, the size of the ROM table is Table 12 / H. It can be limited to 12 bits except four that are addressed by special encoding at 263. The format is:

[Table 7]

Common Encoding Formats

Table 8

Table 12 / H. 263 encoding format

In the table, the VLC code size is the size of the VLC code following the first / last '1'. The VLC code is the VLC code following the first / last '1'. In the case of a '0' count, the VLC code following the first '1' is extracted, otherwise the VLC code must be extracted from the bits following the last '1'. The application of a '1' count in encoding is different than in decoding. The '1' count applies only when the '1 count flag is enabled by the address generator. Therefore, if the MSB of the VLC is 1 but the '1' count flag is low, the '0' / '1' count portion of the ROM table will be 0, which means that the '0' count is applied.

The following example covers all possible cases for encoding.

Example 1: VLC = 0000011001, one_count_enable = 0

Results for common cases: 0101 100 01001

Results for table 12 / H.263: 101 100 001001

Example 2: VLC = 11001, one_count_enable = 0

Results for typical cases: 0000 100 0101

Results for table 12 / H.263: 000 100 001001

Example 3: VLC = 11001, one_count_enable = 1

Results for common cases: 0010 011 00001

Results for Table 12 / H.263

The general address is generated as the addition of the offset and the input value.

1.3.1 Table 14 / MPEG-2

This table is identical to Table 2-B.5c / MPEG-1 and Table 5 / H.261. This encoding handles 'RUN', 'FIRST DC', 'ESCAPE', and 'END OF BLOCK' inputs.

Encoding result: offset address applied to be added with the level or run to generate an address

1.3.2 Table 15 / MPEG-2

Since both tables have the same run, level, and VLC code, most of these tables share Table 14 / MPEG-2. In some special cases a '1' count is applied. This encoding handles 'RUN', 'LEVEL', 'FIRST DC', 'ESCAPE' and 'END OF BLOCK' inputs.

Encoding result: offset address and '1' count indicator

1.3.3. Table 12 / H.263

As mentioned above, this table is very special. We use a different format to deal with this. Unfortunately, there are some exceptions in which 12 bits cannot be used to represent the VLC code. The exception is shown in Table 9. These exceptions can be specially encoded without using ROM tables.

Table 9

Exceptions to encoding in 12 / H.263

Encoding processes the 'RUN' and 'ESCAPE' inputs.

1.3.4 Movement code / macroblock increment

This section describes Table 1 / MPEG-2, Table 10 / MPEG-2, Table 2-B. 1 / MPEG-1, Table 2-B.4 / MPEG-1, Table 1 / H.261, Table 3 / H.261 and Table 10 / H.263.

As described in the decoding section, all of these tables can share one ROM table and one FSM. The VLC code obtained from the ROM table must be combined with the sign / excellent bit to make the complete VLC code. Therefore, the input values processed by this encoding FSM are the absolute value for the motion code whose LSB is a fraction bit and the macroblock address increment shifted one bit to the right.

Encoding handles 'STUFFING' and 'ESCAPE'.

1.3.5 Macroblock Pattern

This clause makes up Table 9 / MPEG-2 and Table 2-B.3 / MPEG-1.

The address is the sum of the opsec and pattern values.

1.3.6 Macroblock Type

This section covers Tables 2,3,4 / MPEG-2 and Table 2-B.2 / MPEG-1.

1.3.7 Table 3.4 / H.263 (MCBPC)

Information about the picture type, macroblock type, and stuffing flag is provided to generate a ROM table address offset. The address is the sum of the offset address and the CBPC.

1.3.8 Table 2 / H.261 (MTYPE)

The address generator is so complex that it is not worth thinking about the implementation.

1.3.9 CBPY

As discussed in the decoding section, only intra picture data is encoded. If the picture type is an inter picture, the data must first be inverted.

The address is the sum of the offset and the CBPY value.

1.3.10 DCT DC size

This section covers tables 12, 13 / MPEG-2 and 2-B.5 / MPEG-1.

Since several VLC codes for luminance and chroma are the same, several ROM tables are shared for them. The chroma flag and the value of several bits are used to generate the offset address. The ROM address can be obtained by adding the op color and the actual value.

1.3.11 dual prime and mode

This section covers Table 11 / MPEG-2 and Table 7 / H.263.

These two tables are very simple and small so they can be encoded directly.

2.0 Hardware Description

The hardware for VLC encoding / decoding is contained in the 'VLC' block. This block contains three subblocks. These blocks are applied to generate a ROM table address or decoding / encoding data. 'VLC_DEC' is used to decode the VLC to generate a ROM table address. 'VLC_ENC' is a block for encoding VLC and generates a special encoding for a ROM table address or TCOEF table of H.263. 'LOOKUP' outputs VLC data based on ROM table values or specially encoded values.

2.1 VLC decoding address generator

The key to VLC_DEC is the decoding FSM. This FSM decodes the input information to control address generation. The input and definition of FSM is as follows.

* ZERO / ONE Count (15 bit): Provides 0/1 count value.

* ZERO ONE Count (4 bits): Provides a 0/1 count value. The purpose of using two different bit-count signals is to reduce gate customers due to input data sharing. In most cases, 15 bits are used.

* ONE Count enable (1 bit): Start count of '1'

Table Type (6 bit): Table Type

Table 10

VLC_DEC FSM Table Type Format

* Mode (9 bits): Operation mode

Table 11

VLC_DEC FSM Mode Format

The definition of the specification and picture type will be described in the pin definition.

Special algorithms are used to generate ROM table addresses to simplify hardware and guarantee ROM access time. The process is as follows.

Step 1: Generate an offset address (OFFSET).

Step 2: Generate a 4-bit shift amount (MASK_SHFT), along with a write shift 16-bit FIFO_DATA. Afterwards, the four least significant bits FOL_DATA are extracted.

Step 3: Invert the four bits obtained in step 2.

Step 4: Generate a 4-bit mask signal to mask the data MASK obtained in step 3.

Step 5: The result of step 4 is ORed with the offset address. The result is a ROM table address.

Combining these steps is as follows.

Address = OFFSET | (BITREVERSE (Bit (3∼0) of (FIFO_DATAMASK_SHFT) (MASK)

The output of the FSM is:

MASK (4-bit): Mask data

OFFSET (9-bit): ROM table offset address

MASK_SHFT (4-bit): Shift amount

* SIZE (5 bits): VLC size

* SPECIAL_FLAG (3 bits): extra information for decoding

Table 12

Definition of special flags in VLC_DEC

* VALID_VLC (1 bit): Valid VLC code flag

HIGH_DATA_INDICATOR (1 bit): Outputs the upper 6 bits of ROM data.

Input pins:

FOL_DATA (4 bits): Shifted FIFO_DATA (see step 2 above)

* CNT (4 bits): 0/1 count

* ONE_CNT_EN (1 bit): '1' count indicator

* MODE (14 bit): table type and other information

The definition is as follows.

Table 13

MODE format in VLC_DEC

Specification: 00 = MPEG-1; 01-MPEG-2; 10 = H.261; 11 = H.263;

Picture type: 00 = reserved; 01 = intra; 10 = prediction; 11 = bidirectional;

FIFO_DATA (16 bits); Data includes VLC.

Output pins:

ROM_ADR (10-bit): ROM table address

MASK_SHFT (4 bits): Shift amount for FIFO_DATA (see step 2 above)

* SIZE (5 bits): VLC size

* SPECIAL_0 (3-bit): special flag (see FSM output)

* VALID_VLC (1 bit): Valid VLC flag

* HIGH_DATA (1 bit): Indicator to extract the LSB of the VLC with the sign of the mouse flag

* FULL_DATA (1 bit): Complete 12-bit data structure high when decoding CTCT coefficients.

* TABLE (6 bit): Defined on FSM input.

* T_MODE (9 bit): Defined as MODE on the FSM input.

2.2. VLC_ENC

As in the VLC encoding core portion, VLC_ENC encodes a variable length code. The output of this part is a special ROM ROM address or VLC encoding. As described in section 1.0, the encoding data structure follows the 12-bit data format in H.263 except for some special cases of TCOEF. Although a 10-bit adder is used to generate the ROM table address, from a hardware point of view it is much simpler than the VLC_DEC part.

Like VLC_DEC, the heart of this part is the FSM called VLC_ENC. The other FSM, ENC_SP, is used for special encoding.

The input signal of FSM VLC_ENC is the same as the input pins of this part.

* LAST (1 bit): value of LAST for TCOEF of H.263

* RUN / VALUE (6-bit): If the DCT coefficient table is being encoded, this input means RUN, otherwise it means a normal value, or pattern.

LEVEL (6-bit): DCT coefficient level

* SPECIAL_FLAG (2 bits): Special flag defined in the VLC_DEC part.

TABLE (6 bits): same as VLC_DEC

* MODE (9 bits): same as VLC_DEC

ROM address generation is very simple. The FSM provides an offset address that is added to the value (run) or level or zero to generate an address. Because these VLCs have the same size and '0' count, for special encoding, the output is the two least significant bits that are restored to the code.

The output pins are as follows.

ONE_CNT_FLG (1 bit): Indicates that the VLC structure uses a '1' count.

SIGN_EN_BIT: Signals that the VLC structure puts the sign / excellent bit into the VLC LSB.

* SPECIAL_ENCODE (1 bit): special encoding flag

* VLC (2-bit): specially encoded VLC code LSBs

ADR_A (16 bits): Offset address, where the upper 6 bits are zero.

ADR_B (16 bits): Another part of the address. The upper 10 bits are always zero.

2.3 lookup

This section provides for encoding / decoding of VLC data. This block handles the following situations:

Regular 12-Bit Encoding / Decoding ROM Table Value Output

Bit high / low decoding data output

* Special Encoded Data Restoration

As requested, the output data is filled with zeros.

Input pins:

D_ADR (10 bits): Decodes the ROM address.

E_ADR (10 bits): Encodes the ROM address.

ENCODE (1 bit): 1: encoding; 0: decoding

HIGH (1 bit): Extracts the high 6 bit flag.

ENABLE (1 bit): Complete 12-bit data flag

* VLC (2-bit): special encoding code

* SPECIAL_ENCODE (1 bit): special encoding code

Output pins:

LOOKUP (16 bit): VLC code