KR20210113574A

KR20210113574A - A process model using a single large linear registers, with new interfacing signals supporting FIFO-base I/O ports, and interrupt-driven burst transfers eliminating DMA, bridges and external I/O bus

Info

Publication number: KR20210113574A
Application number: KR1020210100392A
Authority: KR
Inventors: 나지르 빈 이브라힘 무하마드; 빈 아자리 나마지; 빈 바하룸 아담
Original assignee: 유니버시티 테크놀로지 말레이시아; 팔라완 마이크로
Priority date: 2014-09-26
Filing date: 2021-07-30
Publication date: 2021-09-16
Also published as: KR20160036796A

Abstract

The present invention relates to a processor or CPU architecture, and specifically, to a processor or CPU architecture which implements many technologies proven to enhance a synchronous burst data transfer. The Input-Output (I/O) is uniformly viewed and treated as an individual First-In-First-Out (FIFO) device. A plurality of memory areas are classified as user stack, kernel stack, interrupt stack and procedure call stack. Only one I/O arbiter is necessary for a CPU model which arbitrates between a plurality of FIFOs substituting data caches for on-chip implementation. Accordingly, conventional data transfer techniques using Direct-Memory-Access (DMA), bus control and lock signals, and the like may be eliminated. In order to support an interrupt-driven, FIFO-based I/O and synchronous burst data transfer, the CPU employs a simple linear large register set without bank switching.

Description

A process model using a single large linear register that achieves interrupt-driven burst transfer that excludes DMA, bridges, and external I/O buses, new interface signals support FIF-based I/O ports, and linear registers, with new interfacing signals supporting FIFO-base I/O ports, and interrupt-driven burst transfers eliminating DMA, bridges and external I/O bus}

본 발명은 마이크로프로세서와 컴퓨터 아키텍처에 관한 것이다.The present invention relates to microprocessors and computer architectures.

CPU 아키텍처의 역사History of CPU Architecture

많은 현재의 프로세서나 CPU 아키텍처들은 CISC와 RISC 디자인 특성들을 갖진 (1064년에 발표된) IBM 360에서 발췌되고 있다. 이 디자인은 비즈니스와 과학적 필요성 때문에 특정 명령어를 갖는 멀티프로세싱 조건들을 고려했다. 대부분의 CPU 디자인은 당대의 OS(Operating System) 소프트웨어를 사용하는 것으로 고안되었다. 지금까지의 컴퓨터는 이산 논리게이트로 제작되었고, 그 결과 전체 시스템이 하나의 하우스에 몇개의 룸을 차지한다. 이산 논리를 이용하는 프로세서에 관한 발명은 1964년 G.H. Barnes에 의해 출원해 1968년 9월 10일 등록된 미국특허 #3,401,376, 1967년 Threadgold가 출원해 1970년 6월 30일 등록된 미국특허 #3,518,632가 있다.Many current processor or CPU architectures are taken from the IBM 360 (published in 1064) with CISC and RISC design features. This design takes into account multiprocessing conditions with specific instructions for business and scientific necessity. Most CPU designs were designed to use contemporary operating system (OS) software. Computers so far have been built with discrete logic gates, and as a result, the entire system occupies several rooms in one house. The invention of a processor using discrete logic was published in 1964 by G.H. U.S. Patent #3,401,376, filed by Barnes on September 10, 1968, and U.S. Patent #3,518,632, filed June 30, 1970 by Threadgold in 1967.

CPU의 역사는 페이처일드가 맨처음 IC를 상용화한 1961년에 시작되었다고 할 수 있다. 1966년에 RCA사의 Henry S. Miller가 출원해 1969년 등록된 특허 3,462,742에 소개된 프로세서 시스템은 200개 정도의 논리게이트를 갖는 많은 IC들로 이루어졌다. 이것은 모든 이산 논리회로를 벗어난 것이다.The history of CPUs can be said to have started in 1961, when Facherild first commercialized ICs. The processor system introduced in Patent 3,462,742, which was applied by Henry S. Miller of RCA in 1966 and registered in 1969, consists of many ICs having about 200 logic gates. This is beyond all discrete logic circuits.

곧이어 프로세서를 구성하는 어떤 모듈들들을 IC 하나로 줄이고자 했으면 다음 단계는 마이크로프로세서일 수 있는 IC 하나에 모든 것을 집어넣기만 하면 된다는 것이 알려졌다. 1971년 첫 마이크로프로세서로서 16-핀 IC 하나에 집적된 Intel 4004를 인텔에서 생산했는데, 이것은 4-비트 데이터버스, 12-비트 어드레스버스, 16개의 4-비트 레지스터, 4개의 12-비트 레지스터로 이루어지고 750kHz 클록으로 동작했다. 이것은 고전적인 Von Neumann 아키텍처를 구현한 모든 CPU 중에 가장 간단한 것으로서, CPU 아키텍처의 시작점이 되었다.Soon after, it became known that if some modules constituting a processor were to be reduced to one IC, the next step was to simply put everything into one IC, which could be a microprocessor. In 1971, Intel produced the first microprocessor, the Intel 4004, integrated into a single 16-pin IC, which consisted of a 4-bit data bus, a 12-bit address bus, 16 4-bit registers, and 4 12-bit registers. and operated at a 750 kHz clock. It is the simplest of all CPUs implementing the classic Von Neumann architecture, and became the starting point for the CPU architecture.

70년대에 Unix가 대학과 산업용의 메인프레임 컴퓨터에 사용되었다. PC가 대량생산되기 시작한 70년대 후반의 대부분의 OS는 애플 DOS와 CP/M이었다. 대량사용을 위한 멀티타스킹, 즉 IBM PC는 Berkeley의 BSD Unix와 같은 Unix 버전이었고, Santa Cruz Operations의 Zenix가 사용되고나서 오래지 않아, IBM사가 1981년에 IBM PC를 개발했으며, 윈도우가 시작된 1990년대까지는 MSDOS가 사용되었다. In the '70s, Unix was used in mainframe computers for university and industrial use. In the late 70s, when PCs were mass-produced, most OSs were Apple DOS and CP/M. Multitasking for mass use, the IBM PC was the same Unix version as Berkeley's BSD Unix, and not long after Santa Cruz Operations' Zenix was in use, IBM developed the IBM PC in 1981, and it wasn't until the 1990s when Windows began. MSDOS was used.

1988년에 휴대형 운용시스템인 POSIX(Portable-Operating-System-Interface-for-Unix)가 개발되어, 모든 고성능 CPU에 접속할 수 있는 프로세스의 일관적인 거동을 설명하는 일정한 펑션콜을 갖는 커널을 사용했다. 이것은 CPU 아키텍처에 영향을 주는 커널과 API(Application-Interface)로 OS를 표준화했다.In 1988, POSIX (Portable-Operating-System-Interface-for-Unix), a portable operating system, was developed and used a kernel with constant function calls that describe the consistent behavior of processes that can be connected to all high-performance CPUs. It standardized the OS with kernel and API (Application-Interface) that affect CPU architecture.

거의 모든 새롭고 대중적인 CPU 아키텍처들은 80년데 중반의 인텔펜티엄에서 디자인되었고, 그 뒤로 1985년의 ARM, 1986년의 MIPS R2K, 1987년의 Sun Sparc, 1988년의 AMD 29K, 1988년의 Intel i960, 1992년의 Motorola PowerPC, 1994년의 DEC Alpha, 및 2002년의 Intel Itanium이 있다. Almost all new and popular CPU architectures were designed on the Intel Pentium in the mid 80's, since then ARM in 1985, MIPS R2K in 1986, Sun Sparc in 1987, AMD 29K in 1988, Intel i960 in 1988, 1992 There's the Motorola PowerPC from 1994, the DEC Alpha from 1994, and the Intel Itanium from 2002.

이런 CPU들 거의 모두 1960년의 ARM과 AMD 29K를 제외하고는 복잡한 디자인이었다. 뒤의 아키텍처들은 수퍼스칼라, 수퍼파이프라인드로서, 분기예측과 다른 복잡한 체계들을 갖는다. Sun Sparc의 회전하는 윈도우와 같은 많은 개념들은 병목현상과 비효율적임이 증명되었다. 불행히도 이런 디자인들은 Posix가 설정되기 전에, 특히 Linux 이전 것이기 때문에 많은 원하는 특징들이 없었다.Almost all of these CPUs were complex designs, with the exception of the 1960 ARM and AMD 29K. The latter architectures are superscalar, superpipelined, with branch prediction and other complex schemes. Many concepts, such as Sun Sparc's rotating window, have proven to be bottlenecks and inefficiencies. Unfortunately these designs didn't have many desired features before Posix was set up, especially pre-Linux.

PC와 같은 Posix 환경에 적용하면서 라우터에서의 패킷 데이터-플로우와 모듈의 추가를 위해서는 CPU 보드에 DMA, 멀티브리지, 멀티-중재기, 및 PCI와 같은 I/O 확장 버스를 사용해야만 한다. 이런 CPU는 이런 종래의 요소들을 가져야 하여, 복잡한 인터페이싱 체계를 갖는다.When applying to a Posix environment such as a PC, in order to add a module and packet data flow in a router, it is necessary to use an I/O expansion bus such as DMA, multi-bridge, multi-arbiter, and PCI on the CPU board. Such a CPU must have these conventional elements, so that it has a complex interfacing scheme.

Posix 환경에서 보다 단순하고 강력한 CPU 아키텍처의 필요성 대두The need for a simpler and more powerful CPU architecture in the Posix environment emerges

이런 발명에서의 단순함은 강력한 컴퓨터 환경을 설명하는 Posix 인터페이스내에서 배우기 쉽고 구현하기 쉬운 것을 뜻한다. 배우기 쉽다는 것은 특수한 경우가 거의 또는 전혀 없는 스트림라인 아키텍처, 개념 및 모델로부터 오지만, Posix 환경에서 CPU 프로세스를 더 잘 매핑하는 것으로부터도 온다.Simplicity in this invention means easy to learn and easy to implement within the Posix interface that describes a powerful computing environment. Ease of learning comes from streamline architectures, concepts and models with few or no special cases, but also from better mapping of CPU processes in the Posix environment.

SOC(System-On-Chip)이나 보드에서 덜 복잡한 인터페이싱 신호로 쉽게 이루어지는 특수한 시스템 디자인을 구현하고, 그 결과 제작 후에 생기는 고장수리 문제점들을 더 간단히 분리하면 좋다. 예를 들어, 중재기와 브리지가 각각 2개인 시스템의 고장수리는 브리지가 없고 중재기가 하나인 시스템보다 훨씬 더 어렵다. 많은 고급 라우터에 사용되는 통신 프로세서, 예컨대 버스브리지와 버스중재기가 각각 3개인 IXP435의 복잡성을 고려해보자. 또, North 브리지, South 브리지 및 다른 중재기-브리지 쌍을 갖는 PC의 IA-32(Intel i86)도 대략 비슷하게 복잡하다. 이런 아키텍처 CPU의 경우 어떤 경우에도 브리지는 전혀 없고 중재기는 하나만 갖는 것이 필요하다.It is better to implement a special system design that is easily done with less complex interfacing signals on the SOC (System-On-Chip) or board, and as a result, it is better to isolate the troubleshooting problems after fabrication more simply. For example, troubleshooting a system with two arbiters and two bridges is much more difficult than a system with no bridges and one arbiter. Consider the complexity of the communications processor used in many high-end routers, such as the IXP435, which has three bus bridges and three bus arbiters each. Also, IA-32 (Intel i86) on PC with North bridge, South bridge and other arbiter-bridge pairs are roughly similarly complex. For CPUs of this architecture, it is necessary in any case to have no bridges at all and only one arbiter.

대부분의 CPU는 DMA, Lock 및 Bus Request 신호들이 필요함은 물론, 일반적으로 사용되고 이해되는 인터럽트 신호들을 사용해 개선되었을 때 블록 I/O 디바이스로 데이터를 전송하는 복잡한 지원 프로토콜이 필요하다. 이런 신호들은 제대로 기능하는 인터럽트 메커니즘이 필요하지만 그 반대도 사실이어서; 인터럽트는 이들을 피할 수 있다. 따라서 이런 발명은 인터럽트-구동 메커니즘에 크게 의존한다. Most CPUs require DMA, Lock and Bus Request signals, as well as complex support protocols for transferring data to block I/O devices when enhanced using commonly used and understood interrupt signals. These signals require a functioning interrupt mechanism, but vice versa; Interrupts can avoid them. Thus, this invention relies heavily on an interrupt-driven mechanism.

복잡한 인터페이스 신호들을 줄이는 것은 모든 것을 한다발의 FIFO로 줄이고 이것을 취급할 수 있을 때 I/O 디바이스를 단순화하여 이루어져야만 한다. 또, 멀티코어 CPU와 병렬 어레이 아키텍처를 포함한 모든 주변기기도 연결이 간단해진다. Reducing complex interface signals should be done by reducing everything to a bunch of FIFOs and simplifying the I/O device when it can handle it. All peripherals, including multi-core CPUs and parallel array architectures, also simplify connectivity.

다른 메커니즘은 간단한 벡터 ID로서, 70년대의 올드 Burroughs컴퓨터의 디스크립터 테이블에 기원하고, 주어진 메모리 프레임에서의 고속 인터럽트 반응과 매개변수의 고속 액세스에 가치가 있으면서도 PCB(Process-Control-Block) 디스크립터에 액세스하는 효율적인 수단으로 증명되었다. The other mechanism is a simple vector ID, originating in the descriptor tables of old Burroughs computers from the 70's, and accessing process-control-block (PCB) descriptors while valuable for fast interrupt response in a given memory frame and fast access of parameters. has been proven to be an effective means of

이로부터 알 수 있듯이, 효율적인 데이터전송 수단 없이는 고성능 컴퓨팅 수율을 낼 수 없다. 이것은 블록 메모리(레지스터-메모리) 전송을 통한 동기적 버스트 데이터전송은 물론, FIFO를 통한 I/O로 달성된다. As can be seen from this, high-performance computing yield cannot be achieved without efficient data transmission means. This is accomplished with synchronous burst data transfer via block memory (register-memory) transfers, as well as I/O via FIFO.

발명의 요약Summary of the invention

본 발명은 동기적 버스트 데이터전송으로 FIFO 기반 I/O의 인터럽트-구동 프로세싱을 지원하는 인터페이싱 신호들을 이용하는 CPU 모델을 설명한다. 이 CPU 모델은 PC의 OS와 내장 시스템의 기저를 이루는 Posix 인터페이스에 대한 대응 명령어 세트를 효율적으로 매핑하는 아키텍처를 지원한다. The present invention describes a CPU model that uses interfacing signals to support interrupt-driven processing of FIFO-based I/O with synchronous burst data transfer. This CPU model supports an architecture that efficiently maps the OS of the PC and the corresponding instruction set to the Posix interface underlying the embedded system.

이 CPU 모델은 높은 데이터 수율을 이루고 이를 이용해 새로운 데이터 프로세싱을 하기 위해 몇가지 공지의 기술들을 창의적으로 이용한다.This CPU model creatively uses several well-known techniques to achieve high data yields and use them for new data processing.

입증된 기술은 동기적 버스트 데이터전송, FIFO, 스택마다 별도의 메모리영역, 자율적인 로컬 IOR(I/O registers), 및 벡터 디스크립터이다. 캐시 메모리 조작, 레지스터 리네이밍, 파이프라이닝, 기타 데이터-의존 조작체계와 같은 복잡한 기술들은 배제되었다. 기본적으로 유효성이 확률론에 의하는 대형 데이터 캐시는 FIFO의 결정론적 어레이로 대체된다.Proven techniques are synchronous burst data transfer, FIFOs, separate memory regions per stack, autonomous local I/O registers (IORs), and vector descriptors. Complex techniques such as cache memory manipulation, register renaming, pipelining, and other data-dependent manipulation schemes were excluded. Basically, large data caches whose validity is probabilistic are replaced by deterministic arrays of FIFOs.

이 CPU 모델은 인터럽트-구동 핸드세이킹 체계를 중심으로 하고, 일반적으로 이해되는 INT-INTA(Interrupt-Interrupt Acknowledge) 신호쌍의 인터럽트 메커니즘과 새로운 동기적 버스트 데이터전송 신호 세트를 같이 이용한다. 이렇게 되면 인터럽트-구동 프로세스의 개념이 단순해진다. 모든 이벤트가 인터럽트-구동이면, 많은 기존의 버스컨트롤 센호들이 정리되어, DMA, 버스브리지, 멀티-중재기, 및 PCI와 같은 I/O 버스와 그 변형례들을 없앨 수 있다.This CPU model revolves around an interrupt-driven handshaking scheme and uses a new set of synchronous burst data transfer signals with an interrupt mechanism of the commonly understood INT-INTA (Interrupt-Interrupt Acknowledge) signal pair. This simplifies the concept of interrupt-driven processes. If all events are interrupt-driven, many existing bus control signals can be cleaned up, eliminating I/O buses and their variants such as DMA, busbridge, multi-arbiter, and PCI.

인터럽트-구동 프로세스는 대몬(daemon)인 프로세스 스케쥴러를 갖는 Linux와 같은 멀티타스킹 커널에서 아주 잘 채택되어야 하는데, 타이머 틱에 의한 운용 프로세스를 인터럽트한다는 것은 인터럽트 구동임을 의미한다. Interrupt-driven processes should be very well adopted in multitasking kernels such as Linux, which have a daemon process scheduler, and interrupting a running process by a timer tick means it is interrupt driven.

도 1은 쳬U 모델을 형성하는 내부 레지스터와 버스를 보여주는 블록도;
도 2는 ISR, 벡터테이블 및 디바이스와의 관계를 보여주는 인터페이스 신호들의 블록도;
도 3은 도 2의 I/O 중재기 동반 모듈의 블록도로서, DMA, 브리지, 하나 이상의 중재기 및 PCI와 같은 I/O 버스를 없애고 9개의 제어신도들을 갖는 CPU 모델의 블록도;
도 4는 IOR 독립 로컬버스에 구현되는 자율적 카운터와 time-of-day의 블록도;
도 5는 펑션콜에 사용된 20-비트 FP와 12-비트 SP로 이루어진 FSP; Frame-Pointer, Stack-Pointer 쌍을 보여주는 블록도;
도 6은 임의의 명령어에서 0부터 1까지 bit-31을 토글링하여 아토믹하게나 중단없이 명령어 1-5를 실행하는 atomic1과 atomic2의 아토믹 블록 명령어 쌍을 보여주는 도면.1 is a block diagram showing the internal registers and buses forming a Chun-U model;
Fig. 2 is a block diagram of interface signals showing the ISR, vector table and relationship with the device;
Fig. 3 is a block diagram of the I/O arbiter companion module of Fig. 2, a block diagram of a CPU model having 9 control bodies and eliminating I/O buses such as DMA, bridge, one or more arbiters and PCI;
4 is a block diagram of an autonomous counter and time-of-day implemented on an IOR independent local bus;
5 is an FSP composed of a 20-bit FP and a 12-bit SP used in a function call; Block diagram showing Frame-Pointer and Stack-Pointer pairs;
6 shows an atomic block instruction pair of atomic1 and atomic2 that toggles bit-31 from 0 to 1 in an arbitrary instruction to execute instructions 1-5 atomically or without interruption.

이하, 첨부 도면들을 참조하여 본 발명에 대해 자세히 설명하겠지만, 이런 설명은 어디까지나 예를 든 것일 뿐이고 발명의 범위를 한정하는 것은 아님을 알아야 한다. Hereinafter, the present invention will be described in detail with reference to the accompanying drawings, but it should be understood that this description is merely an example and does not limit the scope of the invention.

도 1에서, CPU 모델은 r₀~r₂₅₅로 표시된 한세트의 256개의 워킹 레지스터(100)를 제공하는데, 이들 레지스터는 데이터 포인터와 어드레스 포인터 양쪽으로 사용될 수 있다. 레지스터 사용은 직교적이고, 모든 명령어가 어떤 레지스터에도 적용될 수 있다. 이들 레지스터는 r₀~r₁₅의 로컬 유저모드 레지스터(101), r₁₆~r₃₁의 글로벌 유저모드 레지스터(102), r₃₂~r₄₇의 로컬 커널모드 레지스터(103) 및 인터럽트 모드용의 나머지 레지스터(104)로 나누어지는데; 프로세서 스케줄러와 같은 공통의 일관적인 임무를 위한 용도와, USB 기기나 다른 주변기기와 같은 인터럽트 소스(208,220)의 용도에 따라 분리되고 할당된다. 하나의 대형 레지스터 공간(100)은 매개변수 전달의 필요성을 크게 낮춘다.In Fig. 1, the CPU model provides a _{set of 256 working registers 100 denoted by r 0} to r ₂₅₅ , which can be used as both data pointers and address pointers. Register usage is orthogonal, and any instruction can be applied to any register. These registers are the local user mode register 101 _{of r 0} ~ r ₁₅ , the global user mode register 102 of _{r 16} ~ r ₃₁ _{, the local kernel mode register 103 of r 32} ~ r ₄₇ , and the rest for interrupt mode. It is divided into registers 104; They are separated and assigned according to the use of the interrupt source 208,220, such as a USB device or other peripheral, and a use for a common coherent task, such as a processor scheduler. One large register space 100 greatly reduces the need for parameter passing.

레지스터 세트를 지배하는 법칙은 아래와 같다: 유저모드는 r₀~r₃₁(101,102)만을 볼 수 있고, 커널모드와 인터럽트 모드(103,104)는 뱅크 스위칭 없이 256개 레지스터 전부를 볼 수 있으며; 유저모드(101)의 컨텍스트 스위치는 레지스터 r₀~r₁₅를 저장하며, 커널모드 컨텍스트 스위치는 레지스터 r₃₂~r₄₇(103)을 각각의 메모리 위치에 저장하고; 레지스터 r₁₆~r₃₁(102)는 유저모드와 커널모드 둘다를 위해 매개변수를 전달하는데 사용된다.The rules governing the register set are as follows: user mode _{can only see r 0} ~ r ₃₁ (101,102), kernel mode and interrupt mode (103,104) can see all 256 registers without bank switching; The context switch of the user mode 101 stores registers r ₀ ~ r ₁₅ , and the kernel mode context switch stores registers r ₃₂ ~ r ₄₇ ( 103 ) in each memory location; Registers r ₁₆ to r ₃₁ (102) are used to pass parameters for both user mode and kernel mode.

총 12개의 컨트롤 레지스터가 있는데, 이는 4개의 메인 커런트 컨트롤 레지스터들로 구성되는데: 즉, 커런트 FS(105; Flags/Status), 커런트 FSP(106; Frame:Stack-Pointer), 커런트 PC(107; Program Counter), 두번째 커런트 PC1(108; Program Counter); 유저모드 스택용의 4개의 FSP1(109; Frame:Stack-Pointers), 커널모드 스택용 FSP2(110), 인터럽트 스택용 FSP3(111) 및 펑션콜 스택용 FSP4(112)이다.There are a total of 12 control registers, which consist of 4 main current control registers: current FS (105; Flags/Status), current FSP (106; Frame:Stack-Pointer), and current PC (107; Program). Counter), second current PC1 (108; Program Counter); Four FSP1 (109; Frame:Stack-Pointers) for user mode stack, FSP2 (110) for kernel mode stack, FSP3 (111) for interrupt stack, and FSP4 (112) for function call stack.

FIFO 버퍼 위주 I/O를 지원할 때, I/O 중재기(201)는 CPU 코어(200,306)에 구현되거나, CPU 동기 버스트전송을 이용하는 별도의 I/O 중재기 칩(201,300)과, 다른 중재기, DMA 메커니즘, 버스 제어신호 및 PCI와 같은 I/O 버스를 없애는 FIFO 버퍼링 메커니즘으로 구현된다. When supporting FIFO buffer-oriented I/O, the I/O arbiter 201 is implemented in the CPU core 200,306, or a separate I/O arbiter chip 201,300 using CPU synchronous burst transfer, and another arbiter , DMA mechanism, bus control signal and FIFO buffering mechanism to eliminate I/O bus such as PCI.

CPU 컨트롤 레지스터 외에, 듀얼포트인 로컬 I/O 레지스터 파일(IOR)(117,400)이 있는데, 이 파일은 총 1024 워드의 어드레서블 메모리를 갖고, 한 워드는 4바이트나 32비트에 상당한다. 이런 IOR(117,400) 파일은 외부 I/O 어드레싱과 같은 신호, 즉 I/O 리드(IORD) 신호(210)와 I/O 라이트(IOWR) 신호(211)를 사용한다. 외부 I/O 어드레싱 신호들은 I/O 중재기 칩(201,300)에 이어 디바이스의 FIFO들(303~307)을 액세스하는데 사용된다(도 3 참조). In addition to the CPU control registers, there is a dual-port local I/O register file (IOR) 117,400, which has a total of 1024 words of addressable memory, one word equivalent to 4 bytes or 32 bits. These IOR (117,400) files use the same signals as external I/O addressing: I/O read (IORD) signal 210 and I/O write (IOWR) signal 211. External I/O addressing signals are used to access the I/O arbiter chip 201,300 followed by the FIFOs 303-307 of the device (see FIG. 3).

첫번째 실시예에서, 일반적인 컴퓨팅 과정에서 I/O 디바이스로의 데이터 전송이 FIFO, 여기서는 FIFO-1(207)을 통해 이루어지고; 커널모드의 CPU(200)가 메모리(202)내의 데이터 블록에 대한 어떤 데이터연산을 실행한다. 데이터 블록은 최고의 효율을 위해 CPU 레지스터(103) 전체로 이동되는데 한번에 한 레지스터 세트 블록씩 이동한다. 연산이 끝나면, 레지스터 블록의 일부가 다른 메모리 위치로 보내진다. In the first embodiment, data transfer to an I/O device in a typical computing process is via a FIFO, here FIFO-1 (207); The CPU 200 in kernel mode executes some data operation on the data blocks in the memory 202 . Data blocks are moved throughout the CPU registers 103 for best efficiency, one register set block at a time. When the operation is finished, part of the register block is sent to another memory location.

따라서, CPU 연산 세트가 전술한 것처럼 시작하기 전에 데이터 블록이 메모리 프레임인 Frame-1(229)로 리드되어야 한다. 이것은 동기적 버스트 데이터전송시 메모리 리드에 대한 I/O 포트에 의해 이루어진다. CPU 명령어(203)는 FIFO-1(207)이비워질 때까지 IORD 신호(210), SYNCLK(214), SYNSTP1(215), SYNSTP2(216) 신호들을 이용해 FIFO-1(207)을 리드한다. 이어서, I/O 중재기(201)는 데이터의 끝을 표시하는 SYNSTP2(216)을 작동시키고, 이때 CPU(200)는 리딩을 중단한다.Thus, a block of data must be read into the memory frame Frame-1 (229) before the CPU set of operations begins as described above. This is done by the I/O port to the memory read during synchronous burst data transfer. CPU instruction 203 reads FIFO-1 207 using IORD signal 210, SYNCLK 214, SYNSTP1 215, SYNSTP2 216 signals until FIFO-1 207 is empty. The I/O arbiter 201 then activates SYNSTP2 216 to mark the end of the data, at which time the CPU 200 stops reading.

마찬가지로, USB 디바이스(208)에 연결된 FIFO-1(207)에 대한 라이팅에 IOWR 신호(211), SYNSTP1(215), SYNSTP2(216) 신호들을 이용한다. Frame-1(208)에서 FIFO-1(207)로 메모리 프레임을 버스트 라이트하여 라이팅하고, 소정의 데이터 블록이 끝날 때 CPU(200)는 전송이 끝났음을 I/O 중재기(201)에 알려주는 SYNSTP1(215) 신호를 작동한다.Similarly, the IOWR signal 211 , SYNSTP1 215 , and SYNSTP2 216 signals are used for writing to the FIFO-1 207 connected to the USB device 208 . Burst writes and writes a memory frame from Frame-1 (208) to FIFO-1 (207), and when a predetermined data block ends, the CPU 200 informs the I/O arbiter 201 that the transmission is complete. Activate the SYNSTP1 (215) signal.

한편, 전술한 바와 같이 USB 디바이스(208)와 같은 디바이스에서 데이터를 받는 것은 FIFO-1(207)이 데이터를 보내는 것과 같다. FIFO-1(207)이 I/O 중재기(201)에 말하면, 중재기는 INT(interrupt) 신호(212)를 작동시킨다. INTA(interrupt acknowledge) 신호(213)를 받은 I/O 중재기(201)는 FIFO-1(207) 벡터 ID를 데이터버스(209)에 둔다. CPU(200)는 이 벡터 ID를 리드하고 이 ID에 대응하는 어드레스로 점프한다. CPU(200)는 커널모드로 들어가고 r0~r255의 모든 256개 레지스터(100)가 CPU에서 이용할 수 있게 되며, 이때 CPU는 FIFO-1(207)을 위해 반전된 것만, 즉 ISR(203; interrupt-service-routing)을 사용한다. ISR는 IORD(210), SYNCLK(214), SYNSTP-1(215) 및 SUNSTP2(216) 신호들을 이용해 데이터를 버스트 리드하기 위해 FIFO-1(207)로부터의 리퀘스트를 안다. 버스트 리드가 끝나면, CPU(200)는 FIFO-1(207)에 대한 리드 종료를 알려주는 SYNSTP1(215) 신호를 작동한다. ISR(203)은 reti(206) 명령어를 실행할 때 인터럽트를 나간다.Meanwhile, as described above, receiving data from a device such as the USB device 208 is the same as sending data to the FIFO-1 207 . When the FIFO-1 207 speaks to the I/O arbiter 201 , the arbiter activates an INT (interrupt) signal 212 . The I/O arbiter 201 receiving the INTA (interrupt acknowledge) signal 213 puts the FIFO-1 207 vector ID on the data bus 209 . The CPU 200 reads this vector ID and jumps to the address corresponding to this ID. The CPU 200 enters the kernel mode and all 256 registers 100 from r0 to r255 become available to the CPU, at this time the CPU only inverted for the FIFO-1 207, that is, the ISR 203; interrupt- service-routing) is used. The ISR knows a request from FIFO-1 207 to burst read data using the IORD 210, SYNCLK 214, SYNSTP-1 215 and SUNSTP2 216 signals. When the burst read ends, the CPU 200 activates the SYNSTP1 (215) signal indicating the end of the read to the FIFO-1 (207). The ISR 203 exits the interrupt when executing the reti 206 instruction.

본 발명의 두번째 실시예는 DMA 전송을 모방한 것으로 멀티프로세서 연결에서 다른 디바이스나 다른 CPU인 CPU-2(219)가 데이터버스에 액세스하고자, I/O 중재기 INT21) 신호(220)를 통해 INT 신호(212)를 작동시키고, CPU(200)는 CPU-2(219)의 벡터 ID에 할당된 어드레스로 점프한다. ISR(203)에서, CPU 명령어 busd(205)는 모든 버스들을 불능화시키고 정지 명령어를 실행하여, CPU200)가 HALT 신호(217)를 작동하고 명령어 실행을 중단하며 CPU-2(219)로부터의 하드웨어 UHALT 신호(218)를 대기한다. 이때, CPU(200)가 정상상태로 돌아가 ISR이 인터럽트를 나가도록 하는 다음 명령어인 reti(206)를 실행시키는 UHALT 신호(218)를 작동할 때까지 CPU-2(219)는 CPU 어드레스(231)와 데이터버스(209)에 대한 액세스를 얻는다.The second embodiment of the present invention mimics DMA transfer. In a multiprocessor connection, when another device or another CPU, CPU-2 (219), wants to access the data bus, INT through the I/O arbiter INT21) signal 220 Activating signal 212, CPU 200 jumps to the address assigned to the vector ID of CPU-2 219. In ISR 203, CPU instruction busd 205 disables all buses and executes a stop instruction so that CPU200 activates HALT signal 217 and stops instruction execution and hardware UHALT from CPU-2 219 Wait for signal 218 . At this time, the CPU-2 (219) is the CPU address (231) until the CPU (200) returns to the normal state and operates the UHALT signal (218) that executes the reti (206), which is the next instruction that causes the ISR to exit the interrupt. and access to databus 209.

DMA 전송의 다른 예로서, FIFO-4(222)를 이용해 I/O 중재기(201)에 연결된 CPU-3(221)가 메모리 블록이나 메모리 버퍼인 Frame-1(229)에 액세스할 수 있다. I/O 중재기(201)는 CPU INT 신호(212)를 가동하고, INTA 신호(213)를 받고나서, I/O 중재기(201)에 의한 인터럽팅 디바이스가 디바이스(221)의 ID를 데이터버스(209)에 둔다. 소스를 알고있는 CPU(200)는 버스를 불능화할 필요 없이 다른 모든 디바이스와 마찬가지로 메모리버퍼인 Frame-1(229)를 FIFO-4((222)에 동기적으로 버스트전송할 루틴을 실행한다.As another example of a DMA transfer, a CPU-3 (221) connected to the I/O arbiter 201 using a FIFO-4 (222) can access a memory block or memory buffer, Frame-1 (229). The I/O arbiter 201 activates the CPU INT signal 212, receives the INTA signal 213, and then the device interrupting by the I/O arbiter 201 sends the ID of the device 221 to data. It is placed on the bus 209 . The CPU 200, knowing the source, executes a routine to synchronously burst transfer the memory buffer Frame-1 (229) to the FIFO-4 (222) like all other devices without the need to disable the bus.

본 발명의 세번째 실시예에서, 하드웨어 인터럽트 메커니즘이 ISR(203)에서의 신속하고 효율적인 인터럽트 프로세싱을 위해 거의 제로인 하우스키핑 명령어를 내도록 고안된다. I/O 중재기(201)와 같은 외부 소스로부터의 INT 신호(212)에 의해 인터럽트 리퀘스트가 작동된다. CPU(200)는 INTA 신호(213)로 답하고, 그 뒤, I/O 중재기(201)를 통해 USB 디바이스(208)와 같은 인터럽팅 디바이스가 디바이스의 벡터 ID를 데이터버스(209)에 둔다. 이어서, CPU(200)가 이 벡터 ID를 리드하고, 벡터 ID 어드레스로 직접 점프한다. 벡터 'va"에서 'a'는 ID 넘버로서 최대 18비트까지 취할 수 있다. CPU(200)는 하드웨어 인터럽트 벡터 메모리공간(226)에 예약되어 있는 256개의 하드웨어 인터럽트 소스만을 인식한다. In a third embodiment of the present invention, a hardware interrupt mechanism is designed to issue near-zero housekeeping instructions for fast and efficient interrupt processing in ISR 203. An interrupt request is triggered by an INT signal 212 from an external source, such as an I/O arbiter 201 . CPU 200 replies with INTA signal 213 , then, via I/O arbiter 201 , an interrupting device such as USB device 208 puts the vector ID of the device on databus 209 . . Then, the CPU 200 reads this vector ID and jumps directly to the vector ID address. In the vector 'va', 'a' can take up to 18 bits as an ID number. The CPU 200 recognizes only 256 hardware interrupt sources reserved in the hardware interrupt vector memory space 226 .

벡터 ID에 해당하는 ISR(203)내의 첫번째 명령어를 실행하기 전에, FS(105), FSP(106), PC(107) 및 PC1(108)를 구성하는 커런트 메인 컨트롤 레지스터들은 아래로 구성되는 인터럽트 모드에서의 상기 레지스터들의 4 카피의 전용 세트들에 저장된다: FS-I(113; interrupt Flags/Status), FSP-I(114; interrupt Frame-Stack-Pointer), PC-I(115; interrupt Program-Counter), 및 PC1-I(116; second interrupt Program-Counter).Before executing the first instruction in the ISR 203 corresponding to the vector ID, the current main control registers constituting the FS 105, FSP 106, PC 107 and PC1 108 are configured as below interrupt mode are stored in dedicated sets of 4 copies of the registers in: FS-I (113; interrupt Flags/Status), FSP-I (114; interrupt Frame-Stack-Pointer), PC-I (115; interrupt Program-) Counter), and PC1-I (116; second interrupt Program-Counter).

이어서, 커런트 커널모드 FSP(106)가 FSP3(111)와 같이 로드되고, PC(105)가 벡터 ID에 있는 포인터와 같이 로드되며, ISR이 시작한다. 다른 명령어들은 불필요하고, 첫번째 명령어는 실제 ISR(203) 코드를 포함할 수 있으며; reti(206) 명령어만 필요하다. 인터럽트 모드에서 모든 256개 레지스터들(100) r₂₅₅~r₀가 보이는데, 이때 r₂₅₅~r₄₈(104)는 거의 관례적으로 예약된다. Then, the current kernel mode FSP 106 is loaded like the FSP3 111, the PC 105 is loaded with the pointer in the vector ID, and the ISR starts. Other instructions are unnecessary, the first instruction may contain the actual ISR 203 code; Only the reti(206) instruction is required. In interrupt mode, all 256 registers 100 r ₂₅₅ ~r ₀ are visible, where r ₂₅₅ ~r ₄₈ (104) is almost customarily reserved.

reti(206) 명령어로 ISR(203)을 나갈 때, 전에 저장된 인터럽트 레지스터들 FS-1(113), FSP-I(114), PC(115)와 PC1-I(116)이 이전 머신 산태를 복구하는 커런트 메인 컨트롤 레지스터인 FS(105), FSP(106), PC(107) 및 PC1(108)에 복사된다.When the reti(206) instruction leaves the ISR(203), the previously saved interrupt registers FS-1(113), FSP-I(114), PC(115) and PC1-I(116) restore the previous machine state. is copied to the current main control registers FS(105), FSP(106), PC(107) and PC1(108).

본 발명의 네번째 실시예에서는, CPU 아키텍처가 인터페이스 라인들의 상호작용 메커니즘을 묘사하고 DMA 브리지 1개 이상의 중재기 및 PCI와 같은 I/O 버스들을 없앤 FIFO들이 보인다. 도 2의 I/O 중재기(201)는 이런 기존의 CPU 요소들을 어떻게 없애는지 보여주도록 도 3에서 I/O 중재기(300)로 표시되었다. 본 명세서에서는 I/O 중재기(201)를 위한 별도의 동반 칩을 설명하고 있지만, 이 칩을 CPU(306) 자체에 내장하여, FIFO용의 (다른 프로세서 모듈에 일반적인) 온칩 데이터캐시를 대체할 수도 있다. In a fourth embodiment of the present invention, FIFOs are shown in which the CPU architecture depicts the interaction mechanism of the interface lines and eliminates the DMA bridge one or more arbiters and I/O buses such as PCI. The I/O arbiter 201 of FIG. 2 is denoted as the I/O arbiter 300 in FIG. 3 to show how to get rid of these traditional CPU elements. Although a separate companion chip for the I/O Arbiter 201 is described herein, this chip can be built into the CPU 306 itself to replace the on-chip data cache (common with other processor modules) for the FIFO. may be

도 3에서, Device-1에서와 같은 I/O 디바이스용의 인터페이스 포트들은 모두 동일하고, I/O-1 포트엔진(302)와 같은 포트엔진에 제시되었을 때는 일관성이 있다. 포트엔진의 주기능은 입력이 있을 때는 직병렬 변환을 하고, 직렬 디바이스들에 대한 출력이 있을 때는 그 반대이다. 디바이스들 사이의 데이터는 FIFO-1(303)과 같은 FIFO의 메모리버퍼에 저장된다. 다수의 분산 디바이스들을 각각의 I/O 포트엔진(302~308)에 연결할 수 있는데, 이때 데이터전송은 엔진들에 의해 자율적으로 취급된다. FIFO-1(303)과 같은 FIFO에서는 중재기+인터럽트 엔진(304)가 자율적으로 어떤 FIFO가 서비스할지를 결정하는데, 이는 초기화 시간중에 구성된 규칙에 의한다. CPU(306)는 9개의 CPU 컨트롤신호(305)를 이용해 중재기+인터럽트 엔진(304)이 제시한대로 FIFO 사이의 DMA 형태로 동기적 버스트 데이터전송을 실행한다. 또, 초기화중에, 중재기+인터럽트 엔진(304)이 FIFO(303~307)의 데이터버퍼의 길이에 맞게 구성된다. 아주 느린 디바이스에 대해서는, 버퍼를 아주 작게할 수 있어, 겨우 1의 최소값을 취할 수 있다. 아주 고속의 디바이스에 대해서는, 버퍼 사이즈를 최대값으로 할 수 있다. 따라서, 브리지가 불필요한데, 이는 데이터속도가 다른 여러 디바이스들을 FIFO(303~307)내의 버퍼사이즈로 수용할 수 있기 때문이다. 또, 많은 FIFO에도 중재기+인터럽트 엔진(304)의 1개의 I/O 중재기만 필요하여 하나의 칩으로 구현할 수 있어, PCI와 같은 I/O 버스를 없앨 수 있다.In FIG. 3 , the interface ports for an I/O device as in Device-1 are all identical, and consistent when presented to a port engine such as I/O-1 port engine 302 . The main function of the port engine is serial-to-parallel conversion when there is an input, and vice versa when there is an output to serial devices. Data between devices is stored in a memory buffer of a FIFO such as FIFO-1 (303). Multiple distributed devices can be connected to each I/O port engine 302-308, where data transfer is handled autonomously by the engines. In a FIFO such as FIFO-1 303, the arbiter+interrupt engine 304 autonomously decides which FIFO serves, according to rules configured during initialization time. The CPU 306 uses the nine CPU control signals 305 to perform synchronous burst data transfer in the form of DMA between FIFOs as suggested by the arbiter+interrupt engine 304 . Also, during initialization, the arbiter+interrupt engine 304 is configured to fit the length of the data buffer of the FIFOs 303 to 307. For very slow devices, the buffer can be very small, taking a minimum of only 1. For very high-speed devices, you can set the buffer size to its maximum value. Therefore, a bridge is unnecessary, because various devices with different data rates can be accommodated in the buffer size in the FIFOs 303 to 307. In addition, since only one I/O arbiter of the arbiter + interrupt engine 304 is required for many FIFOs, it can be implemented with one chip, thereby eliminating the I/O bus such as PCI.

본 발명의 다섯번째 실시예에서는, 듀얼포트일 수 있는 로컬 I/O 레지스터 파일(IOR)(117,400)을 이용해 독립적인 로컬버스(406)를 생성하고, 이 로컬버스는 자율적 하드웨어 기능이나 엔진의 시각(time-of-day) 기능을 구현할 수 있다. IOR(117,400)은 커널모드에서의 최대 1024 레지스터들을 위한 10-비트 워드에 의해 어드레스되고, 또한 로컬시스템 변수들을 저장하는데 사용되기도 한다.In a fifth embodiment of the present invention, an independent local bus 406 is created using local I/O register files (IORs) 117,400, which may be dual-port, which local bus is an autonomous hardware function or engine timeout. (time-of-day) function can be implemented. IORs 117 and 400 are addressed by a 10-bit word for up to 1024 registers in kernel mode, and are also used to store local system variables.

도 4의 실시예에서는 time-of-day counter(404)를 주기적으로 업데이트하는 하드웨어 실시간 클록엔진을 다른 카운터인 counter-1(402)와 counter-2(403)로부터 구성한다.In the embodiment of Fig. 4, a hardware real-time clock engine that periodically updates the time-of-day counter 404 is configured from counter-1 (402) and counter-2 (403) which are other counters.

메가헤르즈 범위의 일정 주파수에서 안정적인 결정 클록소스(401)가 첫번째 카운터인 counter-1(402)를 운용하면, 이곳에 라이트된 워드에 따른 일정 시간 뒤에 이 카운터가 두번째 카운터인 counter-2(403)를 업데이트하고, 이 카운터는 이곳에 라이트된 워드에 따른 일정 시간 뒤에 time-of-day counter(404)를 업데이트한다. When the stable crystal clock source 401 operates the first counter counter-1 402 at a constant frequency in the megahertz range, this counter becomes the second counter counter-2 (403) after a certain period of time according to the word written there. ), and this counter updates the time-of-day counter 404 after a predetermined time according to the word written there.

time-of-day 엔진과 비슷하게 타이머엔진인 timer-1(405)를 구현한다. CPU(200)는 메인 데이터버스를 통해 모든 카운터 레지스터들(402~405)를 리드할 수 있어, 모든 타이머 관련 기능들을 구현한다. time-of-day 레지스터(404)는 Lunux cron daemon에 의해 필요한 대로의 이벤트들을 인터럽트할 수 있다. 도 4에는 2개의 타이머만 도시되어 있지만, 당업자라면 IOR(117,400)를 이용하는 타이머를 2개 이상 구성해 할 수 있는데, 이는 IOR을 통해 독립적인 로컬 데이터버스(406)를 자율적으로 또한 작동중인 메인 데이터버스(407)와 동시에 운용하기 때문이다. Similar to the time-of-day engine, the timer engine timer-1 (405) is implemented. The CPU 200 can read all the counter registers 402 to 405 through the main data bus, so as to implement all timer related functions. The time-of-day register 404 can interrupt events as needed by the Linux cron daemon. Although only two timers are shown in FIG. 4 , those skilled in the art can configure two or more timers using IORs 117 and 400 , which autonomously operate an independent local data bus 406 through the IOR and main data operation. This is because it operates simultaneously with the bus 407 .

본 발명의 여섯번째 실시예에서, fn va 기능에 관한 프로시저 콜 명령어와, 청구항 3에서 기재된 벡터 'va'를 이용해 시스템 sys va 명렁어를 지원하기 위해 다수의 Frame-Stack-Pointer들이 구현된다. 점프와 링크(j1 a) 명령어 외에, 그리고 어떤 임무에 사용되고 이 임무와 같은 FSP(106,500)을 이용하는 로컬 콜 명령어(call a) 외에도, CPU(200)는 벡터 메커니즘을 이용해 펑션 콜 명령어 fn va를 구현하는데, 여기서 'fn'dms 펑션을 위한 니모닉(mnemonic)이고 'v'는 벡터이머, 'a'는 벡터넘버나 ID이다. 이 펑션콜은 어떤 임무나 프로그램으로부터도 호출될 수 있고 재진입도 가능하다.In the sixth embodiment of the present invention, a plurality of Frame-Stack-Pointers are implemented to support the system sys va command using the procedure call command related to the fn va function and the vector 'va' described in claim 3 . In addition to the jump and link (j1 a) instructions, and in addition to the local call instruction (call a) that is used for some task and uses the FSP 106,500 like this task, the CPU 200 implements the function call instruction fn va using a vector mechanism. Here, 'fn' is a mnemonic for the dms function, 'v' is a vector emer, and 'a' is a vector number or ID. This function call can be called from any task or program and is reentrant.

도 5에서는 FSP(106,500)가 20-비트 FP(501; frame pointer)와 12-비트 SP(502)로 정의된다. 이렇게 되면 SP(502)의 사이즈를 1k 워드로 할 수 있다. 이런 FP:SP 워드나 FSP(106,500)가 보여주는 FP-0(504)은 0x00100000으로 초기화된다.In FIG. 5, FSPs 106,500 are defined as a 20-bit FP (frame pointer) 501 and a 12-bit SP 502. In FIG. In this way, the size of the SP 502 can be 1k words. This FP:SP word or FP-0 504 shown by FSP 106,500 is initialized to 0x00100000.

모든 펑션콜의 기본 메커니즘이 도 5에 도시되었다. 모든 펑션콜에, 례(501)를 증분하여 새로운 FP(501)가 간단히 생성된다. 이것은 bit-12(d₁₂)(503)에 1을 더하거나 마찬가지로 FSP(500) 워드에 0x1000을 더해 CPU(200)에 의해 자동으로 이루어진다. 커런트 FSP(500)는 펑션콜 fn va용의 FSP로서 지정된 FSP4(112)에 저장된다. 커런트 FS(105), FSP(106), PC(107) 및 PC1(108)은 FP-0(504)의 시작부터의 오프셋에 저장된다. 전체 로컬 레지스터-세트(101)는 FP-0(504)에 의해 지정된 위치로부터의 오프셋에 저장된다. PC(107)는 펑션벡터 어드레스에 포인터와 같이 로드되고, 프로그램이 전송된다.The basic mechanism of all function calls is shown in FIG. 5 . For every function call, a new FP 501 is simply created by incrementing the example 501 . This is done automatically by the CPU 200 by adding 1 to bit-12(d ₁₂ ) 503 or likewise adding 0x1000 to the FSP 500 word. The current FSP 500 is stored in the FSP4 112 designated as the FSP for the function call fn va. Current FS 105 , FSP 106 , PC 107 and PC1 108 are stored at an offset from the beginning of FP-0 504 . The entire local register-set 101 is stored at an offset from the location specified by FP-0 504 . The PC 107 is loaded like a pointer to the function vector address, and the program is transferred.

반대로, (펑션콜에서 복귀하는) retf 명령어로, 커런트 FP(501)가 감분되고, 모든 레지스터들은 저장된 세트와 같이 새로운 FP(501)로부터의 오프셋에 로드된다. 이것은 bit-12(d₁₂)(503)에서 1을 빼거나, FSP(500) 워드에서 0x1000을 QO서 CPU(200)에 의해 자동으로 이루어진다. 프로그램이 원래 PC(506)으로 전송된다. 이런 동작이 단계적으로 진행되면서, 다음 retf 명령어에서는 FP(501)가 다시 감분되고 복귀과정이 반복된다.Conversely, with a retf instruction (returning from a function call), the current FP 501 is decremented, and all registers are loaded at the offset from the new FP 501 as the stored set. This is done automatically by the CPU 200 by subtracting 1 from bit-12(d _{12 ) 503 or QO 0x1000 in the FSP 500 word.} The program is originally transferred to PC 506 . As this operation proceeds step by step, in the next retf instruction, the FP 501 is decremented again and the recovery process is repeated.

본 발명의 일곱번째 실시예에서는, 벡터 ID나 벡터 'va'를 이용한 명령어를 생성한 CPU 아키텍처로서, ISR(203)용의 인터럽트 디스크립터 이외에, PCB(process-control-block)와 같은 디스크립터 테이블과 포지션 독립 코드용의 메인메모리내 FIFO 버퍼와 같은 곳의 다른 곳에 있는 몇가지 명령어들만 예로 든다. 벡터 ID는 어드레스 공간 0부터 0x7ffff까지의 메모리 위치에 예약되고, 하드웨어 인터럽트 벡터(226)로, 즉 펑션콜 벡터와 다른 디스크립터 테이블용으로 0x0부터 0xff까지, 0x100부터 0x2ff까지의 커널콜 벡터(227), 및 0x300부터 0x1ffff(228)까지의 나머지로 규정된다. In the seventh embodiment of the present invention, as a CPU architecture that generates an instruction using a vector ID or vector 'va', in addition to the interrupt descriptor for the ISR 203, a descriptor table such as a process-control-block (PCB) and a position Just a few instructions elsewhere, such as a FIFO buffer in main memory for independent code, are examples. Vector IDs are reserved in memory locations from address space 0 to 0x7ffff, as hardware interrupt vectors 226, i.e. kernel call vectors 227 from 0x0 to 0xff, 0x100 to 0x2ff, for function call vectors and other descriptor tables. , and the remainder from 0x300 to 0x1ffff (228).

FIFO 실시예에서, FIFO-1(207) 메모리버퍼인 Frame-1(229)를 v0x700으로 표시할 수 있고, 이것은 메모리위치 0x700x4=0x1C00에 있다. 따라서, 메모리위치 0x1C00은 FIFO-1 메인 메모리버퍼를 포함한다. 마찬가지로, 버퍼 사용시, 데이터 어레이를 Frame-2로 표시된 v0x900으로 하거나, 프레임 포인터를 메모리 위치 0x900x4=0x2400에 둘 수 있다. 100th 어레이에 액세스하려면 아래 CPU 명령어들만 이용하면 된다:In the FIFO embodiment, the FIFO-1 207 memory buffer, Frame-1 229, can be denoted by v0x700, which is at memory location 0x700x4=0x1C00. Thus, memory location 0x1C00 contains the FIFO-1 main memory buffer. Similarly, when using a buffer, the data array can be set to v0x900 marked Frame-2, or the frame pointer can be placed at memory location 0x900x4=0x2400. To access the 100th array, you only need to use the following CPU instructions:

m v0x700, r₁ → 벡터 v0x700을 레지스터 r₁으로 이동m v0x700, r ₁ → move vector v0x700 to register r ₁

m ＠r₁, r₂ → 레지스터 r₁이 지적한 데이터를 레지스터 r₂로 이동m @r ₁ , r ₂ → Move the data pointed to by register r ₁ _{to register r 2}

여기서, v0x700은 벡터 0x700의 표시이고, 'm'은 'move'의 약자이며, '＠'는 '위치'를 의미하고, 0x는 16진수를 의미하는 접두어이다. 명령어들은 왼쪽에서 오른쪽으로 리드한다.Here, v0x700 is a representation of the vector 0x700, 'm' is an abbreviation of 'move', '@' means 'position', and 0x is a prefix meaning hexadecimal. Instructions are read from left to right.

r2는 Frame-1(229)에서의 버퍼 어레이에 대한 실제 포인터를 포함한다.r2 contains the actual pointer to the buffer array in Frame-1 (229).

100번째 멤버는:The 100th member is:

m ＠r₂+100, r3 → 레지스터 r₂+100이 지적한 데이터를 레지스터 r₃로 이동m @r ₂ +100, r3 → Move the _{data pointed to by register r 2} +100 to register r ₃

또는 101번째 멤버; m ＠r₂+101, r3or 101st member; m @r ₂ +101, r3

실제 포인터를 구하는데 벡터 ID가 사용되면, 이제 r2를 기본 포인터로 사용하고, 기본 포인터 r2로부터의 인덱싱에 의해 벡터 ID나 디스크립터 내의 변수에 액세스할 수 있다. Once the vector ID is used to get the actual pointer, we can now use r2 as the base pointer, and access the vector ID or variable in the descriptor by indexing from the base pointer r2.

또, 점프 명령어에 벡터표시 jp va를 사용할 수 있는데; 여기서 'jp'는 'jump'를 의미하고, 'v'는 벡터를, 'a'는 벡터넘버나 ID를 의미한다.Also, you can use the vector notation jp va in the jump command; Here, 'jp' means 'jump', 'v' means vector, and 'a' means vector number or ID.

이런 명령어를 받은 CPU는 점프할 실제 PC 어드레스를 포함한 어드레스 'a' x 4를 지정하는 벡터넘버 'a'로 PC(Program-Counter)를 전송한다.After receiving this command, the CPU transmits the PC (Program-Counter) to the vector number 'a' that designates the address 'a' x 4 including the actual PC address to jump to.

벡터 ID와 동기적 버스트전송 연산을 이용하는 CPU 아키텍처의 다른 예는, 메모리 위치로부터 매개변수 프레임을 로딩하는 것이다. 매개변수 프레임은 메모리내의 16개 연속 32-비트 워드로 정의되고, 유저에게나 커널모드 로컬 레지스터 세트에 매핑된다.Another example of a CPU architecture that uses a vector ID and a synchronous burst transfer operation is to load a parameter frame from a memory location. A parameter frame is defined as 16 consecutive 32-bit words in memory, mapped to a set of user or kernel-mode local registers.

CPU 명령어; ms ＠va, r_b에서, 'ms'는 "벡터 'a'에서 시작하는 16 워드의 데이터 블록을 r_b에서 시작하는 16개 레지스터로 동기적으로나 버스트 이동"하라는 의미이고, 'v'는 '벡터'를 의미하며, 'a'는 벡터 ID이다. CPU instruction; In ms @va, r _b , 'ms' means "synchronously or burst move a data block of 16 words starting at vector 'a' into _{16 registers starting at r b} ", 'v' means 'vector', and 'a' is the vector ID.

Frame-1(229)와 같은 프레임으로 알려진 데이터 블록 프레임은 벡터 V0x700의 시작인 것으로, 16개의 32-비트 데이터 워드의 세트나 프레임이다. 16개 워드 프레임이 매회 SYNCLK(214) 사이클마다 버스트 방식으로 전송된다. A data block frame, also known as a frame, such as Frame-1 (229) , is the start of the vector V0x700 , which is a set or frame of 16 32-bit data words. 16 word frames are transmitted in a burst fashion every SYNCLK 214 cycle.

본 발명의 여덟번째 실시예에서, 메모리 액세스나 디바이스 액세스를 수반하는 멀티프로세싱 시스템에서의 프로세스의 동기화에 필요한 모든 명령어들을 실행하도록 한다. 메모리 액세스는 중요 메모리의 한 장소나 블록에서 이루어진다. 보통, CPU 레벨에서의 지속적인 연산에서의 read-modify-write 시퀀스의 일관성을 유지하는 아토믹 명령연산(atomic instruction operation)이 다른 보다 플렉시블한 동기화 메커니즘을 강화하여, Mutex(Mutually-Exclusion) 리소스들을 보강하고; 이때 한개의 프로세스가 액세스되면 다른 프로세스들은 대기해야 한다.In the eighth embodiment of the present invention, all instructions necessary for synchronization of processes in a multiprocessing system involving memory access or device access are executed. Memory accesses are made in a location or block of critical memory. In general, atomic instruction operations, which maintain the consistency of read-modify-write sequences in persistent operations at the CPU level, augment other more flexible synchronization mechanisms, augmenting Mutex (mutually-exclusion) resources and ; At this time, if one process is accessed, other processes must wait.

도 6의 실시예에서는, 모든 Mutex primitive들이 제거되고 접두어와 접미어 atomic1(600)과 atomic2(601) 쌍을 갖는 하나의 블록 아토믹 명령어로 대체된다. 따라서, 아토믹 명령어 TKd이 명령어 블록(602)을 둘러싸 아토믹하게 실행한다. 블록(602) 내 명령어들은 얼마나 긴지 상관 없이 순차적으로 중단없이 실행한다. atomic1(600)-atomic2(601) 쌍은 도 5에 도시된 아토믹 명령어를 표시하는 "1"에 명령어 워드(604)의 MSb bit31(603)를 설정하기만 해서 모든 명령어드를 아토믹하게 만들 수 있다. 따라서, intel 8088의 xchg, 다른 CPU의 compare-exchange, compare-decrement와 같은 기존의 아토믹 명령어들을 없앨 수 있다. 또, 몇개의 선택된 명령어들만을 적용해 사용할 수 있는 Intel x86이나 IA-32에서 구현되는 것과 같은 블록형 아토믹 명령어들의 초집합이기도 하다. In the embodiment of Figure 6, all mutex primitives are removed and replaced with one block atomic instruction with a pair of prefixes and suffixes atomic1 (600) and atomic2 (601). Accordingly, the atomic instruction TKd surrounds the instruction block 602 for atomic execution. The instructions in block 602 execute sequentially and without interruption, no matter how long. The atomic1(600)-atomic2(601) pair can make all instructions atomic by simply setting MSb bit31(603) of the instruction word 604 to "1" indicating the atomic instruction shown in FIG. . Therefore, existing atomic instructions such as xchg of intel 8088 and compare-exchange and compare-decrement of other CPUs can be eliminated. It is also a superset of block-type atomic instructions, such as those implemented in Intel x86 or IA-32, that can be used by applying only a few selected instructions.

atomic1(600)-atomic2(601) 쌍은 어셈블러 디렉티브처럼 프로그래머가 소스레벨에서만 볼 수 있다. The atomic1(600)-atomic2(601) pair is only visible to the programmer at the source level, like assembler directives.

Claims

a. One linear 256 operation registers that are not bank switched, classified according to modes including user mode, kernel mode and interrupt mode;
b. FIFO buffer-oriented I/O (Input/Output), which is an interrupt-driven device as if it were CPU-like;
c. An I/O arbiter integrated into the CPU's core or implemented as a separate I/O arbiter chip, which eliminates the CPU's need for direct-memory-access (DMA) mechanisms, bus control signals, bus bridges, and I/O buses. ; and
d. For a CPU comprising: a local I/O or I/O register (IOR) integrated within the CPU with uniform I/O addressing and external I/O:
the CPU performs separate memory allocation for four stack operations including a user stack, a kernel stack, an interrupt stack, and a procedure call stack;
The CPU according to claim 1, wherein the CPU performs a consistent data transfer method using signals supporting block and synchronous burst modes.

According to claim 1, wherein the CPU is a memory RD (read) signal, memory WR (write) signal, IORD (I/O read signal), IOWR (I/O write signal), SYNCLK (synchronous data transfer clocking signal), Characterized in using a plurality of signals including synchronous data transfer first stop signal (SYNSTP1), synchronous data transfer second stop signal (SYNSTP2), interrupt signal (INT), interrupt-acknowledge signal (INTA), HALT signal and UHALT signal CPU.

The CPU according to claim 1, wherein the CPU includes an instruction including hardware and software interrupts and a move (load-store) instruction to move a vector va to a register r1 (where v is a vector and a indicates a vector number or ID). A CPU characterized in that it uses an instruction set using a vector representation and a criterion that is used consistently in the CPU.

The method according to claim 1, wherein the CPU sets the instruction word to bit-31(d31) in a little-endian instruction to make all instructions atomic, and bit-31(d31)=0. All instructions are executed normally, but are atomic when bit-31(d31)=1, and instruction blocks with bit-31(d31)=1 are executed atomicly, eliminating the need for atomic instructions, indicating that the instruction block may be atomic Characterized by the CPU.

The CPU according to claim 1, characterized in that it has 256 linear registers operable by CPU instructions, the registers being numbered from r0 to r255 and divided into four sets below.
a. a first set (r0-r15) containing 16x32-bit local registers for user mode;
b. a second set (r16-r31) with 16x32-bit local registers for user-mode global registers;
c. a third set (r32-r47) with 16x32-bit local registers for kernel mode; and
d. Fourth set (r48-r255) with kernel-mode global registers and 208x32-bit registers for interrupt service.

2. The CPU of claim 1, wherein the programming model finds the bottom among the linear 256 registers r0-r255.
a. User mode registers r0~r15;
b. User mode local register set (r0~r15);
c. register sets (r16-r31) as user-mode global registers that remain unchanged across all processors;
d. All 256 registers without bankswitching in interrupt mode and kernel mode, which is also a kernel mode process;
e. Register set r32-r47 reserved for kernel local registers; and
f. A set of user-mode global registers (r16-r31) reserved for use in passing parameters from the user-mode local register set (r0-15).

2. The CPU according to claim 1, wherein the CPU has 12 CPU control registers that are not visible to the programmer, these 12 CPU control registers are necessary to describe the current CPU execution state to ensure an accurate return to the previous function call, and The control registers are divided as follows, and the 12 CPU control registers, except for the FS register, are not accessed and operated by software instructions, and are linearly 256 in the FS register to block bits from the field as a result of an internal hardware trap. A CPU characterized in that access to the FS register is allowed only by kernel-mode instructions moving through the register set:
I. a. Current FS;
b. Current FSP;
c. Current PC;
d. 4 main CPU control registers consisting of PC1;
II. a. FSP1 for user stack;
b. FSP2 for kernel stack;
c. FSP3 for interrupt stack;
d. 4 FSPs composed of FSP4 for procedure call or function call stack;
Ⅲ. a. Interrupt FS-I;
b. interrupt FSP-I;
c. Interrupt PC-I;
d. 4 copies of the main CPU control registers in interrupt mode configured with interrupt PC1-I.

8. The method of claim 7, wherein the CPU architecture of the CPU implements FP:SP defined as a pair of FP and SP in a single internal CPU register denoted FSP; FSP is defined as a 20-bit FP and a 12-bit SP, and in any given FSP word FP is defined as bit-31 through bit-12, d31 through d12, where addressing in this CPU architecture is and bit-1 and bit-0 (d1 & d0) are unused, so the SP contains up to 1024 words of register set, passed parameters and pushed words; SP is always an offset to FP; A CPU, characterized in that whenever SP or d11 - d2 both reach "1" and are 0xfff, a hardware trap, which is an interrupt to vector ID V254, occurs with the appropriate set of bits in the trap word.

4. The maximum of claim 3, wherein the CPU corresponds to 131,072 (0x20000) vectors of vector ID V0 to vector ID v0x1ffff, including pointers to interrupts, system calls, descriptor-tables, and others denoted by v0x0 to v0x1ffff. A CPU that implements 128k, characterized in that the vectors are distinguished as follows:
a. v0 to v0xff assigned to kernel-mode hardware interrupts;
b. v0x100 ~ v0x2ff assigned to system call vector in kernel mode;
c. User mode, descriptor-table pointer for PID (process ID), and v0x300 ~ v0x1ffff divided among function call vectors in PCB (Process-Control-Block).

3. The CPU according to claim 2, wherein the CPU generates three interface signals for synchronous burst data transfer to SYNCLK, SYNSTP1 and SYNSTP2, and the block transfer of all data arrays in main memory for a user register set and a kernel register set is performed by the CPU. It is transmitted in a burst using the three interface signals controlled by A bit word is transmitted, and the SYNCSTP1 and SYNCSTP2 signals indicate the end of transmission.

The CPU according to claim 2, wherein the CPU uses only one main bus arbiter with FIFO, and a plurality of bus arbiters are unnecessary, and only one arbiter is required even in a complex design.

3. The CPU of claim 2, wherein the CPU uses the INT-INTA signal pair with sync signals SYNCLK, SYNSTP1, SYNSTP2 without disabling the memory bus.

3. The method according to claim 2, wherein when the CPU disables the memory bus by using the CPU command busd (bus disable), the DMA can be executed by mimicking the conventional method, and the execution is performed using the synchronization signals SYNCLK, SYNSTP1, SYNSTP2 and other two methods. CPU, characterized in that it is made using the INT-INTA signal pair together with the signals HALT and UHALT signals.

According to claim 1, wherein the CPU writes an IOR (I/O register file) consisting of a memory space of 1024 words corresponding to 32-bits, in which one word is accessed by 12-bit byte addressable lines A0 to A11. Implementing, the IOR is an internal dual-port read-write memory for storing frequently used variables and for implementing autonomous I/O devices (including event counter real-time clock and device configuration); The CPU avoids and does not need to poll the main memory bus access, which can implement hardware functions that require an independent local bus that is not blocked by CPU memory bus operation; IOR is the portion of the processor I/O addressing space that uses IORD and IOWR to access the device containing the I/O arbiter chip; A CPU characterized in that IOR is implemented internally.

2. The method of claim 1, wherein the interrupt stack provides interrupt services; Currently the main CPU control register consists of FS, FSP, PC, PC1; CPU, characterized in that the following CPU registers are each stored in a corresponding copy of the interrupt storage register.
a. Interrupt FS-I;
b. interrupt FSP-I;
c. Interrupt PC-I; and
d. Second Interrupt PC1-I.