TW535107B

TW535107B - Data processing device

Info

Publication number: TW535107B
Application number: TW88119144A
Authority: TW
Inventors: Hiroshi Hatae; Hiromi Watanabe
Original assignee: Hitachi Ltd
Priority date: 1999-01-20
Filing date: 1999-11-03
Publication date: 2003-06-01
Also published as: JP3676237B2; WO2000043868A1

Abstract

The invention provides a data processing device that utilizes processor to have fast and efficient calculation processing of huge amount of data as done in computation processing of moving vector test in graphics. It uses the local bus 8 with broader bus width than that of data bus 6 of CPU2 to connect SDMD calculator 4 and work RAM 12 controlled by CPU2, in which SDMA calculator 4, work RAM12 and CPU2 are commonly connected to address bus 10 to have uniform management of SDMD calculator 4, work RAM12 via CPU2 for high speed data processing.

Description

535107 A7 B7 五、發明説明（1 ) 技術領域本發明是有關於資料處理裝置，更詳細地說，是有關於以影像信號壓縮、伸長處理所使用之動作檢測、動作補償處理等般之使用處理器來高速且有效率地進行大量資料處理之資料處理裝置者。技術背景在影像、聲音之伸長、壓縮處理等，對於大量資料，有必要以高速度重複進行相同之運算處理。為此，進行上述相同之運算處理之部分，已有設置專用之運算裝置，及為了使該運算裝置高速動作，具有多個並聯配置之處理器元件（processor element)(運算單元，unit)，並且經由相同之程式（program)來使該等部份動作之由SIMD(單一指令多重資料，Single Instruction Multiple Data)運算裝置所構成的資料處理裝置已為人所知。又，有關SIMD運算裝置，文獻「界面（interface)」之1998年3月份的111頁到113頁中已有記載。具體上，美國Intel公司之Pentium MMX技術 (technology)已為人所知。在SIMD方式之運算裝置，從記憶體（memory)不間斷地供應資料，來提昇運算器之作動率，是決定性能 (performance)之重要因素。但是，將習知之中央處理單元 (Central Processing Unit，簡稱為 CPU)和 SIMD 方式之運算器組合之資料處理裝置，在裝置構造上，是經由共通之/貝料匯流排和位址匯流排來連接CPU和SIMD運算裝置。因此，乃從記憶體來進行到SIMD運算裝置内暫存器（reglSteI*) -4· 本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐) 裝訂535107 A7 B7 V. Description of the Invention (1) Technical Field The present invention relates to a data processing device, and more specifically, it relates to general use processing such as motion detection, motion compensation processing used for image signal compression and elongation processing. It is a data processing device that performs high-speed and efficient mass data processing. Technical background For video and audio elongation and compression processing, it is necessary to repeatedly perform the same arithmetic processing at a high speed for a large amount of data. For this reason, in the part that performs the same arithmetic processing as described above, a dedicated arithmetic device has been provided, and in order to make the arithmetic device operate at a high speed, it has a plurality of processor elements (processor units) arranged in parallel, and A data processing device composed of a SIMD (Single Instruction Multiple Data) computing device for making these partial operations through the same program has been known. The SIMD computing device is described in the "Interface" document, pages 111 to 113 of March 1998. Specifically, the Pentium MMX technology of Intel Corporation is known. In the SIMD mode computing device, continuously supplying data from the memory to improve the operating rate of the computing unit is an important factor in determining performance. However, a data processing device that combines a conventional Central Processing Unit (CPU) and a SIMD-based processor is connected in the device structure through a common / shell material bus and an address bus. CPU and SIMD computing device. Therefore, it is carried out from the memory to the register in the SIMD computing device (reglSteI *) -4. This paper size applies the Chinese National Standard (CNS) A4 specification (210X 297 mm) binding

線 535107 ----------------] ! 修 I it Κ] - 、！、補⑤^_ 五、發明説明（2 ) 之資料傳送，其次是進行運算，而後是將暫存器内之運算結果傳送到記憶體，開始下次資料處理之動作。在該情況，會有使用相鄰之處理器元件所使用的資料，而無法提昇運算效率之問題。為了解決該問題所考慮之方式，可思及的是依照系統大型積體電路LSI之設計方式，以系統匯流排和獨立之寬廣匯流排寬度的局部匯流排連接SIMD運算裝置和内置記憶體。但是，以該方式，SIMD運算裝置和記憶體之資料傳送性能雖會提昇，然不限定從CPU與SIMD運算裝置授受運算指令的系統匯流排之流量（traffic)會成為問題，在CPU和 SIMD運算裝置之兩端，需要位址產生器，而CPU無法一元化地管理記憶體之資料讀出和SIMD運算裝置之資料儲存兩者。因此，而有無法有效應用SIMD運算裝置之高速性能之問題。發明概要本發明之主要目的，是實現能高速處理資料之資料處理裝置。本發明之其他目的，是在具有經由中央處理裝置來控制並且以記憶體和局部匯流排來連接之運算單元（unit)的資料處理裝置中，實現以中央處理裝置一元化地管理記憶體之資料讀出和運算單元之資料儲存兩者的資料處理裝置。再者，本發明之其他目的，是能夠不間斷地對於構成運算單元之處理器元件作資料供給，不僅儘量使每個時脈運算變成可能，並且能實現高速處理資料之資料處理裝置。本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐)Line 535107 ----------------]! Repair I it Κ]-,! 、 ⑤ ^ _ 5. The data transfer of the invention description (2) is followed by the calculation, and then the operation result in the temporary register is transferred to the memory to start the next data processing operation. In this case, there is a problem that the data used by adjacent processor components cannot be used to improve the operation efficiency. In order to solve the problem, it is conceivable to connect the SIMD computing device and the built-in memory with a system bus and a local bus with a wide independent bus width according to the design method of the large integrated circuit LSI. However, in this way, although the data transfer performance of the SIMD computing device and the memory will be improved, it is not limited to the traffic of the system bus that receives and receives computing instructions from the CPU and the SIMD computing device, which will become a problem. At both ends of the device, an address generator is required, and the CPU cannot manage both the data reading of the memory and the data storage of the SIMD computing device. Therefore, there is a problem that the high-speed performance of the SIMD computing device cannot be effectively applied. SUMMARY OF THE INVENTION The main object of the present invention is to realize a data processing device capable of processing data at a high speed. Another object of the present invention is to realize data reading in a unified management of memory by a central processing device in a data processing device having a computing unit controlled by a central processing device and connected by a memory and a local bus. Data processing device for both data output and data storage of the arithmetic unit. Furthermore, the other object of the present invention is to provide a data processing device for uninterruptedly supplying data to the processor elements constituting the computing unit, not only making every clock operation possible, but also realizing high-speed data processing. This paper size applies to China National Standard (CNS) A4 (210X 297 mm)

為了達到上述目的，本發明之資料處理裝置中設置有如下足構造··運算單元，經由CPU裝置來控制；第1記憶機構’、位址匯泥排，共通連接到上述CPU、運算單元和第1 記憶機構；及局部資料匯流排（1〇cal data bus)，具有比上述CPU所具有之資料匯流排的匯流排寬度更為寬廣之匯流排寬度，並且與上述運算單元結合。本發明中，藉由於第1記憶機構和運算單元之間設置局部資料匯流排，可提昇資料傳送性能，並且經由從€1>11到運算單元連接控制線，藉而可將供應到運算單元之運算指令從系統匯流排之流量獨立。甚者，位址匯流排，為了共通連接CPU裝置、運算單元和第it憶機構，位址產生器，最好是僅設置於CPU裝置，而不需要設置於運算單元。第1记憶機構亦是與運算單元之暫存器一起位於cpu 裝2之位址空間，CPU裝置可一元化地管理第1記憶機構之資料讀出，和運算單元之暫存器的資料儲存兩者之位址指定。根據本發明之較佳實施例，是以具有多個處理器元件之 SIMD控制型運算器來構成上述運算單元，上述各個處理器元件具有弟1輸入端子、第2輸入端子和輸出端子，具有下歹]者之構k ·弟1暫存器’其位元寬是全部處理器元件之第1輸入端子的位元寬的總計；第2暫存器，其位元宽是全部處理器元件之第2輸入端子的位元寬的總計；及第3暫存器，具有處理器元件之第2輸入端子位元寬以上的位元寬，並且能在第2暫存器以第2暫存器位元寬單位作資料移 535107 A7 ______ B7 五、發明説明（4 ) 位（shift)所構成者。本發明之資料處理裝置是特別以如下之實施例來說明，其在圖像編碼處理之移動檢測處理等中有效，並且能適用於有必要與CPU處理平行進行高速運算處理之處理裝置。圖式簡單說明圖1是本發明的資料處理裝置之第1實施例構造塊圖。圖2是圖1之SIMD型運算器4的内部構造之電路圖。圖3是圖1之CPU 2内部構造圖。圖4是圖2之處理器元件38内部構造圖。圖5是圖2之SIMD型運算器4的動作說明圖。圖ό是圖2之SIMD型運算器4的動作說明圖。圖7是第1實施例所使用之參考圖像資料說明圖。圖8是第1實施例所使用之編碼化圖像資料說明圖。圖9是圖1之DRAM 16上的位址圖。圖1 0是圖1之工作（work) RAM 12上的位址圖。圖1 1是第1實施例之動作流程圖（flow ehar<t)。圖1 2是說明第1實施例之SIMD型運算器4的暫存器之資料傳送之說明圖。 ° ' 圖13是第1實施例之向量（vector) (0, 〇)的運算範圍之說明圖。圖14是第1實施例之向量（1，0)的運算範圍之說明圖。圖15是本發明資料處理裝置之第2實施例的構造塊圖。圖1 6是第2實施例之CPU内部構造圖。圖1 7是第2實施例之動作流程圖。本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公袭巧 '' -------In order to achieve the above-mentioned object, the data processing device of the present invention is provided with the following foot structure, an arithmetic unit, which is controlled by a CPU device; a first memory mechanism ', an address sink, and the common connection to the CPU, the arithmetic unit, and the first 1 memory mechanism; and a local data bus (10cal data bus), which has a wider bus width than the data bus width of the above-mentioned CPU, and is combined with the above-mentioned arithmetic unit. In the present invention, since a local data bus is provided between the first memory mechanism and the arithmetic unit, the data transmission performance can be improved, and the control line can be connected to the arithmetic unit from € 1> 11, so that it can be supplied to the arithmetic unit. The flow of calculation instructions from the system bus is independent. Furthermore, for the address bus, in order to connect the CPU device, the arithmetic unit and the memory unit in common, the address generator is preferably provided only in the CPU device, and does not need to be provided in the arithmetic unit. The first memory mechanism is also located in the address space of the CPU with the register of the computing unit. The CPU device can manage the reading of the data of the first memory mechanism and the data storage of the register of the computing unit. Specify the address of the person. According to a preferred embodiment of the present invention, the above-mentioned arithmetic unit is constituted by a SIMD control type arithmetic unit having a plurality of processor elements. Each of the above-mentioned processor elements has a first input terminal, a second input terminal, and an output terminal.歹] The structure of the k · brother 1 register 'its bit width is the total of the bit width of the first input terminal of all processor elements; the second register, its bit width is the total of all processor elements The total of the bit width of the second input terminal; and the third register has a bit width greater than the bit width of the second input terminal of the processor element, and can be used in the second register as the second register. Bit-wide units for data shift 535107 A7 ______ B7 V. Description of invention (4) Bit shift. The data processing device of the present invention is specifically described in the following embodiment, which is effective in the motion detection processing of image encoding processing, and can be applied to a processing device which needs to perform high-speed arithmetic processing in parallel with CPU processing. Brief Description of the Drawings Fig. 1 is a block diagram showing a first embodiment of a data processing apparatus according to the present invention. FIG. 2 is a circuit diagram of the internal structure of the SIMD-type arithmetic unit 4 of FIG. 1. FIG. 3 is an internal configuration diagram of the CPU 2 of FIG. 1. FIG. 4 is an internal configuration diagram of the processor element 38 of FIG. 2. FIG. 5 is an operation explanatory diagram of the SIMD type arithmetic unit 4 of FIG. 2. FIG. 6 is an operation explanatory diagram of the SIMD type arithmetic unit 4 of FIG. 2. FIG. 7 is an explanatory diagram of reference image data used in the first embodiment. FIG. 8 is an explanatory diagram of encoded image data used in the first embodiment. FIG. 9 is an address map on the DRAM 16 of FIG. 1. FIG. FIG. 10 is an address map on the work RAM 12 of FIG. 1. Fig. 11 is a flow chart (flow ehar < t) of the first embodiment. Fig. 12 is an explanatory diagram for explaining data transfer of a register of the SIMD type arithmetic unit 4 of the first embodiment. ° 'Fig. 13 is an explanatory diagram of a calculation range of a vector (0, 0) in the first embodiment. 14 is an explanatory diagram of a calculation range of a vector (1, 0) according to the first embodiment. Fig. 15 is a block diagram showing a second embodiment of the data processing apparatus of the present invention. FIG. 16 is a diagram showing the internal structure of the CPU of the second embodiment. Fig. 17 is an operation flowchart of the second embodiment. This paper size applies to China National Standard (CNS) A4 specifications (210X 297) `` -------

裝訂Binding

線 535107 A7 B7 五、發明説明（5 ) 圖1 8是本發明資料處理裝置之第3實施例的構造塊圖。圖1 9是本發明資料處理裝置之第4實施例的構造塊圖。圖20是第4實施例之VPU 160内部構造圖。實施本發明之較佳實施例 <實施例1 > 圖1是圖示本發明資料處理裝置之第1實施例的構造塊圖。該實施例之資料處理裝置，是在圖像影編碼化處理中，以運算單元進行利用資料塊匹配（block matching)法之移動檢測處理者。前者是說明裝置構造，後者是說明檢測 < 移動處理之動作。如同圖所示，該資料處理裝置具有下列者：運算單元 4，藉由中央處理裝置（以下是簡稱CPU) 2介以控制線3和5 直接控制之SIMD運算器所構成；工作隨機存取記憶體 (work RAM) 12，其是記憶機構；位址匯流排1 0，共通連接到CPU 2、運算單元4和工作隨機存取記憶體1 2 ;及局部資料匯流排8，具有比CPU 2所具有之資料匯流排6的匯流排寬度更為寬廣之匯流排寬度，並且使運算單元4和工作隨機存取記憶體1 2結合。 CPU 2，係將指令解碼（decode)來控制整體。在本實施例，是使用RISC型微處理器（microprocessor)。2 0是儲存 CPU裝置2之程式（program)等的ROM，1 8是儲存CPU裝置2 之資料或是程式等的RAM。1 2是為了暫時保持SIMD型運算器4之運算資料的工作RAM，1 6是儲存有圖像資料之 DRAM，14是DRAM和工作 RAM 12之 DRAM 界面（interface) 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) 535107 五、發明説明電路，22是控制DRAM 16和工作RAM 12之直接記憶體存取（Direct Memory Access, DMA)傳送之 DMA 電路。本貫施例具有3種匯流·排，CPU 2之資料匯流排6的匯流排宽度是3 2位元，位址匯流排丨〇之匯流排寬度是3 2位疋，資料匯流排8和24之匯流排寬度是144位元。圖中在匯流排所添加斜線數目是表示匯流排寬度（位元寬）。以下’詳細說明各部分構造之動作。圖2是圖示圖丨之SIMD型運算器4的内部構造之電路圖。運算單元4是以具有丨6個並行配置之處理器元件3 8、 40· ..42、44之SIMD控制型運算器所構成；各個處理器元件/、有·第1輸入端子，介以選擇器（select〇r) 32來連接到暫存器30;第2輸入端子，是連接到暫存器34;及輸出端子，是連接到資料匯流排6和8。暫存器3 0，其位元寬具有全部處理器元件38、40···42、44之第1輸入端子位元寬的總計。暫存器34，其位元寬是具有全部處理器元件之第 2輸入端子位元寬的總計。再者，又具有如下之第3暫存器 36 ·具有處理器元件之第2輸入端子位元寬以上的位元寬’並且在暫存器34具有可以第2輸入端子位元寬單位使資料移位之第三暫存器3 6。各個處理器元件38、40···42、44 ,係介以控制線3和5 由CPU 2來控制。從暫存器3 0到處理器元件3 8、 40·· .4 2、44之資料供應，可經由選擇器3 2來改變。又，暫存器3 0、3 4和3 6，分別從由位址匯流排丨〇控制之寫入私路50、46和48，介以局部資料匯流排8來寫入資料。巧張尺度適财㈣家㈣(CNS) M規格(⑽χ 297公董)--一~~-Line 535107 A7 B7 V. Description of the invention (5) Fig. 18 is a block diagram of the third embodiment of the data processing device of the present invention. Fig. 19 is a block diagram showing a fourth embodiment of the data processing apparatus of the present invention. FIG. 20 is a diagram showing the internal structure of the VPU 160 according to the fourth embodiment. A preferred embodiment for carrying out the present invention < Embodiment 1 > Fig. 1 is a block diagram illustrating a first embodiment of the data processing apparatus of the present invention. The data processing device of this embodiment is a motion detection processor using a block matching method with an arithmetic unit in an image shadow coding process. The former describes the structure of the device, and the latter describes the operation of detecting < movement processing. As shown in the figure, the data processing device has the following: a computing unit 4, which is composed of a central processing device (hereinafter referred to as the CPU) 2 a SIMD calculator directly controlled by control lines 3 and 5; a working random access memory Work RAM 12, which is a memory mechanism; address bus 10, which is commonly connected to CPU 2, computing unit 4, and work random access memory 1 2; and local data bus 8, which has more memory than CPU 2. The bus width of the data bus 6 has a wider bus width, and the arithmetic unit 4 and the working random access memory 12 are combined. The CPU 2 decodes instructions to control the whole. In this embodiment, a RISC type microprocessor is used. 20 is a ROM for storing programs and the like of the CPU device 2, and 18 is a RAM for storing data or programs of the CPU device 2. 12 is a working RAM for temporarily retaining the calculation data of the SIMD type arithmetic unit 4, 16 is a DRAM storing image data, 14 is a DRAM interface of the DRAM and the working RAM 12, and this paper standard applies Chinese national standards ( CNS) A4 specification (210 X 297 mm) 535107 5. Inventive circuit, 22 is a DMA circuit that controls the direct memory access (DMA) transfer of DRAM 16 and work RAM 12. This embodiment has 3 kinds of buses and buses. The bus width of the data bus 6 of the CPU 2 is 32 bits, and the bus width of the address bus 丨〇 is 32 bits. The data buses 8 and 24 The bus width is 144 bits. The number of oblique lines added to the bus in the figure indicates the bus width (bit width). Hereinafter, the operation of each part structure will be described in detail. FIG. 2 is a circuit diagram illustrating the internal structure of the SIMD type arithmetic unit 4 of FIG. The arithmetic unit 4 is composed of SIMD-controlled arithmetic units with 6 parallel processor elements 3 8, 40, .. 42, and 44; each processor element / has a first input terminal for selection Selector 32 is connected to the register 30; the second input terminal is connected to the register 34; and the output terminal is connected to the data buses 6 and 8. The register 30 has a total bit width of the first input terminal bit width of all processor elements 38, 40, 42, and 44. The bit width of the register 34 is a total of the bit widths of the second input terminals having all the processor elements. In addition, it has the following third register 36: It has a bit width greater than the bit width of the second input terminal of the processor element, and the register 34 has a data unit that can be used as the second input terminal bit width Shift the third register 36. Each processor element 38, 40 ... 42, 44 is controlled by the CPU 2 via control lines 3 and 5. The data supply from the register 30 to the processor elements 38, 40,... 2, 44 can be changed via the selector 32. In addition, the registers 30, 34, and 36 respectively write data from the private lanes 50, 46, and 48 controlled by the address bus 丨 0 through the local data bus 8. Kojima scale suitable for financial and domestic (CNS) M specifications (⑽χ 297 公董)-一 ~~-

裝訂Binding

線 535107 . A7 _____B7 五、發明説明（7 ) 圖3，是圖示圖1之Rise型微處理器2構造塊圖。該構造’是與習知之微處理器完全相同，其具有下列之構造：指令解碼（decode)電路58，從指令提取（fetch)電路60，介以、.泉路7 2來輸入提取之指令並且將其解碼；運算電路 6 4 ’執行來自指令解碼電路$ $之指令6 8 ;程式計數器 (program counter) 54 ;及泛用暫存器 56。此外’例如’在指令編碼電路58，對於SIMD型運算器4 之運算指令情況，係啟動（active)信號線3 ;對於simd型運异為4結果之讀.出指令情沉，係啟動信號線$。6 6、6 8、 6 2、7 3和7 4是指令和資料傳送線。圖4是上述處理器元件之構造塊圖。SIMD型運算器4之 16個處理器元件38、4〇·.·42、44的構造完全相同。在此，作為代表，以處理器元件3 8為例子來說明。處理器元件38,具有下列之構造：暫存器82，用來保持運算電路 8 0、8 1之運算結果；及讀出控制電路，用來控制到局部貝料匯流排8或是資料匯流排6之讀出。在運算電路8 〇 , 暫存器3 0之I44個位元的位元寬之一部份的9位元係介以匯流排37來輸入，又，暫存器“之丨44個位元的位元寬之一部份的9位元係介以匯流排3 5來輸入。所輸入之2種資料，是在運算電路80經運算（減法運算），運算電路8〇之輸出是在運算電路81與暫存器82之值相加。運算電路81 之運算結果是儲存在暫存器82。圖5和圖6，是說明選擇器32之連接形態之圖。在第1連接形態中，如同圖5所示，從暫存器3〇之144個位元的最 -10-Line 535107. A7 _____B7 V. Description of the Invention (7) Figure 3 is a block diagram of the Rise microprocessor 2 shown in Figure 1. This structure is exactly the same as the conventional microprocessor, and has the following structure: an instruction decode circuit 58, a fetch circuit 60 from the instruction, and a fetch instruction inputted through the spring 72 and Decode it; the arithmetic circuit 6 4 ′ executes the instruction 6 8 from the instruction decoding circuit $; a program counter 54; and a general purpose register 56. In addition, for example, in the instruction encoding circuit 58, for the operation instruction of the SIMD type arithmetic unit 4, the active signal line 3 is activated; for the simd type operation difference, the result of the reading is 4. The instruction sentiment is the activation signal line. $. 6 6, 6 8, 6, 2, 7, 3 and 74 are command and data transmission lines. FIG. 4 is a structural block diagram of the processor element. The structure of the 16 processor elements 38, 40, 42, 44 of the SIMD type arithmetic unit 4 is exactly the same. Here, as an example, the processor element 38 will be described as an example. The processor element 38 has the following structures: a register 82 for holding the operation results of the arithmetic circuits 80 and 81; and a readout control circuit for controlling the local shell material bus 8 or the data bus Read out of 6. In the operation circuit 8 0, the 9-bit part of the I44 bit width of the register 30 is input via the bus 37, and the register of 44 bits The 9-bit part of the bit width is input through the bus 35. The two types of input data are calculated (subtracted) in the arithmetic circuit 80, and the output of the arithmetic circuit 80 is in the arithmetic circuit. The value of 81 and the register 82 are added. The operation result of the arithmetic circuit 81 is stored in the register 82. Fig. 5 and Fig. 6 are diagrams illustrating the connection form of the selector 32. In the first connection form, it is like As shown in Fig. 5, the most -10- bits of 144 bits from register 30

535107 A7 _____B7 五、發明説明（8 ) 上位元，將9位元a 0共通地供應到各個處理器元件3 8 , 40...44，42。又，在第2連接形態，如同圖6所示，暫存器30之全部内容144個位元，從上位部分，分別為9位元之a0、a2、. · .al4、al5各別供應到處理器元件3 8、 4 0 ··. 4 4，4 2。依此，如圖所示之a 〇的9位元資料是在〇號之處理器元件3 8被分配供應資料；a 1之9位元資料是在1 號之處理器元件4 0被分配供應資料。其次，使用上述資料處理裝置來說明利用MPEG 2規格之圖像信號編碼處理中進行所處理之圖像動作檢測情況。利用MPEG 2規格之圖像動作檢測，是以水平方向為1 $個像素，垂直方向為1 6個像素之巨資料塊（macr〇 w〇ck)單位所編碼之巨資料塊，相對於作為比較對照之參考畫面，在探索範圍之中求取最相似之巨資料塊位置，來進行求取在其2個巨資料塊之間的圖像幀（frame)距離之處理。通常，動作檢測，是以資料塊匹配法來進行。所謂資料塊匹配法，是使與所編碼之圖像像素對應的參考圖像像素之差分絕對值，對於巨資料塊内之全部像素，來進行累積相加，並且進行發現累積相加值之最小值巨資料塊之位置的處理。圖7和圖8，是各別圖示將上述圖像編碼時之參考圖像資料和所編碼圖像之巨資料塊的編碼圖像的像素。在此，參考圖像資料是假設為水平方向是352個像素，垂直方向是 240個像素。在圖中以圓圈所圍之記號^ 、 ra2...rbl...rbl7·.·等是識別像素之記號。又，巨資料塊，係 -11-535107 A7 _____B7 V. Description of the invention (8) The upper bit supplies 9 bits a 0 to each processor element 3 8, 40 ... 44, 42 in common. In the second connection mode, as shown in FIG. 6, the entire contents of the register 30 are 144 bits, and from the upper part, 9 bits a0, a2,.. .4, and al5 are respectively supplied to Processor elements 3 8, 4 0 ··. 4 4, 4 2 Accordingly, as shown in the figure, the 9-bit data of a 0 is allocated to the processor element 38 of 0; the 9-bit data of a 1 is allocated to the processor element 40 of 1 data. Next, the above-mentioned data processing device will be used to explain the processing of the detected image motion in the image signal encoding process using the MPEG 2 standard. Image motion detection using the MPEG 2 specification is compared with a giant data block encoded in units of macroblocks (horizontal block) of 1 $ pixels horizontally and 16 pixels vertically. With reference to the reference picture, the position of the most similar giant data block is obtained in the search range, and the process of obtaining the image frame distance between the two giant data blocks is performed. Generally, motion detection is performed by a data block matching method. The so-called data block matching method is to make the absolute value of the difference between the reference image pixels corresponding to the encoded image pixels, cumulatively add all the pixels in the giant data block, and find the minimum value of the cumulative added value. The processing of the location of the value huge data block. Fig. 7 and Fig. 8 respectively illustrate pixels of a reference image data and a coded image of a giant data block when the image is coded. Here, the reference image data is assumed to be 352 pixels in the horizontal direction and 240 pixels in the vertical direction. In the figure, marks ^, ra2 ... rbl ... rbl7, etc., which are surrounded by circles, are marks for identifying pixels. Also, the giant data block, -11-

535107 A7 B7535107 A7 B7

水平方向為1 6個像素，垂直方向為丨6個像素，在圖中以圓圈所包圍之記號tal、欧却16等是識別像素之記號。圖9 ,是圖示圖1之DRAM 16所儲存之資料。圖中之記號 ral、ra2、.·Μΐ···α8···等，是表示與圖7、圖8所示之記號對應的像素。從位址Α000是分配到參考圖像資料區域σ，以ϋ DRAM 16之位元寬32位元來儲存水平方向之4個像素。從位址B000是分配到巨資料塊，即所編碼之圖像資料區域。圖1 0圖示工作RAM 12所儲存之編碼圖像資料和參考圖像資料。在此，從位址C000是分配到參考圖像資料區域。各個像素資料是9位元資料，在從位址c〇〇〇而始之144個位元中，儲存有從像素ral到像素ral6i 16個水平像素資料。又，從位址D000是分配到編碼圖像資料之區域。與參考圖像 > 料之情況相同，在位址D〇〇〇之144個位元中，儲存從像素tal到像素tal6之16個水平像素。圖1 1，疋上述;貝料處理裝置之移動檢測之處理流程圖 (flow chart) 0 首先，介以DRAM界面（interface) 16來將DRAM 16之資料 (圖9)傳送到工作RAM 12(步驟，90)。此時，在每！個像素之8位元資料附加符號位元，來進行每1個像素之9位元資料的符號伸長。在DRAM 16上並列4個長字（long word)資料來產生144個位元資料。重複如此之傳送，介以匯流排 24而在工作RAM 12儲存資料。其次，介以局部資料匯流排8，從工作RAM 12對SIMD運算器4之暫存器34傳送參考圖像資料（步驟92)。 -12- 本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐) 五、發明説明（1〇 ) 圖1 2是用來說明步驟9 2之詳細動作說明圖，其是以與時間之關係來圖示1 6個處理器元件3 8、4〇、. . 42、44 =H4個位元之暫存器錢、暫存器B34、暫存器⑶之信號 =通。即，其又圖不在縱方向所示之時刻（和此時之暫存器3 〇、3 4、3 6的内容變化。如同上逑般，暫存器A3〇中儲存有必須編碼之圖像的多像素貝料，-連_位兀列之上位9位元是共通地供應給全部之處理器元件38、40.··42、“ ；在暫存器β34中儲，有參考W像之多個像素㈣；上位9位元是供應給處理态兀件38，其次之9位元是供應給所謂之處理器元件4〇，，用地每9位元地供應給其他之處理器元件。暫存，器， =將資料移位（shift)來供應到暫存器B34。在9位元之移位々々凊;兄，暫存為C36之上位9位元供應給暫存器B34之下位9位元。在此，可瞭解的是，時刻t=〇(步驟92)時，從暫存器B34 之參考圖像資料像素ral到像素ral6，以144個位元寬一次傳送。寺亥]t 1(步騍9 4 )時，從工作ram 12將資料傳送到暫存器C36。、結果，參考圖像資料像素ral7到像素ra32 ,重新以 144個位疋寬一次傳送到暫存器c36。結果，跨越暫存器 B34和暫存器C36儲存3 2個水平像素之工條線的參考圖像資料。時刻t=2(步驟96)時，從工作RAM 12至暫存器A30，一次傳运從編碼圖像資料之巨資料塊像素tal到像素“Μ之 -13- 本紙張尺度適财胸97公爱) 535107 A7The horizontal direction is 16 pixels, and the vertical direction is 6 pixels. In the figure, the marks tal, Ou Dau, etc. surrounded by circles are the marks to identify pixels. FIG. 9 illustrates data stored in the DRAM 16 of FIG. 1. The symbols ral, ra2,... M..., Α8, etc. in the figure represent pixels corresponding to the symbols shown in FIGS. 7 and 8. The address A000 is allocated to the reference image data area σ, and 4 pixels in the horizontal direction are stored in the 32-bit width of the ϋ DRAM 16. The address B000 is allocated to the huge data block, that is, the encoded image data area. FIG. 10 illustrates the encoded image data and the reference image data stored in the work RAM 12. Here, the slave address C000 is allocated to the reference image data area. Each pixel data is 9-bit data, and among the 144 bits starting from the address c00, 16 horizontal pixel data from the pixel ral to the pixel ral6i are stored. The slave address D000 is an area allocated to coded image data. As in the case of the reference image >, in the 144 bits of the address D00, 16 horizontal pixels from the pixel tal to the pixel tal6 are stored. Figure 11 1. The above; the flow chart of the movement detection process of the shell processing device. 0 First, the DRAM 16 data (Figure 9) is transferred to the working RAM 12 via the DRAM interface 16 (steps). , 90). At this time, at every! The sign bit is added to the 8-bit data of each pixel to perform the sign extension of the 9-bit data of each pixel. Four long word data are juxtaposed on the DRAM 16 to generate 144 bit data. Repeating this transfer, data is stored in the work RAM 12 via the bus 24. Next, the reference image data is transferred from the work RAM 12 to the register 34 of the SIMD processor 4 via the local data bus 8 (step 92). -12- This paper size is in accordance with Chinese National Standard (CNS) A4 specification (210X 297 mm) V. Description of the invention (10) Figure 12 is a detailed operation diagram for explaining step 92, which is based on time The relationship between the 16 processor elements 38, 40,... 42, 44 = H4 bits of register money, register B34, register ⑶ signal = ON. That is, the figure is not at the time shown in the vertical direction (and the contents of the registers 30, 34, 36 at this time change. As in the previous example, the register A30 stores the image that must be encoded The multi-pixel material, the top 9 bits of the-even_bit column are commonly supplied to all the processor elements 38, 40 ... 42, "; stored in the register β34, there is a reference like Multiple pixels: the upper 9 bits are supplied to the processing element 38, the next 9 bits are supplied to the so-called processor element 40, and the land is supplied to other processor elements every 9 bits. Store, register, = Shift data to supply to register B34. Shift in 9 bits; brother, temporarily store as C36 upper 9 bits supply to lower register B34 It can be understood here that at time t = 0 (step 92), the reference image data from the register B34 to the pixel ral to the pixel ral6 are transmitted with a width of 144 bits at a time. Sihai] At t 1 (step 9 4), the data is transferred from the working ram 12 to the temporary register C36. As a result, the reference image data pixel ral7 to pixel ra32 are re-width by 144 bits It is transmitted to the register c36. As a result, the reference image data of the 3 horizontal pixel lines is stored across the register B34 and the register C36. At time t = 2 (step 96), the work RAM 12 To register A30, one time transfer from the pixel tal of the giant data block of the encoded image data to the pixel "M of -13-this paper size is suitable for financial purposes 97 public love) 535107 A7

本紙張尺度逋财a S家_(CNS) A4規格(210 X 297公爱)This paper size 逋财 a _ (CNS) A4 size (210 X 297 public love)

裝訂Binding

線 535107Line 535107

3個暫存器3〇、34、36之資料傳送。首先，時刻_ 驟104)時’是從工作RAM12傳送資料到暫存器B。 (步時α刻户2〇(步驟106)時，是從工作RAM 12傳送資料到暫存3 6。結果，是形成圖丨2之時刻t=2〇的狀態，先前所運异 <線下的1條線之參考圖像資料，係從像素此丨到此h 跨暫存器34和暫存器36儲存。時刻t=21(步驟108)時，是從工作RAM 12傳送資料到暫存器A。結果，從先前運算之下丨條線之編碼圖像像素到U16儲存到暫存器A ^ 3個暫存器3〇、34、36均有儲存貝料。然後，與上述者相同，進行運算。將此動作就丨6條線重複。 ” 結果，在處理器元件3 8内部之暫存器8 2，儲存有相對全部像素之差分絕對值的累積相加值。該值係表示圖丨3之向量（0, 0)的資料塊匹配（block matching)運算之結果，即與向量（0, 0)對應之趨近度。 ' 另一方面，在處理器元件40之内部暫存器82，是儲存圖1 4之向量（1，〇)的資料塊匹配運算之結果，同樣地，i 6 個處理器元件3 8 ··· 4 4可同時得到1 6個移動向量之資料塊匹配運算結果。在本實施例，從工作RAM 12到SIMD運算器4 ,可在不介由資料處理裝置之系統資料8下，同時傳送大量資料，@ 時，在SIMD運算器4不設置位址產生器，經由cpu 4之位址管理，而能一元化地管理工作RAM 12和SIMD運算器4 之間的資料傳送。依此，可以資料塊匹配來進行圖像處理 -15- 本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐) 535107Data transfer of 3 registers 30, 34, 36. First, at time _ step 104), the data is transferred from the work RAM 12 to the register B. (At the time of step α, user 20 (step 106), the data is transferred from the work RAM 12 to the temporary storage 36. As a result, the state at the time t = 20 between the time when the map 2 is formed, and the previously transported line is <line The reference image data of the next line is stored across the register 34 and register 36 from the pixel here to this h. At time t = 21 (step 108), the data is transferred from the work RAM 12 to the temporary Register A. As a result, the encoded image pixels of the line from the previous calculation to U16 are stored in register A ^ 3 registers 30, 34, and 36 all store the shell material. Then, Same operation is performed. This action is repeated for 6 lines. "As a result, the register 8 2 inside the processor element 3 8 stores the cumulative addition value of the absolute value of the difference between all the pixels. This value is The result of the block matching operation of the vector (0, 0) in Fig. 3 is the degree of approach corresponding to the vector (0, 0). 'On the other hand, temporarily inside the processor element 40 The memory 82 stores the result of the data block matching operation of the vector (1, 0) in FIG. 14. Similarly, i 6 processor elements 3 8 ··· 4 4 may The data block matching operation result of 16 motion vectors is obtained at this time. In this embodiment, from the working RAM 12 to the SIMD arithmetic unit 4, a large amount of data can be transmitted at the same time without system data 8 through the data processing device, @ 时No address generator is set in the SIMD computing unit 4. Through the address management of the CPU 4, the data transfer between the working RAM 12 and the SIMD computing unit 4 can be managed in a unified manner. Based on this, the data can be matched to map Image processing -15- This paper size applies Chinese National Standard (CNS) A4 (210X 297mm) 535107

以1個指令來作同樣多數運算的資動作檢測，對於有必要料處理變得有效。 <實施例2 > 圖。，二二f發明資料處理裝置之第2實施例的構造塊哭13(1貝：疋在圖1之資料處理裝置追加第2 SIMD運算 I。二Γ遗於此，又附加出自CPU 131之控制線134和一第2 SIMD運算器130之内部構造，是與圖2所 :相同，相同對應構造要素附有相同號碼，其說明在此 ^ ·在八他之構造因素，是有關於實質上與圖1所不者相同部分’亦是附有相同號碼來省略說明。圖16，是圖示第2實施例（圖15)之CPU 131構造的塊圖。CPU 131之構造，除了在圖3所示之實施例工的cpu 2，附加從指令編碼電路133而出之控制線η〕*。#之外，在實質上是與CPU 2相同。控制線132和134係用以控制第2 SIMD運算器130者。圖1 7係說明實施例2之資料處理裝置的動作之處理流程圖。在實施例2中，於SIMD運算器4之3個暫存器中儲存資料的動作，易言之，從DRAM 16將資料傳送到工作ram 12之動作（步驟90)，以至從工作ram 12將編碼圖像資料傳送到暫存器A之動作（步騾9 6 )，與圖丨1所示之附有相同步騾號碼之部分相同。Using the same instruction to detect the same number of operations, it becomes effective for necessary data processing. < Example 2 > The construction block of the second embodiment of the data processing device of the second and second inventions 13 (1): I added a second SIMD operation I to the data processing device of FIG. 1. Two Γ is left here, and the control from the CPU 131 is added. The internal structure of the line 134 and a second SIMD calculator 130 is the same as that shown in FIG. 2: the same corresponding structural elements are attached with the same numbers, and their explanations are explained here. The same parts as in FIG. 1 are also attached with the same numbers to omit description. FIG. 16 is a block diagram illustrating the structure of the CPU 131 of the second embodiment (FIG. 15). The structure of the CPU 131 is different from that shown in FIG. The CPU 2 shown in the example embodiment is added with a control line η] *. From the instruction encoding circuit 133, which is essentially the same as the CPU 2. The control lines 132 and 134 are used to control the second SIMD operation. 130 is a processing flowchart illustrating the operation of the data processing device of the embodiment 2. In the embodiment 2, the operation of storing data in the three registers of the SIMD calculator 4 is easy to say, Action of transferring data from DRAM 16 to work ram 12 (step 90), so as to edit data from work ram 12 The image data is sent to the operation of the register A (step 96 mule), identical with the one shown in FIG Shu portions of the same step number of mules.

在步驟9 6之後次’於本實施例之情況，是將資料儲存在 SIMD運算器130之暫存器。最初，是從工作ram 12將參考圖像資料傳送到暫存器B (步驟140)。其次，是從工作RAM -16- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) 535407 A7 B7 ：; 年 il 五、發明説明（14 ) 12將參考圖像資料傳送到暫存器C(步驟142)。最後，從工作RAM 12將所編碼圖像資料傳送到暫存器A (步驟144)。然後，與實施例.·1之情況相同，進行利用處理器元件（PE) 之運算。結果，可同時使用3 2個處理器元件，進行不同向量之資料塊匹配，可作更高速之處理。 <實施例3〉圖1 8是圖示本發明資料處理裝置之第3實施例的構造塊圖。在本實施例，具有2個工作RAM 144和146，可在 DRAM 16側和SIMD運算器4側間切換使用。資料儲存在工作RAM 144，使用該資料SIMD運算器4進行信號處理時，工作RAM 144係經由選擇器142和152來連接到SIMD運算器4側。另一方面，工作RAM 146是經由選擇器148和150來連接到DMAC 122側。然後，在工作RAM 146中，DMAC 122是從DRAM 16傳送，SIMD運算器4是傳送其次所使用之圖像資料。在此，SIMD運算器4在工作 RAM 144内之信號處理一終了，將切換工作RAM 144和工作RAM 146。易言之，將工作RAM 144連接到DMAC 122 側，並且將工作RAM 146連接到SIMD運算器4側。經由該構造，於是在工作RAM 146，已使用之資料係從DRAM 16 傳送而至，因此SIMD運算器4可立刻開始運算動作。依此，可提高運算效率。 <實施例4 >After step 96, in the case of this embodiment, the data is stored in a register of the SIMD calculator 130. Initially, the reference image data is transferred from the ram 12 to the register B (step 140). Secondly, from the working RAM -16- this paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 535407 A7 B7 :; year il V. invention description (14) 12 transfer the reference image data to Register C (step 142). Finally, the encoded image data is transferred from the work RAM 12 to the register A (step 144). Then, in the same manner as in the case of the embodiment .1, an operation using a processor element (PE) is performed. As a result, 3 or 2 processor elements can be used at the same time to perform data block matching of different vectors, which can be processed at a higher speed. < Embodiment 3> Fig. 18 is a block diagram illustrating a third embodiment of the data processing apparatus of the present invention. In this embodiment, there are two working RAMs 144 and 146, which can be switched between the DRAM 16 side and the SIMD computing unit 4 side. Data is stored in the work RAM 144. When the SIMD calculator 4 is used for signal processing, the work RAM 144 is connected to the SIMD calculator 4 via selectors 142 and 152. On the other hand, the work RAM 146 is connected to the DMAC 122 side via the selectors 148 and 150. Then, in the work RAM 146, the DMAC 122 is transferred from the DRAM 16, and the SIMD calculator 4 transfers image data used next. Here, the signal processing of the SIMD calculator 4 in the work RAM 144 is completed, and the work RAM 144 and the work RAM 146 are switched. In other words, the work RAM 144 is connected to the DMAC 122 side, and the work RAM 146 is connected to the SIMD calculator 4 side. With this structure, the used data is transferred from the DRAM 16 in the working RAM 146, so the SIMD computing unit 4 can immediately start the computing operation. Accordingly, the operation efficiency can be improved. < Example 4 >

圖1 9，係本發明資料處理裝置之第4實施例的圖。該實施例，是將本發明之資料處理裝置構在圖像信號壓縮LSI -17- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐)Fig. 19 is a diagram showing a fourth embodiment of the data processing apparatus of the present invention. In this embodiment, the data processing device of the present invention is constructed in the image signal compression LSI -17- This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm)

之中者。在Μ處理器元件166之匯流排丨84上，連接有各個構造要素塊。構造要素塊，含有：通訊界面168，其具有與外部數據機（modem)間之界面功能；音頻（audi〇)界面17〇，其具有與外部之音頻信號輸出入之功能；視頻(video)界面塊 172，其具有與外部圖像信號輸出入之功能；可變長度編馬和解馬塊164，其係擔負可變長度符號之編碼和解碼； Q-DCT/IQ-IDCT塊162，其係擔負量化，反量化、DCT、反 DCT處理，DRAM控制塊174，其係擔負DRAM 176之控制，及動作檢測塊16〇。動作檢測塊工6〇是與第上實施例所說明者相同。在本實施例中，與圖丨所示之裝置比較，在與DRAM界面 14和DRAM 16對應之DRAM 176在乙以之外之點，以及Mpu 166具有供控制動作檢測塊ι6〇之控制暫存器（c〇ntr〇i register) 185之點有所不同。經由該控制暫存器185，可進行動作檢測塊160之CPU 180控制。热說明本構造之圖像壓縮時的動作。經由視頻界面塊 172所輸入之編碼圖像資料，暫時儲存在dram 176。然後’疋以巨貝料塊單位讀入動作檢測塊1之工作ram。此時，所對應之探索範圍的參考圖像資料亦同時地讀入動作檢測塊160之工作RAM。如同在第1實施例所說明般，進行各個移動向量之差分絕對值運算的累積相加。在結束全部向量運算之後，將差分絕對值運算值最小之向量作為相對於孩巨資料塊之移動向量。然後，採用此時之編碼圖像 -1 8 - I紙張尺度適用巾s ®家料(CNS) A4規格(21GX297公釐)—Among those. To the busbar 84 of the M processor element 166, various structural element blocks are connected. The building element block includes: a communication interface 168, which has an interface function with an external modem (modem); an audio (audi0) interface, which has a function of inputting and outputting external audio signals; and a video interface Block 172, which has the function of inputting and outputting with external image signals; Variable-length marshaling and de-mapping block 164, which is responsible for encoding and decoding of variable-length symbols; Q-DCT / IQ-IDCT block 162, which is responsible for Quantization, inverse quantization, DCT, inverse DCT processing, DRAM control block 174, which is responsible for the control of DRAM 176, and motion detection block 160. The motion detection block 60 is the same as that described in the previous embodiment. In this embodiment, compared with the device shown in FIG. 丨, the point where the DRAM 176 corresponding to the DRAM interface 14 and the DRAM 16 is outside B, and the Mpu 166 has a control temporary storage for the control action detection block ι60 Register (c0ntr〇i register) 185 is different. Via the control register 185, the CPU 180 of the motion detection block 160 can be controlled. A description will be given of the operation when the image of this structure is compressed. The encoded image data input via the video interface block 172 is temporarily stored in the dram 176. Then, "疋" reads the working ram of the motion detection block 1 in units of giant shell blocks. At this time, the reference image data of the corresponding search range is also read into the work RAM of the motion detection block 160 at the same time. As explained in the first embodiment, the cumulative addition of the absolute value of the difference between the respective motion vectors is performed. After all the vector operations are finished, the vector with the smallest difference absolute value is used as the motion vector relative to the child data block. Then, use the coded image at this time -1 8-I Paper Size Applicable Towels ® Household Material (CNS) A4 Specification (21GX297 mm) —

裝訂Binding

線 535107Line 535107

和參考圖像所對應之各個像素差分值，並將其結果送到Q _ DCT/IQ-IDCT塊 164。在 Q_DCT/IQ_IDCT塊 164，對於從動作檢測塊160所送來之結果，進行DCT處理和量化處理，並送到可皮長度編碼和解碼塊16 4。在此，進行可變長度編碼處理，結束圖像資料之壓縮處理。如同上述般，藉由將本發明應用於圖像信號壓縮LSI(大型積體電路），可構成可程式化（pr〇grammabiUty)性高，且高性能之圖像信號壓縮LSI。產業上之可利用性如同上述實施例所說明般，本發明，可不間斷地對構成 SIMD型運算器之處理器元件供給資料，特別是，能夠提昇重複進行壓縮、伸長圖像信號之大量運算處理的信號處理中之運算效率。元件符號說明 2 中央處理裝置 3 控制線 4 運算單元 5 控制線 6 資料匯流排 8 局部匯流排 10 位址匯流排 12 工作隨機存取記憶體 14 工作動態隨機存取記憶體 16 動態隨機存取記憶體 -19- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐) 535107 A7 B7 五、發明説明（17 ) 18 隨機存取記憶體 20 唯讀記憶體 22 直接記憶體存取電路 24 匯流排 3 0 暫存器 32 選擇器 34 暫存器 3 6 暫存器 3 8 處理器元件 40 處理器元件 42 處理器元件 44 處理器元件 46 窝入電路 48 寫入電路 50 寫入電路 54 程式計數器 56 泛用暫存器 5 8 指令解碼電路 60 指令提取電路 62 指令及資料傳送線 64 運算電路 66 指令及資料傳送線 68 指令及資料傳送線 72 線路 -20-本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐) 535107 A7 ， B7 五、發明説明（18 ) 7 3 指令及資料傳送線 7 4 指令及資料傳送線 8 0 運算電路 8 1 運算電路 82 暫存器 8 4 讀出控制電路 122 直接記憶體存取電路 13 0 第二SIMD運算器 13 1 中央處理裝置 132 控制線 133 指令解碼電路 134 控制線 142 選擇器 144 工作隨機存取記憶體 146 工作隨機存取記憶體 148 選擇器 15 0 選擇器 1 52 選擇器 1 60 移動檢測塊 1 62 Q-DCT/IQ-IDCT 塊 164 可變長度編碼解碼塊 166 微處理器元件 16 8 通訊界面塊 17 0 聲頻界面塊 -21-本紙張尺度適用中國國家標準(CNS) A4規格(210X 297公釐) 535107 A7 B7 五、發明説明（19 ) 172 視頻界面塊 174 動態隨機存取記憶體控制塊 176 動態隨機存取記憶體 180 中央處理裝置 1 84? 匯流排 18 5 控制暫存器 -22- 本紙張尺度適用中國國家標準(CNS) A4規格(210 X 297公釐)The difference value of each pixel corresponding to the reference image, and the result is sent to the Q_DCT / IQ-IDCT block 164. In the Q_DCT / IQ_IDCT block 164, the DCT processing and quantization processing are performed on the result sent from the slave detection block 160, and it is sent to the pico-length encoding and decoding block 164. Here, a variable-length encoding process is performed, and the compression process of the image data is ended. As described above, by applying the present invention to an image signal compression LSI (Large Integrated Circuit), it is possible to construct a high-performance image signal compression LSI with high programmability. Industrial applicability As explained in the above-mentioned embodiment, the present invention can continuously supply data to the processor elements constituting the SIMD type arithmetic unit. In particular, it can improve a large number of arithmetic operations for repeatedly compressing and extending image signals. Operational efficiency in signal processing. Component symbol description 2 Central processing device 3 Control line 4 Computing unit 5 Control line 6 Data bus 8 Local bus 10 Address bus 12 Working random access memory 14 Working dynamic random access memory 16 Dynamic random access memory -19- This paper size applies to Chinese National Standard (CNS) A4 specification (210 X 297 mm) 535107 A7 B7 V. Description of the invention (17) 18 Random access memory 20 Read-only memory 22 Direct memory access Circuit 24 bus 3 0 register 32 selector 34 register 3 6 register 3 8 processor element 40 processor element 42 processor element 44 processor element 46 socket circuit 48 write circuit 50 write circuit 54 Program counter 56 Universal register 5 8 Instruction decoding circuit 60 Instruction fetch circuit 62 Instruction and data transmission line 64 Operation circuit 66 Instruction and data transmission line 68 Instruction and data transmission line 72 Line-20- This paper is applicable to China Standard (CNS) A4 specification (210X 297 mm) 535107 A7, B7 V. Description of the invention (18) 7 3 Command and data transmission line 7 4 Command and Material transfer line 8 0 arithmetic circuit 8 1 arithmetic circuit 82 register 8 4 read control circuit 122 direct memory access circuit 13 0 second SIMD arithmetic unit 13 1 central processing unit 132 control line 133 instruction decoding circuit 134 control line 142 selector 144 working random access memory 146 working random access memory 148 selector 15 0 selector 1 52 selector 1 60 motion detection block 1 62 Q-DCT / IQ-IDCT block 164 variable length coded block 166 Microprocessor component 16 8 Communication interface block 17 0 Audio interface block -21- This paper size applies to China National Standard (CNS) A4 specification (210X 297 mm) 535107 A7 B7 V. Description of invention (19) 172 Video interface block 174 dynamic random access memory control block 176 dynamic random access memory 180 central processing unit 1 84? Bus 18 5 control register -22- This paper size applies to China National Standard (CNS) A4 specifications (210 X 297 (Mm)

Claims

535107 Αδ Βδ

1. A data processing device, which has: w-.._ 运 1 is different from each other and controlled by the cpu; brother 1 memory mechanism; local data _, | needle bus, has a data bus that is higher than the above cpu — The width will be more broad. The bus width is wide, and the above-mentioned 1st arithmetic unit is connected with the above-mentioned first hidden mechanism; and the address bus is connected with the above-mentioned CPU and the above-mentioned first line. Miao Yilian was connected to the above-mentioned first memory mechanism in common. 2. According to the information in item i of the patent application scope 心, scallop processing device, in which the first operation unit is a SIMD type arithmetic unit. 3 :: Please request the data processing device of item i of the patent scope, in which the above-mentioned industrial operation unit is a plurality of parallel configuration. 4 · If the scope of patent application is the first! The data processing device of item, wherein the U memory unit has: a first memory; a second memory; and a read eight circuit, which is connected to the address bus and the data bus, and is controlled in the first memory and Data transfer between 2nd memory. 5. If the data processing device of the 3rd or 4th in the scope of patent application, the above-mentioned 11th memory device has: from the second memory to the i-th memory, the symbol expansion is performed when transferring by the DMA circuit Institution. 6. The data processing device according to item 4 of the scope of patent application, wherein the first memory has first and second work memories; the above-mentioned memory mechanism has the ability to switch the first and second working memories alternately. A mechanism for connecting a body to the above-mentioned first and arithmetic units and a connection to the above-mentioned second memory. 7. The data processing device according to item 1 of the scope of patent application, wherein the above-mentioned first operation unit '疋 is a SIMD control type arithmetic unit which processes and processes multiple data in parallel with a single instruction from the above-mentioned CPU. -23- This paper size applies to China National Standard (CNS) A4 specification (210 X 297 mm) 8, = data processing device in the scope of application for item i, in which the first operation list 疋 is composed of the following: A plurality of processor elements have a first input terminal, a second input terminal, and a first output terminal, and are actuated by the control signal from the CPU; the i-th register, whose bit width is 3, is: The sum of the bit widths of all the i-th input terminals 疋; the second temporary ::: whose bit width is all the 2nd inputs of the above-mentioned multiple processor elements: = total of width 'and in a way that all bit widths will not overlap Plus] the second input terminal with the processor element; the third register, which has the bit width of the second input terminal bit width of the processor element or more, and the ::: input terminal bit width The unit shifts data in the second register. The selector can select the resources of the above ^ register and supply it from the i-th input terminal bit width of the highest-order processor component to all of the above. The 丨 round entry > sub-write control circuit of the processor element is derived from the above address pool It is controlled in a row, and the data is written to the first, second, and third temporary storage states respectively through the above-mentioned local area; and the SIMD control type arithmetic unit has the data output of the above output terminals Circuit to the local data bus described above. 9 · If the data processing device for image processing of item 8 of the patent application scope, the processing element described above is within a certain range, and the above-mentioned (division 2 input Sakiko <data subtraction value and output to the calculation circuit The multiple pixel data of the 5 images must be stored in the above! Register, and 1 test (the multiple pixel data of the reference image is stored in the second temporary store and the output of the multiple processor elements is taken as ㈣ The multiples of the motion vectors correspond to the degree of approximation. -24- 535107 VI. Patent Application Fanyuan10 · A SIMD control type arithmetic unit, which has the following ... Multiple processor elements have a first input terminal , The second input terminal and the first output terminal; The first temporary storage benefit, whose bit width is the total of the bit width of all of the above-mentioned multiple processor elements' input terminals; the second temporary register, the bit width The width is the total of the bit widths of all the second input terminals of the f processor elements; and the third register, which has the bit width of the second input terminal of the processor element or more, is greater than and equal to The register is based on the second input terminal The data is shifted by the unit of width. Η · For example, the SIMD control type arithmetic unit of the patent application scope No. 10, among which the first register is provided with the following ... The above processing is prepared from the most significant bit The bit width of the first input terminal is commonly supplied to the connection circuits of all the processor components mentioned above; and the connection circuits of all the bit widths are supplied to all processor components in a non-overlapping manner. The SIMD-controlled arithmetic unit of the range item 丨 0, wherein the first register has the function of supplying the bit width of the first input child of the above-mentioned processor element from the most significant bit to all the above-mentioned processor elements. Alternatively, each clock pulse is used to perform arithmetic processing on the processor element, and in the first register, data shift processing is performed in the bit width unit of the i-th input terminal of the processor element, and The second register and the third register are a mechanism for shifting data in 7G wide units of the second input terminal bit of the processor element. 13 ·, according to the patent, the 11th or 1st scope of the patent is requested. SIMD of 2 items A shape arithmetic unit is used for image processing; a plurality of pixel data of the i-th image are stored in the i-th register; and the second and third registers 25- Paper standard suitable for aa home standard 297 public [535107 A8

A plurality of pixel data of the second image are stored therein; the processor element is composed of an arithmetic circuit that accumulates a difference between data applied from the first input terminal and the second input terminal. Each of the material elements outputs a mechanism of an approximation degree corresponding to a plurality of motion vectors between the image and the second image. 14. A data processing device comprising: a CPU; a first operation unit; a memory mechanism; a connection between the above-mentioned coffee unit and the above-mentioned memory mechanism "address bus; and a local data connection connecting the above-mentioned operation unit and the above-mentioned memory unit It is characterized in that the above-mentioned CPU has an instruction decoding circuit for decoding instructions, which controls the first operation unit with the output of the instruction decoding circuit, and the local data bus has a wider bus width than the CPU . 15. The data processing device according to item 14 of the scope of patent application, wherein the i-th operation unit is a SIMD type arithmetic unit. 16. A data processing device, characterized by having the following: a CPU; a first operation unit controlled by the CPU; a memory mechanism connected to the CPU by an address bus; a DMA circuit connected to the address bus It is connected with the above-mentioned memory unit; and the local data bus has a wider bus width than the data bus width of the above CPU. 17. The data processing device according to item 16 of the scope of patent application, wherein the first operation unit is a SIMD type arithmetic unit. 18. A data processing device, characterized in having the following: a first memory for storing instructions; a CPU, which is connected to the above-mentioned first memory through an address bus and a first data bus; and a second Memory, using the above address -26- This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 535107 AB c D ~, patent application range stream to connect with the above CPU; and computing The unit is connected to the second memory through a second data bus; and the second data bus has a wider bus width than the first data bus. 19. The data processing device according to item 18 of the scope of patent application, wherein the above arithmetic unit is a SIMD type arithmetic unit. 20. For example, the data processing device of claim 18 or item 19 has a DMA circuit, and the DMA circuit is connected to the address bus and the first data bus and the second memory. -27- This paper size applies to China National Standard (CNS) A4 (210X297 mm)