JPS63291178A

JPS63291178A - Digital signal processor

Info

Publication number: JPS63291178A
Application number: JP12620787A
Authority: JP
Inventors: Haruyasu Yamada; 山田　晴保; Toshiki Mori; 俊樹森; Kunitoshi Aono; 邦年青野; Masakatsu Maruyama; 征克丸山; Maki Toyokura; 真木豊蔵
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1987-05-22
Filing date: 1987-05-22
Publication date: 1988-11-29

Abstract

PURPOSE:To execute a high speed picture processing by providing a function for simultaneously loading a program in plural memories and processing the respective picture elements of a picture signal in parallel. CONSTITUTION:The titled processor is provided with a local picture shift register 2, the plural picture memories 4-1-4-4 for storing the contents of the shift register 2, plural arithmetic blocks 1-1-1-4 for processing the picture based on the data of the picture memories 4-1-4-4, plural program memories and controllers 5-1-5-4 for controlling the arithmetic blocks 1-1-1-4 and the function for simultaneously loading the program in the plural program memories 5-1-5-4. The respective picture elements of the picture signal are processed in parallel. Since this is the parallel processing for every picture element of the picture signal, a higher instruction such as a condition jump instruction or the like can be executed in a next instruction cycle without a waiting time. Thereby, a more complicate picture processing can be realized at high speed and a real time processing is easily executed.

Description

【発明の詳細な説明】産業上の利用分野本発明は画像信号処理等を高速に並列実行することので
きるディジタル信号処理装置に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a digital signal processing device that can perform image signal processing and the like in parallel at high speed.

従来の技術超Ｌ　Ｓ　Ｉ　（ｌａｒｇｅ　５ｃａｌｅ　ｉｎｔｅｇ
ｒａｔｅｄｃｉｒｃｕｉｔ）の技術により小型で、高速
のプロセサやメモリーが各種の信号処理に使用されてい
る。Conventional technology Ultra LSI (large 5cale integ)
Compact, high-speed processors and memories are used for various signal processing using rated circuit technology.

特に高度の処理のために一般にＤ　Ｓ　Ｐ　（ｄｊ４ｉ
ｔａｌｓｉｇｎａｌ　ｐｒｏｃｅｓｓｏｒ）と呼ばれて
いるプロセサが使用される。これはムＬＵ　（ａｒｉｔ
ｈｍｅｔｉｃ　ｌｏｇｉｃｕｎｉｔ）以外に専用の乗算
器等金有し、データの処理が高速に出来るものである。DSP (dj4i
A processor called a talsignal processor is used. This is mulu (arit
In addition to the hmetic logic unit, it also has a dedicated multiplier, etc., and can process data at high speed.

現在のところ、これらのプロセサを用いて音声帯域の信
号までははソリアルタイムで処理できる。At present, signals in the audio band can be processed in real time using these processors.

ＤＳＰの平均的な命令サイクルは１００ｎｓ　程度であ
る。音声のサンプリングｉ　２０　ＫＨｚとすると、１
サンプリング時間は６０μｓとなるので、この時間内に
処理できる命令回数は５００回となる。この程度の命令
回数が可能であれば大刀の処理はでき、音声の認識２合
成、ディジタル伝送のための各種帯域圧縮等がリアルタ
イム処理できる。The average instruction cycle of a DSP is about 100 ns. If the audio sampling i is 20 KHz, then 1
Since the sampling time is 60 μs, the number of instructions that can be processed within this time is 500. If this number of commands is possible, it will be possible to perform a large number of commands, and it will be possible to perform real-time processing such as speech recognition and synthesis, and various band compressions for digital transmission.

一方、医用、パターン認識等９画像処理の場合金考える
。音声信号ではサンプリングはせいぜい５０　ＫＨｚで
あり、画像信号の場合のサンプリングは１０〜２０　Ｍ
Ｈｚと高い。従って画像処理ｋ　ＩＪアルタイムで実行
する場合、音声信号の処理に比べて２桁以上の処理スピ
ードが必要とされる。例えばビデオ信号が１０ＭＨｚの
サンプリングであるとすると、音声信号の場合よりも処
理数が少ないとしても、このサンプリング時間内で１０
０命令以上の処理が必要とされる。すなわち命令のサイ
クルタイムが１ｎ８以下でないとリアルタイム処理がで
きないことになる。On the other hand, in the case of 9 image processing such as medical use and pattern recognition, money is considered. For audio signals the sampling is at most 50 KHz, for image signals the sampling is 10-20 M
High as Hz. Therefore, when performing image processing in real time, a processing speed of two orders of magnitude or more is required compared to audio signal processing. For example, if a video signal is sampled at 10 MHz, even if the number of processing steps is less than that for an audio signal, 10 MHz will be processed within this sampling time.
Processing of 0 or more instructions is required. In other words, real-time processing cannot be performed unless the instruction cycle time is 1n8 or less.

これ全実現する方法として、デバイスの性能を向上する
ことが考えられる。現在のＤＳＰはＭＯ８形ＬＳＩで構
成されているので、このＤＳＰ−ｉバイポーラ形ＬＳＩ
にすればスピードを早くすることができる。すでに特願
昭第５９−１１８４８４号に見られるシステムが提案さ
れている。これは高速化のために本来のＤＳＰの構成の
他に画像の局所データを常時入力できるレジスタを具備
し、かつ高速のプログラム実行のためにプログラムメモ
リも具備したものである。このシステムをバイポーラ形
ＬＳＩにすればこれまでのｌｌ５Ｐの１桁程度の高速性
能が達成できる。One possible way to achieve all of this is to improve device performance. Since the current DSP is composed of MO8 type LSI, this DSP-i bipolar type LSI
You can increase the speed by doing this. A system as seen in Japanese Patent Application No. 59-118484 has already been proposed. In addition to the original DSP configuration, this is equipped with a register to which local image data can be input at any time to increase speed, and is also equipped with a program memory for high-speed program execution. If this system is made into a bipolar LSI, it will be possible to achieve a high-speed performance that is about one order of magnitude higher than that of the conventional 115P.

発明が解決しようとする問題点しかしながら上記システムでは従来のＤＳＰの１桁程度
の高速性にしかならない。バイポーラ形ＬＳＩといえど
もデバイススピードで限界になる。Problems to be Solved by the Invention However, the above system is only about one order of magnitude faster than conventional DSPs. Even bipolar LSIs are limited by device speed.

又このシステムでは新たなシステムの構成でスピードを
アップしているがそれにも限界がある。こうした従来の
欠点に鑑み、本発明は、画像データの画素ごとの並列信
号処理により、より高度のリアルタイム画像信号処理全
可能とするディジタル信号処理装置を提供するものであ
る。Also, although this system has been speeded up with a new system configuration, there are limits to this as well. In view of these conventional drawbacks, the present invention provides a digital signal processing device that enables more advanced real-time image signal processing through parallel signal processing of image data for each pixel.

問題点全解決するための手段局所画像シフトレジスタとこのシフトレジスタの内容を
ストアする複数個の画像メモリと、この画像メモリのデ
ータ全便い画像処理を実行する複数個の演算ブロックと
、これらの演算ブロックをコントロールする複数個のプ
ログラムメモリ及びコントローラと、これらの複数個の
メモリに同時にプログラム全ロードする機能を具備し、
画像信号の各画素を並列に処理全行うことで高速の画像
処理を実行できるディジタル信号処理装置を提供するも
のである。Means for solving all problems A local image shift register, a plurality of image memories that store the contents of this shift register, a plurality of operation blocks that perform image processing using all the data in this image memory, and these operations. Equipped with multiple program memories and controllers that control blocks, and a function to load all programs into these multiple memories at the same time.
The present invention provides a digital signal processing device that can perform high-speed image processing by processing each pixel of an image signal in parallel.

作用本構成によれば、デバイスの高速限界以上の高速動作が
可能となる。従来の並列処理やパイプライン処理と比較
して、画像信号の画素ごとの並列処理であるため、条件
ジャンプ命令などのより高度の命令などの実行も待ち時
間なしに次の命令サイクルで実行できる。また各画素の
処理は同一であるため、プログラムメモリヘロードする
プログラムは同じもので良く１のプログラムローダで簡
単にプログラムを入力できる。According to this configuration, high-speed operation exceeding the high-speed limit of the device is possible. Compared to conventional parallel processing or pipeline processing, since the image signal is processed in parallel for each pixel, more advanced instructions such as conditional jump instructions can be executed in the next instruction cycle without waiting time. Furthermore, since the processing for each pixel is the same, the program to be loaded into the program memory can be the same and can be easily input using a single program loader.

また演算速度は遅いが高集積の半導体集積回路に適した
ＭＯＳデバイスでも、このシステム構成にすれば、速度
の速いバイポーラデバイスを用いた半導体集積回路以上
の高速のシステム金、より低電力で構成することができ
る。In addition, even with MOS devices that have a slow calculation speed but are suitable for highly integrated semiconductor integrated circuits, if this system configuration is used, it is possible to construct a high-speed system that requires less money and power than a semiconductor integrated circuit that uses fast bipolar devices. be able to.

実施例第１図に本発明の実施例を示す。１−１〜１−４は加減
算、論理演算９乗算などの計算を行う演算ブロック、２
は３×３等の局所画像データを屓次シフトするレジスタ
ブロック、３−１．３−２は画像データの１ライン全遅
延するプレイライン、４−１〜４−４は３×３等の局所
画像データを１画素の演算の間ストアしておくメモリ、
６−１〜６−４は命令ヲストアしておくプログラムメモ
リとその制御ブロック、６はプログラムローダ、アは外
部よりデータを入力するデータレジスタ、８は出力全切
替える出力回路である。９は画像データ入力端子、１０
は処理された画像データの出力端子、１１はプログラム
データの入力端子、１２はデータの入力端子である。Embodiment FIG. 1 shows an embodiment of the present invention. 1-1 to 1-4 are calculation blocks that perform calculations such as addition and subtraction, and logical operations 9 and multiplication; 2
is a register block that sequentially shifts local image data such as 3 x 3, 3-1.3-2 is a play line that delays the entire line of image data, and 4-1 to 4-4 are local blocks that are 3 x 3 etc. A memory that stores image data during the calculation of one pixel.
6-1 to 6-4 are program memories for storing instructions and their control blocks, 6 is a program loader, 1 is a data register for inputting data from the outside, and 8 is an output circuit for switching all outputs. 9 is an image data input terminal, 10
1 is an output terminal for processed image data, 11 is an input terminal for program data, and 12 is an input terminal for data.

端子１１に入力された画像処理プログラムは６のプログ
ラムローダのコントロールのもとに５−１〜５−４の中
のプログラムメモリにストアされる。このとき６−１〜
５−４のメモリには同一のプログラムが同時に書き込ま
れる。一方画像データは端子９よシ入力され２の局所画
像シフトレジスタに通される。この実施例では局所画像
データ３×３として４画素を並列に処理するため２の局
所画像レジスタは第２図の構成となる。２１は１個のレ
ジスタで６段縦続接続されて構成さｎたシフトレジスタ
全通して３−１の１ラインシフトレジスタに出力される
。画像の横方向画素ヲ５１２とすると３−１の１ライン
シフトレジスタの段数は５１２−６＝５０６段で良い。The image processing program input to the terminal 11 is stored in the program memories 5-1 to 5-4 under the control of the program loader 6. At this time 6-1~
The same program is simultaneously written into the memories 5-4. On the other hand, image data is input through terminal 9 and passed through local image shift register 2. In this embodiment, since four pixels are processed in parallel as local image data 3×3, two local image registers have the configuration shown in FIG. Numeral 21 is one register, which is connected in cascade in six stages, and is outputted to a one-line shift register 3-1 through all n shift registers. If the number of pixels in the horizontal direction of the image is 512, the number of stages of the 1-line shift register 3-1 may be 512-6=506 stages.

この３−１の１ラインシフトレジスタの出力は２の局所
画像シフトレジスタの２段目のシフトレジスタに入力さ
れ、同様にして６段のシフトレジスタを通して３−２の
１ラインシフトレジスタに出力すれる。３−２の１ライ
ンシフトレジスタも３−１の１ラインシフトレジスタと
同じ段数のシフトレジスタでコノ出力は局所画像シフト
レジスタ２の３段目のシフトレジスタに入力される。こ
の１８個の各レジスタの出力を組み合せると第３図に示
す３×３局所画像４枚分のデータが得られる。このデー
タは４−１〜４−４の画像メモリに送られる。なお画素
データのビット数全８ビツトとするとこれらのシフトレ
ジスタの深さ方向の数は８個となる。The output of this 1-line shift register 3-1 is input to the second-stage shift register of the local image shift register 2, and is similarly output to the 1-line shift register 3-2 through the 6-stage shift register. . The 1-line shift register 3-2 has the same number of stages as the 1-line shift register 3-1, and the output is input to the third stage shift register of the local image shift register 2. By combining the outputs of these 18 registers, data for four 3×3 local images shown in FIG. 3 is obtained. This data is sent to image memories 4-1 to 4-4. Note that if the total number of bits of pixel data is 8 bits, the number of these shift registers in the depth direction is eight.

画像メモリ４−１〜４−４にストアされたデータはプロ
グラムにより選択されて１−１〜１−４の演算ブロック
に出力される。演算ブロックは人ＬＵ（ムｒｉｔｈｍａ
ｔｉｃ　Ｌｏｇｉｃ　Ｕｎｉｔ　）や乗算器、レジスタ
、データメモリ等で構成され、画素毎のエツジ検出や、
ノイズ除去などの処理が６−１〜５−２内のメモリにス
トアされているプログラムに従って実行される。ジャン
プなどの命令は演算ブロック１−１〜１−４の演算結果
全コントローラ部６−１〜５−４でチェックし命令を実
行する。The data stored in the image memories 4-1 to 4-4 are selected by the program and output to the calculation blocks 1-1 to 1-4. The calculation block is human LU (Mrithma)
It consists of a tic Logic Unit), a multiplier, a register, a data memory, etc., and performs edge detection for each pixel,
Processes such as noise removal are executed according to programs stored in the memories in 6-1 to 5-2. For commands such as jump, the calculation results of calculation blocks 1-1 to 1-4 are checked by all controller units 6-1 to 5-4, and the command is executed.

演算に使用する外部からのデータ入力は端子１２から入
力され７のデータレジスタを介して各演算ブロックに入
力される。処理された画像データは出力回路８で制御さ
れて順番に端子１ｏより出力される。External data input used for calculations is input from terminal 12 and input to each calculation block via 7 data registers. The processed image data is controlled by the output circuit 8 and output from the terminal 1o in order.

この実施例では４段の並列処理について説明したが必要
な速度に応じて変更できることは言うまでもない。又画
像データの画素数も１ラインシフトレジスタ（３−１〜
３−２）の段数を制御すれば良く、局所画像の大きさも
６×６など大きくできることは言うまでもない。In this embodiment, four stages of parallel processing have been described, but it goes without saying that this can be changed depending on the required speed. Also, the number of pixels of image data is determined by one line shift register (3-1~
Needless to say, the number of stages in step 3-2) can be controlled, and the size of the local image can also be made larger, such as 6×6.

又各演算ブロックの演算プログラムは実施例ではすべて
同じものとしたが５−１〜５−４内の個々のメモリにス
トアするプログラムを変更すれば、別々の画像処理金す
ることもできる。Further, although the calculation programs for each calculation block are all the same in the embodiment, it is also possible to perform separate image processing by changing the programs stored in the individual memories in 5-1 to 5-4.

この実施例では画像メモリは４−１〜４−４に分割して
構成しているが、４個の３×３局所画像のデータは大部
分同じデータとなるので共用することができる。第４図
に画像データを共用したメモリ構成の例を示す。４１−
１．４１−２はデュアルポートメモリ、４２−１〜４２
−４はメモリアドレスのデコーダである。第１図の４ブ
ロツクの画像メモリば２ブロツクとなり、１個の画像メ
モリは４×３の局所画像データをストアする。デコーダ
４２−１では第１図の画像メモリ４−１に相当する３×
３の画像データを出方し、デコーダ４２−２は画像メモ
リ４−２に相当する３×３の画像データ全出力する。画
像メモ１Ｊ４１−２についても同様である。このデュア
ルポートメモリにすればメモリ数は３Ｘ３Ｘ４＝３６個
から４×３Ｘ２＝２４個にメモリ数全削減できる。In this embodiment, the image memory is divided into 4-1 to 4-4, but most of the data for the four 3×3 local images is the same, so they can be shared. FIG. 4 shows an example of a memory configuration in which image data is shared. 41-
1.41-2 is dual port memory, 42-1 to 42
-4 is a memory address decoder. The four blocks of image memory in FIG. 1 become two blocks, and one image memory stores 4×3 local image data. In the decoder 42-1, 3× corresponding to the image memory 4-1 in FIG.
The decoder 42-2 outputs all 3×3 image data corresponding to the image memory 4-2. The same applies to image memo 1J41-2. By using this dual port memory, the total number of memories can be reduced from 3×3×4=36 to 4×3×2=24.

発明の効果本発明によれば、従来おこなわれていた並列演算やパイ
プライン処理と異り、各画素ごとの並列画像処理である
ため、複雑な画像処理でも高速に行うことができる。並
列の段数が多くなればなるだけ高速化が可能で、リアル
タイム処理も容易となる。Effects of the Invention According to the present invention, unlike conventional parallel calculations and pipeline processing, parallel image processing is performed for each pixel, so even complex image processing can be performed at high speed. The higher the number of parallel stages, the higher the speed and the easier real-time processing.

また高速の画像処理にはバイポーラデバイスが用いられ
るが、この並列システムであればＭＯＳデバイスでも良
くなり、高速、低消費電力の半導体集積回路が実現でき
る。Although bipolar devices are used for high-speed image processing, MOS devices can also be used in this parallel system, making it possible to realize high-speed, low-power semiconductor integrated circuits.

[Brief explanation of the drawing]

第１図は本発明のディジタル信号処理装置のブロック図
、第２図は局所画像シフトレジスタの詳細な構成図、第
３図は局所画像領域の説明図、第４図は画像メモリの他
の構成図である。１・・・・・・演算ブロック、２・・・・・・局所画像
シフトレジスタ、４・・・・・・画像メモリ、５・・・
・・・プログラムメモリ及びコントローラ。FIG. 1 is a block diagram of the digital signal processing device of the present invention, FIG. 2 is a detailed configuration diagram of the local image shift register, FIG. 3 is an explanatory diagram of the local image area, and FIG. 4 is another configuration of the image memory. It is a diagram. 1... Arithmetic block, 2... Local image shift register, 4... Image memory, 5...
...Program memory and controller.

Claims

[Claims]

a local image shift register; a plurality of image memories that store the contents of the shift register; a plurality of calculation blocks that perform image processing based on data in the image memory; a plurality of program memories that control the calculation blocks; A digital signal processing device comprising a controller and a function of simultaneously loading a program into the plurality of program memories, and processing each pixel of an image signal in parallel.