JP4697419B2

JP4697419B2 - Video coding system reduces frame memory access by working memory direct link

Info

Publication number: JP4697419B2
Application number: JP2005252509A
Authority: JP
Inventors: リースケ・ハンノ
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2005-08-31
Filing date: 2005-08-31
Publication date: 2011-06-08
Anticipated expiration: 2025-08-31
Also published as: JP2007067945A

Description

本発明は、映像符号化システムに用いられるハードウェアアーキテクチャに関し、特に、映像符号化システムのより効果的なフレーム・メモリ・アクセスに関する。 The present invention relates to a hardware architecture used in a video encoding system, and more particularly to a more effective frame memory access of the video encoding system.

近年、コンピューターシステムは様々な分野で適用されるようになった。 In recent years, computer systems have been applied in various fields.

しかしながら、これらの新しい様々な分野、例えば映像分野で、高い処理能力と共に、全アプリケーションデータを保持できる記憶領域の要求も増加した。 However, in these new various fields, such as the video field, the demand for storage areas capable of holding all application data as well as high processing capacity has increased.

この要求条件に合うための解決として、全体のメモリを階層レベルの違いによって分けた。 As a solution to meet this requirement, the entire memory was divided according to the level of the hierarchy.

この階層レベルの違いによって分けられたメモリは、一方は演算ユニットの近くに構成されていて速くアクセスできるが容量が小さいメモリであり、もう一方は演算ユニットから遠くに構成されていて容量が大きいがアクセス時間が遅いメモリである。 The memory divided according to the difference in the hierarchical level is one that is configured near the arithmetic unit and can be accessed quickly but has a small capacity, while the other is configured far from the arithmetic unit and has a large capacity. Memory with slow access time.

このような種類のシステムは映像符号化システムであり、図１にその構成を示す。 This type of system is a video encoding system, and its configuration is shown in FIG.

図1に示される映像符号化システムには、異なる2種類のメモリが構成されている。一方のメモリは、アクセスが速いオンチップ作業メモリで、プロセッサユニットの演算ユニットの近くに構成されている。これに対して、もう一方のオフチップ・フレーム・メモリは大容量データメモリとして用いられている。 In the video encoding system shown in FIG. 1, two different types of memories are configured. One memory is an on-chip working memory that can be accessed quickly, and is configured near the arithmetic unit of the processor unit. On the other hand, the other off-chip frame memory is used as a large capacity data memory.

映像符号化システムにおいて、映像データの大部分はフレームメモリで保持されるのに対し、すぐに処理される映像データの一部は作業メモリへ転送される。 In the video encoding system, most of the video data is held in the frame memory, while a part of the video data to be processed immediately is transferred to the working memory.

これは、処理実行中に演算ユニットで生成される大量の一時データが、アクセスが速い作業メモリに格納され、アクセスが遅いフレームメモリに格納される必要がないので、しばしば処理時間の減少に繋がっているからである。 This is because a large amount of temporary data generated by the arithmetic unit during processing is stored in a work memory with fast access and does not need to be stored in a frame memory with slow access, which often leads to a reduction in processing time. Because.

もちろん、常に作業メモリのデータは新しく生成されるデータと交換されなければならず、そのために演算ユニットでの処理を停止しなければならない。 Of course, the data in the working memory must always be exchanged with newly generated data, and for this purpose the processing in the arithmetic unit must be stopped.

多種にわたる適用分野において、マルチポートの作業メモリを構成することによってスピードアップすることができる。又は図2に示されるように、フレームメモリから転送される外部データのための作業メモリと演算ユニットからアクセスされる内部データのための作業メモリとの異なる２つの作業メモリを構成させることでデータアクセスの速度を更に上げることは可能である。このような場合、データ転送およびデータ処理は並列的に実行することができる。 In a wide variety of applications, speed can be increased by configuring a multi-port working memory. Alternatively, as shown in FIG. 2, data access is made by configuring two different working memories, one for external data transferred from the frame memory and the other for internal data accessed from the arithmetic unit. It is possible to further increase the speed. In such a case, data transfer and data processing can be performed in parallel.

全ての処理装置が全ての作業メモリにアクセスしてデータを提供するために、経路はスイッチを通じて送られる。 The path is routed through a switch in order for all processing units to access all working memory and provide data.

図2に示される実施形態において、2つの状態を切り替えるスイッチは必要であり、一方の状態においてフレームメモリはデータを作業メモリ0に転送し、演算ユニットは作業メモリ1にアクセスしている。そして、もう一方の状態はその逆である。 In the embodiment shown in FIG. 2, a switch for switching between the two states is necessary. In one state, the frame memory transfers data to the working memory 0, and the arithmetic unit accesses the working memory 1. The other state is the opposite.

しかしながら、多くのアルゴリズムにおいて、一時データは再び直接的に使用されるのではなく、むしろ後から使われる。 However, in many algorithms, temporary data is not used directly again, but rather later.

一方、メモリスイッチが構成される位置を変更した場合、この一時データは演算ユニットからアクセスされることができない。 On the other hand, when the position where the memory switch is configured is changed, this temporary data cannot be accessed from the arithmetic unit.

図3はデータが両方の作業メモリ本体の間で転送されなければならない実施例のアルゴリズムのタスクスケジュール操作を示し、上記問題点を解決するために、データはフレームメモリに保存されなければならなくて、後で回復しなければならない。 FIG. 3 shows the task scheduling operation of the example algorithm where data must be transferred between both working memory bodies, and in order to solve the above problem, the data must be stored in frame memory. Must recover later.

まず第1に、演算ユニットは、その処理の実行を一度中断しなければならない。 First of all, the arithmetic unit must interrupt the execution of its processing once.

第２に、スイッチの位置を変更しなければならず、データは作業メモリからフレームメモリに転送されなければならない。 Secondly, the switch position must be changed and data must be transferred from the working memory to the frame memory.

第３に、メモリスイッチの位置は、演算ユニットによって次に使われる他の作業メモリにデータをフレームメモリからロードするために、そのステート後部を変えなければならない。 Thirdly, the position of the memory switch must change its state back in order to load data from the frame memory into the other working memory that will be used next by the arithmetic unit.

最後に、演算ユニットの割り込みは取られ、そして、両ユニットはもう一度メモリスイッチの位置を変えることによって作業メモリ本体を変えている。 Finally, the interrupt of the arithmetic unit is taken, and both units are changing the working memory body by changing the position of the memory switch once more.

フレームメモリに対する遅いアクセス時間のため、演算ユニット・プロセス実行が停止すると共に、多くのアプリケーションのために、これは全体の処理時間の有効部分をすることができる。 Due to the slow access time to the frame memory, the compute unit process execution stops and for many applications this can be a useful part of the overall processing time.

Srinivasan et al., In-loop deblockingfilter・ US patent US 2005/0013494 A1, Jan. 2005.Srinivasan et al., In-loop deblockingfilter ・ US patent US 2005/0013494 A1, Jan. 2005. Gomila et al., deblocking filterconditioned on pixel brightness・ US patent US 2003/0206664 A1, Nov. 2003.Gomila et al., Deblocking filterconditioned on pixel brightness ・ US patent US 2003/0206664 A1, Nov. 2003. Sankaran, loop deblocking filteringof block coded video in a very long instruction word processor・US patent US 2005/0117653 A1, Jun. 2005.Sankaran, loop deblocking filtering of block coded video in a very long instruction word processor ・ US patent US 2005/0117653 A1, Jun. 2005. Draft ITU-T Recommendation and Final DraftInternational Standard of Joint Video Specification (ITU-T Rec. H.264/ISO/IEC14496-10 AVC), Mar. 2003.Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264 / ISO / IEC14496-10 AVC), Mar. 2003. T. Wiegand, G. J. Sullivan,G. Bjntegaard, and A. Luthra, overview of the H.264/AVC video coding standard,・IEEETrans. Circuits Syst.Video Technol.,vol. 13, pp. 560・76, July 2003.T. Wiegand, GJ Sullivan, G. Bjntegaard, and A. Luthra, overview of the H.264 / AVC video coding standard, ・ IEEETrans. Circuits Syst.Video Technol., Vol. 13, pp. 560 ・ 76, July 2003 . P. List, A. Joch, J. Lainema, G. Bjntegaard, and M. Karczewicz, adaptive deblockingfilter,・IEEE Trans. Circuits Syst.Video Technol., vol. 13, pp. 614・19,July 2003.P. List, A. Joch, J. Lainema, G. Bjntegaard, and M. Karczewicz, adaptive deblockingfilter, IEEE Trans. Circuits Syst. Video Technol., Vol. 13, pp. 614-19, July 2003. Miao Sima, YuanhuaZhou, and Wei Zhang, an Efficient Architecture forAdaptive Deblocking Filter of H.264/AVC Video Coding,・IEEETransactions on Consumer Electronics, Vol. 50, No. 1, FEBRUARY 2004.Miao Sima, YuanhuaZhou, and Wei Zhang, an Efficient Architecture for Adaptive Deblocking Filter of H.264 / AVC Video Coding, IEEETransactions on Consumer Electronics, Vol. 50, No. 1, FEBRUARY 2004. Yu-Wen Huang, To-Wei Chen, Bing-Yu Hsieh, Tu-ChihWang, Te-Hao Chang, and Liang-GeeChen, architecture Design For Deblocking Filter inH.264/JVT/AVC・Multimedia and Expo, 2003. ICME '03.Proceedings. 2003 International Conference on, vol. 1, pp. 693・96,6-9 July 2003.Yu-Wen Huang, To-Wei Chen, Bing-Yu Hsieh, Tu-ChihWang, Te-Hao Chang, and Liang-GeeChen, architecture Design For Deblocking Filter in H.264 / JVT / AVC ・ Multimedia and Expo, 2003. ICME '' 03.Proceedings. 2003 International Conference on, vol. 1, pp. 693 ・ 96,6-9 July 2003. G. Q. Zheng and L. Yu, anefficient architecture design fordeblocking loopfilter・ Picture Coding Symposium, 2004.G. Q. Zheng and L. Yu, an efficient architecture design for deblocking loopfilter ・ Picture Coding Symposium, 2004. Vivek Venkatraman, Shoba Krishnan, and Nam Ling, architecture for De-BlockingFilter in H.264,・presented and published in Proceedings ofthe Picture Coding Symposium 2004 (PCS), San Francisco, California, USA,December 15 ・17, 2004.Vivek Venkatraman, Shoba Krishnan, and Nam Ling, architecture for De-BlockingFilter in H.264, ・ presented and published in Proceedings ofthe Picture Coding Symposium 2004 (PCS), San Francisco, California, USA, December 15 ・ 17, 2004. S.-C. Chang, W.-H. Peng,S.-H. Wang and T. Chiang, "A Platform based Bus-interleaved Architecturefor Deblocking Filter in H.264/MPEG-4 AVC," IEEETrans. on Consumer Electronics, Feb. 2005.S.-C. Chang, W.-H. Peng, S.-H. Wang and T. Chiang, "A Platform based Bus-interleaved Architecture for Deblocking Filter in H.264 / MPEG-4 AVC," IEEETrans. On Consumer Electronics, Feb. 2005.

図2に示された従来技術では、データがアクセスの遅いフレームメモリに一度コピーされなければならず、これによって生じる2個の作業メモリ間の遅いデータ転送である。 In the prior art shown in FIG. 2, the data must be copied once to a slow-access frame memory, resulting in a slow data transfer between two working memories.

図3は、従来技術におけるアーキテクチャ上の実施例アプリケーションのタスクスケジュール操作を示す。 FIG. 3 illustrates task scheduling operations of an example embodiment application in the prior art.

図3に示すように、データを２個の作業メモリの間で転送するためには、まず一方の作業メモリからフレームメモリにデータを格納し、次にフレームメモリからデータをもう一方の作業メモリにロードしなければならない。このように、2回のデータ転送はアクセスが遅いフレームメモリを介して行わなければならない。 As shown in FIG. 3, in order to transfer data between two working memories, the data is first stored from one working memory to the frame memory, and then the data from the frame memory to the other working memory. Must be loaded. As described above, the two data transfers must be performed through the frame memory having a slow access.

本発明は、上記課題を解決するためになされたものである。 The present invention has been made to solve the above problems.

上記課題を解決するための第１の発明は、映像符号化システムであって、
映像データを格納するオフチップ・メモリと、
一時的に前記映像データを格納する複数のオンチップ・メモリと、
前記オンチップ・メモリの映像データを演算処理するデブロッキング・フィルタ・ユニットと、
前記オフチップ・メモリと前記デブロッキング・フィルタ・ユニットと前記オンチップ・メモリとの接続を切り替え、且つ前記オンチップ・メモリ間の直接接続が確立された切替装置と
を有し、
オンチップ・メモリの映像データが前記デブロッキング・フィルタ・ユニットによって処理された後に他のオンチップ・メモリに格納される際、これらのオンチップ・メモリ間の転送を前記切替装置に確立された直接接続を用いて行い、
前記オンチップ・メモリのマクロ・ブロックをデブロッキングフィルタリングするために、所定のマクロ・ブロックに格納されている映像データが他のオンチップ・メモリに、前記切替装置に確立された直接接続を用いて複製される
ことを特徴とする映像化符号化システム。 A first invention for solving the above problem is a video encoding system,
Off-chip memory for storing video data;
A plurality of on-chip memory for temporarily storing the image data,
A deblocking filter unit for computing video data of the on-chip memory ;
A switching device that switches connection between the off-chip memory, the deblocking filter unit, and the on-chip memory , and a direct connection between the on-chip memories is established ;
When video data in the on-chip memory is stored in another on-chip memory after being processed by the deblocking filter unit, the transfer between these on-chip memories is directly established in the switching device. Done with a connection,
In order to deblock filter the macro block of the on-chip memory, video data stored in a predetermined macro block is transferred to another on-chip memory using the direct connection established in the switching device. replicated imaging encoding system according to claim Rukoto.

フレームメモリに遅いアクセス・タイムによって生じる上述した課題は、請求項１に記載されている方法によって除去されることができる。 The above-mentioned problems caused by slow access times in the frame memory can be eliminated by the method as claimed in claim 1.

2つの作業メモリを有する実施例として、図2からの基本的なアーキテクチャは、この方法によって拡大された。 As an example with two working memories, the basic architecture from FIG. 2 has been expanded in this way.

結果は、図4に示される。 The results are shown in FIG.

ここで、スイッチは、両方の作業メモリ存在物間の直接のデータ接続を可能にする付加的なデータリンクによって拡大される。 Here, the switch is extended by an additional data link that allows a direct data connection between both working memory entities.

また、図5から分かるように、作業メモリ間で転送されるべきデータのデータ転送はフレームメモリへのアクセス時間の遅れに依存せず、作業メモリへのアクセス時間のみである。 Further, as can be seen from FIG. 5, the data transfer of data to be transferred between the work memories does not depend on the delay of the access time to the frame memory, but only the access time to the work memory.

請求項２に記載の実施の形態のために、H.264映像符号化のためのデブロッキング・フィルタは、図6に示すように選択された。 For the embodiment as claimed in claim 2, the deblocking filter for H.264 video coding was selected as shown in FIG.

ここで、デブロッキング・フィルタ・アクセラレータ内のデブロッキング・フィルタおよびフレームメモリ内部からのデータ接続は、2つの作業メモリへの切り替え全体に導かれる。 Here, the deblocking filter in the deblocking filter accelerator and the data connection from inside the frame memory are guided to the whole switching to the two working memories.

デブロッキング・フィルタのマクロ・ブロックのためのアルゴリズムが選ばれるときに、作業メモリのサイズは一度に1マクロ・ブロック上のフィルタ動作を処理するのに必要な全てのデータが1つの作業メモリのあるその方法で選択されることができる。 When an algorithm for a deblocking filter macro block is chosen, the size of the working memory is one working memory with all the data needed to process the filter operation on one macro block at a time It can be selected in that way.

しかしながら、マクロ・ブロックがアルゴリズムに基づいた結果、線を挟んで隣接された2マクロ・ブロックのデータ一部は重なり合う。 However, as a result of the macro block being based on the algorithm, part of the data of two macro blocks adjacent to each other across the line overlap.

この新規なスイッチで、両方の作業メモリ存在物の間で直接この重なり合うデータを転送することが可能である。 With this new switch it is possible to transfer this overlapping data directly between both working memory entities.

これは、フレームメモリがデータ転送プロセスに含まれるデータ転送と同等の全部のフィルタ方法の速度を上げる。 This speeds up all filtering methods equivalent to data transfer where the frame memory is included in the data transfer process.

本発明を用いると、第1にメモリ間の転送回数を2から１に減らすことができ、第2に転送時間を減らすことが出来る。これは、アクセスの遅いフレームメモリに転送する必要が無いからである。 Using the present invention, first, the number of transfers between memories can be reduced from 2 to 1, and secondly, the transfer time can be reduced. This is because there is no need to transfer to a slow-access frame memory.

第3に、オフチップであるフレームメモリに接続する外部バス上で必要となるデータ・バンド幅を下げることが出来る。 Third, the data bandwidth required on the external bus connected to the off-chip frame memory can be reduced.

本発明の課題は、図2に示された従来技術では、データがアクセスの遅いフレームメモリに一度コピーされなければならず、これによって生じる2個の作業メモリ間の遅いデータ転送である。 The problem of the present invention is that in the prior art shown in FIG. 2, data must be copied once to a slow-access frame memory, resulting in a slow data transfer between two working memories.

図3に示すように、データを両方の作業メモリの間で転送するためには、まず一方の作業メモリからフレームメモリにデータを格納し、次にフレームメモリからデータをもう一方の作業メモリにロードしなければならない。このように、2回のデータ転送はアクセスが遅いフレームメモリを介して行わなければならない。 As shown in Figure 3, to transfer data between both working memories, first store the data from one working memory into the frame memory, then load the data from the frame memory into the other working memory Must. As described above, the two data transfers must be performed through the frame memory having a slow access.

本発明を用いると、上記問題を解決することが出来る。 By using the present invention, the above problem can be solved.

上記問題を解決するために、図4に示すように、２個の作業メモリ間の直接リンクが確立されるために、メモリスイッチ内部のマルチプレクサを拡大しなければならない。 To solve the above problem, the multiplexer inside the memory switch must be expanded in order to establish a direct link between the two working memories, as shown in FIG.

スイッチ内部にこの新規なデータリンクを実装させると、２個の作業メモリの間で転送されなければならないデータはそれらの間で直接転送されることができて、フレームメモリを通過する必要はない。 With this new data link implemented inside the switch, the data that must be transferred between the two working memories can be transferred directly between them, without having to go through the frame memory.

これにより、まず第1にメモリ間の転送回数を2から１に減らすことができ、第2に転送時間を減らすことが出来る。これは、アクセスの遅いフレームメモリに転送する必要が無いからである。第3に、オフチップであるフレームメモリに接続する外部バス上で必要となるデータ・バンド幅を下げることが出来る。 As a result, first, the number of transfers between memories can be reduced from 2 to 1, and secondly, the transfer time can be reduced. This is because there is no need to transfer to a slow-access frame memory. Third, the data bandwidth required on the external bus connected to the off-chip frame memory can be reduced.

図4に示される実施の形態における新規なタスクスケジュール操作を図5に示す。 FIG. 5 shows a new task schedule operation in the embodiment shown in FIG.

演算ユニットが停止している時間は、2回のフレームメモリの転送時間から作業メモリ間の直接転送の時間を引いた時間だけ減少することができる。 The time during which the arithmetic unit is stopped can be reduced by a time obtained by subtracting the time of direct transfer between the working memories from the transfer time of the frame memory twice.

他の実施例として、デブロッキング・フィルタを搭載した構成について説明する。 As another embodiment, a configuration equipped with a deblocking filter will be described.

特許文献1、2および3においては、他のデブロッキング・アルゴリズムがあるのに対して、図6に示される構成はH.264映像符号化の非特許文献１、２および３のデブロッキング・フィルタを参照する。 In Patent Documents 1, 2, and 3, there are other deblocking algorithms, whereas the configuration shown in FIG. 6 is the deblocking filter of Non-Patent Documents 1, 2, and 3 for H.264 video coding. Refer to

フレーム・バッファおよびより大きいシステム待ち時間にフィルタをかけて基礎を形成されるフレームのために要求する非特許文献４に記載されているアーキテクチャとは対照的に、作業メモリは一度に全ての必要なデータをプロセス一マクロ・ブロックに保持するサイズのみを有しなければならないので、このデブロッキング・フィルタ・アクセラレータはマクロ・ブロックが基礎を形成したデブロッキング・フィルタを実行している。 In contrast to the architecture described in [4], which requires a frame buffer and larger system latency to filter the underlying frame, working memory is all necessary at one time. The deblocking filter accelerator implements a deblocking filter on which the macroblock is based, since it must only have a size that holds the data in a process-one macroblock.

その他の実施例として、非特許文献５、６および７のどれかに記載されているように、同時にデータ転送および処理を実行することができるマルチポートの作業メモリを用いている、または非特許文献８のように、他でバッファリングすることなくフレームメモリから演算ユニットまで直接データをロードすることによって、作業メモリの上のルーズタスクカップリングの代わりにタイトタスクカップリングを使用する。 As another embodiment, as described in any one of Non-Patent Documents 5, 6 and 7, a multi-port working memory capable of simultaneously executing data transfer and processing is used, or Non-Patent Document Use tight task coupling instead of loose task coupling on the working memory by loading data directly from the frame memory to the arithmetic unit without buffering elsewhere, as in FIG.

しかしながら、これは、アクセスが遅いフレームメモリの処理時間の高い依存を犠牲にして行われる。 However, this is done at the expense of the high processing time dependence of the slow-access frame memory.

ここで記載されている構成は、マルチポートの作業メモリとは対照的に使用して、2つのシングルポートの作業メモリに存在してフレーム・メモリ・アクセスを減らす。 The configuration described here is used in contrast to multi-port working memory to reduce frame memory access by being present in two single-port working memories.

これは、一方の作業メモリに格納されているデータを処理することと、前もって処理されたマクロ・ブロックのデータを格納するためにフレームメモリから次のマクロ・ブロックをロードするもう一個の作業メモリを使用することによって、データ転送とデータ処理とを同時に実行することを可能にするために行われる。 This is done by processing the data stored in one working memory and another working memory that loads the next macro block from the frame memory to store the data of the previously processed macro block. By using it, this is done to allow data transfer and data processing to be performed simultaneously.

デブロッキング・フィルタを処理するために、演算ユニットにおいて次に処理されるマクロ・ブロックのイメージデータの次に隣接したマクロ・ブロックからの周囲のデータが、必要である。 In order to process the deblocking filter, the surrounding data from the next adjacent macro block of the next macro block image data to be processed in the arithmetic unit is required.

これらは、上部の、そして、左のマクロ・ブロックからのデータである。 These are data from the top and left macroblocks.

必要とされた上部のマクロ・ブロックデータがすでにフレームメモリにあって、そこからプレインストールされることができるのに対して、左のマクロ・ブロックからの必要とされたデータはフレームメモリにおいて利用できないが他の作業メモリにおいて新しく算出されて、デブロッキング・フィルタ・ユニットのフィルタ算出と平行してプレインストールされることができない。 The required top macro block data is already in the frame memory and can be pre-installed from it, while the required data from the left macro block is not available in the frame memory Cannot be pre-installed in parallel with the filter calculation of the deblocking filter unit.

その代わりに、これらのデータは、前もって処理されたマクロ・ブロックのフィルタ計算をした後に作業メモリ間で転送されなければならない。 Instead, these data must be transferred between working memories after a filter calculation of previously processed macroblocks.

更に異なる実施例を図7に示す。この実施例は請求項３に記載されている。 A further different embodiment is shown in FIG. This embodiment is described in claim 3.

図7おいて、作業メモリ0のマクロ・ブロックデータが処理され、作業メモリ1のマクロ・ブロックデータが次に処理される。 In FIG. 7, the macro block data in the working memory 0 is processed, and the macro block data in the working memory 1 is processed next.

図7の陰影をつけられた領域は、各20ピクセルに20本の線が付いており、16ライン×16ピクセルの次の処理のマクロ・ブロックのデータと、上部のマクロ・ブロックの最後の４ラインから４ライン×16ピクセルのデータと、左側のマクロ・ブロックの最後の４段から１６ライン×４ピクセルのデータとを含む作業メモリで使用されたデータを示す。 The shaded area in Figure 7 has 20 lines for each 20 pixels, the data for the next macroblock of 16 lines x 16 pixels, and the last 4 of the top macroblock. Data used in the working memory including 4 lines × 16 pixels data from the line and 16 lines × 4 pixels data from the last 4 stages of the left macroblock.

使用領域の1/6のサイズのテクスチャ領域は、ここでは重なり合う部分である。テクスチャ領域は、作業メモリ0において新しく算出して、作業メモリ1において使われる。 The texture area having a size 1/6 of the use area is an overlapping portion here. The texture area is newly calculated in the working memory 0 and used in the working memory 1.

スイッチ内部の請求項２の新規な挿入された接続であるため、この領域は、アクセスが速い両方の作業メモリ間で、フレームメモリを通過することのなく直接転送されることができる。 Due to the new inserted connection of claim 2 inside the switch, this area can be transferred directly between both fast-accessing working memories without passing through the frame memory.

これは、データ転送の速度を上げている。第1の理由は、メモリ間の転送動作が半分になっているからである。第２に、転送時間にアクセスが遅いフレームメモリを含まない。 This increases the speed of data transfer. The first reason is that the transfer operation between memories is halved. Second, it does not include a frame memory that has a slow access time.

更に、この計測は全体のシステム性能に良い影響を及ぼす。その理由は、外部バス占領が減少していることにある。 In addition, this measurement has a positive effect on overall system performance. The reason is that the occupation of external buses is decreasing.

図1は、メインプロセッサと、異なるユニットからスイッチを介してアクセスできるシングルポートの作業メモリとを有するアクセラレータとを備える一般的なの映像符号化システムの略図である。FIG. 1 is a schematic diagram of a typical video coding system comprising a main processor and an accelerator having a single-port working memory accessible from different units via a switch. 図2は、メインプロセッサと、各々が異なるユニットからスイッチを介してアクセスできるシングルポートの作業メモリ２個とを有するアクセラレータとを備える一般的なの映像符号化システムの略図である。FIG. 2 is a schematic diagram of a typical video encoding system comprising a main processor and an accelerator having two single-port working memories each accessible via a switch from a different unit. 図3は、図2に示される映像符号化システムが、演算ユニットが実行処理を続ける前に両方の作業メモリ本体間のデータ転送しなければならない場合のタスクスケジュール操作可能性の略図である。FIG. 3 is a schematic diagram of the task schedule operation possibility when the video encoding system shown in FIG. 2 has to transfer data between both work memory bodies before the arithmetic unit continues execution processing. 図4は、メインプロセッサと各々が異なるユニットから作業メモリ同士を接続するスイッチを介してアクセスできるシングルポートの作業メモリ２個とを有するアクセラレータとを備える一般的なの映像符号化システムの略図である。FIG. 4 is a schematic diagram of a typical video encoding system including a main processor and an accelerator having two single-port working memories that can be accessed via switches that connect the working memories from different units. 図5は、図4に示される映像符号化システムが、演算ユニットが実行処理を続ける前に両方の作業メモリ本体間のデータ転送しなければならない場合のタスクスケジュール操作可能性の略図である。FIG. 5 is a schematic diagram of the task schedule operation possibility when the video encoding system shown in FIG. 4 has to transfer data between both work memory bodies before the arithmetic unit continues the execution process. 図6は、メインプロセッサと、作業メモリ同士を接続するスイッチを介してデブロッキング・フィルタ・ユニットからアクセスできるシングルポートの作業メモリ２個を有するデブロッキング・フィルタ・アクセラレータとを有するH.264映像符号化システムの略図である。FIG. 6 shows an H.264 video code having a main processor and a deblocking filter accelerator having two single-port working memories that can be accessed from the deblocking filter unit via a switch that connects the working memories. 1 is a schematic diagram of a system. 図7は、マクロ・ブロックを方法に基づいてデブロッキング・フィルタを実行した時に両方の作業メモリ間の必要なマクロ・ブロックデータ転送を有する請求項２の利点を示す請求項３の略図である。FIG. 7 is a schematic diagram of claim 3 illustrating the advantages of claim 2 having the necessary macro block data transfer between both working memories when performing a deblocking filter based on the method of the macro block.

Explanation of symbols

０作業メモリ
１作業メモリ 0 Working memory 1 Working memory

Claims

A video encoding system,
Off-chip memory for storing video data;
A plurality of on-chip memory for temporarily storing the image data,
A deblocking filter unit for computing video data of the on-chip memory ;
A switching device that switches connection between the off-chip memory, the deblocking filter unit, and the on-chip memory , and a direct connection between the on-chip memories is established ;
When video data in the on-chip memory is stored in another on-chip memory after being processed by the deblocking filter unit, the transfer between these on-chip memories is directly established in the switching device. Done with a connection,
In order to deblock filter the macro block of the on-chip memory, video data stored in a predetermined macro block is transferred to another on-chip memory using the direct connection established in the switching device. replicated imaging encoding system according to claim Rukoto.