CN202084032U - IP (Internet Protocol) core based on two-dimensional (2D) IDCT (Inverse Discrete Cosine Transformation) distributed algorithm of SOPC (System on Programmable Chip) technology - Google Patents

IP (Internet Protocol) core based on two-dimensional (2D) IDCT (Inverse Discrete Cosine Transformation) distributed algorithm of SOPC (System on Programmable Chip) technology Download PDF

Info

Publication number
CN202084032U
CN202084032U CN 201120080618 CN201120080618U CN202084032U CN 202084032 U CN202084032 U CN 202084032U CN 201120080618 CN201120080618 CN 201120080618 CN 201120080618 U CN201120080618 U CN 201120080618U CN 202084032 U CN202084032 U CN 202084032U
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
module
idct
data
output
sopc
Prior art date
Application number
CN 201120080618
Other languages
Chinese (zh)
Inventor
付扬
邓超
郭培源
Original Assignee
北京工商大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Abstract

The utility model discloses an IP (Internet Protocol) core based on a two-dimensional (2D) IDCT (Inverse Discrete Cosine Transformation) distributed algorithm of an SOPC (System on Programmable Chip) technology. The IP core device comprises an Avalon bus reading module, a controller module, a 2D IDCT module, an Avalon bus writing module, an Avalon bus and a control register, wherein the control register is used for writing in control data; the control module is used for controlling the Avalon bus reading module to read an operation address in the control register, and reading data to be processed in an input cache through the Avalon bus, and a result is written back to the original address through the Avalon bus writing module after the data to be processed is processed through the 2D IDCT module; a hardware module has high working speed, and working frequency of a system is reduced while high-quality decoding is realized, so that power consumption is reduced; a software module has good flexibility and expandability, so that a decoder has very good compatibility; and as calculated amount is mainly concentrated on a hardware accelerator, calculation load of a CPU (Central Processing Unit) is reduced, therefore, the CPU can be used for supporting more upper applications in consideration of decoding speed, function, flexibility, cost and development cycle.

Description

基于SOPC技术二维IDCT分布式算法的IP核 IP core based on SOPC technology two-dimensional IDCT Distributed Algorithms

技术领域 FIELD

[0001] 本实用新型涉及图像解码技术领域,特别涉及基于SOPC技术二维IDCT分布式算法的IP核。 [0001] The present invention relates to an image decoding technology, and particularly relates to a two-dimensional IDCT SOPC technology based distributed algorithm IP core.

背景技术 Background technique

[0002] 近年来,随着半导体技术的飞速发展,现代高密度器件FPGA (Field-Programmable Gate Array)的设计性能及性价比已完全能够与ASIC抗衡。 [0002] In recent years, with the rapid development of semiconductor technology, modern high-density devices FPGA (Field-Programmable Gate Array) and the cost of design performance has been fully able to compete with the ASIC. 在这样的背景下,美国Altera 公司与2000年提出了片上可编程系统SOPC (System on a Programmable Chip)新技术,并同时推出了相应的开发软件Quartus II。 In this context, the United States in 2000 Altera Corporation proposes programmable system-on-chip SOPC (System on a Programmable Chip) technology, and also introduced the corresponding development software Quartus II.

[0003] SOPC系统是一种特殊的嵌入式系统,SOPC技术就是将尽可能大而完整的电子系统在一块FPGA中实现,包括嵌入式处理器系统、接口系统、DSP系统、数字通信系统、存储电路等,其实质就是将PLD中容入更多模块,它是片上系统S0C,同时是可编程系统,具有灵活的设计方式,可裁减、可扩充、可升级,并具备软硬件在系统可编程的功能。 [0003] SOPC embedded system is a special system, the SOPC technology is as large as possible and complete the electronic system implemented in an FPGA, comprising an embedded processor system, the interface system, DSP systems, digital communication systems, storage circuit or the like, and its essence is to accommodating a PLD into more modules, it is the on-chip system S0C, while the system is programmable, has a flexible design approach, scalable, extensible, scalable, programmable and includes hardware and software in the system function.

[0004] SOPC技术主要包括:软硬件协同设计技术、IP核复用技术、模块和模块界面间的综合分析和验证技术。 [0004] SOPC technologies include: Comprehensive analysis and verification of hardware and software co-design techniques among technology, IP core multiplexing, interface module and the module.

[0005] 软硬件协同设计技术强调软件和硬件设计开发的并行性和相互反馈,以克服传统方法中因软件和硬件分开设计所带来的系统综合预期不可确定的弊端。 [0005] hardware and software co-design technology software and hardware parallelism emphasis on design and development of mutual feedback, in order to overcome the drawbacks of traditional methods due to software and hardware designed separately brought the system can not determine the overall expected. 因为传统的系统设计方法是先设计硬件,再根据算法设计软件,在深亚微米设计中,硬件的费用是非常大的, 当设计完成后,发现错误进行更改时,要花费大量的人力、物力和时间,且设计周期变长。 Because the traditional system design approach is to design the hardware, then according to algorithm design software, in deep submicron design, the hardware cost is very large, when the design is complete, it found errors change, to spend a lot of manpower and material resources and time, and the design cycle becomes long. 针对传统的设计方法,基于FPGA的软硬件协同设计是一种高效实用的设计方法。 Traditional design method, based on FPGA hardware and software co-design is an efficient and practical design. 通过协同设计确定系统软件和硬件之间的相互制约关系,使系统软件的设计必须考虑芯片的硬件结构,同时系统的芯片结构,更需要软件与硬件设计的协同,以使整个系统在协同设计中实现最优化,大大缩短了设计周期,提高了设计效率。 Collaborative design by determining the relationship between the mutual restraint system software and hardware, system software must be designed so that regardless of hardware structure of the chip, while the chip architecture of the system, need collaboration software and hardware design, so that the whole system in a collaborative design optimize, shorten the design cycle and improve design efficiency. 对于SOPC系统而言,系统功能用硬件执行,需要一定的FPGA的逻辑单元和执行时间,用软件执行,则需要一定的存储器的容量和处理器时间。 For SOPC system, the system functions performed by the hardware, and requires a certain execution time FPGA logic unit, and by software executed, processor time is required and a certain capacity of memory. 软件实现不需要占用硬件资源,但需要较长的执行时间;与之相反,硬件实现执行时间较短,但占用硬件逻辑资源。 Software does not require hardware resources, but requires a longer execution time; in contrast, hardware implementation of a short time, but take up the hardware logic resources. 合理划分软硬件就是在FPGA逻辑单元占用和时间耗费上做一个好的平衡。 Reasonable division hardware in the FPGA logic unit is occupied and the time-consuming to make a good balance. 基本原则是高速、低功耗由硬件实现;多品种、小批量由软件实现; 处理器和专用硬件并用以提高处理速度和降低功耗。 The basic principle is that a high-speed, low power consumption realized by hardware; multi-variety, small batch implemented by software; processors and dedicated hardware, and to improve processing speed and reduce power consumption.

[0006] SOPC技术的IP复用理念将得到普遍认同并成为主要的设计方式。 [0006] SOPC technology IP reuse concept will be universally recognized and to be the main design. SOPC芯片需要集成一个复杂的系统,这导致了它具有比较复杂的结构,如果是从头开始完成芯片设计, 显然将花费大量的人力物力。 SOPC chip needs to integrate a complex system, which has led to its complex structure, if it is complete chip design from scratch, obviously spend a lot of manpower and resources. 另外现在电子产品的生命期正在不断缩短,这要求芯片的设计可以在更短的周期内完成。 Also now the lifetime of electronic products are constantly shortening, which requires the design of the chip can be completed in a shorter period. 为了加快芯片设计的速度,人们将已有的IC电路以模块的形式,在SOPC芯片设计中调用,从而简化芯片的设计,缩短设计时间,提高设计效率。 In order to speed up the chip design, people will have the form of an IC circuit module, called the SOPC chip design, thereby simplifying the design of the chip, reduce design time and increase design efficiency. IP模块是一种预先设计好,已经过验证,集成度较高并具有某种完整功能的集成电路、器件或部件,如MPU、DSP、DRAM、Flash等模块。 IP module is a pre-designed, verified, and higher integration of the integrated circuit having a certain function complete, devices or components, such as MPU, DSP, DRAM, Flash, etc. modules. 构建一个系统是个复杂的过程,设计者可以把注意力集中于整个系统,而不必考虑各个模块的正确性和性能。 Construction of a system is a complex process, the designer can concentrate throughout the system, regardless of the performance and accuracy of each module. IP模块的再利用,除了可以缩短芯片的设计时间外,还能大大降低设计和制造的成本,提高可靠性。 Reuse IP module, in addition to the chip design time can be shortened, but also greatly reduce the design and manufacturing costs, improve reliability.

[0007] 近几年来,关于视频编解码的研究取得了很大的进展,特别是国际标准化组织(ISO)和国际电信联盟(ITU)等国际组织制定出了一系列相应的国际标准,极大地推动了视频编解码技术的发展,促进了视频编解码技术的广泛应用。 [0007] In recent years, research on video codec has made great progress, in particular the International Organization for Standardization (ISO) and the International Telecommunication Union (ITU) and other international organizations to develop a series of relevant international standards, greatly promote the development of video codec technology, and promote the widespread use of video codec technology.

[0008] 随着嵌入式系统性能的不断提高,SOPC技术实现图像和视频解码将很大程度提高解码性能,其软硬件协同设计的技术特点对视频解码将具有不可比拟的优势。 [0008] With the improvement of system performance embedded, SOPC technology for image and video decoding will improve decoding performance to a large extent, its hardware and software co-design of the technical characteristics of the video decoding will have unparalleled advantages.

[0009] 离散余弦变换(DCT)和其离散余弦逆变换(IDCT)广泛应用于图像和视频压缩、解压缩应用中。 [0009] Discrete Cosine Transform (DCT) and its inverse discrete cosine transform (IDCT) are widely used in image and video compression, decompression applications. DCT可以去除数据之间的相关性,能够聚集图像中的能量,使数据便于压缩, 是目前大多数图像和视频编码标准的核心,比如JPEG、H. 26x系列、MPEGx系列标准等。 DCT can remove the correlation between the data, the image can be focused energy that facilitates data compression, the core is most image and video coding standards, such as JPEG, H. 26x series, MPEGx series standards. 而在图像和视频解码系统中,则使用IDCT对数据进行还原。 In image and video decoding system, the data is restored using the IDCT. IDCT是解码中重要的部分,常用8X8的二维离散余弦逆变换QD IDCT)的运算量大,其计算量占到整个解码运算的40%左右,直接影响到图像和视频解码系统的实时性,因此在以Nios II处理器为核心SOPC视频解码系统中,研究二维IDCT的实现尤为重要。 IDCT is an important part of the decoding, the common 8X8 two dimensional inverse discrete cosine transform IDCT the QD) of the large amount of computation, which calculates accounted for about 40% of the decoding operation, and a direct impact on the real-time image and video decoding system, Therefore, in order for the Nios II processor core SOPC video decoding system, the researchers achieved a two-dimensional IDCT is particularly important.

实用新型内容 SUMMARY

[0010] 本实用新型的目的是解决上问题,提供基于SOPC技术二维IDCT分布式算法的IP 核,该装置硬件模块工作速度快,这样就能够在实现实时高质量解码的同时,降低系统的工作频率,从而大大降低功耗;软件模块所具有的灵活性和可扩展性使得解码器具有很好的兼容性,可以比较方便的修改和加入新的功能;;3)计算量主要集中到了硬件加速器上,大大减轻了CPU的计算负担,使得CPU可以支持更多上层应用,兼顾解码速度、功耗、灵活性、 成本以及开发周期的要求。 [0010] The object of the present invention is to solve the problem, a two-dimensional IDCT SOPC technology-based distributed algorithm IP core, fast operating speed of the device hardware module, so that while it is possible to achieve real-time decoding of high quality, reduced system the operating frequency, thus greatly reducing the power consumption; software module having flexibility and scalability such that the decoder has good compatibility, and can be more easily modified to add new features ;; 3) to calculate the amount of hardware mainly on the accelerator, greatly reducing the computational burden on the CPU so that the CPU can support more upper application, taking into account the requirements of decoding speed, power, flexibility, cost and development cycle.

[0011] 为达到上述目的,本实用新型所采用的技术方案是:基于SOPC技术二维IDCT分布式算法的IP核,该IP核装置具有Avalon总线读取模块、控制器模块、2D IDCT模块、 Avalon总线写入模块、Avalon总线和控制寄存器;控制寄存器写入控制数据;控制模块控制Avalon总线读取模块读取控制寄存器中的操作地址,将要处理的数据经由Avalon总线读入输入缓存,经2D IDCT模块处理后由Avalon总线写入模块将结果写回原地址。 [0011] To achieve the above object, the technical solution adopted by the present invention is: based on the two-dimensional IDCT IP core SOPC technology distributed algorithm, the IP core Avalon bus means having a reading module, a controller module, 2D IDCT module, Avalon bus write module, and a control register Avalon bus; a control register write control data; Avalon bus control module controls operation of the read control module reads the address register, the data to be processed into the input buffer via a read bus Avalon, the 2D after IDCT processing module back to the original address written by the Avalon bus module will write the result.

[0012] 2D IDCT模块具有ID IDCT模块、串并转换器、复用器、并串转换器、转置内存和控制器;串并转换Buffer模块收到一组数据后将其作为一行数据同时输出给复用器,由ID IDCT计算每行8点的逆变换值,输出给转置内存,再输出给复用器,经由由ID IDCT模块计算每列8点的逆变换值,输出给并串转换器输出;控制器控制整个过程。 [0012] 2D IDCT module having ID IDCT module, serial to parallel converter, a multiplexer, serial converter, and a transposition memory controller; serial-parallel converter which outputs one line of data at the same time as the module will receive a set of data Buffer to the multiplexer, is calculated from the ID IDCT inverse transform values ​​of each row 8, the output to the transposition memory, and then output to the multiplexer 8 via the computing each column of an inverse transform value of ID IDCT module, and output to the serial converter output; and a controller to control the entire process.

[0013] ID IDCT模块具有移位寄存器、8个移位累加器和后处理模块;移位寄存器输入13 位数据,输出8位数据给8个移位累加器;8个移位累加器输出为14位数据,经后处理模块将精度扩展为16位后输出。 [0013] ID IDCT module has a shift register, the shift accumulators 8 and post-processing module; data shift register input 13, 8-bit data to output a shift accumulator 8; 8 outputs a shift accumulator 14-bit data, after the extended-precision processing module 16 to the output.

[0014] 移位累加器模块由4输入累加器构成。 [0014] Shift accumulator module consists of 4 input accumulator.

[0015] 本实用新型基于SOPC技术二维IDCT分布式算法的IP核,先利用2D IDCT的行列分解特性,将其变为两个ID IDCT变换,先对所有行进行ID IDCT变换,再对所有列进行ID IDCT变换,最终得到的就是2D IDCT变换的结果。 [0015] The present invention is based on a two-dimensional IDCT SOPC technology IP core distributed algorithm, using the first row 2D IDCT decomposition characteristics, it becomes a two ID IDCT transform, to transform IDCT ID for all rows, then all columns ID IDCT conversion, the final result is obtained 2D IDCT transform. 这种分解带来的好处是多方面的:首先它减小了运算量,并降低了实现的复杂度;其次它使得运算很有规律性,有助于软硬件实现;硬件实现时,可以复用同一个ID IDCT核,节省了硬件资源的消耗。 The benefits of this decomposition brings is manifold: First, it reduces the amount of computation, and reduce the complexity of implementation; secondly, it makes operation very regularly, it helps software and hardware; hardware implementation can be reused using the same ID IDCT core, saving the consumption of hardware resources. 这样,实现2D IDCT硬件的关键是实现ID IDCT0对于1D-DCT/IDCT运算,已经有很多快速算法,如CheruWang、 Lee、Loeffer算法等,这些快速算法多用于ID DCT/IDCT的软件实现,不适合用于硬件实现。 Thus, the key to 2D IDCT hardware is realized ID IDCT0 for 1D-DCT / IDCT operation has a lot of fast algorithm, such as CheruWang, Lee, Loeffer algorithm, the Fast Algorithm for the ID DCT / IDCT software implementation, not suitable for hardware implementation. 主要是因为这些算法不利于硬件的并行执行,而且需要使用较大的乘法器,乘法器占用硬件资源较多,且处理速度慢。 Mainly because these algorithms are not conducive to parallel execution hardware, but also need to use a larger multiplier, multiplier hardware resources more, and processing speed. 本设计研究在用Chen算法简化方程的基础上,ID IDCT硬件设计采用分布式算法(Distributed Arithmetic,简称DA)实现乘法,并设计偏移ニ进制编5¾ OBC(Offset Binary Code)来减小其查找表LUT (Look up table)大小。 Design of this algorithm on the basis of simplified equations using the Chen, ID IDCT hardware design uses a distributed algorithm (Distributed Arithmetic, referred to as DA) to achieve multiplication and offset design Ni binary coding 5¾ OBC (Offset Binary Code) to reduce its lookup table LUT (look up table) size. DA算法是将输入的数据通过矢量内积计算所产生的所有的结果都存储在里面,这样当要用到其中某个点乘的结果时就可以通过查找LUT得到想要的結果,这不仅改进了像传统串行算法那种计算过程繁琐、计算量大、硬件电路复杂的缺点,还使系统性能大大提高,运行速率加快。 DA algorithm is all the result of data input by the vector product calculation generated are stored in the inside, so that when one of these results to the use point by the desired result can be obtained by finding the LUT, which not only improved the disadvantage that as the traditional serial process algorithm complicated, computationally intensive, complicated hardware circuit, but also greatly improve system performance, operation rate of speed.

[0016] 进ー步设计基于Avalon总线标准的接ロ以及控制寄存器組,形成ニ维IDCT IP 核,将该IP核加入到SOPC视频解码中,实现ニ维IDCT功能。 [0016] Step ー into contact ro based design standards and Avalon bus control register group, Ni-dimensional IDCT IP core is formed, the IP core is added to the video decoding SOPC achieve ni dimensional IDCT function.

[0017] 完整的技术方案从IDCT算法、ー维IDCT硬件设计、ニ维IDCT硬件设计、Avalon总线接ロ的ニ维IDCT IP核设计、ニ维IDCT IP核的综合与测试五个方面论述,详述如下: [0017] complete technical solutions from IDCT algorithm ー dimensional IDCT hardware design, ni-dimensional IDCT hardware design, Avalon bus interface of ro ​​ni-dimensional IDCT IP core design, ni-dimensional IDCT IP core integration and testing five aspects, details as follows:

[0018] (1)、ニ维IDCT 分解 [0018] (1), ni exploded dimensional IDCT

[0019] 8X8 ニ维IDCT的定义式如下: [0019] 8X8 Ni-dimensional IDCT is defined in the following formula:

Figure CN202084032UD00051

[0021] 其中Fx,y为DCT变换后的系数,fi, j为原始数据,当n = 0,C(n) = 2-1/2,当n≠ 0,C(n) =1。 [0021] wherein Fx, the DCT transform coefficients of y, fi, j of the original data, when n = 0, C (n) = 2-1 / 2, when n ≠ 0, C (n) = 1.

[0022] 直接计算8X8 ニ维IDCT的运算量很大,因此利用2D IDCT的行列分解特性,将其变为两个ID IDCT变换。 [0022] Direct calculation of ni-dimensional IDCT 8X8 large amount of computation, so using 2D IDCT ranks decomposition characteristics, it becomes a two ID IDCT transform. 先对所有行进行ID IDCT变换,再对所有列做ID IDCT变换,最终得到的就是2D IDCT变换的結果。 First performed on all lines ID IDCT transform, and then do the ID IDCT transform all the columns, the resulting transformation is the result of 2D IDCT.

[0023] (2)、ー维IDCT的分布式算法 [0023] (2), distributed ー dimensional IDCT algorithm

[0024] 实现2DIDCT硬件的关键是实现ID IDCT,对于ID IDCT运算,为了适合用于硬件实现,。 [0024] The key to 2DIDCT ID IDCT hardware is realized, for the ID IDCT calculation, in order suitable for hardware implementation. 本设计在用Chen算法简化方程的基础上,使用分布式算法实现ID IDCT。 On the basis of this design with a simplified equation of the Chen algorithm, using a distributed algorithm ID IDCT.

[0025] 8点ID IDCT的定义式如下: [0025] 8-point ID IDCT formula is defined as follows:

Figure CN202084032UD00052

[0027] 其中h为DCT变换后的系数,fi为原始数据,当X = 0,C(x) = 2-1/2,当x乒0, C(X) = 1。 [0027] where h is the coefficient of the DCT transform, the original data Fi, when X = 0, C (x) = 2-1 / 2, when x ping 0, C (X) = 1.

[0028]使用 Chen 算法化简,令Ci = cos (i Ji /16),则: [0028] using the Chen algorithm simplification, so that Ci = cos (i Ji / 16), then:

Figure CN202084032UD00053

[0030] 则8点ID IDCT可用下面两式计算:[0031] [0030] 8-point ID IDCT is available the following two formulas: [0031]

Figure CN202084032UD00061

[0032] 计算8点ID IDCT的问题转化成了计算P、M两个矩阵式,P、M实际分别是4个向量内积,计算时需要用到乘法累加器(MAC)。 [0032] 8-point ID IDCT calculation problem transformed into computing P, M two matrix, P, M are the actual four vector products, need to use multiplier-accumulator (MAC) calculation. 因此如何实现乘法累加器便成了用硬件实现ID IDCT运算的关键问题。 So the key question became how to achieve the multiplier-accumulator implemented in hardware ID IDCT operations.

[0033] 使用分布式算法(DA)可以有效地解决向量内积计算问题,将预先计算好的部分和存入查找表中,利用移位累加器和查找表而不使用乘法器得到计算结果。 [0033] using a distributed algorithm (DA) can solve the problem of vector inner product computation, precalculated and stored in a lookup table section, using a shift accumulator without using the multiplier and look-up table to obtain the calculation result.

[0034] 令Fi为B位二进制补码形式,可以表示为: [0034] Fi to make B-bit twos complement form, it can be expressed as:

[0035] [0035]

Figure CN202084032UD00062

[0036]其中j表示Fi的第j位,BI为最高位(MSB)符号位,Fi j的值只可能是0或L [0036] where j represents the j-th bit of the Fi, the BI is the highest bit (MSB) sign bit, the value of Fi j may be 0 only or L

[0037]将P写成向量内积形式 [0037] The vector product of the written form P

Figure CN202084032UD00063

(χ=1,2,3,4)。 (Χ = 1,2,3,4). 将上式的Fi代入得: The substituted into the formula Fi too:

[0038] [0038]

Figure CN202084032UD00064

[0039] 整理后可写成下式: [0039] Finishing can be written as follows:

[0040] [0040]

Figure CN202084032UD00065

[0041 ]其中: [0041] wherein:

Figure CN202084032UD00066

[0042] 部分和Dx(Fj)是位置j的函数,对于4个输入位Fij,其输出只有M= 16种可能,因此可以将这16这值存在查找表中,然后通过加法和移位操作进行计算,而不进行乘法运算。 [0042] section and Dx (Fj) is a function of position j, Fij for the 4 input bits, the output of M = 16 only possible, it is possible that these values ​​exist lookup table 16, and then by addition and shift operations calculated without multiplications. 这种分布式算法可以有效地计算向量内积,并且运行速度快,结构简单便于硬件实现。 This distributed algorithm can calculate the vector inner product, and is fast, simple structure facilitates hardware implementation.

[0043] (3)、OBC编码的分布式算法 [0043] (3), OBC distributed arithmetic coding

[0044] 分布式算法的查找表大小,与内积算式的向量长度有关。 The size of the lookup table [0044] distributed algorithm, the inner product of the vector length of the relevant formula. 对于向量长度为N的向量内积,其查找表大小为2N。 For the vector inner product of vectors of length N, the lookup table size is 2N. 随着向量长度的增加,查找表的大小将随之增加,查找表过大将影响累加器对它的访问速度,而且也将占用更多的硬件资源。 With increasing vector length of the lookup table will also increase the size of the lookup table is too general affect access speed of its accumulator, and will also take up more hardware resources.

[0045] 使用偏移二进制编码OBC可以将查找表的大小减少一半,它将输入向量位值0、1 映射为_1、1,使部分和Dx关于输入向量正负值成为镜像对称。 [0045] using an offset binary coding OBC lookup table size may be reduced by half, it will map the input vector is selected, 1-bit values ​​of 0, so that part of the positive and negative input vector on Dx becomes mirror-symmetric. 将B位Fi的二进制补码写成下式: The B-bit twos complement Fi written as follows:

[0046] [0046]

Figure CN202084032UD00067

[0047] 令 [0047] Order

Figure CN202084032UD00068

则可将Fi表示为:[0048] Fi can be expressed as: [0048]

Figure CN202084032UD00071

[0049] 将上式代入表达式Px 中,化简后得: [0050] [0049] The above equation is substituted into the expression Px, obtained after simplification: [0050]

Figure CN202084032UD00072

[0051] [0051]

Figure CN202084032UD00073

[0052] [0053] 从上式可以看出,如果预先计算好部分和Dj并存入查找表,通过移位累加操作, 同样可以计算出向量内积I^。 [0052] [0053] As can be seen from the above equation, if the pre-calculated and stored Dj section and a lookup table, by shifting accumulation operation, the same product can be calculated within the vector I ^. dij的取值只可能是-1或+1,部分和Dj关于向量d的正负值成镜像对称。 dij values ​​can only be -1 or +1, and partially on the vector d Dj mirror symmetry of positive and negative values. 下面以计算Pl为例,来说明这种部分和的对称关系。 Pl below to calculate an example to illustrate this portion and the symmetrical relationship. Pl的计算式如下: Pl calculation formula as follows:

[0054] P1 = C4F0+C2F2+C4F4+C6F6 [0054] P1 = C4F0 + C2F2 + C4F4 + C6F6

[0055] 根据部分和Dj的定义,有: [0055] According to the definition part and Dj, are:

[0056] [0056]

Figure CN202084032UD00074

[0057] [0058] 计算Pl的查找表见表1。 [0057] [0058] Pl is calculated look-up table shown in Table 1.

[0059] 表1计算Pl的查找表 [0059] Table 1 calculated lookup table Pl

[0060] [0060]

Figure CN202084032UD00081

[0061] 从表1中可以看到,Pl的部分和关于FOj的值对称。 [0061] can be seen from Table 1, the portion Pl and symmetrical about the value of FOj. 当FOj的等于0时,只使用其余三位进行查表即可。 When FOj is equal to 0, only the remaining three look-up table can. 当FOj的等于1时,只要将该值与其它三位的值进行异或运算,然后用异或后的结果在黑框中查表,最后再将查表得到的结果取反即为正确输出。 When FOj equal to 1, as long as the value XOR the value of the other three, then the look-up table in a black frame with the XOR result, then the final result look-up table is the correct output negated . 因此,实际的查找表大小为23 = 8,与之前的M = 16相比减小了一半。 Therefore, the actual size of the lookup table 23 = 8, and M = 16 compared to the previous reduced by half.

[0062] 因此,该装置硬件模块工作速度快,这样就能够在实现实时高质量解码的同时,降低系统的工作频率,从而大大降低功耗;软件模块所具有的灵活性和可扩展性使得解码器具有很好的兼容性,可以比较方便的修改和加入新的功能;幻计算量主要集中到了硬件加速器上,大大减轻了CPU的计算负担,使得CPU可以支持更多上层应用,兼顾解码速度、功耗、灵活性、成本以及开发周期的要求。 [0062] Thus, the device is working fast hardware module, so that while it is possible to achieve real-time decoding of high quality, reduce the operating frequency of the system, thus greatly reducing the power consumption; software module having flexibility and scalability decoder that It has good compatibility, and can be more easily modified to add new features; magic computation to focus on the hardware accelerator, greatly reducing the computational burden of the CPU so that the CPU can support more upper application, taking into account the decoding speed, power consumption requirements of flexibility, cost and development cycle.

附图说明: BRIEF DESCRIPTION OF:

[0063] 1、图1为本实用新型的结构连接示意图;[0064] 2、图2为本实用新型的2D结构连接示意图; [0063] 1, FIG. 1 is a schematic view of the connection structure of the invention; [0064] 2, FIG. 2 new 2D schematic diagram of the structure of the connection utility;

[0065] 3、图3为本实用新型的ID连接示意图; [0065] 3, FIG. 3 is a schematic view of the connection ID invention;

[0066] 4、图4为本实用新型的累加器连接示意图。 [0066] 4, FIG. 4 is a schematic view of the connection accumulator invention.

具体实施方式: detailed description:

[0067] 为使本实用新型的技术方案便于理解,以下结合具体实施方式对本实用新型作进一步的说明。 [0067] To make the technical solution of the present invention to facilitate the understanding, the following embodiment with reference to specific embodiments further explanation of the invention as.

[0068] 实施例1 : [0068] Example 1:

[0069] 如图1、2、3所示,基于SOPC技术二维IDCT分布式算法的IP核,该IP核具有Avalon总线读取模块、控制器模块、2D IDCT模块、Avalon总线写入模块、Avalon总线和控制寄存器;控制寄存器写入控制数据;控制模块控制Avalon总线读取模块读取控制寄存器中的操作地址,将要处理的数据经由Avalon总线读入输入缓存,经2D IDCT模块处理后由Avalon总线写入模块将结果写回原地址。 [0069] As shown, the IP core based on SOPC dimensional IDCT algorithm is distributed, the IP core Avalon bus having reading module 1, 2, the controller module, 2D IDCT module, Avalon bus writing module, Avalon bus and a control register; control register write control data; Avalon bus control module controls operation of the read control module reads the address register, the data to be processed into the input buffer via a read bus Avalon, the 2D IDCT processing module after the Avalon bus writing module writes the result back to the original address.

[0070] 2D IDCT模块具有ID IDCT模块、串并转换器、复用器、并串转换器、转置内存和控制器;串并转换Buffer模块收到一组数据后将其作为一行数据同时输出给复用器,由ID IDCT计算每行8点的逆变换值,输出给转置内存,再输出给复用器,经由由ID IDCT模块计算每列8点的逆变换值,输出给并串转换器输出;控制器控制整个过程。 [0070] 2D IDCT module having ID IDCT module, serial to parallel converter, a multiplexer, serial converter, and a transposition memory controller; serial-parallel converter which outputs one line of data at the same time as the module will receive a set of data Buffer to the multiplexer, is calculated from the ID IDCT inverse transform values ​​of each row 8:00, transposed to the output of the memory, and then output to a multiplexer, each column via the computing 8:00 by an inverse transform value of ID IDCT module, and output to the serial converter output; and a controller to control the entire process.

[0071] IDIDCT模块具有移位寄存器、8个移位累加器和后处理模块;移位寄存器输入13 位数据,输出8位数据给8个移位累加器;8个移位累加器输出为14位数据,经后处理模块将精度扩展为16位后输出。 [0071] IDIDCT module has a shift register, the shift accumulators 8 and post-processing module; data shift register input 13, 8-bit data to output a shift accumulator 8; 8 outputs a shift accumulator 14 bit data after the processing module 16 will be extended to the accuracy of the output.

[0072] 移位累加器模块由4输入累加器构成。 [0072] Shift accumulator module consists of 4 input accumulator.

[0073] 使用Cyclone II EP2C35F672C8 FPGA芯片为核心的SOPC开发平台,硬件设计使用Verilog HDL硬件描述语言编写,在Quartus II软件进行综合,整个2D IDCT占用了4336 个逻辑单元,核心模块ID IDCT只占用了632个逻辑单元。 [0073] using the Cyclone II EP2C35F672C8 FPGA chip as the core SOPC development platform, the hardware design using Verilog HDL hardware description language, integrated Quartus II software, the entire 2D IDCT occupy 4336 logic cells, the core module ID IDCT only takes 632 logical units. 8个查找表模块直接使用了FPGA 逻辑单元内的查找表LUT,没有寄存器或内置RAM。 8 directly look up table module uses the lookup table LUT in FPGA logic unit, registers or no built-RAM. 这种查找表模块的实现方式简单灵活, 并且芯片访问速度快。 Simple and flexible way to achieve this look-up table module, chip and fast access speed. 2D IDCT IP核的最高可综合工作频率为140.39MHz。 The maximum operating frequency of the integrated 2D IDCT IP core is 140.39MHz.

[0074] 在以Nios II为处理器的SOPC系统中,进行实际视频解码测试。 [0074] In the processor of Nios II SOPC system, actual video decoding test. 将IDCT IP核添加到SOPC Builder中,将编码过视频测试文件烧入到FLASH中,移植解码程序到Nios II IDE中,删除原有的IDCT软件函数,C语言编写2DIDCT IP核的驱动函数。 Add the IDCT IP core to SOPC Builder, the encoded video test files burned into the FLASH, the transplant program to decode the Nios II IDE, delete the original IDCT function software, write drivers 2DIDCT IP core function of the C language. 经系统解码后, 通过带VGA接口的IXD进行播放。 After the system decoder, for playback by IXD with VGA interface.

[0075] 系统加入IDCT IP核后,IXD显示画面清晰,没有降低系统的解码质量,加入2D IDCT IP核后系统解码时间小了约11ms,帧率提高了6帧。 After the [0075] system was added IDCT IP core, IXD display clear, without reducing the quality of the decoding system, after addition of 2D IDCT IP core decoding system small time of about 11ms, the frame rate increased 6.

[0076] 该IP核装置,针对硬件设计所研究的二维IDCT算法切实有效,分布式算法的使用提高了芯片的最高工作频率,结合OBC编码方法,大大减小了逻辑资源的占用面积。 [0076] The IP core device, the two-dimensional IDCT algorithm for hardware design effective research, using a distributed algorithm improves the maximum operating frequency of the chip, combined with OBC coding method, greatly reducing the footprint of logic resources. 综合结果表明了芯片占用资源少、访问速度快,其最高可综合工作频率达140. 39MHz,成功实现了二维IDCT可编程的FPGA硬件设计, The results showed that the overall chip footprint small, fast access speed, the maximum operating frequency can be integrated up to 140. 39MHz, the successful implementation of the two-dimensional IDCT programmable FPGA hardware design,

[0077] 实现了基于SOPC系统的二维IDCT IP核设计,测试结果表明使用该IP核的解码比使用软件解码提高了视频解码速度,平均提高了20%以上,成功验证了设计的实时性和有效性。 [0077] implemented based on two-dimensional IDCT SOPC system's IP core design, test results show that the use of decoding IP core is higher than using software decoding video decoding speed, an average of more than 20%, successfully validated real-time design and effectiveness. [0078] 由于SOPC的IP核复用技术,该IP核装置可以应用相关的图像和视频处理系统中,具有很强的实用性、通用性和扩展性。 [0078] Since SOPC IP core multiplexing, IP core means can be applied to image and video processing-related systems, very practical, versatility and expandability.

[0079] 以上所述,仅为本实用新型的较佳实施例,并非对本实用新型作任何形式上和实质上的限制,凡熟悉本专业的技术人员,在不脱离本实用新型技术方案范围内,当可利用以上所揭示的技术内容,而作出的些许更动、修饰与演变的等同变化,均为本实用新型的等效实施例;同时,凡依据本实用新型的实质技术对以上实施例所作的任何等同变化的更动、修饰与演变,均仍属于本实用新型的技术方案的范围内。 [0079] The above are only preferred embodiments of the present invention embodiment, the present invention does not limit any substantial form and, within the scope of the technical solution of the present invention where Those skilled in the art, without departing from the when content equivalent changes may utilize the techniques disclosed above, made various omissions, substitutions, modification and evolution of the present invention are equally effective embodiments; Meanwhile, according to the present invention where the spirit of the above embodiments technique any changes made equivalent modifiers, modification and evolution, it falls within the scope of the present invention technical solution.

Claims (4)

  1. 1.基于SOPC技术二维IDCT分布式算法的IP核,其特征在于:该IP核装置具有Avalon 总线读取模块、控制器模块、2DIDCT模块、Avalon总线写入模块、Avalon总线和控制寄存器;所述的控制寄存器写入控制数据;控制模块控制所述的Avalon总线读取模块读取控制寄存器中的操作地址,将要处理的数据经由所述的Avalon总线读入输入缓存,经所述的2DIDCT模块处理后由所述的Avalon总线写入模块将结果写回原地址。 1. IP core based on SOPC dimensional IDCT algorithm is distributed, characterized in that: the IP core Avalon bus means having a reading module, a controller module, 2DIDCT module, Avalon bus writing module, Avalon bus and control registers; the write control data of said register; control module controls the read module reads the Avalon bus address register in a control operation, the data to be processed into the input buffer via the Avalon bus read by said module 2DIDCT after processed by the Avalon bus write module writes the result back to the original address.
  2. 2.根据权利要求1所述的基于SOPC技术二维IDCT分布式算法的IP核,其特征在于: 所述的2D IDCT模块具有ID IDCT模块、串并转换器、复用器、并串转换器、转置内存和控制器;所述的串并转换Buffer模块收到一组数据后将其作为一行数据同时输出给所述的复用器,由所述的ID IDCT计算每行8点的逆变换值,输出给所述的转置内存,再输出给复用器,经由由所述的ID IDCT模块计算每列8点的逆变换值,输出给所述的并串转换器输出; 所述的控制器控制整个过程。 The two-dimensional IDCT SOPC technology based distributed algorithm IP core according to claim 1, wherein: said IDCT 2D IDCT module having a module ID, serial to parallel converter, a multiplexer, serial converter , and transposition memory controller; the serial-parallel conversion multiplexer module Buffer will simultaneously receive a set of data as output to the data line, calculated by the inverse of each row of ID IDCT 8:00 conversion value output to the transpose memory, and then output to the multiplexer 8 via each column is calculated by the inverse transform module of ID IDCT value, and output to the serial output of the converter; the the controller controls the overall process.
  3. 3.根据权利要求1所述的基于SOPC技术二维IDCT分布式算法的IP核,其特征在于: 所述的ID IDCT模块具有移位寄存器、8个移位累加器和后处理模块;所述的移位寄存器输入13位数据,输出8位数据给所述的8个移位累加器;所述的8个移位累加器输出为14位数据,经所述的后处理模块将精度扩展为16位后输出。 The two-dimensional IDCT SOPC technology based distributed algorithm IP core according to claim 1, wherein: said IDCT module ID includes a shift register, the shift accumulators 8 and post-processing module; the the data shift register input 13, 8-bit data output to the shift accumulator 8; 8 shift the accumulator output 14-bit data, the post-processing module of the extended precision after the 16-bit output.
  4. 4.根据权利要求1所述的基于SOPC技术二维IDCT分布式算法的IP核,其特征在于: 所述的移位累加器模块由4输入累加器构成。 According to claim SOPC technology based on a two-dimensional IDCT algorithm is distributed according to an IP core, wherein: said shift accumulator module consists of 4 input accumulator.
CN 201120080618 2011-03-24 2011-03-24 IP (Internet Protocol) core based on two-dimensional (2D) IDCT (Inverse Discrete Cosine Transformation) distributed algorithm of SOPC (System on Programmable Chip) technology CN202084032U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201120080618 CN202084032U (en) 2011-03-24 2011-03-24 IP (Internet Protocol) core based on two-dimensional (2D) IDCT (Inverse Discrete Cosine Transformation) distributed algorithm of SOPC (System on Programmable Chip) technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201120080618 CN202084032U (en) 2011-03-24 2011-03-24 IP (Internet Protocol) core based on two-dimensional (2D) IDCT (Inverse Discrete Cosine Transformation) distributed algorithm of SOPC (System on Programmable Chip) technology

Publications (1)

Publication Number Publication Date
CN202084032U true CN202084032U (en) 2011-12-21

Family

ID=45344655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201120080618 CN202084032U (en) 2011-03-24 2011-03-24 IP (Internet Protocol) core based on two-dimensional (2D) IDCT (Inverse Discrete Cosine Transformation) distributed algorithm of SOPC (System on Programmable Chip) technology

Country Status (1)

Country Link
CN (1) CN202084032U (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103019973A (en) * 2012-11-23 2013-04-03 华为技术有限公司 Data interaction system and method
WO2015131511A1 (en) * 2014-03-06 2015-09-11 京东方科技集团股份有限公司 Video decoding method and device thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103019973A (en) * 2012-11-23 2013-04-03 华为技术有限公司 Data interaction system and method
CN103019973B (en) * 2012-11-23 2015-08-26 华为技术有限公司 Interactive data system and method
WO2015131511A1 (en) * 2014-03-06 2015-09-11 京东方科技集团股份有限公司 Video decoding method and device thereof
US9838704B2 (en) 2014-03-06 2017-12-05 Boe Technology Group Co., Ltd. Method and apparatus for decoding video

Similar Documents

Publication Publication Date Title
Wu et al. A high-performance and memory-efficient pipeline architecture for the 5/3 and 9/7 discrete wavelet transform of JPEG2000 codec
US7536430B2 (en) Method and system for performing calculation operations and a device
Gupta et al. IMPACT: imprecise adders for low-power approximate computing
Veredas et al. Custom implementation of the coarse-grained reconfigurable ADRES architecture for multimedia purposes
Han et al. EIE: efficient inference engine on compressed deep neural network
Lin et al. A dynamic scaling FFT processor for DVB-T applications
Han et al. Ese: Efficient speech recognition engine with sparse lstm on fpga
Gupta et al. Low-power digital signal processing using approximate adders
Hameed et al. Understanding sources of inefficiency in general-purpose chips
Shams et al. NEDA: A low-power high-performance DCT architecture
Benes et al. A fast asynchronous Huffman decoder for compressed-code embedded processors
Chungan et al. A 250MHz optimized distributed architecture of 2D 8x8 DCT
US7725516B2 (en) Fast DCT algorithm for DSP with VLIW architecture
CN101330616A (en) Hardware implementing apparatus and method for inverse discrete cosine transformation during video decoding process
Shen et al. A unified 4/8/16/32-point integer IDCT architecture for multiple video coding standards
JPH0646269A (en) Expansion method and compression method for still picture data or device executing the methods
Shams et al. A low power high performance distributed DCT architecture
CN101834723A (en) RSA (Rivest-Shamirh-Adleman) algorithm and IP core
Kuang et al. Low-error configurable truncated multipliers for multiply-accumulate applications
Li et al. Architecture design for H. 264/AVC integer motion estimation with minimum memory bandwidth
Wang et al. A reconfigurable multi-transform VLSI architecture supporting video codec design
US20070250557A1 (en) Method and circuit for performing cordic based loeffler discrete cosine transformation (dct) for signal processing
Ahmed et al. N point DCT VLSI architecture for emerging HEVC standard
Park et al. Low power reconfigurable DCT design based on sharing multiplication
Judd et al. Proteus: Exploiting numerical precision variability in deep neural networks

Legal Events

Date Code Title Description
C14 Granted
C17 Cessation of patent right