CN104581172A - Hardware structure for realizing SVC macroblock-level algorithm - Google Patents

Hardware structure for realizing SVC macroblock-level algorithm Download PDF

Info

Publication number
CN104581172A
CN104581172A CN201410743580.3A CN201410743580A CN104581172A CN 104581172 A CN104581172 A CN 104581172A CN 201410743580 A CN201410743580 A CN 201410743580A CN 104581172 A CN104581172 A CN 104581172A
Authority
CN
China
Prior art keywords
module
prediction
data
memory
hardware configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410743580.3A
Other languages
Chinese (zh)
Inventor
张鹏
钟俊华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongxing Technology Co Ltd
Vimicro Corp
Original Assignee
Vimicro Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vimicro Corp filed Critical Vimicro Corp
Priority to CN201410743580.3A priority Critical patent/CN104581172A/en
Publication of CN104581172A publication Critical patent/CN104581172A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the invention provides a hardware structure for realizing an SVC macroblock-level algorithm, and aims at improving the encoding efficiency. The hardware structure comprises a memory, an arbitration module, a data reading module, a prediction module and a data sending module, wherein the data reading module, the prediction module and the data sending module realize access to the memory through the arbitration module; the memory saves base layer data, and the data reading module, the prediction module and the data sending module access the memory simultaneously or non-simultaneously; the arbitration module is used for judging the reading and writing priority of the data reading module, the prediction module and the data sending module to the memory; the prediction module is used for carrying out luminance sampling interpolation operation and chroma sampling interpolation operation on the data read from the memory by the data reading module to obtain luminance and chroma prediction values; SAD of the prediction values with the luminance information of the current frame is calculated, and finally the prediction values are saved in the memory.

Description

A kind of hardware configuration realizing SVC macro-block level algorithm
Technical field
The present invention relates to SVAC standard, particularly a kind of hardware configuration realizing SVC macro-block level algorithm in SVAC standard.
Technical background
In existing coding and decoding video, SVC is many to be realized by software algorithm, is seldom realized by hardware, does not make full use of the performance of hardware, makes coding inefficiency.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of hardware configuration realizing SVC macro-block level algorithm, to improve code efficiency.
In order to achieve the above object, a kind of hardware configuration realizing SVC macro-block level algorithm that the embodiment of the present invention provides, comprising: memory, arbitration modules, read data module, prediction module, transmission data module; Wherein said read data module, prediction module, transmission data module realize the access to storing by described arbitration modules;
Base layer data preserved by described memory, described read data module, prediction module, transmission data module simultaneously or different time reference to storage;
Described arbitration modules is for judging that read data module, prediction module, transmission data module are to the read-write priority of memory;
Described prediction module, carries out luma samples interpolation arithmetic and chroma samples difference operation to the data that read data module reads from memory, obtains brightness and colorimetric prediction value; Calculate the SAD with the monochrome information of present frame; Predicted value is saved in memory the most at last.
This hardware structure implementation is based upon in SVAC standard base, by taking hardware encoding language to realize SVC macro-block level coding on FPGA, effectively can improve code efficiency.
Accompanying drawing explanation
Fig. 1 is a kind of hardware configuration schematic diagram realizing SVC macro-block level algorithm in the embodiment of the present invention.
Fig. 2 shows the priority orders that in one embodiment of the invention, arbitration modules judges.
Fig. 3 shows the structural representation of prediction module in one embodiment of the invention.
Figure 4 shows that the schematic flow sheet of luma prediction and colorimetric prediction serial process in one embodiment of the invention.
Figure 5 shows that the schematic flow sheet of luma prediction and colorimetric prediction parallel processing in one embodiment of the invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 is a kind of hardware configuration schematic diagram realizing SVC macro-block level algorithm in the embodiment of the present invention.As shown in Figure 1, this hardware configuration comprises: memory, arbitration (Arbitrate) module, read data (Read Data) module, prediction (Prediction) module, transmission data (Send Data) module.
Wherein, base layer data preserved by memory, read data module, prediction module, transmission data module need simultaneously or different time reference to storage to obtain corresponding data, or by the deposit data after process in memory.In an embodiment of the present invention, memory can take SRAM (Static Random Access Memory) to realize.
Arbitration modules is for judging that read data module, prediction module, transmission data module are to the read-write priority of memory.Fig. 2 shows the priority orders that in one embodiment of the invention, arbitration modules judges.As shown in Figure 2, concrete priority is: read data priority, higher than transmission data priority, sends the priority of data priority higher than prediction module.
Read data module, obtains base layer data (Read Data) from Busife module, is saved in memory after judging priority by arbitration modules.
Prediction module, carries out luma samples difference operation (Luma up sample) and chroma samples difference operation (Chroma up sample) to the data that read data module reads from memory, obtains brightness and colorimetric prediction value; Calculate the SAD with the monochrome information of present frame; Predicted value is saved in memory the most at last.
Fig. 3 shows the structural representation of prediction module in one embodiment of the invention.As shown in Figure 3, prediction module comprises:
Luma prediction subelement, carries out luma samples difference operation for the data read from memory read data module, obtains luma prediction value.
Colorimetric prediction subelement, carries out chroma samples difference operation for the data read from memory read data module, obtains colorimetric prediction value.
SAD subelement, for calculating the SAD of the monochrome information of luma prediction value and present frame.In an embodiment of the present invention, by the data of shared predicted value, amount of calculation can be reduced.
In prediction module, due to the connecting between macro block, decrease associated hardware resource by shared related data (comprising adjacent data, results of intermediate calculations).
In an embodiment of the present invention, before luma prediction subelement carries out up-sampling interpolation arithmetic to Primary layer brightness data, the block that can be divided into 4 4*4 carries out, make processing mode identical with colourity, thus minimizing register memory space, the effective hardware resource reduced shared by design.Now, prediction module comprises division submodule further, for brightness data being divided into the block of 4*4.
In an embodiment of the present invention, prediction module is before the data read from memory read data module carry out luma samples difference operation (Luma up sample) and chroma samples difference operation (Chroma up sample), further border extended is carried out to the data that read data module reads from memory, make to become normal data matrix model, and then luma samples difference operation and chroma samples difference operation are carried out to canonical matrix model.In this case, prediction module comprises data processing subelement further, carries out border extended for the data read from memory read data module, makes to become normal data matrix model.In an embodiment of the present invention, the partial data of canonical matrix model directly can be used in the middle of next pending model, reduce the workload again from memory of data.
In one embodiment, predicted value result can also be given Mode Decision Module carry out decision-making by sending data module by prediction module.
In an embodiment of the present invention, when the serial process of brightness in the enough SVC of timeticks, colourity, luma prediction subelement and the serial of colorimetric prediction subelement in prediction module.Figure 4 shows that the schematic flow sheet of luma prediction and colorimetric prediction serial process in one embodiment of the invention.In the diagram, after first up-sampling interpolation arithmetic being carried out to Primary layer brightness data, then up-sampling interpolation arithmetic is carried out to Primary layer chroma data.
In an embodiment of the present invention, when timeticks is nervous or need SVC to accelerate computing, the mode of luma prediction and colorimetric prediction parallel processing can be taked, i.e. luma prediction subelement and the concurrent working of colorimetric prediction subelement in prediction module.Figure 5 shows that the schematic flow sheet of luma prediction and colorimetric prediction parallel processing.In Figure 5, when memory designs for single port, consider that except sequencing, when carrying out up-sampling interpolation arithmetic, brightness and colourity can be carried out simultaneously except needing when reading and store data from memory.Take the mode of luma prediction and colorimetric prediction parallel processing greatly can shorten treatment cycle.
In an embodiment of the present invention, prediction module can support parallel or serial process scheme simultaneously.Now, prediction module comprises a control port further, with according to global design requirement, selects parallel or serial process scheme.
Send data module, the predicted value finally obtained is passed to correlation module (TFE) and carry out mode decision.
In an embodiment of the present invention, described hardware configuration is realized by FPGA (Field-Programmable Gate Array), and memory both can be the built-in memory of FPGA in this case, also can be external memory.
In an embodiment of the present invention, described hardware configuration can also be by Implementation of Embedded System, or is realized by other feasible hardware modes.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement etc., all should be included within protection scope of the present invention.

Claims (9)

1. realize a hardware configuration for SVC macro-block level algorithm, it is characterized in that, comprising: memory, arbitration modules, read data module, prediction module, transmission data module; Wherein said read data module, prediction module, transmission data module realize the access to storing by described arbitration modules;
Base layer data preserved by described memory, described read data module, prediction module, transmission data module simultaneously or different time reference to storage;
Described arbitration modules is for judging that read data module, prediction module, transmission data module are to the read-write priority of memory;
Described prediction module, carries out luma samples interpolation arithmetic and chroma samples difference operation to the data that read data module reads from memory, obtains brightness and colorimetric prediction value; Calculate the SAD with the monochrome information of present frame; Predicted value is saved in memory the most at last.
2. hardware configuration as claimed in claim 1, it is characterized in that, described prediction module comprises:
Luma prediction subelement, carries out luma samples difference operation for the data read from memory read data module, obtains luma prediction value;
Colorimetric prediction subelement, carries out chroma samples difference operation for the data read from memory read data module, obtains colorimetric prediction value;
SAD subelement, for calculating the SAD of the monochrome information of luma prediction value and present frame.
3. hardware configuration as claimed in claim 2, it is characterized in that, described prediction module comprises further:
Divide submodule, for brightness data being divided into the block of 4*4.
4. hardware configuration as claimed in claim 2 or claim 3, it is characterized in that, described prediction module comprises further:
Data processing subelement, carries out border extended for the data read from memory read data module, makes to become normal data matrix model.
5. hardware configuration as claimed in claim 2 or claim 3, is characterized in that, described luma prediction subelement and described colorimetric prediction subelement walk abreast or work in series.
6. as claimed in claim 2 or claim 3 hardware configuration, it is characterized in that, described prediction module comprises a control port further, walks abreast or work in series with described colorimetric prediction subelement to control described luma prediction subelement.
7. the hardware configuration as described in claim 1,2 or 3, is characterized in that, the priority orders that described arbitration modules judges higher than transmission data priority, sends the priority of data priority higher than prediction module as: read data priority.
8. the hardware configuration as described in claim 1,2 or 3, is characterized in that, described in be stored as static random access memories.
9. the hardware configuration as described in claim 1,2 or 3, is characterized in that, described hardware configuration is realized by FPGA.
CN201410743580.3A 2014-12-08 2014-12-08 Hardware structure for realizing SVC macroblock-level algorithm Pending CN104581172A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410743580.3A CN104581172A (en) 2014-12-08 2014-12-08 Hardware structure for realizing SVC macroblock-level algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410743580.3A CN104581172A (en) 2014-12-08 2014-12-08 Hardware structure for realizing SVC macroblock-level algorithm

Publications (1)

Publication Number Publication Date
CN104581172A true CN104581172A (en) 2015-04-29

Family

ID=53096231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410743580.3A Pending CN104581172A (en) 2014-12-08 2014-12-08 Hardware structure for realizing SVC macroblock-level algorithm

Country Status (1)

Country Link
CN (1) CN104581172A (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1090404A (en) * 1992-11-24 1994-08-03 布尔有限公司 Be distributed in the system of the unit in the network
CN1387644A (en) * 1999-08-31 2002-12-25 英特尔公司 SDRAM controller for parallel processor architecture
CN1387641A (en) * 1999-08-31 2002-12-25 英特尔公司 Execution of multiple threads in parallel processor
CN1426560A (en) * 2000-12-28 2003-06-25 皇家菲利浦电子有限公司 System integrating agents having different resource-accessing schemes
CN1558325A (en) * 2004-02-03 2004-12-29 智慧第一公司 Device and method for invalidating redundant items in branch target address cache
CN1808412A (en) * 2005-01-17 2006-07-26 松下电器产业株式会社 Bus arbitration method and semiconductor apparatus
CN101169771A (en) * 2007-11-30 2008-04-30 华为技术有限公司 Multiple passage internal bus external interface device and its data transmission method
CN101443734A (en) * 2006-05-17 2009-05-27 Nxp股份有限公司 Multi-processing system and a method of executing a plurality of data processing tasks
CN101562741A (en) * 2009-05-11 2009-10-21 华为技术有限公司 Multi-layer coding rate control method and device
CN101647002A (en) * 2007-03-28 2010-02-10 Nxp股份有限公司 Multiprocessing system and method
CN102203752A (en) * 2008-07-29 2011-09-28 Vl有限公司 Data processing circuit with arbitration between a plurality of queues
CN102414671A (en) * 2009-04-29 2012-04-11 超威半导体公司 Hierarchical memory arbitration technique for disparate sources
CN102948149A (en) * 2010-04-16 2013-02-27 Sk电信有限公司 Video encoding/decoding apparatus and method
CN103077141A (en) * 2012-12-26 2013-05-01 西安交通大学 AMBA (Advanced Microcontroller Bus Architecture) bus based self-adaption real-time weighting prior arbitration method and arbitrator
CN103260007A (en) * 2012-02-21 2013-08-21 清华大学 Intelligent monitoring system based on on-chip multi-port memory controller
US20140146883A1 (en) * 2012-11-29 2014-05-29 Ati Technologies Ulc Bandwidth saving architecture for scalable video coding spatial mode
CN103916673A (en) * 2013-01-06 2014-07-09 华为技术有限公司 Coding method and decoding method and device based on bidirectional forecast

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1090404A (en) * 1992-11-24 1994-08-03 布尔有限公司 Be distributed in the system of the unit in the network
CN1387644A (en) * 1999-08-31 2002-12-25 英特尔公司 SDRAM controller for parallel processor architecture
CN1387641A (en) * 1999-08-31 2002-12-25 英特尔公司 Execution of multiple threads in parallel processor
CN1426560A (en) * 2000-12-28 2003-06-25 皇家菲利浦电子有限公司 System integrating agents having different resource-accessing schemes
CN1558325A (en) * 2004-02-03 2004-12-29 智慧第一公司 Device and method for invalidating redundant items in branch target address cache
CN1808412A (en) * 2005-01-17 2006-07-26 松下电器产业株式会社 Bus arbitration method and semiconductor apparatus
CN101443734A (en) * 2006-05-17 2009-05-27 Nxp股份有限公司 Multi-processing system and a method of executing a plurality of data processing tasks
CN101647002A (en) * 2007-03-28 2010-02-10 Nxp股份有限公司 Multiprocessing system and method
CN101169771A (en) * 2007-11-30 2008-04-30 华为技术有限公司 Multiple passage internal bus external interface device and its data transmission method
CN102203752A (en) * 2008-07-29 2011-09-28 Vl有限公司 Data processing circuit with arbitration between a plurality of queues
CN102414671A (en) * 2009-04-29 2012-04-11 超威半导体公司 Hierarchical memory arbitration technique for disparate sources
CN101562741A (en) * 2009-05-11 2009-10-21 华为技术有限公司 Multi-layer coding rate control method and device
CN102948149A (en) * 2010-04-16 2013-02-27 Sk电信有限公司 Video encoding/decoding apparatus and method
CN103260007A (en) * 2012-02-21 2013-08-21 清华大学 Intelligent monitoring system based on on-chip multi-port memory controller
US20140146883A1 (en) * 2012-11-29 2014-05-29 Ati Technologies Ulc Bandwidth saving architecture for scalable video coding spatial mode
CN103077141A (en) * 2012-12-26 2013-05-01 西安交通大学 AMBA (Advanced Microcontroller Bus Architecture) bus based self-adaption real-time weighting prior arbitration method and arbitrator
CN103916673A (en) * 2013-01-06 2014-07-09 华为技术有限公司 Coding method and decoding method and device based on bidirectional forecast

Similar Documents

Publication Publication Date Title
Veredas et al. Custom implementation of the coarse-grained reconfigurable ADRES architecture for multimedia purposes
CN105684036B (en) Parallel hardware block processing assembly line and software block handle assembly line
KR101522985B1 (en) Apparatus and Method for Image Processing
JP2010527194A (en) Dynamic motion vector analysis method
JP2015534169A (en) Method and system for multimedia data processing
CN101729893B (en) MPEG multi-format compatible decoding method based on software and hardware coprocessing and device thereof
Huang et al. Memory-hierarchical and mode-adaptive HEVC intra prediction architecture for quad full HD video decoding
CN102148990B (en) Device and method for predicting motion vector
CN111985456A (en) Video real-time identification, segmentation and detection architecture
CN112534819A (en) Method and apparatus for predicting video image component, and computer storage medium
Roh et al. Prediction complexity-based HEVC parallel processing for asymmetric multicores
US10863200B2 (en) Techniques for performing a forward transformation by a video encoder using a forward transform matrix
Sanny et al. Energy-efficient median filter on FPGA
Azgin et al. A computation and energy reduction technique for HEVC intra prediction
Wang et al. VLSI implementation of HEVC motion compensation with distance biased direct cache mapping for 8K UHDTV applications
JP6412589B2 (en) Apparatus, computer program, and computer-implemented method
JP2022520922A (en) Chroma Intra prediction methods and devices, as well as computer storage media
Grellert et al. A multilevel data reuse scheme for Motion Estimation and its VLSI design
Kim et al. MESIP: A configurable and data reusable motion estimation specific instruction-set processor
CN104581172A (en) Hardware structure for realizing SVC macroblock-level algorithm
CN112911285B (en) Hardware encoder intra mode decision circuit, method, apparatus, device and medium
Sampaio et al. Hybrid scratchpad video memory architecture for energy-efficient parallel hevc
Song et al. Hybrid scratchpad and cache memory management for energy-efficient parallel HEVC encoding
US20100074336A1 (en) Fractional motion estimation engine
KR20160011782A (en) Video encoding circuit and video encoding method therewith

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20171208

Address after: 519000 Guangdong city of Zhuhai province Hengqin Baohua Road No. 6, room 105, -23898 (central office)

Applicant after: Zhongxing Technology Co., Ltd.

Applicant after: Vimicro Electronics Co., Ltd.

Address before: 100083 Haidian District, Xueyuan Road, No. 35, the world building, the second floor of the building on the ground floor, No. 16

Applicant before: Beijing Vimicro Corporation

Applicant before: Vimicro Electronics Co., Ltd.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 519031 Guangdong city of Zhuhai province Hengqin Baohua Road No. 6, room 105, -23898 (central office)

Applicant after: Mid Star Technology Limited by Share Ltd

Applicant after: Vimicro Electronics Co., Ltd.

Address before: 519000 Guangdong city of Zhuhai province Hengqin Baohua Road No. 6, room 105, -23898 (central office)

Applicant before: Zhongxing Technology Co., Ltd.

Applicant before: Vimicro Electronics Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150429