CN112422984A - Code stream preprocessing device, system and method of multi-core decoding system - Google Patents

Code stream preprocessing device, system and method of multi-core decoding system Download PDF

Info

Publication number
CN112422984A
CN112422984A CN202011154927.2A CN202011154927A CN112422984A CN 112422984 A CN112422984 A CN 112422984A CN 202011154927 A CN202011154927 A CN 202011154927A CN 112422984 A CN112422984 A CN 112422984A
Authority
CN
China
Prior art keywords
code stream
decoding
code
macro block
competition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011154927.2A
Other languages
Chinese (zh)
Other versions
CN112422984B (en
Inventor
雷理
张云
韦虎
蔡浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mouxin Technology Shanghai Co ltd
Original Assignee
Mouxin Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mouxin Technology Shanghai Co ltd filed Critical Mouxin Technology Shanghai Co ltd
Priority to CN202011154927.2A priority Critical patent/CN112422984B/en
Publication of CN112422984A publication Critical patent/CN112422984A/en
Application granted granted Critical
Publication of CN112422984B publication Critical patent/CN112422984B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements
    • H04N19/426Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements using memory downsizing methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Abstract

The invention discloses a code stream preprocessing device, system and method of a multi-core decoding system, and relates to the technical field of video decoding. The code stream preprocessing device comprises a competition code removing module, a code stream caching module and an entropy decoding module; the anti-competition code removing module is used for acquiring original code stream data in the memory and performing anti-competition code removing processing to acquire code stream payload data; the code stream caching module is used for storing code stream payload data and anti-competition code information corresponding to the code stream payload data; the entropy decoding module is used for reading code stream payload data in the code stream caching module to perform entropy decoding; in the entropy decoding process, the total bit number of the code stream including the anti-competition codes is calculated so that a decoding single core corresponding to a macro block line in the multi-core decoding system can locate the position of the code stream start field of the macro block line. The invention realizes the sectional positioning of the code stream according to the macro block line based on hardware, so that the multi-core decoding system can directly expand and decode for the original data of the code stream, the resource occupation is less, and the memory requirement is saved.

Description

Code stream preprocessing device, system and method of multi-core decoding system
Technical Field
The invention relates to the technical field of video decoding, in particular to a code stream preprocessing device, system and method of a multi-core decoding system.
Background
The development of the ultra-high definition video industry has great significance for driving intelligent transformation of industries such as 5G, security and the like which take videos as important carriers and improving the overall strength of information industry and cultural industry in China. The ultra-high definition video technology naturally puts higher requirements on video coding and decoding processing. For High definition Video, the current mainstream Video compression standard includes Advanced Video Coding (AVC) and High Efficiency Video Coding (HEVC), etc., and software decoding brings about large overhead of CPU, power consumption, etc., so the industry usually adopts a special hardware accelerator as a Video Decoder (VDEC (full Video Decoder) to perform Video decoding, taking a common single-core hardware Decoder as an example, the single-core hardware Decoder mostly adopts a pipeline design, and macroblocks (MBs in AVC, namely Macro Blocks, and CTBs in HEVC, namely Coding Tree Blocks) are taken as a pipeline unit, for ultra-High definition Video, the Decoder needs to realize 8K @30fps/4K @120fps performance, and the single-core hardware Decoder has difficulty in meeting performance requirements within 600MHz clock running frequency, therefore, a multi-core design scheme of the Video Decoder is proposed in the prior art to realize High-Efficiency parallel decoding, the good scalability of multi-core parallelism can meet higher performance requirements.
Currently, in AVC/HEVC, when single-thread decoding is employed, the processing order of macroblocks within a frame (MB in AVC, CTB in HEVC) is generally from left to right, top to bottom. A macroblock usually depends on its left and upper neighboring macroblock information during entropy decoding, intra prediction, inter prediction, deblocking filtering, etc. To achieve macroblock parallel processing within a frame, data independent macroblocks must be found. A 2D) Wave row-wise parallel decoding algorithm is provided in the prior art, which is similar to the HEVC Wave front concept. Specifically, the 2D) Wave row-wise parallel decoding algorithm may position the nearest macroblock that is independent from one macroblock at a relative position of "right 2, top 1" according to a preset protocol, and the dependency relationship of a typical macroblock is shown in fig. 1, where a diagonal macroblock (x, y) and a diagonal macroblock (x +2, y)1) are independent macroblocks, and the dependent macroblock of the macroblock (x, y) is a dotted macroblock (x, y)1) and a dotted macroblock (x +1, y) 1). Therefore, as long as the synchronization limit is added among multiple threads, the uplink decoding thread can be decoded at the same time by ensuring that the uplink decoding thread leads the downlink thread by at least two macro blocks. Referring to fig. 2, the decoded macroblock 10 is complete, the decoded macroblock 20 is shown in the lattice, and the un-decoded macroblock 30 is shown in the blank. When the video is decoded, each thread is responsible for decoding a row of macro blocks, and front and back wave type rightward progressive decoding is kept between the upper row and the lower row, so that the parallel processing of the macro blocks is realized. In a 4K/8K ultra-high definition application scene, the parallelism can be greatly improved through a 2D) Wave algorithm, and the high-definition video decoding processing is accelerated.
The 2D) Wave needs to predict the initial position of the code stream field corresponding to each line when performing parallel processing according to the line parallel decoding algorithm. However, in AVC/HEVC coding, an intra macroblock line is entropy-coded and compressed in an order from top to bottom, and a finally obtained compressed stream is stored in a memory in order of bytes, and when the code stream length of each macroblock line is different, initial code stream data corresponding to a first macroblock of a next macroblock line needs to wait for decoding of a last macroblock of the previous macroblock line to perform positioning. Accordingly, in the prior art, a pre-processor is usually provided to perform a code stream pre-decoding process, so as to implement a 2D) Wave line-by-line parallel decoding algorithm. An existing preprocessor generally includes an Entropy decoding (Entropy Decoder) module and a code stream Buffer (BitStream Buffer) module, and as shown in fig. 3, the preprocessor can record bit information of a code stream that is completely parsed by a current macroblock line after Entropy decoding of end-of-line macroblocks of each macroblock line is completed, and mark and position an offset address of a start field of the code stream of a next macroblock line through the recorded information.
On the one hand, in the AVC/HEVC Video Coding standard, the whole system framework is divided into two layers, including a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). Among other things, the former is responsible for efficiently representing the content of video data, and the latter is responsible for formatting the data and providing header information to ensure that the data is suitable for transmission over various channels and storage media. In order to separate two adjacent NAL units to form a boundary (to facilitate a decoder to distinguish a Start position of each NALU from video stream data), a Start Code (Start Code) is additionally inserted between the two adjacent NAL units to identify a Start of a new NAL unit, and the Start Code (Start Code) is usually set to a fixed sequence such as "0 x00_00_00_ 01" or "0 x00_00_ 01". Meanwhile, in order to distinguish the Start Code (Start Code) from the "0 x00_00_ 01" elementary stream data that may be present in the NAL unit, the encoder inserts a contention-resistant Code, such as "0 x 03", before the last byte 0x01 of the elementary stream data to distinguish the Start Code, which is usually a 0xXX sequence. Referring to fig. 4, the left side is the essential stream data in the NAL unit, and the right side is the encapsulated stream data after inserting the anti-contention code "0 x 03". Then, when the decoder detects a sequence of "0 x00_00_ 03" inside the NAL unit, the decoder discards 0x03 and restores the data to the aforementioned substantial code stream data (or code stream payload data). However, since the entropy decoding parses the code stream by bits of indefinite length instead of bytes each time, the original byte arrangement in the code stream data is destroyed, so that the entropy decoding itself cannot detect the anti-competition code byte sequence (i.e., 0x00_00_03) in the code stream data, i.e., cannot remove the invalid anti-competition code data.
On the other hand, although a technical solution is also proposed in the prior art in which original code stream data (code stream with anti-contention code inserted) and substantial code stream data (code stream with anti-contention code discarded) are stored independently for use, referring to fig. 5, the original code stream data (BitStream Source) is stored in the memory space S, and after the anti-contention code discarding operation is performed on the original code stream data in the memory space S, the obtained substantial code stream data (code stream or Payload data) is written into the memory space D, so that an analysis object of entropy decoding and subsequent parallel decoding directly faces the substantial code stream data in the memory space D. However, the above-mentioned technical solution brings extra memory read-write bandwidth and consumes extra memory space.
In summary, how to provide a code stream preprocessing scheme of a multi-core decoding system, which is suitable for a parallel decoding algorithm that needs to locate a start position of a code stream field and has less resource occupation and less memory requirement, based on the prior art is a technical problem that needs to be solved urgently at present.
Disclosure of Invention
The invention aims to: the defects of the prior art are overcome, and a code stream preprocessing device, a code stream preprocessing system and a code stream preprocessing method of a multi-core decoding system are provided. The invention realizes the sectional positioning of the code stream according to the macro block lines based on hardware, can accurately acquire the total bit number of the code stream including the anti-competition codes consumed by each macro block line, enables the multi-core decoding system to be directly expanded and decoded facing the original data of the code stream, occupies less resources and saves the memory requirement. The invention is suitable for various parallel decoding algorithms which need to predict the initial field position of the code stream so as to carry out parallel processing, such as 2D) Wave line-by-line parallel decoding algorithms.
In order to achieve the above object, the present invention provides the following technical solutions:
a code stream preprocessing device of a multi-core decoding system comprises a competition code removing module, a code stream caching module and an entropy decoding module;
the anti-competition code removing module is connected with the memory and used for acquiring original code stream data in the memory and performing anti-competition code removing processing on the original code stream data to acquire code stream payload data;
the code stream cache module is connected with the competition code removing module and used for storing the code stream payload data and the corresponding competition code preventing information;
the entropy decoding module is connected with the code stream caching module and is used for reading code stream payload data in the code stream caching module to perform entropy decoding; in the entropy decoding process, when the end-row macro block of a macro block row completes the entropy decoding, the total bit number of the code stream including the anti-competition code, which is consumed accumulatively by the macro block row, is obtained, and the total bit number of the code stream is used for positioning the start field position of the code stream of the macro block row by a decoding single core corresponding to the macro block row.
Further, the de-contention code module is configured to:
reading original code stream data from a memory, and carrying out anti-competition code detection on the original code stream data; when a byte sequence with a preset format and related to the anti-competition codes is detected, the anti-competition codes in the byte sequence are removed to obtain code stream payload data, and the code stream payload data and the removed anti-competition code information are transmitted to a code stream cache module.
Further, the anti-competition code information is anti-competition code byte number, and the code stream cache module reads and writes code stream payload data and anti-competition code byte number by adopting a first-in first-out register.
Further, the entropy decoding module is configured to: reading code stream payload data for entropy decoding, and reading next section of code stream payload data in the code stream cache module in sequence after the current code stream payload data is analyzed; and in the entropy decoding process of any macroblock row, acquiring the bit number BitsNum of code stream payload data cumulatively consumed by analyzing to the macroblock rowbpAnd accumulating the byte number BytesNum of the rejected anti-competition codes03(ii) a And when the entropy decoding of the macroblock at the end of the row of the macroblock row is finished, calculating the total bit number of the code stream based on the original code stream data, which is analyzed to be consumed by the macroblock row in an accumulated way, by the following formula: the total bit number of the code stream BitsOffset is BitsNumbp+(BytesNum03X 8) and writing the total bit number BitsOffset of the code stream into a Line Queue.
And the decoding single core corresponding to the macro block Line in the multi-core decoding system locates the start field position of the code stream of the macro block Line by reading the total bit number information of the code stream in the Line Queue.
Further, the anti-contention code corresponds to a byte sequence of 0x03, and correspondingly, the preset format of the byte sequence associated with the anti-contention code is 0x00_00_ 03.
The invention also provides a general multi-core parallel decoder system, which comprises a decoding firmware and a multi-core hardware decoding accelerator which are in communication connection;
the decoding firmware is used for analyzing non-entropy coding data on the upper layer of the video code stream; the multi-core hardware decoding accelerator is used for processing decoding tasks of a macro block layer in a video code stream; slice level data in the video code stream is used as an interactive unit between the decoding firmware and the multi-core hardware decoding accelerator, and Slice parallel processing is carried out through Slice Queue;
the multi-core hardware decoding accelerator comprises a preprocessor module and at least two isomorphic full-function hardware decoders;
the preprocessor module is the code stream preprocessing device and is used for an entropy decoding task of a video code stream;
each full-function hardware decoder is responsible for decoding a row of macroblock rows, including steps of inverse DCT transformation, inverse quantization, intra-frame inter-frame prediction and pixel reconstruction, and enabling the decoded macroblock in two adjacent upper and lower rows to be separated by at least two macroblocks.
Further, the multi-core hardware decoding accelerator comprises a first full-function hardware decoder and a second full-function hardware decoder, wherein the first full-function hardware decoder is used for processing the decoding of the 2N macroblock line, the second full-function hardware decoder is used for processing the decoding of the 2N +1 macroblock line, and N is an integer more than 0; two full-function hardware decoders share a group of line caches to form dual-core sharing, and dependency relationship information between macro blocks is stored in the line caches;
the full-function hardware decoder is configured to: checking whether each Line Queue has an instruction or not, and starting decoding of a macro block Line according to the content of the instruction when the Line Queue contains the instruction; and monitoring the working position of the full-function hardware decoding corresponding to the last macro block line in the decoding process, and enabling the processing position of the self to be at least two macro blocks later than the processing position of the last macro block line.
The invention also provides a 2D) Wave line-by-line parallel decoding system which is used for decoding video code streams in parallel and comprises multi-thread decoding formed by a plurality of decoding single cores, wherein each decoding single core is responsible for decoding a row of macro blocks, front and back Wave type right progressive decoding is kept between upper and lower macro block lines, and the system also comprises the code stream preprocessing device.
Further, the video is an AVC video or an HEVC video.
The invention also provides a code stream preprocessing method of the multi-core decoding system, which comprises the following steps:
acquiring original code stream data in a memory, and performing anti-competition code removal processing on the original code stream data to obtain code stream payload data;
writing the code stream payload data and the corresponding anti-competition code information into a cache;
reading the cached information through an entropy decoding module to perform entropy decoding; in the entropy decoding process, when the end-row macro block of a macro block row completes the entropy decoding, the total bit number of the code stream including the anti-competition code, which is consumed accumulatively by the macro block row, is obtained, and the total bit number of the code stream is used for positioning the start field position of the code stream of the macro block row by a decoding single core corresponding to the macro block row.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects as examples: the code stream preprocessing device of the multi-core decoding system realizes the sectional positioning of the code stream according to the macro block lines based on hardware, and can accurately acquire the total bit number of the code stream including the anti-competition codes consumed by each macro block line, so that the whole multi-core decoding system can be directly expanded and decoded facing the original data of the code stream, the resource occupation is less, and the memory requirement is saved.
On one hand, the code stream preprocessing device is provided with a hardware competition code removing module, so that the entropy decoding module can accurately count the total bit number of the code stream including competition code (0x03) consumed by each macro block line, and the whole multi-core decoding system directly expands and decodes the original data of the code stream in the NAL unit without consuming extra memory space. On the other hand, the code stream preprocessing device of the multi-core decoding system provided by the invention realizes the sectional positioning of the code stream according to the macro block lines, can be used for preprocessing the code stream in a 2D-Wave parallel decoding algorithm according to the lines, and is suitable for the existing mainstream video compression standards AVC and HEVC videos.
Drawings
Fig. 1 is a schematic diagram illustrating dependency relationships of macro blocks in the prior art.
Fig. 2 is a schematic diagram illustrating a parallel decoding process performed by multiple threads in the prior art.
Fig. 3 is a schematic block diagram of a preprocessor in the prior art.
Fig. 4 is a schematic diagram illustrating the operation of performing anti-contention code processing on a code stream of a NAL unit in the prior art.
Fig. 5 is a schematic diagram illustrating the operation of code stream processing by software in the prior art.
Fig. 6 is a schematic block diagram of a preprocessor according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of data reading and writing of the contention code elimination module according to an embodiment of the present invention.
Fig. 8 is a timing diagram corresponding to fig. 7.
Description of reference numerals:
decoded macroblock 10, decoded macroblock 20, not decoded macroblock 30.
Detailed Description
The following describes the code stream preprocessing apparatus, system and method of the multi-core decoding system in further detail with reference to the accompanying drawings and specific embodiments. It should be noted that technical features or combinations of technical features described in the following embodiments should not be considered as being isolated, and they may be combined with each other to achieve better technical effects. In the drawings of the embodiments described below, the same reference numerals appearing in the respective drawings denote the same features or components, and may be applied to different embodiments. Thus, once an item is defined in one drawing, it need not be further discussed in subsequent drawings.
It should be noted that the structures, proportions, sizes, and other dimensions shown in the drawings and described in the specification are only for the purpose of understanding and reading the present disclosure, and are not intended to limit the scope of the invention, which is defined by the claims, and any modifications of the structures, changes in the proportions and adjustments of the sizes and other dimensions, should be construed as falling within the scope of the invention unless the function and objectives of the invention are affected. The scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that described or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
Examples
Referring to fig. 6, a code stream preprocessing apparatus of a multi-core decoding system is provided for an embodiment of the present invention.
The code stream preprocessing device (Pre) parser or preprocessor) comprises a de-scrambling code Block (de scrambling Block), a code stream cache (BitStream Buffer) module and an Entropy decoding (Encopy Decoder) module.
The anti-competition code removing module is connected with the memory and used for obtaining original code stream data in the memory and carrying out anti-competition code removing processing on the original code stream data to obtain code stream payload data.
And the code stream cache module is connected with the competition code removing module and is used for storing the code stream payload data and the competition code preventing information corresponding to the code stream payload data.
The entropy decoding module is connected with the code stream caching module and used for reading code stream payload data in the code stream caching module to perform entropy decoding. In the entropy decoding process, when the end-row macro block of a macro block row completes the entropy decoding, the total bit number of the code stream including the anti-competition code, which is consumed accumulatively by the macro block row, is obtained, and the total bit number of the code stream is used for positioning the start field position of the code stream of the macro block row by a decoding single core corresponding to the macro block row.
Specifically, the contention code elimination Block (denuding Block) is configured to: reading original code stream (BitStream source) data (the single-segment BitStream data can be 64bit data) from a memory (DDR), and performing anti-competition code detection on the original code stream data; and when a byte sequence with a preset format and related to the anti-competition code is detected, eliminating the anti-competition code in the byte sequence to obtain code stream payload data (64bit data), and transmitting the code stream payload data and the eliminated anti-competition code information to a code stream cache module.
By way of example and not limitation, the anti-contention code may correspond to a byte sequence of 0x03, and the predetermined format of the byte sequence associated with the anti-contention code is 0x00_00_ 03. At this time, the original code stream data is read, and the 0x03 in the byte sequence of 0x00_00_03 is removed when the byte sequence is detected. Then, the code stream payload data from which the "0 x 03" is removed and the number of bytes of the "0 x 03" removed from the code stream payload data are written into a code stream cache (BitStream Buffer) module together, as shown in fig. 7. The right side of fig. 7 is the input original code stream data, including data "0 x11_22_00_00_03_33_44_55_66_77_88_99_99_ 99" with the anti-contention code 0x03 byte sequence. The left side of fig. 7 shows the output information processed by the anti-contention code removing module, which includes the code stream Payload data (bitstreampayload) "0 x11_22_00_00_33_44_55_ 66" and the number of bytes of the removed anti-contention code "0 x 1". The corresponding timing diagram is shown with reference to fig. 8.
The code stream cache (BitStream Buffer) module is configured to cache output information of the anti-contention code removal module, where the output information includes code stream Payload data (BitStream Payload) and the number of bytes of 0x03 of the anti-contention code. In this embodiment, the code stream cache module is implemented by using a first-in first-out register (FIFO).
The Entropy decoding (Entropy Decoder) module is configured to:
and reading code stream payload data for entropy decoding, and reading the next section of code stream payload data in the code stream cache module in sequence after the current code stream payload data is analyzed. After the current code stream payload data is analyzed and read, the next section of data in the code stream cache module is read for entropy decoding.
And in the entropy decoding process of any macroblock row, acquiring the bit number BitsNum of code stream payload data cumulatively consumed by analyzing to the macroblock rowbpAnd accumulating the byte number BytesNum of the rejected anti-competition codes03. I.e. in the macroblock rowContinuously counting bit number BitsNum of code stream payload data consumed by updating in the process of entropy decodingbpAnd the number BytesNum of bytes of the consumed code stream payload data which are correspondingly removed from the byte number BytesNum of 0x0303
And when the entropy decoding of the macroblock at the end of the row of the macroblock row is finished, calculating the total bit number of the code stream based on the original code stream data, which is analyzed to be consumed by the macroblock row in an accumulated way, by the following formula: the total bit number of the code stream BitsOffset is BitsNumbp+(BytesNum03X 8), namely the total bit number of the code stream is the sum of the bit number of the code stream payload data consumed accumulatively and the bit number corresponding to the accumulated number of bytes of the rejected competition-preventing codes; then writing the total bit number BitsOffset of the code stream into a Line cache Queue (Line Queue); the decoding mononuclear corresponding to the macroblock Line can locate the start field position of the code stream of the macroblock Line by reading the total bit number BitsOffset of the code stream in the Line cache Queue (Line Queue).
The code stream preprocessing device provided by the invention can be used for code stream preprocessing in a 2D) Wave line-by-line parallel decoding system. The 2D) Wave line-by-line parallel decoding system is used for performing parallel decoding on a video code stream, and comprises multi-thread decoding formed by a plurality of decoding single cores, each decoding single core is responsible for decoding a row of macro blocks, front and back Wave-type rightward progressive decoding is kept between upper and lower macro block lines, code stream preprocessing is performed through the code stream preprocessing device, and the code stream is positioned in a segmented mode according to the macro block lines.
Specifically, after entropy decoding of the last macroblock of each row is completed, the total bit number of the bit stream including the anti-competition code (0x03) consumed by each macroblock row can be accurately counted and recorded through the bit stream preprocessing device, the information is written into the corresponding row cache queue and is interacted with each macroblock row decoding core in the 2D) Wave parallel decoding system, and the information can be used for each macroblock row decoding core to position the start field position of the bit stream corresponding to the macroblock row. Therefore, the whole multi-core decoding system directly expands and decodes the original data of the code stream in the NAL unit, saves the extra bandwidth and memory requirements of software processing compared with the prior art, and is suitable for the existing mainstream video compression standards AVC and HEVC.
In another embodiment of the present invention, a general multi-core parallel decoder system is also provided.
The system includes communicatively coupled decode firmware and a multi-core hardware decode accelerator.
The decoding firmware is used for analyzing non-entropy coding data on the upper layer of the video code stream.
The multi-core hardware decoding accelerator is used for processing decoding tasks of a macro block layer in a video code stream. In this embodiment, the multi-core hardware decoding accelerator includes a preprocessor module and at least two homogeneous full-function hardware decoders. The full-function hardware decoder is at least capable of processing the steps of inverse DCT transformation, inverse quantization, intra-frame inter-frame prediction and pixel reconstruction which are necessary for decoding a macro block line.
The preprocessor module is used for entropy decoding tasks of video code streams and adopts the code stream preprocessing device in the previous embodiment. The code stream preprocessing device comprises a competition code removing module, a code stream caching module and an entropy decoding module. The anti-competition code removing module is connected with the memory and used for obtaining original code stream data in the memory and carrying out anti-competition code removing processing on the original code stream data to obtain code stream payload data. And the code stream cache module is connected with the competition code removing module and is used for storing the code stream payload data and the competition code preventing information corresponding to the code stream payload data. The entropy decoding module is connected with the code stream caching module and used for reading code stream payload data in the code stream caching module to perform entropy decoding. In the entropy decoding process, when the end-row macro block of a macro block row completes the entropy decoding, the total bit number of the code stream including the anti-competition code, which is consumed accumulatively by the macro block row, is obtained, and the total bit number of the code stream is used for positioning the start field position of the code stream of the macro block row by a decoding single core corresponding to the macro block row.
Each full-function hardware decoder is responsible for decoding a row of macroblock rows, including steps of inverse DCT (discrete cosine transform), inverse quantization, intra-frame inter-frame prediction and pixel reconstruction, and enabling the decoded macroblock in two adjacent upper and lower rows to be separated by at least two macroblocks so as to realize multi-core synchronous decoding.
Slice level data in the video code stream is used as an interactive unit between the decoding firmware and the multi-core hardware decoding accelerator, and Slice parallel processing is carried out through Slice Queue.
The code stream of the AVC Video and the HEVC Video adopts a layered structure, most grammars shared in a GOP layer and a Slice layer are liberated, a Video Parameter Set VPS (Video Parameter Set), a Sequence Parameter Set SPS (Sequence Parameter Set) and a Picture Parameter Set PPS (Picture Parameter Set) are formed, and the method is very suitable for software analysis due to the fact that the data of the part is small in occupied ratio and simple in analysis. According to the characteristics of the code stream data, the decoder system provided by this embodiment divides the video decoder VDEC into two parts, namely, a decoding firmware VDEC _ FW and a multi-core hardware decoding accelerator VDEC _ MCORE, where the decoding firmware as a software part can be used to analyze non-entropy-coded data on an upper layer of the video code stream, and the multi-core hardware decoding accelerator as a hardware part can be used to collectively process all decoding operations of a macro block layer in the video code stream.
Preferably, non-entropy coding data such as a video parameter set VPS, a sequence parameter set SPS, a picture parameter set PPS, Slice header information (i.e., Slice header) and the like at an upper layer of the code stream are parsed by decoding firmware VDEC _ FW; the multi-core hardware decoding accelerator is used for processing all decoding works of a macro block layer in a video code stream in a centralized mode, and can comprise the steps of code stream reading, entropy decoding, inverse DCT (discrete cosine transformation), inverse quantization, intra-frame and inter-frame prediction, pixel reconstruction, deblocking filtering and the like of all hardware.
In the embodiment, the software/hardware takes the Slice level in the video code stream as an interactive unit, and performs data interaction through Slice Queue (i.e. a stripe Queue) inside the video decoder. The interaction flow of the decoding firmware VDEC _ FW and the multi-core hardware decoding accelerator VDEC _ MCORE may be as follows:
1) after the decoding firmware VDEC _ FW finishes the upper layer analysis task of the code stream, the Slice upper layer parameter information is packed and pressed into a Slice Queue, namely the information is put (push) into the Slice Queue for queuing.
At this time, the decoding firmware is configured to: and after the upper layer of the video code stream is analyzed, the Slice upper layer parameter information is packed and pressed into Slice Queue.
2) The multi-core hardware decoding accelerator VDEC _ MCORE inquires ready information (ready state information) of Slice Queue data, reads Queue information and completes configuration, the full hardware analyzes a macro block in the current Slice until the end, sends an interrupt signal when the end, and releases the Slice Queue, namely releases (pop) corresponding information in the Queue of the Slice Queue.
At this point, the multi-core hardware decode accelerator is configured to: and inquiring ready information of Slice Queue data, after reading the Queue and completing configuration, analyzing the current macroblock in the Slice until the macroblock in the Slice is analyzed, sending an interrupt signal after the analysis is finished, and releasing the Slice Queue.
The homogeneous full-function hardware decoder can be set to be more than two (including two), the hardware decoder is called a dual-core hardware decoding accelerator when being set to be two, the hardware decoder is called a tri-core hardware decoding accelerator when being set to be three, the hardware decoder is called a quad-core hardware decoding accelerator when being set to be four, and the like. Each full-function hardware decoder is responsible for decoding one row of macro block rows, the dual-core hardware decoding accelerator can simultaneously perform parallel decoding work on two rows of macro block rows, the three-core hardware decoding accelerator can simultaneously perform parallel decoding work on three rows of macro block rows, and the like.
Taking a dual-core hardware decoding accelerator as an example, the multi-core hardware decoding accelerator includes a first full-function hardware decoder and a second full-function hardware decoder, the first full-function hardware decoder is configured to process decoding of a 2N macroblock row, the second full-function hardware decoder is configured to process decoding of a 2N +1 macroblock row, where N is an integer greater than 0; two full-function hardware decoders share a group of line caches to form dual-core sharing, and dependency relationship information among macro blocks is stored in the line caches.
In performing the decode task, each full function hardware decoder may check whether there is an instruction in its respective Line Queue, and then initiate the decoding of a macroblock Line based on the contents of the instruction. Meanwhile, each full-function hardware decoder can monitor the working position of the full-function hardware decoder corresponding to the previous macro block line in the decoding process, and at least two macro blocks are required to be behind the processing position of the previous line.
Two full-function hardware decoders may share a set of line buffers to form a dual core share. All dependency information among the macro blocks is stored in the line buffer. The two full-function hardware decoders may commonly maintain the line buffer, which may include, for example, read and update operations of the line buffer.
At this point, the full function hardware decoder is configured to: checking whether each Line Queue has an instruction or not, and starting decoding of a macro block Line according to the content of the instruction when the Line Queue contains the instruction; and monitoring the working position of the full-function hardware decoding corresponding to the last macro block line in the decoding process, and enabling the processing position of the self to be at least two macro blocks later than the processing position of the last macro block line. In specific implementation, the processing positions of two full-function hardware decoders can be coordinated by setting a multi-core synchronization mechanism so as to keep that the uplink and downlink decoding macro blocks are separated by at least two macro blocks. In this embodiment, preferably, the dual-core synchronous decoding is implemented by a way that a line buffer arbiter (i.e., a line buffer ready flag) arbitrates an authorized line buffer ready flag: since the full-function hardware decoder needs to read the line buffer information first when each macroblock is started. The first full-function hardware decoder and the second full-function hardware decoder are in communication connection with the line cache through the line cache arbiter, and when the full-function hardware decoder needs to read line cache information, a line cache ready mark authorized by the line cache arbiter needs to be obtained. Therefore, the processing speed of the first full-function hardware decoder and the second full-function hardware decoder can be coordinated by presetting the arbitration rule of the line cache arbiter.
Other technical features of the code stream preprocessing device refer to the foregoing embodiments, and are not described herein again.
The invention further provides a code stream preprocessing method of the multi-core decoding system. The method comprises the following steps:
step 100, obtaining original code stream data in the memory, and performing anti-competition code removing processing on the original code stream data to obtain code stream payload data. This step may be performed by the de-contention code module in the previous embodiment.
And 200, writing the code stream payload data and the corresponding anti-competition code information into a cache. This step may be performed by the codestream caching module in the previous embodiment.
Step 300, reading the cached information through an entropy decoding module to perform entropy decoding; in the entropy decoding process, when the end-row macro block of a macro block row completes the entropy decoding, the total bit number of the code stream including the anti-competition code, which is consumed accumulatively by the macro block row, is obtained, and the total bit number of the code stream is used for positioning the start field position of the code stream of the macro block row by a decoding single core corresponding to the macro block row. This step may be performed by the entropy decoding module in the previous embodiment.
For other technical features of the contention code removing module, the code stream caching module and the entropy decoding module, reference is made to the foregoing embodiment, which is not described herein again.
In the foregoing description, the disclosure of the present invention is not intended to limit itself to these aspects. Rather, the various components may be selectively and operatively combined in any number within the intended scope of the present disclosure. In addition, terms like "comprising," "including," and "having" should be interpreted as inclusive or open-ended, rather than exclusive or closed-ended, by default, unless explicitly defined to the contrary. All technical, scientific, or other terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. Common terms found in dictionaries should not be interpreted too ideally or too realistically in the context of related art documents unless the present disclosure expressly limits them to that. Any changes and modifications of the present invention based on the above disclosure will be within the scope of the appended claims.

Claims (10)

1. A code stream preprocessing device of a multi-core decoding system is characterized in that: the device comprises a competition code removing module, a code stream caching module and an entropy decoding module;
the anti-competition code removing module is connected with the memory and used for acquiring original code stream data in the memory and performing anti-competition code removing processing on the original code stream data to acquire code stream payload data;
the code stream cache module is connected with the competition code removing module and used for storing the code stream payload data and the corresponding competition code preventing information;
the entropy decoding module is connected with the code stream caching module and is used for reading code stream payload data in the code stream caching module to perform entropy decoding; in the entropy decoding process, when the end-row macro block of a macro block row completes the entropy decoding, the total bit number of the code stream including the anti-competition code, which is consumed accumulatively by the macro block row, is obtained, and the total bit number of the code stream is used for positioning the start field position of the code stream of the macro block row by a decoding single core corresponding to the macro block row.
2. The code stream preprocessing device according to claim 1, wherein: the de-contention code module is configured to,
reading original code stream data from a memory, and carrying out anti-competition code detection on the original code stream data; when a byte sequence with a preset format and related to the anti-competition codes is detected, the anti-competition codes in the byte sequence are removed to obtain code stream payload data, and the code stream payload data and the removed anti-competition code information are transmitted to a code stream cache module.
3. The code stream preprocessing device according to claim 2, wherein: the code stream cache module adopts a first-in first-out register to read and write code stream payload data and the anti-competition code byte number.
4. The code stream preprocessing device according to claim 3, wherein: the entropy decoding module is configured to read code stream payload data for entropy decoding, and after the analysis of the current code stream payload data is completed, sequentially read the next section of code stream payload data in the code stream cache module; and, at any macroIn the entropy decoding process of the block line, the bit number BitsNum of code stream payload data which is analyzed to be consumed accumulatively by the macro block line is obtainedbpAnd accumulating the byte number BytesNum of the rejected anti-competition codes03(ii) a And when the entropy decoding of the macroblock at the end of the row of the macroblock row is finished, calculating the total bit number of the code stream based on the original code stream data, which is analyzed to be consumed by the macroblock row in an accumulated way, by the following formula: the total bit number of the code stream BitsOffset is BitsNumbp+(BytesNum03X 8), and writing the total bit number BitsOffset of the code stream into a Line Queue;
and the decoding single core corresponding to the macro block Line in the multi-core decoding system locates the start field position of the code stream of the macro block Line by reading the total bit number information of the code stream in the Line Queue.
5. The code stream preprocessing device according to claim 1, wherein: the byte sequence corresponding to the anti-contention code is 0x03, and the byte sequence corresponding to the anti-contention code in the preset format is 0x00_00_ 03.
6. A general multi-core parallel decoder system, characterized by: a decode firmware and a multi-core hardware decode accelerator comprising a communication connection;
the decoding firmware is used for analyzing non-entropy coding data on the upper layer of the video code stream; the multi-core hardware decoding accelerator is used for processing decoding tasks of a macro block layer in a video code stream; slice level data in the video code stream is used as an interactive unit between the decoding firmware and the multi-core hardware decoding accelerator, and Slice parallel processing is carried out through Slice Queue;
the multi-core hardware decoding accelerator comprises a preprocessor module and at least two isomorphic full-function hardware decoders;
the preprocessor module is the code stream preprocessing device of any one of claims 1 to 5, and is used for entropy decoding tasks of video code streams;
each full-function hardware decoder is responsible for decoding a row of macroblock rows, including steps of inverse DCT transformation, inverse quantization, intra-frame inter-frame prediction and pixel reconstruction, and enabling the decoded macroblock in two adjacent upper and lower rows to be separated by at least two macroblocks.
7. The generalized multi-core parallel decoder system according to claim 6, wherein: the multi-core hardware decoding accelerator comprises a first full-function hardware decoder and a second full-function hardware decoder, wherein the first full-function hardware decoder is used for processing the decoding of the 2N macroblock line, the second full-function hardware decoder is used for processing the decoding of the 2N +1 macroblock line, and N is an integer more than 0; two full-function hardware decoders share a group of line caches to form dual-core sharing, and dependency relationship information between macro blocks is stored in the line caches;
the full-function hardware decoder is configured to: checking whether each Line Queue has an instruction or not, and starting decoding of a macro block Line according to the content of the instruction when the Line Queue contains the instruction; and monitoring the working position of the full-function hardware decoding corresponding to the last macro block line in the decoding process, and enabling the processing position of the self to be at least two macro blocks later than the processing position of the last macro block line.
8. A2D-Wave parallel decoding system according to lines is used for decoding video code streams in parallel, and comprises multi-thread decoding formed by a plurality of decoding single cores, wherein each decoding single core is responsible for decoding a row of macro blocks, and front and back Wave type right progressive decoding is kept between upper and lower macro block lines, and the system is characterized in that: the code stream preprocessing device of any one of claims 1-5 is further included.
9. The system of claim 8, wherein: the video is an AVC video or an HEVC video.
10. A code stream preprocessing method of a multi-core decoding system is characterized by comprising the following steps:
acquiring original code stream data in a memory, and performing anti-competition code removal processing on the original code stream data to obtain code stream payload data;
writing the code stream payload data and the corresponding anti-competition code information into a cache;
reading the cached information through an entropy decoding module to perform entropy decoding; in the entropy decoding process, when the end-row macro block of a macro block row completes the entropy decoding, the total bit number of the code stream including the anti-competition code, which is consumed accumulatively by the macro block row, is obtained, and the total bit number of the code stream is used for positioning the start field position of the code stream of the macro block row by a decoding single core corresponding to the macro block row.
CN202011154927.2A 2020-10-26 2020-10-26 Code stream preprocessing device, system and method of multi-core decoding system Active CN112422984B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011154927.2A CN112422984B (en) 2020-10-26 2020-10-26 Code stream preprocessing device, system and method of multi-core decoding system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011154927.2A CN112422984B (en) 2020-10-26 2020-10-26 Code stream preprocessing device, system and method of multi-core decoding system

Publications (2)

Publication Number Publication Date
CN112422984A true CN112422984A (en) 2021-02-26
CN112422984B CN112422984B (en) 2023-02-28

Family

ID=74841725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011154927.2A Active CN112422984B (en) 2020-10-26 2020-10-26 Code stream preprocessing device, system and method of multi-core decoding system

Country Status (1)

Country Link
CN (1) CN112422984B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660496A (en) * 2021-07-12 2021-11-16 珠海全志科技股份有限公司 Multi-core parallel-based video stream decoding method and device
CN115866254A (en) * 2022-11-24 2023-03-28 亮风台(上海)信息科技有限公司 Method and equipment for transmitting video frame and camera shooting parameter information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6829299B1 (en) * 1997-10-02 2004-12-07 Kabushiki Kaisha Toshiba Variable length decoder and decoding method
US20050259688A1 (en) * 2004-05-21 2005-11-24 Stephen Gordon Multistandard video decoder
CN102469344A (en) * 2010-11-16 2012-05-23 腾讯科技(深圳)有限公司 Video stream encryption and decryption method, video stream encryption and decryption device, communication terminal and storage terminal
US20180084284A1 (en) * 2016-09-22 2018-03-22 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding video data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6829299B1 (en) * 1997-10-02 2004-12-07 Kabushiki Kaisha Toshiba Variable length decoder and decoding method
US20050259688A1 (en) * 2004-05-21 2005-11-24 Stephen Gordon Multistandard video decoder
CN102469344A (en) * 2010-11-16 2012-05-23 腾讯科技(深圳)有限公司 Video stream encryption and decryption method, video stream encryption and decryption device, communication terminal and storage terminal
US20180084284A1 (en) * 2016-09-22 2018-03-22 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding video data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王智: "AVS视频编码器中熵编码的研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113660496A (en) * 2021-07-12 2021-11-16 珠海全志科技股份有限公司 Multi-core parallel-based video stream decoding method and device
CN115866254A (en) * 2022-11-24 2023-03-28 亮风台(上海)信息科技有限公司 Method and equipment for transmitting video frame and camera shooting parameter information

Also Published As

Publication number Publication date
CN112422984B (en) 2023-02-28

Similar Documents

Publication Publication Date Title
US20240080482A1 (en) System And Method For Decoding Using Parallel Processing
JP6808341B2 (en) Coding concepts that allow parallel processing, transport demultiplexers and video bitstreams
US20160080756A1 (en) Memory management for video decoding
CN102388616B (en) Image signal decoding device, image signal decoding method, image signal encoding device, and image signal encoding method
US9148670B2 (en) Multi-core decompression of block coded video data
JP7343663B2 (en) How to identify random access points and picture types
CN112422984B (en) Code stream preprocessing device, system and method of multi-core decoding system
TW201742452A (en) Decoder and method for reconstructing a picture from a datastream, encoder and method for coding a picture into a datastream, and related computer program and machine accessible medium
KR101710001B1 (en) Apparatus and Method for JPEG2000 Encoding/Decoding based on GPU
CN107277505B (en) AVS-2 video decoder device based on software and hardware partition
JP7177270B2 (en) Identifying tiles from network abstraction unit headers
US20110216827A1 (en) Method and apparatus for efficient encoding of multi-view coded video data
US20190356911A1 (en) Region-based processing of predicted pixels
US9020284B2 (en) Image encoding apparatus
US8443413B2 (en) Low-latency multichannel video port aggregator
US9344720B2 (en) Entropy coding techniques and protocol to support parallel processing with low latency
US20140092987A1 (en) Entropy coding techniques and protocol to support parallel processing with low latency
CN112422983B (en) Universal multi-core parallel decoder system and application thereof
WO2020215216A1 (en) Image decoding method, decoder and storage medium
CN112449196A (en) Decoding method of concurrent video session IP frame image group
RU2787711C1 (en) Managing a buffer of decoded images for encoding video signals
KR20170053031A (en) Enhanced data processing apparatus using multiple-block based pipeline and operation method thereof
WO2018223353A1 (en) Video coding method, video decoding method, and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant