CN112449199A

CN112449199A - One-dimensional DCT/IDCT converter for parallel bit vector conversion and partial product addition

Info

Publication number: CN112449199A
Application number: CN202011300100.8A
Authority: CN
Inventors: 陈朝阳
Original assignee: Henan Institute of Engineering
Current assignee: Henan Institute of Engineering
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2021-03-05
Anticipated expiration: 2040-11-19
Also published as: CN112449199B

Abstract

The invention provides a one-dimensional DCT/IDCT converter for converting parallel bit vectors and adding partial products, which is used for solving the problem that the existing DCT/IDCT converter occupies too many logic unit resources and comprises a bit vector parallel output device, a bit vector parallel exchanger, a numerical value bit vector parallel converter, a sign bit vector converter and a partial product adder, wherein the bit vector parallel output device is connected with a digital signal processor; the bit vector parallel output device is connected with the bit vector parallel exchanger; the bit vector parallel converter is respectively connected with the numerical value bit vector parallel converter and the sign bit vector converter; the numerical bit vector parallel converter and the sign bit vector converter are both connected with the partial product adder. The parallel bit vector transformation partial product left phase shift addition based on the bit and-addition algorithm is realized, direct multiplication operation is not used, a specific common divisor is not required to be multiplexed, the transformation period is short, and a large number of logic units can be saved; the DCT transform and the IDCT transform can be implemented in a unified structure.

Description

One-dimensional DCT/IDCT converter for parallel bit vector conversion and partial product addition

Technical Field

The invention relates to the technical field of digital video compression coding and decoding, in particular to a one-dimensional DCT/IDCT (discrete cosine transform/inverse discrete cosine transform) converter for parallel bit vector transform partial product addition, which is used for integer approximation discrete cosine transform/inverse discrete cosine transform (DCT/IDCT) realized by an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit) in a high efficiency digital video coding standard (HEVC).

Background

The high efficiency video coding standard (HEVC) employs a block-based hybrid video compression coding framework that specifies two-dimensional approximation DCT transforms with core transform matrix sizes of 4 × 4, 8 × 8, 16 × 16, and 32 × 32, and similarly two-dimensional approximation IDCT transforms with core transform matrix sizes of 4 × 4, 8 × 8, 16 × 16, and 32 × 32. The two-dimensional transformation is separable, i.e. it can be achieved by performing a row transformation and a column transformation of the N points in one dimension separately.

The one-dimensional N-point DCT transform is operated as follows: where C is an HEVC core transform matrix with a size of 4 × 4, 8 × 8, 16 × 16, or 32 × 32, x is a row data vector or a column data vector for motion residual compensation of an image block of a corresponding length, the vector length is 4, 8, 16, or 32, y is a corresponding one-dimensional transform result vector, and the vector length is 4, 8, 16, or 32. The one-dimensional N-point IDCT transform is operated as follows: w ═ C^Tu，C^TIs the transpose of the transform matrix C, u is the input vector of length 4, 8, 16 or 32, and w is the output vector of length 4, 8, 16 or 32.

If the matrix multiplication is performed directly, the one-dimensional transformation or the inverse transformation requires N, respectively²Sub-multiplication and N (N-1) addition. For two-dimensional transformation, 2N is required³Sub multiplication and 2N²The (N-1) times of addition have large calculation amount, when the FPGA or ASIC is used for realizing, a large number of special multipliers are needed, namely, one multiplier is a multiplier with fixed coefficients, so that a lot of resources are occupied, and the existing mainstream scheme is based on Constant Matrix Multiplication (CMM) and multi-constant multiplexing (MCM) without multiplication. The input data vectors are processed by utilizing the transformation matrix, namely in the realization of FPGA and ASIC, the integer multiplier of the transformation matrix is realized by adopting combined shift and addition, and the needed shift and addition resources are reduced by multiplexing common divisor as much as possible. The common divisor is determined without a uniform method, so that the efficiency of the algorithm cannot be ensured, and only the sharing of partial components of DCT and IDCT can be realized due to different symmetry of matrix base vectors of transformation and inverse transformation.

Disclosure of Invention

Aiming at the technical problem that integer approximated DCT/IDCT conversion in HEVC is generally realized by FPGA or ASIC and a large number of special multipliers occupy excessive logic unit resources, the invention provides a one-dimensional DCT/IDCT converter for partial product addition of parallel bit vector conversion, which realizes DCT conversion and IDCT conversion in HEVC uniformly, reduces the logic resource occupation of DCT/IDCT conversion and has short conversion period.

In order to achieve the purpose, the technical scheme of the invention is realized as follows: a parallel bit vector transformation partial product added one-dimensional DCT/IDCT converter comprises a bit vector parallel output device, a bit vector parallel exchanger, a numerical bit vector parallel converter, a sign bit vector converter and a partial product adder; the bit vector parallel output device is connected with the bit vector parallel exchanger; the bit vector parallel converter is respectively connected with the numerical value bit vector parallel converter and the sign bit vector converter; the numerical bit vector parallel converter and the sign bit vector converter are both connected with the partial product adder.

The bit vector parallel output device is used for outputting M paths of N-dimensional bit vectors to the bit vector parallel exchanger, and comprises an N-dimensional M-bit data vector memory, an N-dimensional 0 th numerical bit vector output device, an N-dimensional 1 st numerical bit vector output device, an N-dimensional (i.e., a) to a.once. -, an N-dimensional (M-2) th numerical bit vector output device and an N-dimensional symbol bit vector output device; the N-dimensional M-bit data vector memory is respectively connected with an N-dimensional 0 th numerical bit vector output device, an N-dimensional 1 st numerical bit vector output device, an N-dimensional M-2 th numerical bit vector output device and an N-dimensional symbol bit vector output device.

The N-dimensional M-bit data vector memory stores N-dimensional vectors of motion compensation residual image block row data/column data of an N multiplied by N image block represented by M-bit binary complement, and the N-dimensional 0-th numerical bit vector output device takes out the 0 th bit of each data of the N-dimensional vectors in the N-dimensional M-bit data vector memory to form a vector of an N-dimensional 1-bit element and outputs the vector to the bit vector parallel exchanger; the 1 st bit of each data of the N-dimensional vector in the N-dimensional M-bit data vector memory is taken out by the N-dimensional 1 st numerical bit vector output device to form a vector of N-dimensional 1-bit elements, and the vector is output to the bit vector parallel exchanger; by analogy, the M-2 th digit vector output device of the N dimension takes out the M-2 th digit of each data of the N dimension vector in the M-digit data vector memory of the N dimension to form a vector of an N dimension 1 digit element, and outputs the vector to the digit vector parallel exchanger; the N-dimensional symbol bit vector output device takes out a vector of N-dimensional 1-bit elements formed by M-1 bits of each data of the N-dimensional vector in the N-dimensional M-bit data vector memory as a bit symbol vector and outputs the bit symbol vector to the bit vector parallel exchanger.

The numerical bit vector parallel converter realizes DCT/IDCT conversion of M-1 numerical bit vectors, and comprises an NxN core conversion matrix memory, an N-dimensional 0 th numerical bit vector expander, an N-dimensional 1 st numerical bit vector expander, an N-dimensional M-2 numerical bit vector expander, a parallel vector bit and arithmetic unit and a parallel partial product generator; the N-dimensional 0 th numerical bit vector expander corresponds to the N-dimensional 0 th numerical bit vector output device, the N-dimensional 1 st numerical bit vector expander corresponds to the N-dimensional 1 st numerical bit vector output device, the N-dimensional M-2 th numerical bit vector expander corresponds to the N-dimensional M-2 th numerical bit vector output device, the NxN core transformation matrix memory, the N-dimensional 0 th numerical bit vector expander, the N-dimensional 1 st numerical bit vector expander, the N-dimensional.

The sign bit vector converter comprises an N multiplied by N negative core conversion matrix memory, an N-dimensional sign bit vector expander, a vector bit AND operator and a partial product generator; the N-dimensional sign bit vector expander is connected with the N-dimensional sign bit vector output device, the N multiplied by N negative core transformation matrix memory and the N-dimensional sign bit vector expander are both connected with the vector bit and arithmetic device, and the vector bit is connected with the arithmetic device and the partial product generator; the partial product generator is connected to the partial product adder.

The partial product adder comprises a sign extender, a left shift shifter and an addition operator; the sign expander is respectively connected with the parallel partial product generator and the partial product generator, the sign expander is connected with the left shift shifter, and the left shift shifter is connected with the addition arithmetic unit.

The N multiplied by N core transformation matrix memory stores a DCT/IDCT core transformation matrix, elements of the transformation matrix are expressed as 8-bit integers expressed by binary complement codes, and the elements of the transformation matrix are sent to a parallel vector bit and arithmetic unit in parallel; the N-dimensional 0 th numerical value bit vector expander, the N-dimensional 1 st numerical value bit vector expander, the... once.n-dimensional M-2 th numerical value bit vector expander respectively expand the vectors of M-1N-dimensional 1-bit elements into M-1N-dimensional 8-bit co-located value data vectors, and the expanded N-dimensional 8-bit data vectors are used as bit expansion vectors and are sent to the parallel vector bit and arithmetic unit in parallel; the parallel vector bit AND arithmetic unit performs bit AND on N data of each row vector of the N-dimensional bit extension vector and the N multiplied by N core transformation matrix according to data bits to obtain an N multiplied by (M-1) dimensional bit and result matrix; the parallel partial product generator adds the N multiplied by (M-1) -dimensional bits to the result matrix in the row direction to obtain an N multiplied by (M-1) -dimensional partial product matrix.

The N multiplied by N negative core transformation matrix memory stores a negative matrix of a DCT/IDCT core transformation matrix, the negative matrix is a product matrix of-1 and the DCT/IDCT core transformation matrix, elements of the negative matrix are expressed as 8-bit integers expressed by binary complement codes, and the elements of the negative matrix are sent to a vector bit and arithmetic unit in parallel; the N-dimensional sign bit vector expander expands the bit sign vectors output by the N-dimensional sign bit vector output device into N-dimensional 8-bit parity data vectors, and sends the expanded N-dimensional 8-bit sign vectors serving as sign bit expansion vectors to the vector bit AND arithmetic unit; the vector bit AND operator performs bit AND on N data of the sign bit extension vector and the row vector of the N multiplied by N negative core transformation matrix according to the data bit sequence to obtain an N multiplied by N dimensional bit and result matrix; the partial product generator adds the NxN dimension bits to the result matrix in the row direction to obtain an N-dimension partial product vector.

The symbol expander outputs an N x M dimensional partial product expanded symbol matrix, the 0 th column of the N x M dimensional partial product expanded symbol matrix is obtained by expanding the 0 th column of the N x (M-1) dimensional partial product matrix output by the parallel partial product generator by an M-1 bit symbol, the 1 st column of the N x M dimensional partial product expanded symbol matrix is obtained by expanding the 1 st column of the N x (M-1) dimensional partial product matrix output by the parallel partial product generator by an M-2 bit symbol, .., the M-2 column of the N × M dimensional partial product expansion symbol matrix is obtained by expanding the M-2 column of the N × M (M-1) dimensional partial product matrix output by the parallel partial product generator by 1-bit symbol, and the M-1 column of the N × M dimensional partial product expansion symbol matrix is the N dimensional partial product vector output by the partial product generator; the left shift shifter outputs an N × M dimensional partial product spread symbol left shift matrix, the 0 th column of the N × M dimensional partial product spread symbol left shift matrix is the 0 th column of the N × M dimensional partial product spread symbol matrix, the 1 st column thereof is each element of the 1 st column of the N × M dimensional partial product spread symbol matrix left shifted by 1 bit, the 2 nd column thereof is each element of the 2 nd column of the N × M dimensional partial product spread symbol matrix left shifted by 2 bits, a.... the., the M-1 th column thereof is each element of the M-1 th column of the N × M dimensional partial product spread symbol matrix left shifted by M-1 bits; the addition arithmetic unit adds the N multiplied by M dimensional spread symbol left shift matrix according to each element in the row direction to obtain the N dimensional data vector DCT/IDCT transformation result.

The N-dimensional sign bit vector expander, the N-dimensional 0 th numeric bit vector expander, the N-dimensional 1 st numeric bit vector expander, and the N-dimensional M-2 th numeric bit vector expander are all methods for expanding a vector of N-dimensional 1-bit elements into an N-dimensional 8-bit parity data vector: if the nth component of the original data is 1, the nth component of the expanded data is 11111111, and if the nth component of the original data is 0, the nth component of the expanded data is 00000000, and N is 1, 2.

Compared with the prior art, the invention has the beneficial effects that: the invention is based on DCT/IDCT transformation realized by parallel bit vector transformation partial product left phase shift addition of a bit and-addition algorithm, does not use direct multiplication operation and does not need to multiplex specific common divisor, utilizes an input data vector to control and process a transformation matrix, namely, the DCT/IDCT transformation result of each bit vector of an input data vector is obtained by adopting the parallel bit and-addition algorithm and is partial product, the multiplication of input data and the transformation matrix is realized by each partial product displacement and vector addition, the number of parallel bit transformation units is only related to the number of the binary complement bits of the input data and is unrelated to the diversity of the transformation matrix coefficients with different sizes; because of adopting the parallel processing, the DCT/IDCT conversion can be completed only by 3 clock cycles, and the conversion cycle is short; a large number of FPGA or ASIC logic units can be saved, 4 multiplied by 4DCT transformation is realized, and only one FPGA logic unit (LE) is needed; when the number of the binary complement bits of the input data is the same, the DCT and IDCT transformation can be realized by a unified structure.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic structural diagram of the present invention.

Fig. 2 is a block diagram of a structure of the bit vector parallel outputter shown in fig. 1.

Fig. 3 is a block diagram showing the configuration of the numeric bit vector parallel converter shown in fig. 1.

Fig. 4 is a block diagram showing the structure of the sign bit vector converter shown in fig. 1.

Fig. 5 is a block diagram of a partial product adder shown in fig. 1.

FIG. 6 is a simulation timing diagram of the present invention.

In the figure, 10 is a bit vector parallel outputter, 11 is a bit vector parallel exchanger, 12 is a numeric bit vector parallel converter, 13 is a sign bit vector converter, 14 is a partial product adder, 1001 is an N-dimensional M-bit data vector memory, 1002 is an N-dimensional 0-th numeric bit vector outputter, 1003 is an N-dimensional 1-th numeric bit vector outputter, 1009 is an N-dimensional M-2-th numeric bit vector outputter, 1010 is an N-dimensional sign bit vector outputter, 1201 is an N × N core transform matrix memory, 1202 is an N-dimensional 0-th numeric bit vector expander, 1203 is an N-dimensional 1-th numeric bit vector expander, 1209 is an N-dimensional M-2-th numeric bit vector expander, 1210 is a parallel vector bit and operator, 1211 is a parallel partial product generator, 1301 is an N × N core transform matrix memory, 1302-dimensional sign bit vector expander, 1310 is a vector bit and operator, 1311, sign extender 1401, shift left shifter 1402, and addition operator 1403.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, a one-dimensional DCT/IDCT converter for parallel bit vector conversion and partial product addition includes a bit vector parallel outputter 10, a bit vector parallel exchanger 11, a numeric bit vector parallel exchanger 12, a sign bit vector exchanger 13, and a partial product adder 14; the bit vector parallel output device 10 is connected with a bit vector parallel exchanger 11; the bit vector parallel exchanger 11 is respectively connected with a numerical bit vector parallel converter 12 and a sign bit vector converter 13; the value bit vector parallel converter 12 and the sign bit vector converter 13 are both connected to a partial product adder 14. For N-dimensional vectors of motion compensation residual row data/column data of an N multiplied by N image blocks represented by M-bit binary complement codes, a bit vector parallel output device 10 realizes M-path N-dimensional bit vector output to a bit vector parallel exchanger 11, a numerical bit vector parallel converter 12 realizes DCT/IDCT conversion of M-1 numerical bit vectors, a sign bit vector converter 13 realizes DCT/IDCT conversion of a sign bit vector, the bit vector parallel exchanger 11 realizes the exchange of N-dimensional 0 th-M-2 th numerical bit vectors to the numerical bit vector parallel converter 12, the N-dimensional sign bit vectors are exchanged to the sign bit vector converter 13, and a partial product adder 14 realizes the sign extension, left shift and addition of partial products to obtain the DCT/IDCT conversion result of the data vectors.

The invention utilizes the input N-dimensional M-bit data vector to control and process an NxN HEVC DCT/IDCT core transformation matrix, and comprises the following three steps: firstly, obtaining a bit and result matrix of dimension NxNxM by adopting a parallel bit and algorithm; secondly, adding the bits and each row element of a result matrix to obtain an NxM dimensional matrix, wherein the matrix is a DCT/IDCT conversion result of each bit vector of an input data vector and is called a partial product matrix; and thirdly, shifting each partial product element and adding row vectors of the partial product matrix to finally realize multiplication of the input data vector and the transformation matrix to obtain an N-dimensional input data vector DCT/IDCT transformation result vector.

As shown in fig. 2, the bit vector parallel outputter 10 includes an N-dimensional M-bit data vector memory 1001, an N-dimensional 0 th numeric bit vector outputter 1002, an N-dimensional 1 st numeric bit vector outputter 1003, an a. -.. -. an N-dimensional M-2 th numeric bit vector outputter 1009, and an N-dimensional sign bit vector outputter 1010; the N-dimensional M-bit data vector memory 1001 is connected to an N-dimensional 0 th numeric bit vector outputter 1002, an N-dimensional 1 st numeric bit vector outputter 1003, an N-dimensional M-2 th numeric bit vector outputter 1009, and an N-dimensional sign bit vector outputter 1010, respectively. An N-dimensional M-bit data vector memory 1001 stores N-dimensional vectors of motion compensation residual data/column data of an N multiplied by N image blocks represented by M-bit two-complement codes, and an N-dimensional 0-th numerical bit vector output device 1002 extracts the 0 th bit of each data of the N-dimensional vectors in the N-dimensional M-bit data vector memory 1001 to form N vectors of 1-bit elements and outputs the vectors to a bit vector parallel exchanger 11; the N-dimensional 1 st numerical bit vector outputter 1003 extracts the 1 st bit of each data of the N-dimensional vector in the N-dimensional M-bit data vector memory 1001, composes N vectors of 1-bit elements, and outputs to the bit vector parallel exchanger 11; by analogy, the M-2 th numeric bit vector outputter 1009 in the N-dimensional dimension takes out the M-2 th bit of each data of the N-dimensional vector in the M-bit data vector memory 1001 in the N-dimensional dimension to form a vector of N1-bit elements, and outputs the vector to the bit vector parallel switch 11; the N-dimensional sign bit vector outputter 1010 extracts the M-1 th bit of each data of the N-dimensional vector in the N-dimensional M-bit data vector memory 1001, composes a vector of N1-bit elements, and outputs to the bit vector parallel exchanger 11.

As shown in fig. 3, the numerical bit vector parallel converter 12 includes an N × N kernel conversion matrix memory 1201, an N-dimensional 0 th numerical bit vector expander 1202, an N-dimensional 1 st numerical bit vector expander 1203, an a.... times, an N-dimensional M-2 th numerical bit vector expander 1209, a parallel vector bit and operator 1210, and a parallel partial product generator 1211; the N-dimensional 0 th numeric bit vector expander 1202 corresponds to an N-dimensional 0 th numeric bit vector outputter 1002, the N-dimensional 1 st numeric bit vector expander 1203 corresponds to an N-dimensional 1 st numeric bit vector outputter 1003, the N-dimensional M-2 th numeric bit vector expander 1209 corresponds to an N-dimensional M-2 th numeric bit vector outputter 1009, the N × N kernel transform matrix memory 1201, the N-dimensional 0 th numeric bit vector expander 1202, the N-dimensional 1 st numeric bit vector expander 1203, the N-dimensional 1 st numeric bit vector expander 1209 are connected to a parallel vector bit and operator 1210, the parallel vector bit and operator 1210 are connected to a parallel partial product generator 1211, and the parallel partial product generator 1211 is connected to the partial product adder 14. The nxn core transformation matrix memory 1201 stores an HEVC DCT/IDCT core transformation matrix, the elements of the transformation matrix are expressed as 8-bit integers expressed by binary complement, and the elements of the transformation matrix are sent to the parallel vector bit and operator 1210 in parallel; an N-dimensional 0-th numeric bit vector expander 1202, an N-dimensional 1-th numeric bit vector expander 1203, # 1. And sends the expanded N-dimensional 8-bit data vector as a bit expansion vector to the parallel vector bit and operator 1210 in parallel; the parallel vector bit and operator 1210 performs bit and operation on the N-dimensional bit extension vector and N data of each row vector of the nxn core transformation matrix according to data bits to obtain an nxnxnxnx (M-1) -dimensional bit and result matrix; the parallel partial product generator 1211 adds the bits of dimension N × (M-1) to the result matrix in a row direction to obtain a partial product matrix of dimension N × (M-1).

As shown in fig. 4, the sign bit vector transformer 13 includes an N × N negative core transform matrix memory 1301, an N-dimensional sign bit vector extender 1302, a vector bit and operator 1310, and a partial product generator 1311; the nxn negative core transform matrix memory 1301 and the N-dimensional sign bit vector expander 1302 are both connected to the vector bit and operator 1310, and the vector bit and operator 1310 and the partial product generator 1311 are connected; the N-dimensional sign bit vector expander 1302 is connected to the N-dimensional sign bit vector outputter 1010, and the partial product generator 1311 is connected to the partial product adder 14. The nxn negative core transform matrix memory 1301 stores negative matrices of the HEVC DCT/IDCT core transform matrices, that is, product matrices of-1 and HEVC DCT/IDCT core transform matrices, elements of the negative matrices of the transform matrices are expressed as 8-bit integers expressed by binary complement, and the elements of the negative matrices of the transform matrices are sent to the vector bit and operator 1310 in parallel; the N-dimensional sign bit vector expander 1302 expands the bit sign vectors of the N1-bit elements of the N-dimensional sign bit vector outputter 1010 into N-dimensional 8-bit parity data vectors, i.e., if the original nth component is 1, the expanded nth component is 11111111, if the original nth component is 0, the expanded nth component is 00000000, and sends the expanded N-dimensional 8-bit sign vectors to the vector bit and operator 1310; the vector bit and operator 1310 performs bit and operation on N data of the row vector of the N-dimensional sign bit extension vector and the N × N negative core transformation matrix according to the data bit sequence to obtain an N × N dimensional bit and result matrix; the partial product generator 1311 adds N × N-dimensional bits to the result matrix in the row direction to obtain an N-dimensional partial product vector.

The bit vector parallel exchanger 11 realizes the exchange of the numerical value vectors of the 0 th bit to the M-2 th bit of the N dimension to the numerical value bit vector parallel converter 12 and the exchange of the N dimension sign bit vectors to the sign bit vector converter 13; the method comprises the following steps: the output of the N-dimensional 0-th bit vector outputter 1002 is switched to the input of the N-dimensional 0-th bit vector expander 1202, the output of the N-dimensional 1-th bit vector outputter 1003 is switched to the input of the N-dimensional 1-th bit vector expander 1203, ·.

As shown in fig. 5, the partial product adder 14 includes a symbol expander 1401, a left shift shifter 1402, and an addition operator 1403; the symbol expander 1401 is connected to a parallel partial product generator 1211 and a partial product generator 1311, respectively. The sign extender 1401 is connected to the shift left unit 1402, and the shift left unit 1402 is connected to the addition unit 1403; the symbol expander 1401 outputs an N × M-dimensional partial product expanded symbol matrix, the 0 th column of the N × M-dimensional partial product expanded symbol matrix is obtained by expanding the 0 th column of the N × (M-1) -dimensional partial product matrix output from the parallel partial product generator 1211 by an M-1 bit symbol, the 1 st column of the N × M-dimensional partial product expanded symbol matrix is obtained by expanding the 1 st column of the N × (M-1) -dimensional partial product matrix output from the parallel partial product generator 1211 by an M-2 bit symbol, .., the M-2 column of the N × M dimensional partial product expansion symbol matrix is obtained by expanding the M-2 column of the N × M (M-1) dimensional partial product matrix output by the parallel partial product generator 1211 by a 1-bit symbol, and the M-1 column of the N × M dimensional partial product expansion symbol matrix is the N dimensional partial product vector output by the partial product generator 1311; the method of spreading the elements of the symbol matrix in symbol spreader 1401 is: if the original two's complement data is 0XXXXXXXXX, then after extending the 2-bit symbols, the data becomes: 000 XXXXXXXXXXX; if the original two's complement data is 1XXXXXXXXX, then after extending the 2-bit symbols, the data becomes: 111 xxxxxxxxxx, X represents a one-bit binary number 0 or 1, and symbol expander 1401 may expand the elements of the partial product matrix by bits from 0 bits, 1 bit, to M-1 bits. The left shift shifter 1402 outputs an nxm dimensional partial product spread symbol left shift matrix, a 0 th column of which is a 0 th column of the nxm dimensional partial product spread symbol matrix, a 1 st column of which is a 1 st column of elements of the nxm dimensional partial product spread symbol matrix left shifted by 1, a 2 nd column of which is a 2 nd column of elements of the nxm dimensional partial product spread symbol matrix left shifted by 2, a... a.m. column of which is an M-1 th column of elements of the nxm dimensional partial product spread symbol matrix left shifted by M-1; the addition operator 1403 adds the N × M-dimensional partial product spread symbol left shift matrix for each element in the row direction to obtain an N-dimensional data vector DCT/IDCT transform result.

The structure of the present invention is explained by taking an example of a 4-point integer approximation DCT transform, and assuming that motion compensation residual data is an integer between-255 and 255, which can be expressed as a 9-bit signed binary complement, that is, when N is 4 and M is 9, an input data vector can be expressed as:

wherein x is_0，8，x_1，8，x_2，8，x_3，8Are each x₀，x₁，x₂，x₃The sign bit of the positive number can be defined as 0 and the sign bit of the negative number can be defined as 1 without loss of generality; the remainder being numerical bits, e.g. x_0，7，x_0，6，x_0，5，x_0，4，x_0，3，x_0，2，x_0，1，x_0，0Are each x₀Bit 7 to bit 0. Since the input data is a 9-bit binary number, the bit vector parallel outputter 10 outputs to the bit vector parallel exchanger 11And 9 paths of 4-dimensional bit vectors are output.

The bit vector output by the bit vector parallel output device 10 is defined as:

Xbit_j＝[x_0，j x_1，j x_2，j x_3，j]^T；

since the bit width of the input data is assumed to be 9, j is 0, 1, …, 8. The DCT/IDCT transformation of each bit vector is processed in parallel, where Xbit₀……Xbit₇DCT/IDCT conversion is performed in parallel in the numerical bit vector parallel converter 12; xbit₈The DCT/IDCT conversion is performed in the sign bit vector transformer 13.

The 4 × 4DCT transform coefficient matrix of HEVC is:

the 4 × 4DCT transform coefficient matrix stored in the N × N core transform matrix memory 1201 is:

the negative matrix of 4 × 4DCT transform coefficients stored in the N × N negative core transform matrix memory 1301 is:

the above matrix C_4，value、C_4，signEach element in (1) is represented by an 8-bit two's complement code.

A4-point value bit vector DCT transform performed in the value bit vector parallel converter 12 can be expressed as

Parallel 8 such changes are made in a numerical bit vector parallel converter 12And (4) changing. Due to Xbit_kIs a 1-bit binary vector and thus is represented by C in order to be complementary to an 8-bit binary code_4,valueThe matrix element phase and is first expanded into an 8-bit parity vector, and for a 4-dimensional vector, the expansion is completed by a 4-dimensional 0 th numeric bit vector expander 1202, a 4-dimensional 1 st numeric bit vector expander 1203, a. The parallel vector bit and operator 1210 performs a bit and operation on 144 data to obtain a 4 × 4 × 8 dimensional bit and result matrix, and the parallel partial product generator 1211 adds the 4 × 4 × 8 dimensional bit and result matrix in rows to generate a 4 × 8 dimensional partial product matrix.

The 4 × 4DCT transform of the symbol bit vector may be represented as:

4-dimensional 1-bit binary vector Xbit₈To sum with C represented by 8-bit two's complement_4,signThe matrix element phase and is first expanded into an 8-bit parity vector, and for a 4-dimensional vector, the expansion is performed by a 4-dimensional sign bit vector expander 1302. The vector bit and operator 1310 performs a bit and operation on 16 data bits to obtain a 4 × 4 dimensional bit and result matrix, and the partial product generator 1311 adds the 4 × 4 dimensional bit and result matrix by rows to generate a 4 dimensional partial product matrix.

The 4-point one-dimensional data vector integer approximation DCT transform can be represented as:

in the above formula, 2^mHas the functions of

After symbol spreading, it is shifted to the left by m bits and implemented by a partial product adder 14. Sign extender 1401 is to

r₁ ⁰、

Spread 8-bit symbols, will

r₁ ¹、

The 7-bit symbol is extended, and so on,

r₁ ⁷、

the 1-bit symbol is extended,

r₁ ⁸、

do not extend the symbol; left shift unit 1402 will expand 8-bit symbol

r₁ ⁰、

Not left-shifted, will be extended by 7-bit symbols

r₁ ¹、

Left-shifted by 1 bit, and so on, to expand the symbol of 1 bit

r₁ ⁷、

Shift 7 positions to the left, will

r₁ ⁸、

Left shift by 8 bits; the addition operator 1403 expands the symbol and shifts it to the left

Add to obtain y₀R after spreading and left shifting the symbols₁ ⁰、r₁ ¹、......、r₁ ⁷、r₁ ⁸Add to obtain y₁After spreading and left-shifting the symbols

Add to obtain y₂After spreading and left-shifting the symbols

Add to obtain y₃And outputting the DCT transformation result [ y ] of the data vector₀，y₁，y₂，y₃]^T。

As shown in fig. 6, the simulation input data vector x ═ x₀ x₁ x₂ x₃]^T＝[1 2 4 -248]^TThe values 1,2, 4 were chosen for simulation because they are the most probable values of the video frame motion compensation residual, while the-248 option can check the reliability and validity of the algorithm for large values.

Numerical bit vector Xbit₀＝[x_0，0 x_1，0 x_2，0 x_3，0]^T＝[1 0 0 0]^TInput at clock cycle 0; in this theory:

numerical bit vector Xbit₁＝[x_0，1 x_1，1 x_2，1 x_3，1]^T＝[0 1 0 0]^TInput at clock cycle 0; numerical digitVector Xbit₂＝[x_0，2 x_1，2 x_2，2 x_3，2]T＝[0 0 1 0]^TInput at clock cycle 0; numerical bit vector Xbit₃＝[x_0，3 x_1，3 x_2，3 x_3，3]^T＝[0 0 0 1]^TInput at clock cycle 0; numerical bit vector Xbit₄、Xbit₅、Xbit₆、Xbit₇All 0 vectors are input in clock cycle 0; symbol bit vector Xbit₈＝[x_0，8 x_1，8 x_2，8 x_3，8]^T＝[0 0 0 1]^TAnd is input at clock cycle 0.

At the middle rising edge of clock cycle 3, the output conversion result y ═ y₀ y₁ y₂ y₃]^T＝[-15424 20595 -16192 9130]^T。

The invention can save a large amount of logic units of FPGA or ASIC by using the parallel bit vector transformation part without multiplication and the left shift addition; in 4 x 4DCT transformation, the traditional scheme of multiplying MCM by shift-add multiple constants and multiplexing common divisor needs 34 FPGA logic units LE and 34 registers, and the invention only needs 1 FPGA logic unit LE and 1 register. The bit vector converter of the invention uses a bit and-addition algorithm to realize the matrix-bit vector multiplication of an input data bit vector and a DCT (discrete cosine transform) core conversion matrix or an IDCT (inverse discrete cosine transform) core conversion matrix to obtain a partial product matrix, and then adds the partial product matrix along the row direction to realize the DCT/IDCT conversion of the data vector, and the structure is uniform and regular and is not influenced by the diversity of element values of the core conversion matrix.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A one-dimensional DCT/IDCT converter for parallel bit vector conversion and partial product addition is characterized by comprising a bit vector parallel output device (10), a bit vector parallel exchanger (11), a numerical bit vector parallel converter (12), a sign bit vector converter (13) and a partial product adder (14); the bit vector parallel output device (10) is connected with the bit vector parallel exchanger (11); the bit vector parallel converter (11) is respectively connected with a numerical value bit vector parallel converter (12) and a sign bit vector converter (13); the numerical value bit vector parallel converter (12) and the sign bit vector converter (13) are both connected with a partial product adder (14).

2. The parallel bit-vector transform partial product add one-dimensional DCT/IDCT converter according to claim 1, wherein said bit-vector parallel outputter (10) implementsMRoad surfaceNThe dimension bit vector is output to a bit vector parallel exchanger (11), and the bit vector parallel output device (10) comprisesNVitamin CMBit data vector memory (1001),NA 0-dimensional numerical bit vector output unit (1002),NA dimension 1 numerical bit vector output unit (1003)NMaintenance ofM-2-value bit vector outputter (1009) andNa dimension sign bit vector outputter (1010); the above-mentionedNVitamin CMBit data vector memory 1001 andNa 0-dimensional numerical bit vector output unit (1002),NA dimension 1 numerical bit vector output unit (1003)NMaintenance ofM-2-value bit vector outputter (1009) andNthe dimension sign bit vector output unit (1010) is connected.

3. The parallel bit vector transform partial product add one-dimensional DCT/IDCT transformer of claim 2, wherein said one-dimensional DCT/IDCT transformer is characterized byNVitamin CMBit data vector memory (1001) storesMExpressed by two's complementN×NMotion compensated residual row/column data for image blocksNThe dimension vector is a vector of the dimensions,Ndimension 0 numeric bit vector output (1002)NVitamin CMBit data vector memory (1001)NThe 0 th component of each datum of the dimension vectorNA vector of dimension 1 bit elements and output to a bit vector parallel exchanger (11);Ndimension 1 numerical bitThe vector output unit (1003) takes outNVitamin CMBit data vector memory (1001)N1 st component of each data of dimension vectorNA vector of dimension 1 bit elements and output to a bit vector parallel exchanger (11); by the way of analogy, the method can be used,Nmaintenance ofM-2-value bit vector outputter (1009) fetchingNVitamin CMBit data vector memory (1001)NSecond of each data of the dimensional vectorM-2 bit compositionNA vector of dimension 1 bit elements and output to a bit vector parallel exchanger (11);Ndimension sign bit vector output (1010)NVitamin CMBit data vector memory (1001)NSecond of each data of the dimensional vectorM-1 bit compositionNThe vector of dimension 1 bit elements is taken as a bit-sign vector and the bit-sign vector is output to a bit-vector parallel exchanger (11).

4. Parallel bit vector transform partial product addition one-dimensional DCT/IDCT transformer according to claim 2 or 3, characterized in that the numerical bit vector parallel transformer (12) implementsM-1 DCT/IDCT transformation of a numerical bit vector, the numerical bit vector parallel transformer (12) comprisingN×NA core transform matrix memory (1201),NA dimension 0-th numeric bit vector expander (1202),NA dimension 1-th numeric bit vector expander (1203)NMaintenance ofM-2 numeric bit vector expander (1209), parallel vector bit and operator (1210) and parallel partial product generator (1211); the above-mentionedNDimension 0-th numeric bit vector expander (1202) andNdimension 0-th numeric bit vector output (1002),Ndimension 1-th numeric bit vector expander (1203) andNdimension 1-th numeric bit vector output (1003) corresponds,Nmaintenance ofM-2 a numerical bit vector expander (1209) andNmaintenance ofM-2-value bit vector outputter (1009) corresponds,N×Na core transform matrix memory (1201),NA dimension 0-th numeric bit vector expander (1202),NA dimension 1-th numeric bit vector expander (1203)NMaintenance ofM-2 magnitude bit vector expander (1209) and parallel vector bit AND operationThe unit (1210) is connected, the parallel vector bit and operator (1210) is connected to a parallel partial product generator (1211), and the parallel partial product generator (1211) is connected to a partial product adder (14).

5. The parallel bit vector transform partial product add one-dimensional DCT/IDCT transformer of claim 4, characterized in that the sign bit vector transformer (13) comprisesN×NA negative core transformation matrix memory (1301),NA dimension sign bit vector expander (1302), a vector bit AND operator (1310), and a partial product generator (1311); the above-mentionedNDimension sign bit vector expander (1302) andNa dimension sign bit vector output (1010),N×Nnegative core transformation matrix memory (1301) andNthe dimension sign bit vector expanders (1302) are connected with the vector bit and arithmetic unit (1310), and the vector bit is connected with the arithmetic unit (1310) and the partial product generator (1311); the partial product generator (1311) is connected to the partial product adder (14).

6. The one-dimensional DCT/IDCT transformer of parallel bit-vector transform partial-product addition according to claim 5, characterized in that said partial-product adder (14) comprises a sign extender (1401), a left shift shifter (1402) and an addition operator (1403); the symbol expander (1401) is connected with the parallel partial product generator (1211) and the partial product generator (1311), the symbol expander (1401) is connected with the left shift shifter (1402), and the left shift shifter (1402) is connected with the addition arithmetic unit (1403).

7. The parallel bit-vector transform partial product add one-dimensional DCT/IDCT transformer of claim 4, wherein said nxn core transform matrix memory (1201) stores a DCT/IDCT core transform matrix whose elements are represented as 8-bit integers represented by two's complement and sends the elements of the transform matrix in parallel to a parallel vector bit and operator (1210); the above-mentionedNDimension 0 numerical bit vector expander（1202）、NA dimension 1-th numeric bit vector expander (1203)NMaintenance ofM-2 numerical bit vector expanders (1209) respectively willM-1AnNVector expansion of dimension 1 bit elements intoM-1 pieces ofNA vector of 8-bit parity-valued data, and expandingNThe dimension 8-bit data vector is sent to a parallel vector bit and operator (1210) in parallel as a bit extension vector; parallel vector bit AND operator (1210) bitwise compares the bits of the vector with the bits of the dataNDimension bit expansion vector andN×Nof vectors of rows of a core transformation matrixNbit-AND the data to obtainN×N×(M-1) dimension bit and result matrix; the parallel partial product generator (1211) willN×N×(M-1) adding the dimension bits to the result matrix in row direction to obtainN×(M-1) a dimensional partial product matrix.

8. The parallel bit vector transform partial product add one-dimensional DCT/IDCT transformer of claim 7, wherein the DCT/IDCT transformer is configured as a single-dimensional DCT/IDCT transformerN×NThe negative core transformation matrix memory (1301) stores a negative matrix of the DCT/IDCT core transformation matrix, wherein the negative matrix is a product matrix of-1 and the DCT/IDCT core transformation matrix, elements of the negative matrix are expressed as 8-bit integers expressed by binary complement codes, and the elements of the negative matrix are parallelly sent to the vector bit AND operator (1310);Nthe dimension sign bit vector expander (1302) willNThe bit-symbol vector output by the dimension-symbol bit-vector output unit (1010) is expanded intoNA vector of 8-bit parity-valued data, and expandingNThe dimension 8-bit sign vector is sent to a vector bit and operator (1310) as a sign bit extension vector; a vector bit AND operator (1310) expands the sign bit by a vector AND in data bit orderN×NRow vector of negative kernel transform matrixNbit-AND the data to obtainN×NDimension bits and result matrices; the partial product generator (1311) willN×NAdding dimension bits and result matrix in row direction to obtainNThe partial product vector is dimensional.

9. The one-dimensional DCT/IDCT transform of parallel bit-vector transform partial product add of claim 8-a transformer, characterized in that said symbol expander (1401) outputsN×MThe dimensional partial product extends the sign matrix and,N×Mthe 0 th column of the dimensional partial product spread symbol matrix is output from the parallel partial product generator (1211)N×(M-1) column 0 expansion of the dimensional partial product matrixM-1 bit sign gives,N×MThe 1 st column of the dimensional partial product spread symbol matrix is output from the parallel partial product generator (1211)N×(M-1) column extension of the dimensional partial product matrixM-2-bit symbol acquisitionN×MFirst of the dimensional partial product spread symbol matrixM-2 columns are output from the parallel partial product generator (1211)N×(M-1) first of the dimensional partial product matrixM-2 columns are extended by 1-bit symbols,N×Mfirst of the dimensional partial product spread symbol matrixM-1 column is output by the partial product generator (1311)NA dimensional partial product vector; left shift shifter (1402) outputN×MThe dimensional partial product extends the sign left shift matrix,N×Mcolumn 0 of the dimensional partial product spread symbol left shift matrix isN×MColumn 0 of the dimensional partial product spread symbol matrix, column 1 thereof isN×MThe elements in column 1 of the dimensional partial product spread symbol matrix are shifted left by 1 bit, and column 2 isN×MThe elements of column 2 of the dimensional partial product spread symbol matrix are left-shifted by 2, aMColumn-1 isN×MFirst of the dimensional partial product spread symbol matrixM-1 column of elements left-shiftedM-1 position; an addition operator (1403) comparesN×MAnd adding the left shift matrix of the dimension partial product expansion sign according to each element in the row direction to obtain an N-dimension data vector DCT/IDCT conversion result.

10. The parallel bit-vector transform partial product add one-dimensional DCT/IDCT transformer of claim 8, wherein said one-dimensional DCT/IDCT transformer is characterized byNA dimension symbol bit vector expander (1302),NA dimension 0-th numeric bit vector expander (1202),NA dimension 1-th numeric bit vector expander (1203)NMaintenance ofM-2 the numeric bit vector expander (1209) willNVector expansion of dimension 1 bit elements intoNAll the methods of the dimension 8-bit parity value data vector: if the original data is the firstnComponent 1, the data after expansionnThe component is 11111111, if the original data is the first onenComponent 0, the data after expansionnThe component is 00000000, andn=1,2, ......,N。