CN105828071A - Deblocking filtering vectorization realization method facing vector processor - Google Patents

Deblocking filtering vectorization realization method facing vector processor Download PDF

Info

Publication number
CN105828071A
CN105828071A CN201610194300.7A CN201610194300A CN105828071A CN 105828071 A CN105828071 A CN 105828071A CN 201610194300 A CN201610194300 A CN 201610194300A CN 105828071 A CN105828071 A CN 105828071A
Authority
CN
China
Prior art keywords
vector
filtering
block
result
depositor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610194300.7A
Other languages
Chinese (zh)
Other versions
CN105828071B (en
Inventor
陈胜刚
万江华
刘胜
王耀华
陈小文
刘仲
陈海燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201610194300.7A priority Critical patent/CN105828071B/en
Publication of CN105828071A publication Critical patent/CN105828071A/en
Application granted granted Critical
Publication of CN105828071B publication Critical patent/CN105828071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements

Abstract

The present invention provides a deblocking filtering vectorization realization method facing a vector processor. The method comprises the steps: S1: data preparation: inputting n*m filtered video data blocks into a vector memory bank, and performing vectorization; S2: horizontal filtering operation; S3: result storage: selecting the final result (p3, p2, p1, p0, q0, q1, q2 and q3) for each PE and the p3 and q3 values (p3, p2', p1', p0', q0', q1', q2' and q3') for each PE according to the result of the step S2, and storing the final results and the p3 and q3 values in a matrix register file; S4: repeating the step S2 and the step 3 until the completion of filtering all the boundaries at the horizontal direction; S5: performing vertical filtering; S6: performing result storage: selecting the final result (p3, p2, p1, p0, q0, q1, q2 and q3) for each PE and the p3 and q3 values (p3, p2', p1', p0', q0', q1', q2' and q3') for each PE according to the results of the step 5, and directly storing the final results and the p3 and q3 values into the vector memory bank; and the S7: repeating the step 5 and the step 6 until completion of filtering all the boundaries at the vertical direction. The deblocking filtering vectorization realization method facing a vector processor has advantages of efficient calculation, fully performed multi-PED cooperation of the vector processor and shortened operation time, etc.

Description

The block elimination filtering vectorization implementation method of vector processor-oriented
Technical field
Present invention relates generally to vector processor and coding and decoding video field, refer in particular to the vectorization implementation method of the block-eliminating effect filtering of a kind of vector processor-oriented.
Background technology
In coding and decoding video algorithm, block-based prediction, compensate, change, quantization can cause blocking effect, has a strong impact on the subjective perceptual quality rebuilding image.In order to eliminate the blocking effect of image, generally require and reconstruction image is carried out block filtering, and international standard is brought block-eliminating effect filtering algorithm in the loop of encoding and decoding algorithm the most especially, referred to as in-loop deblocking effect filtering (in-loopdeblockingfiltering);Owing to each border of encoding block is required for being filtered judgement, calculates and repeatedly updates storage etc. so that deblocking filter algorithm consumes the computation complexity of decoder more than 1/3rd.Therefore, use the execution speed accelerating block-eliminating effect filtering significant for real-time high-definition video encoding and decoding.
The usual way accelerating block-eliminating effect filtering is parallelization.Researcher often uses specialized hardware to accelerate block-eliminating effect filtering algorithm, and the shortcoming of this method is very flexible, standard update the most frequently in the case of expense huge;Simultaneously, it is necessary to use special transposition circuit to process in block-eliminating effect filtering algorithm the access to raw column data.Therefore, programmable way more market.
But, conventional single-core processor is difficult to the calculating demand meeting real-time decoder to block-eliminating effect filtering, and polycaryon processor is the loosest due to coupling, and internuclear data transmission expense is relatively big, is not the most also suitable for block-eliminating effect filtering parallelization and accelerates.In this case, vector processor becomes first-selection.Vector processor is typically made up of multiple processing units (PE), and between PE, closely, each PE comprises independent multiple functional parts, such as multiplying unit, adding unit, shifting part etc. in coupling.It is carried out very long instruction word (VLIW) instruction every PE, comprises and multiple perform bag, do not share the functional part of streamline and can perform multiple to perform bag simultaneously.Each PE comprises one group of local register, and the local register of the same numbering of all PE forms the most again a vector registor.Such as in Fig. 1, all R0 depositors of PE_0~PE_M-1 logically constitute the element that vector registor VR0, the R0 corresponding to each PE are referred to as vector registor VR0.Meanwhile, vector processor often can provide the matrix register file accessed for matrix ranks, can effectively meet the storage requirements for access of deblocking effect filtering different directions filtering.
But, block-eliminating effect filtering algorithm self adaptation is relatively strong, the execution route factor data source of adjacent boundary and different, and same data needs are carried out discontinuously, are repeatedly read and write, therefore, how realizing the vectorization to block-eliminating effect filtering algorithm on vector processor and calculating acceleration is a difficult point.
Summary of the invention
The technical problem to be solved in the present invention is that the technical problem existed for prior art, and the present invention provides a kind of principle to be simple and convenient to operate, can efficiently calculate, give full play to the block elimination filtering vectorization implementation method of the vector processor-oriented of vector processor many PE cooperation, shortening operation time.
For solve above-mentioned technical problem, the present invention by the following technical solutions:
The block elimination filtering vectorization implementation method of a kind of vector processor-oriented, the steps include:
S1: data prepare;Input n × m by filtering video data block to vector memory bank, and carry out being vectorization;
S2: horizontal filtering operation;Selecting to be currently needed for the horizontal boundary of filtering, each PE reads from vector memory bank needs the view data (p3, p2, p1, p0, q0, q1, q2, q3) of filtering;Use view data (p3, p2, p1, p0, q0, q1, q2, q3) and constant to calculate judgment condition, and be stored in vector condition depositor;Rule according to block-eliminating effect filtering algorithm calculates all results of (p3, p2, p1, p0, q0, q1, q2, q3), is stored in respectively in partial vector depositor;
S3: result stores;Result according to step S2 is that each PE selects the final result of (p3, p2, p1, p0, q0, q1, q2, q3) and the value (p3, p2 ', p1 ', p0 ', q0 ', q1 ', q2 ', q3) of p3 and q3, is stored in matrix register file;
S4: repeat step S2 and step S3, until all boundary filterings of horizontal direction are complete;
S5: vertical filtering;Selecting to be currently needed for the border of filtering, each PE reads view data (p3, p2, the p1 needing filtering from matrix register file, p0, q0, q1, q2, q3), use in matrix register file and have passed through the data of horizontal filtering as initial data, select view data (p3, p2, the p1 of vertical direction, p0, q0, q1, q2, q3) and constant calculating judgment condition, and be stored in vector condition depositor;Rule according to block-eliminating effect filtering algorithm calculates all results of (p3, p2, p1, p0, q0, q1, q2, q3), is stored in respectively in partial vector depositor;
S6: result stores;Result according to step S5 is that each PE selects the final result of (p3, p2, p1, p0, q0, q1, q2, q3) and the value (p3, p2 ', p1 ', p0 ', q0 ', q1 ', q2 ', q3) of p3 and q3, is directly stored in vector memory bank;
S7: repeat step 5 and step 6, until all boundary filterings of vertical direction are complete.
As a further improvement on the present invention: described step S3 and step S6 select the operation of final result comprise the steps of
S100: assume that the calculating candidate result of the pi on border corresponding for each PE is made up of R0~Rk-1, these results form a complete or incomplete binary tree;Final result for any one PE, pi is necessarily present in R0~Rk-1;R0~Rk-1 is launched to be written as according to the number of PEThe conditional matrix of conditional operation is obtained according to corresponding judgment condition in block-eliminating effect filtering algorithmWherein
S200: according to conditional matrix, operated by k vector condition MOV, the final result Pi of pi can be obtained;That is, Pi=∑ Ri Ci;
S300: repeat step S100, S200, until p2, the result of p1, p0, q0, q1, q2 selects complete.
As a further improvement on the present invention: the concrete operation method of described step S3 and the operation of step S6 conditional is: assume that vector processor currently performs vector instruction Inst, simultaneously conditional register R0, R0={R01,R02,...,R0M-1, the most corresponding PE0~PEM-1.If R0i==1, then PEiPerform instruction Inst, otherwise PEiPerform do-nothing operation.
As a further improvement on the present invention: described vector memory bank includes M memory block, described M memory block and M vector PE one_to_one corresponding successively;M memory block unified addressing, intersects by BANK and deposits;That is, first character is deposited at first BANK, and second word is deposited at second BANK ..., until m-th word is deposited in m-th BANK;Then the M+1 word is deposited at first BANK again ..., the like;Each memory block is divided into memory block and lower memory block and supports to carry out two vectorial accessing operations simultaneously.
As a further improvement on the present invention: described vector matrix register file is made up of M × M memory element, and the bit wide of each memory element is generally 4,8,12,16,32, this array is logically by M row vector depositor VR0—VRM-1Or M column vector CVR0—CVRM-1Depositor forms;Each row vector depositor comprises M element Ei,0—Ei,M-1, wherein i=0,1,2 ... M-1, each column vector depositor comprises M element E0,i—EM-1,i, wherein i=0,1,2 ... M-1;Matrix register completes reading and the write of ranks vector under the control of read-write enable, read/write address and row array selecting signal.
Compared with prior art, it is an advantage of the current invention that:
1, the vectorization implementation method of the block-eliminating effect filtering of the vector processor-oriented of the present invention, the filtering operation on M border can be performed simultaneously, effectively accelerate the speed of block-eliminating effect filtering, for other functional modules reserved more sufficient time of real-time high-definition video encoding and decoding.The parallel method of this vectorization can make full use of the vectorial calculation features of vector processor, excavate the concurrency of vector processor, fully develops the data parallelism of block-eliminating effect filtering algorithm, it is possible to increase substantially its operational performance.
2, the vectorization implementation method of the block-eliminating effect filtering of the vector processor-oriented of the present invention, by conditional operation, so that vector processor processes the multiple-limb program triggered due to source data.What each PE of vector processor obtained from dispatch unit is same instruction, and it performs cycle and typically performs the different branch instruction of result not across each PE.The present invention passes through conditional operation so that vector processor can be smoothly performed the instruction comprising multiple-limb target.
Accompanying drawing explanation
Fig. 1 is the general structure schematic diagram of vector processor.
Fig. 2 is the present invention the most vertically and horizontally boundary filtering pixel schematic diagram.
Fig. 3 is the schematic flow sheet of the inventive method.
Fig. 4 is the present invention vectorial bank structure schematic diagram in concrete application example.
Fig. 5 is present invention filtering data a kind of location mode schematic diagram in vector memory bank in concrete application example.
Fig. 6 is the memory cell array structure schematic diagram of the matrix register used in the present invention.
Detailed description of the invention
Below with reference to Figure of description and specific embodiment, the present invention is described in further details.
As it is shown on figure 3, the block elimination filtering vectorization implementation method of the vector processor-oriented of the present invention, comprise the following steps:
S1: data prepare;
S101: input n × m by filtering video data block to vector memory bank in.
S102: constant required for loading algorithm from vector memory bank, and be vectorization to it.
S2: horizontal filtering operation;
S201: select to be currently needed for the horizontal boundary of filtering, each PE read from vector memory bank needs the view data (p3, p2, p1, p0, q0, q1, q2, q3) of filtering;
S202: use view data (p3, p2, p1, p0, q0, q1, q2, q3) and constant to calculate judgment condition, and be stored in vector condition depositor;
S203: calculate all results of (p3, p2, p1, p0, q0, q1, q2, q3) according to the rule of block-eliminating effect filtering algorithm, be stored in respectively in partial vector depositor.
S3: result stores;
The result of the judgment condition depositor according to block-eliminating effect filtering algorithm, selects (p3, p2 for each PE, p1, p0, q0, q1, q2, q3) final result and the value (p3 of p3 and q3, p2 ', p1 ', p0 ', q0 ', q1 ', q2 ', q3), it is stored in matrix register file.
S4: repeat step S2 and step S3, until all boundary filterings of horizontal direction are complete.
S5: vertical filtering;
S501: select to be currently needed for the border of filtering, each PE reads the view data (p3, p2, p1, p0, q0, q1, q2, q3) needing filtering from matrix register file.
S502: use the data that have passed through horizontal filtering in matrix register file as initial data, select the view data (p3, p2, p1, p0, q0, q1, q2, q3) of vertical direction and constant to calculate judgment condition, and be stored in vector condition depositor.
S503: calculate all results of (p3, p2, p1, p0, q0, q1, q2, q3) according to the rule of block-eliminating effect filtering algorithm, be stored in respectively in partial vector depositor.
S6: result stores;
The result of the judgment condition depositor according to block-eliminating effect filtering algorithm, selects (p3, p2 for each PE, p1, p0, q0, q1, q2, q3) final result and the value (p3 of p3 and q3, p2 ', p1 ', p0 ', q0 ', q1 ', q2 ', q3), vector memory bank it is directly stored in.
S7: repeat step 5 and step 6, until all boundary filterings of vertical direction are complete.
By the said method of the present invention, can support that block elimination filtering vectorization calculates efficiently, give full play to the computation capability of whole PE of vector processor, be effectively improved the execution efficiency of vector processor, shorten operation time.
In concrete application example, above-mentioned vector memory bank includes M memory block, described M memory block and M vector PE one_to_one corresponding successively;M memory block unified addressing, intersect by BANK and deposit (referring to that first character is deposited at first BANK, second word is deposited at second BANK ..., until m-th word deposits in m-th BANK.Then the M+1 word is deposited at first BANK again ..., the like);Each memory block is divided into memory block and lower memory block and supports to carry out two vectorial accessing operations simultaneously.
In concrete application example, vector matrix register file is made up of M × M memory element, and the bit wide of each memory element is generally 4,8,12,16,32, and this array logically can be regarded as by M row vector depositor VR0—VRM-1Or M column vector CVR0—CVRM-1Depositor forms.Each row vector depositor comprises M element (memory element) Ei,0—Ei,M-1(i=0,1,2 ... M-1), each column vector depositor comprises M element E0,i—EM-1,i(i=0,1,2 ... M-1).Matrix register completes reading and the write of ranks vector under the control of read-write enable, read/write address and row array selecting signal.
In concrete application example, the concrete operation method of step S3 and the operation of step S6 conditional is: assume that vector processor currently performs vector instruction Inst, simultaneously conditional register R0, R0={R01,R02,...,R0M-1, the most corresponding PE0~PEM-1.If R0i==1, then PEiPerform instruction Inst, otherwise PEiPerform do-nothing operation.
In concrete application example, step S3 and step S6 select the operation of final result comprise the steps of
S100: assume that the calculating candidate result of the pi on border corresponding for each PE is made up of R0~Rk-1, these results form a complete or incomplete binary tree.Final result for any one PE, pi is necessarily present in R0~Rk-1.R0~Rk-1 is launched to be written as according to the number of PEThe conditional matrix of conditional operation is obtained according to corresponding judgment condition in block-eliminating effect filtering algorithmWherein
S200: according to conditional matrix, operated by k vector condition MOV, the final result Pi of pi can be obtained.That is, Pi=∑ Ri Ci.
S300: repeat step S100, S200, until p2, the result of p1, p0, q0, q1, q2 selects complete.
As in figure 2 it is shown, be the present invention vertically and horizontally filtering boundary schematic diagram in a concrete application example.Wherein, corresponding for abcdefgh dotted line is the filtering boundary of this block.In h .264, the image block in Fig. 2 is 16 × 16 pixels, and the block of pixels in each dotted line frame is 4 × 4 pixels.Filter involved pixel each time and include 4 pixels (p3, p2, p1, p0) and (q0, q1, q2, the q3) on filtering boundary both sides.
In conjunction with Fig. 3, as a example by the in-loop deblocking effect filtering algorithm in H.264 standard, the present invention comprises the following steps in instantiation:
(1) input data: the macro block of each 16 × 16 in H.264, as in figure 2 it is shown, block-eliminating effect filtering will be carried out.The macro block of 16 × 16 is divided into the sub-block of 16 4 × 4, and the border of the sub-block of each 4 × 4 is filtering boundary, i.e. abcdefgh border shown in Fig. 2.
As shown in Figure 4, vector memory bank is made up of M=16 block (BANK_0~BANK_15), and with PE_0~the PE_15 one_to_one corresponding of vector processing unit, 16 BANK unified addressing, intersect by BANK and deposit, data sharing can be carried out, the data access of high bandwidth is provided for 16 PE;Each BANK supports multiport to access by the intersection type of organization of body more than two groups, and (multiport includes two vectorial accessing operation ports, also include DMA port and scalar memory access port), i.e. it is divided into upper and lower two memory blocks, two vectorial accessing operations can be supported simultaneously.
In the present embodiment, the macro block data of 16 × 16 deposit position in vector memory bank is as shown in Figure 5.16 data to be filtered of every a line can once be loaded in local register participation computing;The result finally filtered can Store data line in vector memory, be then transported in external memory storage by DMA.
(2) (p3 is calculated, p2, p1, p0, q0, q1, q2, q3) filter result: 16 PE once can calculate 16 row bounds or row border, and this vector operations pattern is greatly reduced the time required for filtering, calculates all row bounds or the filtering on row border of the macro block that just can complete 16 × 16 for only 4 times.
In the present embodiment, the value mode that the filter result of (p3, p2, p1, p0, q0, q1, q2, q3) is possible has (with the P2 of capitalization, P1, P0, P0, P1, P2 represent):
P0=Min (Max (0, p0+d), 255)
P0=(p2+2*p1+2*p0+2*q0+q1+4) > > 3
P1=(p2+p1+p0+q0+2) > > 2
P1=p1+Min (Max (-C0, (p2+ (p0+q0)>>1-(p1<<1))>>1), C0)
P2=(2*p3+3p2+p1+p0+q0+4) > > 3
Q0=Min (Max (0, q0-d), 255)
Q0=(q2+2*q1+2*q0+2*p0+p1+4) > > 3
Q1=q1+Min (Max (-C0, (p2+ (p0+q0)>>1-(p1<<1))>>1), C0)
Q1=(q2+q1+q0+p0+2) > > 2
Q2=(2*q3+3q2+q1+q0+p0+4) > > 3
D=Min (Max (-C, (((q0-p0)<<2+ (p1-q1)+4)>>3)), C)
Wherein, variable C and C0 is the set parameter of in-loop deblocking effect filtering.
In the present embodiment, p3 and q3 is not modified in filtering operation, and (p2, p1, p0, q0, q1, q2) is likely to not be modified.
(3) judgment condition is calculated:
Control the relation between parameter and the source data that filtered according to filtering, calculate filtering judgment condition.Each judgment condition determines that the value of a PSW is true or false.
In the present embodiment, filtering judgment condition includes: the threshold value etc. of the difference between difference between filtering strength, neighbor, neighbor.
(4) screening final result:
According to the condition of step (3), each possible filter result is performed a condition MOV operation, obtains correct filter result the most at last.
(5) storage filter result:
If row filtering, then filter result is stored in matrix register file;Otherwise it is directly stored in vector memory bank, is saved in user's space with external data transmission engines such as DMA.
(6) if filtering is also not fully complete, then step (2)~(5) are continued until all filtering boundaries all complete Filtering position.
See Fig. 6, for present invention memory cell array structure schematic diagram of matrix register in concrete application example.The memory cell array of matrix register is typically made up of N*N memory element, the exponential of the usual position of N 2.The bit wide of each memory element is that W, W are generally 4,8,12,16,32.This array logically can regard N number of row vector depositor VR as0—VRN-1Or N number of column vector CVR0—CVRN-1Depositor forms, and each row vector depositor comprises N number of element (memory element) Ei,0—Ei,N-1(i=0,1,2 ... N-1).With VR0As a example by, this row vector depositor includes memory element E0,0—E0,N-1.This memory cell array divided by column is the column of memory cells of N number of N*W position, and each column is elementary composition by N number of same column.This N number of column of memory cells and N number of column vector depositor CVR0—CVRN-1One_to_one corresponding, for realizing the access facility of respective column vector registor.With CVRN-1As a example by, this column vector depositor includes all row vector depositor VR0—VRN-1Last element Ei,N-1(i=0,1,2 ... N-1).
Below being only the preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-described embodiment, and all technical schemes belonged under thinking of the present invention belong to protection scope of the present invention.It should be pointed out that, for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, should be regarded as protection scope of the present invention.

Claims (5)

1. the block elimination filtering vectorization implementation method of a vector processor-oriented, it is characterised in that step is:
S1: data prepare;Input n × m by filtering video data block to vector memory bank, and carry out being vectorization;
S2: horizontal filtering operation;Selecting to be currently needed for the horizontal boundary of filtering, each PE reads from vector memory bank needs the view data (p3, p2, p1, p0, q0, q1, q2, q3) of filtering;Use view data (p3, p2, p1, p0, q0, q1, q2, q3) and constant to calculate judgment condition, and be stored in vector condition depositor;Rule according to block-eliminating effect filtering algorithm calculates all results of (p3, p2, p1, p0, q0, q1, q2, q3), is stored in respectively in partial vector depositor;
S3: result stores;Result according to step S2 is that each PE selects the final result of (p3, p2, p1, p0, q0, q1, q2, q3) and the value (p3, p2 ', p1 ', p0 ', q0 ', q1 ', q2 ', q3) of p3 and q3, is stored in matrix register file;
S4: repeat step S2 and step S3, until all boundary filterings of horizontal direction are complete;
S5: vertical filtering;Selecting to be currently needed for the border of filtering, each PE reads view data (p3, p2, the p1 needing filtering from matrix register file, p0, q0, q1, q2, q3), use in matrix register file and have passed through the data of horizontal filtering as initial data, select view data (p3, p2, the p1 of vertical direction, p0, q0, q1, q2, q3) and constant calculating judgment condition, and be stored in vector condition depositor;Rule according to block-eliminating effect filtering algorithm calculates all results of (p3, p2, p1, p0, q0, q1, q2, q3), is stored in respectively in partial vector depositor;
S6: result stores;Result according to step S5 is that each PE selects the final result of (p3, p2, p1, p0, q0, q1, q2, q3) and the value (p3, p2 ', p1 ', p0 ', q0 ', q1 ', q2 ', q3) of p3 and q3, is directly stored in vector memory bank;
S7: repeat step 5 and step 6, until all boundary filterings of vertical direction are complete.
The block elimination filtering vectorization implementation method of vector processor-oriented the most according to claim 1, it is characterised in that select the operation of final result to comprise the steps of in described step S3 and step S6
S100: assume that the calculating candidate result of the pi on border corresponding for each PE is made up of R0~Rk-1, these results form a complete or incomplete binary tree;Final result for any one PE, pi is necessarily present in R0~Rk-1;R0~Rk-1 is launched to be written as according to the number of PEThe conditional matrix of conditional operation is obtained according to corresponding judgment condition in block-eliminating effect filtering algorithmWherein
S200: according to conditional matrix, operated by k vector condition MOV, the final result Pi of pi can be obtained;That is, Pi=∑ Ri Ci;
S300: repeat step S100, S200, until p2, the result of p1, p0, q0, q1, q2 selects complete.
The block elimination filtering vectorization implementation method of vector processor-oriented the most according to claim 1, it is characterized in that, the concrete operation method of described step S3 and the operation of step S6 conditional is: assume that vector processor currently performs vector instruction Inst, simultaneously conditional register R0, R0={R01,R02,...,R0M-1, the most corresponding PE0~PEM-1;If R0i==1, then PEiPerform instruction Inst, otherwise PEiPerform do-nothing operation.
4. according to the block elimination filtering vectorization implementation method of the vector processor-oriented described in any one in claims 1 to 3, it is characterised in that described vector memory bank includes M memory block, described M memory block and M vector PE one_to_one corresponding successively;M memory block unified addressing, intersects by BANK and deposits;That is, first character is deposited at first BANK, and second word is deposited at second BANK ..., until m-th word is deposited in m-th BANK;Then the M+1 word is deposited at first BANK again ..., the like;Each memory block is divided into memory block and lower memory block and supports to carry out two vectorial accessing operations simultaneously.
5. according to the block elimination filtering vectorization implementation method of the vector processor-oriented described in any one in claims 1 to 3, it is characterized in that, described vector matrix register file is made up of M × M memory element, the bit wide of each memory element is generally 4,8,12,16,32, and this array is logically by M row vector depositor VR0—VRM-1Or M column vector CVR0—CVRM-1Depositor forms;Each row vector depositor comprises M element Ei,0—Ei,M-1, wherein i=0,1,2 ... M-1, each column vector depositor comprises M element E0,i—EM-1,i, wherein i=0,1,2 ... M-1;Matrix register completes reading and the write of ranks vector under the control of read-write enable, read/write address and row array selecting signal.
CN201610194300.7A 2016-03-31 2016-03-31 The deblocking filtering vectorization implementation method of vector processor-oriented Active CN105828071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610194300.7A CN105828071B (en) 2016-03-31 2016-03-31 The deblocking filtering vectorization implementation method of vector processor-oriented

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610194300.7A CN105828071B (en) 2016-03-31 2016-03-31 The deblocking filtering vectorization implementation method of vector processor-oriented

Publications (2)

Publication Number Publication Date
CN105828071A true CN105828071A (en) 2016-08-03
CN105828071B CN105828071B (en) 2019-05-24

Family

ID=56525347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610194300.7A Active CN105828071B (en) 2016-03-31 2016-03-31 The deblocking filtering vectorization implementation method of vector processor-oriented

Country Status (1)

Country Link
CN (1) CN105828071B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012802A (en) * 2010-11-25 2011-04-13 中国人民解放军国防科学技术大学 Vector processor-oriented data exchange method and device
CN102231202A (en) * 2011-07-28 2011-11-02 中国人民解放军国防科学技术大学 SAD (sum of absolute difference) vectorization realization method oriented to vector processor
CN102801973A (en) * 2012-07-09 2012-11-28 珠海全志科技股份有限公司 Video image deblocking filter method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012802A (en) * 2010-11-25 2011-04-13 中国人民解放军国防科学技术大学 Vector processor-oriented data exchange method and device
CN102231202A (en) * 2011-07-28 2011-11-02 中国人民解放军国防科学技术大学 SAD (sum of absolute difference) vectorization realization method oriented to vector processor
CN102801973A (en) * 2012-07-09 2012-11-28 珠海全志科技股份有限公司 Video image deblocking filter method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李勇: "H.264核心算法在SIMD向量DSP上的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN105828071B (en) 2019-05-24

Similar Documents

Publication Publication Date Title
US8346833B2 (en) Filter and method for filtering
US20060002471A1 (en) Motion estimation unit
RU2623806C1 (en) Method and device of processing stereo images
WO2022206556A1 (en) Matrix operation method and apparatus for image data, device, and storage medium
US10540734B2 (en) Processor, system, and method for efficient, high-throughput processing of two-dimensional, interrelated data sets
JPS62208158A (en) Multiprocessor system
US10169295B2 (en) Convolution operation device and method
JPH09231182A (en) Data processor and method for processing data
JP2011141823A (en) Data processing device and parallel arithmetic device
CN101888554B (en) VLSI (Very Large Scale Integration) structure design method for parallel flowing motion compensating filter
CN111045727B (en) Processing unit array based on nonvolatile memory calculation and calculation method thereof
CN105828071A (en) Deblocking filtering vectorization realization method facing vector processor
US10332239B2 (en) Apparatus and method for parallel polyphase image interpolation
JP2013512479A (en) Apparatus and method enabling efficient time and area access to square matrix distributed and stored in internal memory of processing element operating in SIMD mode and its transpose matrix
US9798550B2 (en) Memory access for a vector processor
CN101472172B (en) Method for processing filtering previous data of video decode as well as decoder
US8731071B1 (en) System for performing finite input response (FIR) filtering in motion estimation
JP2018173956A (en) Semiconductor device
CN110087088B (en) Data storage method based on motion estimation, terminal equipment and storage medium
US9317474B2 (en) Semiconductor device
CN110659446B (en) Convolution operation control method, device and medium
WO2022047403A1 (en) Memory processing unit architectures and configurations
JP4482356B2 (en) Image processing method and image processing apparatus using SIMD processor
RU168781U1 (en) STEREO IMAGE PROCESSING DEVICE
JP3860545B2 (en) Image processing apparatus and image processing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant