Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, motion vector search method and device in a kind of video coding are provided, can improve the speed of motion-vector search.
In order to achieve the above object, the invention provides following technical scheme:
Motion vector search method in a kind of video coding comprises the steps:
Step S1 utilizes the search window data that are stored in the current coding macro block in the on-chip memory, carries out the motion vector that matching operation obtains current coding macro block;
Step S2 reads the nonoverlapping search window data of search window of the search window of next coded macroblocks and current coding macro block and stores into the on-chip memory from chip external memory;
Step S3 utilizes the equitant search window data of next coded macroblocks and current coding macro block and described nonoverlapping search window data, carries out the motion vector that matching operation obtains next coded macroblocks.
Further, storage described in the step S1 is by row storage continuously.
Further, the matching criterior of the matching operation among the described step S1 adopts the absolute value and the minimum method of margin of image element.
Further, in step S2, also comprise the step of described nonoverlapping search window data being carried out buffer memory at on-chip memory.
Further, in step S2, comprise that also the data that the search window of the search window of next coded macroblocks and current macro is overlapping and described nonoverlapping search window data set synthesize at on-chip memory by the row step of next coded macroblocks search window of storage continuously.
Further, described the initial address of next coded macroblocks search window of storage is by going continuously at on-chip memory: the current window initial address adds the macroblock level width.
Further, described be combined into by row continuously the search window of next coded macroblocks of storage be to realize by the search window data that the memory location of controlling described nonoverlapping search window data division covers current coding macro block in row and separately.
Further, the original position deposited of the every row of described nonoverlapping search window data division is:
S+i·M
Wherein S is the initial address of current search window, and M is the horizontal width of search window, and i is that value is 1 to N capable number, and N is the search window vertical height.
Further, to be in same coded macroblocks capable for described next coded macroblocks among the step S2 and described current coding macro block.
Further, among the described step S2, also comprise and judge whether current coding macro block is the step of last capable coded macroblocks of place coded macroblocks, if judged result is a current coding macro block is last capable coded macroblocks of place coded macroblocks, it is capable then to handle next coded macroblocks, the search window data of first coded macroblocks that next coded macroblocks is capable read the on-chip memory from chip external memory, and execution in step S1 is to step S3.
Further, comprise that also the picture frame with to be encoded is divided into the step of a plurality of coded macroblockss before the step S1.
A kind of motion-vector search device comprises:
Processing unit is used to control inputoutput data and motion-vector search computing;
Storage unit in high speed is connected with described processing unit, is used for the memory search window data;
Described processing unit reads the nonoverlapping search window data of search window of the search window of next coded macroblocks and current coding macro block and is stored in the described storage unit in high speed.
Further, described processing unit comprises by the continuous memory module of row, is used for the search window data of synthetic next coded macroblocks stored continuously by row of the overlapping data of current coding macro block that described storage unit in high speed is stored and the search window of next coded macroblocks and described nonoverlapping data set.
Further, this motion-vector search device also comprises: input unit, be connected with described processing unit, and be used for from outer input data; Output unit is connected with described processing unit, is used for to external data output.
Further, described storage unit in high speed comprises not second high-speed memory of overlapped data of first high-speed memory that is used for the memory search window data and the search window that is used for next coded macroblocks of buffer memory and current search window.
Further, the capacity of described first high-speed memory is:
C1=M * N+W, unit: byte;
Wherein, M is the search window horizontal width, and N is the search window vertical height, and W is the width of current encoded image.
Further, the capacity of described second high-speed memory is:
C2=16 * N, unit: byte;
In the formula, N is the vertical height of search window.
The beneficial effect that the present invention produces is:
The present invention makes full use of the reference frame data of lap of the search window of current macro and adjacent last macro block, these lap data do not need to repeat to move on-chip memory from chip external memory, in on-chip memory, do not need to move yet, and make what the interior data of search window were always deposited continuously in on-chip memory.Particularly, the present invention has the advantage of following three aspects:
1. to macro block contiguous in the delegation, avoid the data of the lap of search window in the outer slow storage of its sheet of repeated accesses, significantly reduced the number of times of search procedure visit chip external memory.
2. processor reads reference frame data and adopts direct memory access (DMA) (DMA) method reading of data in advance outside sheet, and is parallel with processor, seldom takies the processor time.
3. to the macro block with delegation's vicinity, the lap data of its search window need not moved in on-chip memory, by controlling the deposit position of the data of newly moving into, the search window data that equally yet can be deposited continuously.
The present invention produces the beneficial effect of above 3 aspects, has accelerated the efficient of motion-vector search process greatly.The present invention can be used for the design of video coding chip and the design of embedded video encoding software.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, motion vector search method and device in a kind of video coding of the present invention are further described below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
As a kind of enforceable mode, motion vector search method of the present invention is a block matching method, at first picture frame to be encoded is divided into the macro block of a plurality of fixed sizes, and for example the size of macro block is 16 * 16 pixels; By the macro-block line order of video frame image macro block being encoded one by one then, is exactly by the repetitive process to delegation's macroblock coding to a frame video image.At each macro block, searched the piece of coupling in front in the coded frame in the predefined region of search (search window), the displacement difference that obtains is motion vector.The front the coded frame image to be also referred to as be reference frame image.Motion vector search method in the video coding of the present invention comprises the steps:
Step S1 utilizes the search window data that are stored in the current coding macro block in the on-chip memory, carries out the motion vector that matching operation obtains current coding macro block.
The search window data are parts of reference frame image, and reference frame image is stored in the outer slow storage of processor piece.During searching motion vector, processor wants earlier the slow storage of search window data outside sheet with current macroblock to be encoded to read the high-speed memory in the sheet.The size of search window is predefined, is the rectangular area of M * N pixel, the piece for 16 * 16, if level and vertical search scope all are :-16~+ 16, M=N=48 then.
Preferably, to on-chip memory, its storage organization is that the search window data are stored in the on-chip memory continuously by row with the search window transfer of data in the reference frame, and the initial address of storage is P; And the first address S=P of current search window is set.
Preferably, comprise also among the step S1 that a plurality of sub-piece to this macro block carries out matching operation and obtains the motion vector that this macro block is respectively divided sub-piece.For accurate coupling, a macroblock partitions is become the experimental process piece, for example the macro block to 16 * 16 pixels can be divided into 2 16 * 8 sub-pieces or 28 * 16 sub-pieces, or 48 * 8 sub-pieces, and each 8 * 8 can be divided downwards again.Each height piece to every kind of partition mode carries out the motion vector that matching operation obtains each height piece of this macro block, and the matching criterior of matching operation adopts the absolute value and the minimum method of margin of image element.For example current macro-block partition mode to be encoded is the image block of 16 * 16 pixels, then calculate 256 pixel values of the image block of 256 pixels of current macro and reference frame image 16 * 16 pixels in the search window and ask poor respectively, calculate again difference absolute value and, choose least absolute value and image block be coupling optimal image piece.In order to choose optimum partition mode, choose the described least absolute value of each sub-piece and the minimum partition mode that adds up be current optimum code pattern.
To the selection of the various mode division of macro block, and carrying out the motion vector that matching operation obtains the various partition modes of macro block, is technology well known to those skilled in the art, gives unnecessary details no longer one by one at this.
Step S2 reads the nonoverlapping data of search window of the search window of next macro block and current macro and stores in the on-chip memory from chip external memory;
Preferably, among the step S2, also comprise the step of the described nonoverlapping search window data that read being carried out buffer memory.
Preferably, among the step S2, comprise that also the data that the search window of the search window of next macro block and current macro is overlapping and described nonoverlapping data set are synthetic by the row step of the search window of the next macro block of storage continuously.
More preferably, among the step S2, the described search window that is combined into the next macro block of storing continuously by row is to realize by controlling the memory location of described not overlapped data in the memory of current search window.And, reading the search window data of next macro block during for search arithmetic, the first address S that the storage first address S ' that the search window of described next macro block is set equals the current search window adds the data block horizontal width of the above nonoverlapping data division.
Specifically, with direct memory access (DMA) (DMA) mode with next macro block search window and current search window not the data read of lap to on-chip memory, next macro block is the right adjacent block of current macro, as shown in Figure 2, the search window of current block is rectangle ABCD, size is a M * N pixel, the search window of its next macro block is rectangle A ' B ' C ' D ', with the search window of current macro not lap be rectangle BB ' C ' C, its size is the rectangular area of 16 * N pixel, be that every row has 16 pixels, N is capable altogether.The original position that capable 16 pixels of i of rectangle BB ' C ' C are deposited in on-chip memory is:
S+i·M
In the formula, S is the first address of current search window, M is the horizontal width of search window, here the i value is 1 to N, N is the search window vertical height, promptly the initial storage address of first row is S+M, and the initial storage address of second row is S+2M, obtains capable 16 original positions that pixel is deposited in on-chip memory of i by that analogy.According to said method after the storage, the address S+16 in on-chip memory is the initial address by the data of the search window of capable next macro block of depositing continuously.Do like this and can improve processor and in the matching operation process, read the speed of search window data, and do not need overlapped data is moved or address mapping in on-chip memory.In order more to prove absolutely method of the present invention, storage and data that Fig. 3 to Fig. 5 illustrates the search window first three rows cover the position, be without loss of generality, it is 88 that picture traverse is set as an example, rectangular search window level width (M) is 16, and next search window and current window not lap width are 4.Fig. 3, Fig. 4 are the position relation and the interior data cases of search window of current search window and next search window.Schematic diagram above among Fig. 5 is search window data situation of depositing continuously in the high-speed memory in sheet of current block, and the schematic diagram below among Fig. 5 is moved for the not lap of next search window and is combined into data distribution situation behind the new search window in the sheet in the high-speed memory.As can be seen from Figure 5, first line data 16,17 of lap not among Fig. 3,18,19 leave current search window data 88,89,90 in, 91 positions, second line data 104,105,106,107 leave 176 in, 177,178,179 positions, the rest may be inferred, according to said method after the storage, among Fig. 5, the position of S ' indication is the first address of new search window, and S '=S+16 is to be that first address leaves in the sheet in the high-speed memory continuously with S ' in conjunction with the data in the new as can be seen search window of Fig. 4.
Step S3 utilizes the nonoverlapping data that read among equitant search window data of next macro block and current macro and the above-mentioned steps S3, carries out the motion vector that matching operation obtains next macro block.
Preferably, among the described step S2, also comprise and judge whether current macro is the step of last macro block of place macro-block line, if judged result is a current macro is last macro block of being expert at, skips steps S3 then, the search window data of first coded macroblocks that next coded macroblocks is capable read the on-chip memory from chip external memory, and execution in step S1 to handle next coded macroblocks to step S3 capable, be current macro among the step S1 and next macro block among the step S2 in same macro-block line, can utilize the lap of the search window of two macro block correspondences like this.
Represent motion-vector search device in the video coding of the present invention as Fig. 1, comprising:
Processing unit 101 is used to control inputoutput data and motion-vector search computing.
Storage unit in high speed 102 in the sheet is connected with described processing unit 101, is used for the memory search window data; And storage unit in high speed is also answered constant, intermediate data and the operation result in the memory encoding process in this sheet.
Input unit 103 is connected with described processing unit 101, is used for importing outside sheet data.
The described input unit 103 of described processing unit 101 controls reads the nonoverlapping data of search window of the search window of next macro block and current macro and is stored in the described interior storage unit in high speed 102.
Preferably, described processing unit 101 also comprises by the continuous memory module of row, is used for data that the current macro of described interior storage unit in high speed 102 storages and the search window of next macro block is overlapping and the current macro and the synthetic search window data by continuous next macro block stored of row of the nonoverlapping data set of search window of next macro block that reads.
Preferably, this motion-vector search device also comprises output unit 104, is connected with described processing unit 101, is used for dateout outside sheet.
As a kind of enforceable mode, described interior storage unit in high speed 102 is high speed primary or L2 caches (CACHE) of processing unit 101 controls in dsp processor or the general processor (CPU).As the enforceable mode of another kind, described interior storage unit in high speed 102 is interior high-speed register groups of sheet of processing unit 101 controls in the video coding chip.
Preferably, described interior storage unit in high speed 102 comprises the first high-speed memory A that is used for the memory search window and is used for the search window of next macro block that buffer memory reads and the current search window second high-speed memory B of overlapped data not outside sheet.The benefit that the second high-speed memory B is set is, can realize the processing of data read and processing unit processes data parallel.If there is not the buffering of the second high-speed memory B, then from reference frame image, directly read among the first high-speed memory A for the search window of current macro and the nonoverlapping data of search window of next macro block, this process could begin after must waiting the current macro search procedure to finish, otherwise can override the useful data of current search window, after this second high-speed memory is set, just do not have this restriction, increased the concurrency of processing unit.For the engineers and technicians of this area, should whether the second high-speed memory B be set according to the easy balance of designed system.
As a kind of enforceable mode, because memory mechanism of the present invention, the capacity of the described first high-speed memory A is as follows:
C1=M * N+W, unit: byte;
Wherein, M is the search window horizontal width, and N is the search window vertical height, and W is the width of current encoded image.
As a kind of enforceable mode, the capacity of the second high-speed memory B is as follows:
C2=16 * N, unit: byte
In the formula, N is the vertical height of search window.
In dsp processor or general processor (CPU), described processing unit 101 is the generic logic arithmetic element of processor; In video coding chip, described processing unit 101 can be for being used for the arithmetic logic unit of motion vector computing.For the specific implementation method of processing unit 101, the engineers and technicians of this area can implement out different feasible programs, can adopt known technology, do not give unnecessary details one by one at this.
By detailed description of the present invention, the beneficial effect that the present invention produces is:
The present invention makes full use of the reference frame data of lap of the search window of current macro and adjacent last macro block, these lap data do not need to repeat to move on-chip memory from chip external memory, in on-chip memory, do not need to move yet, and make what the interior data of search window were always deposited continuously in on-chip memory.Particularly, the present invention has the advantage of following three aspects:
1. to macro block contiguous in the delegation, avoid the data of the lap of search window in the outer slow storage of its sheet of repeated accesses, significantly reduced the number of times of search procedure visit chip external memory.
2. processor reads reference frame data and adopts direct memory access (DMA) (DMA) method reading of data in advance outside sheet, and is parallel with processor, seldom takies the processor time.
3. to the macro block with delegation's vicinity, the lap data of its search window need not moved in on-chip memory, by controlling the deposit position of the data of newly moving into, the search window data that equally yet can be deposited continuously.
The present invention produces the beneficial effect of above 3 aspects, has accelerated the efficient of motion-vector search process greatly.The present invention can be used for the design of video coding chip and the design of embedded video encoding software.
The present invention has improved the efficient of motion-vector search, thereby has improved the coding rate of encoder, especially requires the many encoder performance of motion vector number high, search to promote higher to picture quality.
Above said content; only for the concrete execution mode of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed in protection scope of the present invention.