CN100508607C

CN100508607C - block matching method and device

Info

Publication number: CN100508607C
Application number: CN 200410057549
Authority: CN
Inventors: 刘琨
Original assignee: Realtek Semiconductor Corp
Current assignee: Realtek Semiconductor Corp
Priority date: 2004-08-16
Filing date: 2004-08-16
Publication date: 2009-07-01
Anticipated expiration: 2024-08-16
Also published as: CN1738427A

Abstract

A block matching device comprises a plurality of operation modules, each operation module is used for calculating pixel differences between a plurality of target pixels in a target block and a plurality of reference pixels in a reference block, wherein each operation module comprises a plurality of operation units which are respectively used for calculating the pixel differences between one of the target pixels and one of the reference pixels; and a plurality of adding units respectively coupled to the operation modules, each adding unit for adding the calculation results of the operation units in the corresponding operation module.

Description

Method and device for block matching

技术领域 technical field

本发明涉及一种区块匹配方法与装置，特别涉及一种计算区块之间像素差异的方法与装置。The invention relates to a block matching method and device, in particular to a method and device for calculating pixel differences between blocks.

背景技术 Background technique

许多视频处理的过程(例如MPEG2/MPEG4中的运动预测技术(motionestimation))都需要利用图像区块匹配(block matching)的运算结果。例如，编码一画面中的一目标区块(block)时，需根据该目标区块与时间轴上前一画面或后一画面中最相似的一参考宏区块间的差异来进行编码。一般而言，区块匹配会将该目标区块与前一画面或后一画面的一搜寻范围内所有相似大小的待匹配宏区块逐一进行匹配，以找出与该目标区块最相似的参考区块。Many video processing processes (such as motion estimation in MPEG2/MPEG4) need to use the operation result of image block matching. For example, when encoding a target block in a frame, encoding needs to be performed according to the difference between the target block and a reference macroblock that is most similar in the previous or subsequent frame on the time axis. Generally speaking, the block matching will match the target block with all similar-sized macroblocks to be matched in a search range of the previous frame or the next frame one by one, so as to find the most similar macroblock to the target block Reference block.

不同的视频标准规格所允许的目标区块的大小有所不同，例如8 X 8、8 X 16、16 X 8或是16 X 16等等。然而，在已知技术中，不同尺寸的目标区块需要不同的电路来进行区块匹配，因而增加了电路实作时的成本与复杂度。由于区块匹配需要进行大量的运算，因此，如何以更有效率的方式来实现区块匹配算法是业界关注的议题之一。The size of the target block allowed by different video standard specifications is different, such as 8 X 8, 8 X 16, 16 X 8 or 16 X 16 and so on. However, in the prior art, target blocks of different sizes require different circuits for block matching, thus increasing the cost and complexity of circuit implementation. Since block matching requires a large number of calculations, how to implement block matching algorithms in a more efficient manner is one of the topics of concern in the industry.

发明内容 Contents of the invention

因此，本发明的目的之一在于提供一种区块像素差异运算装置，可处理不同尺寸的目标区块。Therefore, one of the objectives of the present invention is to provide a block pixel difference calculation device capable of processing target blocks of different sizes.

依据本发明的实施例，是揭露一种区块匹配装置，其包含有多个运算模块，每一运算模块用来计算一目标区块中多个目标像素及一参考区块中多个参考像素之间的像素差异，其中每一运算模块包含有多个运算单元，分别用来计算这些目标像素之一及这些参考像素之一之间的像素差异；以及多个加法单元，分别耦合于这些运算模块，每一加法单元是用来加总相对应的运算模块中的多个运算单元的计算结果，并且每一加法单元是用来累加相对应的运算模块在多个运算周期中的计算结果。According to an embodiment of the present invention, a block matching device is disclosed, which includes a plurality of computing modules, and each computing module is used to calculate a plurality of target pixels in a target block and a plurality of reference pixels in a reference block The pixel difference between them, wherein each operation module includes a plurality of operation units, respectively used to calculate the pixel difference between one of these target pixels and one of these reference pixels; and a plurality of addition units, respectively coupled to these operations module, each adding unit is used to add up the calculation results of multiple calculation units in the corresponding calculation module, and each addition unit is used to accumulate the calculation results of the corresponding calculation module in multiple calculation cycles.

依据本发明的实施例，另披露一种区块像素差异运算装置，用来计算一目标画面的一目标区块与一参考画面中一第一参考区块与一第二参考区块的差异，该目标区块包含有一第一像素与一第二像素，该第一参考区块包含有一第一参考像素与一第二参考像素，该第二参考区块包含有该第二参考像素与一第三参考像素，该区块像素差异运算装置包含有：一第一运算单元，用来计算该第一像素与该第一参考像素的差异；一第二运算单元，用来计算该第一像素与该第二参考像素的差异；一第三运算单元，用来计算该第二像素与该第二参考像素的差异；一第四运算单元，用来计算该第二像素与该第三参考像素的差异；一第一加法单元，耦合于该第一、第三运算单元，用来加总该第一、第三运算单元的运算结果；以及一第二加法单元，耦合于该第二、第四运算单元，用来加总该第二、第四运算单元的运算结果。According to an embodiment of the present invention, a block pixel difference computing device is disclosed, which is used to calculate the difference between a target block of a target frame and a first reference block and a second reference block in a reference frame, The target block includes a first pixel and a second pixel, the first reference block includes a first reference pixel and a second reference pixel, the second reference block includes the second reference pixel and a first reference pixel Three reference pixels, the block pixel difference calculation device includes: a first calculation unit used to calculate the difference between the first pixel and the first reference pixel; a second calculation unit used to calculate the difference between the first pixel and the first reference pixel The difference between the second reference pixel; a third computing unit used to calculate the difference between the second pixel and the second reference pixel; a fourth computing unit used to calculate the difference between the second pixel and the third reference pixel difference; a first addition unit, coupled to the first and third calculation units, used to add up the calculation results of the first and third calculation units; and a second addition unit, coupled to the second and fourth calculation units The computing unit is used for summing up the computing results of the second and fourth computing units.

依据本发明的实施例，又披露一种区块像素差异运算方法，用来计算一目标画面的一目标区块与一参考画面中一第一参考区块与一第二参考区块的差异，该目标区块包含有一第一像素与一第二像素，该第一参考区块包含有一第一参考像素与一第二参考像素，该第二参考区块包含有该第二参考像素与一第三参考像素，该区块像素差异运算方法包含有：(a)计算该第一像素与该第一参考像素的差异；(b)计算该第一像素与该第二参考像素的差异；(c)计算该第二像素与该第二参考像素的差异；(d)计算该第二像素与该第三参考像素的差异；(e)加总步骤(a)与(b)的运算结果；以及(f)加总步骤(c)与(d)的运算结果。According to an embodiment of the present invention, a block pixel difference calculation method is disclosed, which is used to calculate the difference between a target block of a target frame and a first reference block and a second reference block in a reference frame, The target block includes a first pixel and a second pixel, the first reference block includes a first reference pixel and a second reference pixel, the second reference block includes the second reference pixel and a first reference pixel Three reference pixels, the block pixel difference calculation method includes: (a) calculating the difference between the first pixel and the first reference pixel; (b) calculating the difference between the first pixel and the second reference pixel; (c) ) calculating the difference between the second pixel and the second reference pixel; (d) calculating the difference between the second pixel and the third reference pixel; (e) summing the results of steps (a) and (b); and (f) summing up the operation results of steps (c) and (d).

附图说明 Description of drawings

图1为一目标画面的示意图。FIG. 1 is a schematic diagram of a target screen.

图2为一参考画面的示意图。FIG. 2 is a schematic diagram of a reference frame.

图3为本发明一实施例的区块像素差异运算装置的示意图。FIG. 3 is a schematic diagram of a block pixel difference calculation device according to an embodiment of the present invention.

图4为图3的区块像素差异运算装置匹配一8 X 8的目标区块的实施例的数据流示意图。FIG. 4 is a schematic diagram of the data flow of an embodiment of matching an 8×8 target block by the block pixel difference computing device in FIG. 3 .

图5为一16 X 8大小的目标区块的示意图。Fig. 5 is a schematic diagram of a target block with a size of 16 X 8.

图6为图3的区块像素差异运算装置匹配一16 X 8的目标区块的实施例的数据流示意图。FIG. 6 is a schematic diagram of the data flow of an embodiment of matching a 16×8 target block by the block pixel difference computing device in FIG. 3 .

图7为一16 X 16大小的目标区块的示意图。Fig. 7 is a schematic diagram of a target block with a size of 16 X 16.

图8为图3的区块像素差异运算装置匹配一16 X 16的目标区块的实施例的数据流示意图。FIG. 8 is a schematic diagram of the data flow of an embodiment of matching a 16×16 target block by the block pixel difference computing device in FIG. 3 .

附图符号说明Description of reference symbols

100、500、700 目标画面 110、510、710 目标区块 200 参考画面 210 搜寻区域 300 区块像素差异运算装置 302、304、306、308、310、312、314、316 运算模块 322、324、326、328、330、332、334、336 加法单元 402、404、406、408 运算单元 512、514、712、714、716、718 子区块 100, 500, 700 target screen 110, 510, 710 target block 200 reference screen 210 search area 300 Block pixel difference operation device 302, 304, 306, 308, 310, 312, 314, 316 Operation module 322, 324, 326, 328, 330, 332, 334, 336 Addition unit 402, 404, 406, 408 arithmetic unit 512, 514, 712, 714, 716, 718 subblock

具体实施方式 Detailed ways

请参考图1，其所绘示为一目标画面100的示意图。目标画面100中包含有一大小为8 X 8的目标区块110。为方便说明，目标区块110所包含的每一像素均标示一对应的坐标。在以下说明中，目标区块110中的各像素是以C(x，y)称之，其中(x，y)为该像素的坐标。Please refer to FIG. 1 , which is a schematic diagram of a target frame 100 . The target frame 100 includes a target block 110 with a size of 8×8. For convenience of description, each pixel included in the target block 110 is marked with a corresponding coordinate. In the following description, each pixel in the target block 110 is called C(x, y), where (x, y) is the coordinate of the pixel.

图2所绘示为一参考画面200的示意图。如熟悉此项技术者所知，参考画面200通常是目标画面100的前一画面或是后一画面，但不局限于此。参考画面200包含一搜寻区域(search area)210，其大小为n×m。在以下说明中，搜寻区域210中的各像素是以R(x，y)称之，其中(x，y)为该像素的坐标。FIG. 2 is a schematic diagram of a reference frame 200 . As known to those skilled in the art, the reference frame 200 is usually a previous frame or a subsequent frame of the target frame 100, but not limited thereto. The reference frame 200 includes a search area 210 with a size of n×m. In the following description, each pixel in the search area 210 is called R(x, y), where (x, y) is the coordinate of the pixel.

图3为依据本发明一实施例的区块像素差异运算装置300的示意图。区块像素差异运算装置300包含有8个运算模块(computing module)302-316；以及8个加法单元(adding unit)322-336。每一运算模块均包含有8个运算单元(processing element，PE)，用来计算目标区块110的一像素与搜寻区域210的一像素之间的差异。在本实施例中，每一运算单元(PE)是用来计算像素间的绝对差(absolute difference，AD)。如图3所示，同一运算模块中的所有运算单元均耦合于相对应的加法单元。在本实施例中，每一加法单元是用来加总相对应的运算模块中所有运算单元的运算结果，即具有加法器(adder)的功能。又在本实施例中，每一加法单元亦用来加总相对应的运算模块在多个运算周期中所得到的运算结果，亦即具有累加器(accumulator)的功能。FIG. 3 is a schematic diagram of a block pixel difference calculation device 300 according to an embodiment of the present invention. The block pixel difference computing device 300 includes 8 computing modules (computing module) 302-316; and 8 adding units (adding unit) 322-336. Each calculation module includes 8 processing elements (PEs), which are used to calculate the difference between a pixel of the target block 110 and a pixel of the search area 210 . In this embodiment, each computing unit (PE) is used to calculate an absolute difference (absolute difference, AD) between pixels. As shown in FIG. 3 , all computing units in the same computing module are coupled to corresponding adding units. In this embodiment, each adding unit is used to sum up the operation results of all the operation units in the corresponding operation module, that is, has the function of an adder. In this embodiment, each adding unit is also used to add up the operation results obtained by the corresponding operation module in a plurality of operation cycles, that is, it has the function of an accumulator.

请参考图4，其所绘示为区块像素差异运算装置300匹配目标区块110与参考画面200的搜寻区域210中的多个参考区块(reference block)的一实施例的数据流示意图400。为方便说明，搜寻区域210中的每一参考区块是以其最左上角的像素的坐标来定义。例如，搜寻区域210中最左上角的一参考区块定义为参考区块RB_8 x 8(1，1)，参考区块RB_8 x 8(1，1)往右平移一个像素距离的一参考区块是定义为参考区块RB_8 x 8(2，1)等等。另外，为了避免图示过于复杂，故在图4中将区块像素差异运算装置300予以简化，而未显示其内部连结方式。Please refer to FIG. 4 , which shows a data flow diagram 400 of an embodiment in which the block pixel difference calculation device 300 matches multiple reference blocks (reference blocks) in the search area 210 of the target block 110 and the reference frame 200 . For the convenience of illustration, each reference block in the search area 210 is defined by the coordinates of its upper leftmost pixel. For example, a reference block in the upper left corner of the search area 210 is defined as a reference block RB _{8 x 8} (1, 1), and a reference block RB _{8 x 8} (1, 1) is shifted to the right by one pixel distance A block is defined as a reference block RB _{8 x 8} (2, 1) and so on. In addition, in order to avoid the illustration being too complicated, the block pixel difference calculation device 300 is simplified in FIG. 4 , and its internal connections are not shown.

在图4中，八条通过区块像素差异运算装置300的水平虚线(horizontaldotted line)，分别代表目标区块110中同一列(row)的八个像素的数据流；而十五条通过区块像素差异运算装置300的斜向虚线(oblique dottedline)，则分别代表搜寻区域210中同一列的十五个像素的数据流。需注意的是，在本实施例中，每一像素数据是同步地(即亦在同一运算周期中)输入至相对应的虚在线的所有运算单元，如此一来，加载像素数据至区块像素差异运算装置300时，便不会造成延迟(delay)的情形。In Fig. 4, eight horizontal dotted lines (horizontal dotted lines) passing through the block pixel difference operation device 300 respectively represent the data streams of eight pixels in the same column (row) in the target block 110; The oblique dotted lines of the difference operation device 300 respectively represent the data flow of fifteen pixels in the same row in the search area 210 . It should be noted that in this embodiment, each pixel data is synchronously (that is, also in the same operation cycle) input to all the operation units on the corresponding dotted line, so that the pixel data is loaded to the block pixels When the difference operation device 300 is used, no delay will be caused.

在第一运算周期中，目标区块110中的第一列像素数据，亦即像素C(1，1)、C(2，1)、...、C(7，1)及C(8，1)，会分别同步输入至相对应的水平虚在线的所有运算单元。例如，在第一运算周期中，像素C(1，1)会同步输入至第一列运算单元中的运算单元402、404等八个运算单元，而像素C(2，1)会同步输入第二列运算单元中的运算单元406、408等八个运算单元，依此类推。同时，搜寻区域210的第一列像素中的前15个像素，亦即像素R(1，1)、R(2，1)、...、R(14，1)及R(15，1)，会分别同步输入至相对应的斜向虚在线的所有运算单元。例如，在第一运算周期中，像素R(1，1)会同步输入至运算单元402，而像素R(2，1)会同步输入至运算单元404及406，依此类推。In the first operation cycle, the first column of pixel data in the target block 110, that is, pixels C(1,1), C(2,1), . . . , C(7,1) and C(8 , 1), will be synchronously input to all computing units corresponding to the horizontal dotted line. For example, in the first computing cycle, the pixel C(1,1) will be synchronously input to eight computing units such as the computing units 402 and 404 in the first row of computing units, and the pixel C(2,1) will be synchronously input to the There are eight arithmetic units such as the arithmetic units 406, 408 in the second column of arithmetic units, and so on. At the same time, the first 15 pixels in the first column of pixels in the search area 210, that is, pixels R(1,1), R(2,1), . . . , R(14,1) and R(15,1 ), will be synchronously input to all computing units on the corresponding oblique dotted lines. For example, in the first operation cycle, the pixel R(1,1) is synchronously input to the operation unit 402 , and the pixel R(2,1) is synchronously input to the operation units 404 and 406 , and so on.

在第二运算周期中，目标区块110中的第二列像素数据，亦即像素C(1，2)、C(2，2)、...、C(7，2)及C(8，2)，会分别同步输入至相对应的一水平虚在线的所有运算单元。同时，搜寻区域210的第二列像素中的前15个像素，亦即像素R(1，2)、R(2，2)、...、R(14，2)及R(15，2)，会分别同步输入至相对应的一斜向虚在线的所有运算单元。依此类推，在第八运算周期中，目标区块110中的第八列像素数据，即像素C(1，8)、C(2，8)、...、C(7，8)及C(8，8)，会分别同步输入至相对应的一水平虚在线的所有运算单元。同时，搜寻区域210的第八列像素中的前15个像素，亦即像素R(1，8)、R(2，8)、...、R(14，8)及R(15，8)，会分别同步输入至相对应的一斜向虚在线的所有运算单元。In the second operation cycle, the second column of pixel data in the target block 110, that is, pixels C(1,2), C(2,2), . . . , C(7,2) and C(8 , 2), will be respectively synchronously input to all computing units corresponding to a horizontal dotted line. At the same time, the first 15 pixels in the second column of pixels in the search area 210, that is, pixels R(1,2), R(2,2), . . . , R(14,2) and R(15,2 ), will be synchronously input to all computing units corresponding to a diagonal dotted line. By analogy, in the eighth operation cycle, the eighth row of pixel data in the target block 110, that is, pixels C(1,8), C(2,8), . . . , C(7,8) and C(8, 8) will be synchronously input to all computing units corresponding to a horizontal dotted line. At the same time, the first 15 pixels in the eighth row of pixels in the search area 210, that is, the pixels R(1,8), R(2,8), . . . , R(14,8) and R(15,8 ), will be synchronously input to all computing units corresponding to a diagonal dotted line.

在每一运算周期中，各运算单元会同步计算其所载入的两像素值的绝对差(AD)。例如，在第一运算周期中，运算模块302中的运算单元402会计算像素C(1，1)与像素R(1，1)的绝对差，而运算单元406会计算像素C(2，1)与像素R(2，1)的绝对差。同时，运算模块304中的运算单元404会计算像素C(1，1)与像素R(2，1)的绝对差，而运算单元408会计算像素C(2，1)与像素R(3，1)的绝对差。在第二运算周期中，运算单元402会计算像素C(1，2)与像素R(1，2)的绝对差，运算单元406会计算像素C(2，2)与像素R(2，2)的绝对差，运算单元404会计算像素C(1，2)与像素R(2，2)的绝对差，而运算单元408会计算像素C(2，2)与像素R(3，2)的绝对差。依此类推，在第八运算周期中，运算单元402会计算像素C(1，8)与像素R(1，8)的绝对差，运算单元406会计算像素C(2，8)与像素R(2，8)的绝对差，运算单元404会计算像素C(1，8)与像素R(2，8)的绝对差，而运算单元408会计算像素C(2，8)与像素R(3，8)的绝对差。In each operation cycle, each operation unit will synchronously calculate the absolute difference (AD) of the two pixel values loaded by it. For example, in the first operation cycle, the operation unit 402 in the operation module 302 will calculate the absolute difference between the pixel C(1,1) and the pixel R(1,1), and the operation unit 406 will calculate the absolute difference between the pixel C(2,1) ) and the absolute difference of pixel R(2,1). Meanwhile, the computing unit 404 in the computing module 304 will calculate the absolute difference between the pixel C(1,1) and the pixel R(2,1), and the computing unit 408 will calculate the absolute difference between the pixel C(2,1) and the pixel R(3, 1) The absolute difference. In the second calculation cycle, the calculation unit 402 will calculate the absolute difference between the pixel C(1,2) and the pixel R(1,2), and the calculation unit 406 will calculate the absolute difference between the pixel C(2,2) and the pixel R(2,2). ), the operation unit 404 will calculate the absolute difference between pixel C(1,2) and pixel R(2,2), and the operation unit 408 will calculate the absolute difference between pixel C(2,2) and pixel R(3,2) Absolutely bad. By analogy, in the eighth operation cycle, the operation unit 402 will calculate the absolute difference between the pixel C (1, 8) and the pixel R (1, 8), and the operation unit 406 will calculate the absolute difference between the pixel C (2, 8) and the pixel R (2,8), the operation unit 404 will calculate the absolute difference between pixel C(1,8) and pixel R(2,8), and the operation unit 408 will calculate the absolute difference between pixel C(2,8) and pixel R( 3, 8) the absolute difference.

由前述可知，运算模块302的八个运算单元在第一运算周期的运算结果总和可表示为

而运算模块302的八个运算单元在第二运算周期的运算结果总和可表示为

依此类推，运算模块302的八个运算单元在第八运算周期的运算结果总和可表示为

换言之，加法单元322加总并累加运算模块302自第一运算周期至第八运算周期的运算结果的数学式可表示为：As can be seen from the foregoing, the sum of the calculation results of the eight calculation units of the calculation module 302 in the first calculation cycle can be expressed as

The sum of the calculation results of the eight calculation units of the calculation module 302 in the second calculation cycle can be expressed as

By analogy, the sum of the calculation results of the eight calculation units of the calculation module 302 in the eighth calculation period can be expressed as

In other words, the mathematical expression of the addition unit 322 summing up and accumulating the operation results of the operation module 302 from the first operation cycle to the eighth operation cycle can be expressed as:

${Σ Σ}_{y the y = = 11}^{88} {Σ Σ}_{x x = = 11}^{88} | | C C ((x x,, y the y)) - - R R ((x x,, y the y)) | | - - - - - - ((11))$

如熟悉此项技术者所知，式(1)D值即为目标区块110与搜寻区域210中最左上角的参考区块RB_8 x 8(1，1)两者间的绝对差值和(sum of absolutedifference，SAD)。As known to those skilled in the art, the D value of the formula (1) is the sum of the absolute difference between the target block 110 and the reference block RB _{8 x 8} (1, 1) in the upper left corner of the search area 210 (sum of absolute difference, SAD).

同理，运算模块304的八个运算单元在第一运算周期的运算结果总和可表示为

而运算模块304的八个运算单元在第二运算周期的运算结果总和可表示为

依此类推，运算模块304的八个运算单元在第八运算周期的运算结果总和可表示为

换言之，加法单元324加总并累加运算模块304自第一运算周期至第八运算周期的运算结果的数学式可表示为：Similarly, the sum of the calculation results of the eight calculation units of the calculation module 304 in the first calculation cycle can be expressed as

The sum of the calculation results of the eight calculation units of the calculation module 304 in the second calculation period can be expressed as

By analogy, the sum of the calculation results of the eight calculation units of the calculation module 304 in the eighth calculation period can be expressed as

In other words, the mathematical expression of the addition unit 324 summing up and accumulating the operation results of the operation module 304 from the first operation cycle to the eighth operation cycle can be expressed as:

${Σ Σ}_{y the y = = 11}^{88} {Σ Σ}_{x x = = 11}^{88} | | C C ((x x,, y the y)) - - R R ((x x + + 11,, y the y)) | | - - - - - - ((22))$

式(2)的值即为目标区块110与搜寻区域210中的参考区块RB_8 x 8(2，1)两者间的绝对差值和。The value of formula (2) is the sum of absolute differences between the target block 110 and the reference block RB _{8 x 8} (2,1) in the search area 210 .

依此类推，加法单元336加总运算模块316自第一运算周期至第八运算周期的运算结果的数学式可表示为：By analogy, the mathematical formula of the addition unit 336 summing up the calculation results of the operation module 316 from the first operation cycle to the eighth operation cycle can be expressed as:

${Σ Σ}_{y the y = = 11}^{88} {Σ Σ}_{x x = = 11}^{88} | | C C ((x x,, y the y)) - - R R ((x x + + 77,, y the y)) | | - - - - - - ((33))$

式(3)的值即为目标区块110与搜寻区域210中的参考区块RB_8 x 8(8，1)两者间的绝对差值和。The value of formula (3) is the sum of absolute differences between the target block 110 and the reference block RB _{8 x 8} (8, 1) in the search area 210 .

因此，经过前八个运算周期的运算后，八个加法单元中所累积的值即分别为目标区块110与搜寻区域210中相对应的八个参考区块(亦即参考区块RB_8 x 8(1，1)、RB_8 x 8(2，1)、...、及RB_8 x 8(8，1))间的绝对差值和。Therefore, after the operation of the first eight operation cycles, the values accumulated in the eight adding units are respectively the eight reference blocks corresponding to the target block 110 and the search area 210 (that is, the reference block RB _{8 x 8} (1, 1), RB _{8 x 8} (2, 1), ..., and RB _{8 x 8} (8, 1)).

接下来，在第九运算周期中，目标区块110中的第一列像素数据，即像素C(1，1)、C(2，1)、...、C(7，1)及C(8，1)，会如前所述，分别同步输入至相对应的水平虚在线的所有运算单元。同时，搜寻区域210的第一列像素中，自像素R(9，1)开始的15个像素，亦即像素R(9，1)、R(10，1)、...、R(22，1)及R(23，1)，会分别同步输入至相对应的斜向虚在线的所有运算单元。在第十运算周期中，目标区块110中的第二列像素数据，即像素C(1，2)、C(2，2)、...、C(7，2)及C(8，2)，会分别同步输入至相对应的水平虚在线的所有运算单元。同时，搜寻区域210的第二列像素中，自像素R(9，2)开始的15个像素，亦即像素R(9，2)、R(9，2)、...、R(22，2)及R(23，2)，会分别同步输入至相对应的斜向虚在线的所有运算单元。依此类推，经过第九至第十六运算周期的运算后，八个加法单元中所累积的值分别为目标区块110与搜寻区域210中后续八个参考区块(亦即参考区块RB_8 x 8(9，1)、RB_8 x 8(10，1)、...、及RB_8 x 8(16，1))间的绝对差值和。Next, in the ninth operation cycle, the first column of pixel data in the target block 110, that is, pixels C(1,1), C(2,1), . . . , C(7,1) and C (8, 1), as mentioned above, are respectively synchronously input to all computing units corresponding to the horizontal dotted line. At the same time, in the first column of pixels in the search area 210, 15 pixels starting from pixel R(9,1), that is, pixels R(9,1), R(10,1), ..., R(22 , 1) and R(23, 1) will be synchronously input to all computing units corresponding to the diagonal dotted lines. In the tenth operation cycle, the second row of pixel data in the target block 110, that is, pixels C(1,2), C(2,2), . . . , C(7,2) and C(8, 2), which will be synchronously input to all computing units corresponding to the horizontal dotted line. At the same time, in the second column of pixels in the search area 210, the 15 pixels starting from the pixel R(9,2), that is, the pixels R(9,2), R(9,2), . . . , R(22 , 2) and R(23, 2) will be synchronously input to all computing units corresponding to the diagonal dotted line. By analogy, after the operations of the ninth to sixteenth computing cycles, the accumulated values in the eight adding units are respectively the following eight reference blocks in the target block 110 and the search area 210 (that is, the reference block RB _{8 x 8} (9, 1), RB _{8 x 8} (10, 1), ..., and RB _{8 x 8} (16, 1)).

由上述可知，本实施例的区块像素差异运算装置300每八个运算周期，便能计算出目标区块110与搜寻区域210中八个参考区块间的绝对差值和(SAD)。换言之，计算目标区块110与单一参考区块间的绝对差值和所需的时间，平均只要一个运算周期。另外，其在进行区块匹配运算时没有延迟的现象产生，故能达到最佳的运算效率。It can be seen from the above that the block pixel difference calculation device 300 of this embodiment can calculate the sum of absolute differences (SAD) between the target block 110 and the eight reference blocks in the search area 210 every eight computing cycles. In other words, the time required to calculate the sum of absolute differences between the target block 110 and a single reference block is only one computing cycle on average. In addition, there is no delay during the block matching operation, so the best operation efficiency can be achieved.

前述的区块像素差异运算装置300在同一运算周期中，是加载目标区块110中同一列(row)的像素值，与搜寻区域210中同一列的像素值来进行运算，此仅为本发明的一实施例。实作上，区块像素差异运算装置300在同一运算周期中，亦可加载目标区块110中同一行(column)的像素值，以及搜寻区域210中同一行的像素值来进行运算。The aforementioned block pixel difference calculation device 300 loads the pixel values of the same column (row) in the target block 110 in the same calculation cycle, and performs calculations with the pixel values of the same row in the search area 210, which is only the present invention An embodiment of . In practice, the block pixel difference calculation device 300 can also load the pixel values of the same row (column) in the target block 110 and the pixel values of the same row in the search area 210 to perform calculations in the same calculation cycle.

另外，本发明的区块像素差异运算装置300还具有支持不同尺寸大小的目标区块的优点。假设目标区块的大小为16 X 8，如图5所绘示，则区块像素差异运算装置300只需将图5的目标画面500中的目标区块510，等分为两个大小为8 X 8的子区块(sub-block)512及514，即可利用与前述相同的运作方式来进行区块匹配。In addition, the block pixel difference computing device 300 of the present invention also has the advantage of supporting target blocks of different sizes. Assuming that the size of the target block is 16×8, as shown in Figure 5, the block pixel difference calculation device 300 only needs to divide the target block 510 in the target frame 500 in Figure 5 into two equal parts with a size of 8 The sub-blocks (sub-blocks) 512 and 514 of X 8 can use the same operation method as described above to perform block matching.

图6所绘示为区块像素差异运算装置300匹配目标区块510与参考画面200的搜寻区域210中多个参考区块的一实施例的数据流示意图。FIG. 6 is a schematic diagram of a data flow of an embodiment in which the block pixel difference computing device 300 matches the target block 510 with multiple reference blocks in the search area 210 of the reference frame 200 .

在第一运算周期中，子区块512中的第一列像素数据，亦即像素C(1，1)、C(2，1)、...、C(7，1)及C(8，1)，会分别同步输入至相对应的水平虚在线的所有运算单元。同时，搜寻区域210的第一列像素中的前15个像素，亦即像素R(1，1)、R(2，1)、...、R(14，1)及R(15，1)，会分别同步输入至相对应的斜向虚在线的所有运算单元。在第二运算周期中，子区块512中的第二列像素数据，亦即像素C(1，2)、C(2，2)、...、C(7，2)及C(8，2)，会分别同步输入至相对应的水平虚在线的所有运算单元。同时，搜寻区域210的第二列像素中的前15个像素，亦即像素R(1，2)、R(2，2)、...、R(14，2)及R(15，2)，会分别同步输入至相对应的斜向虚在线的所有运算单元。In the first operation cycle, the first column of pixel data in the sub-block 512, that is, pixels C(1,1), C(2,1), . . . , C(7,1) and C(8 , 1), will be synchronously input to all computing units corresponding to the horizontal dotted line. At the same time, the first 15 pixels in the first column of pixels in the search area 210, that is, pixels R(1,1), R(2,1), . . . , R(14,1) and R(15,1 ), will be synchronously input to all computing units on the corresponding oblique dotted line. In the second operation cycle, the second column of pixel data in sub-block 512, that is, pixels C(1,2), C(2,2), . . . , C(7,2) and C(8 , 2), will be respectively synchronously input to all computing units corresponding to the horizontal dotted line. At the same time, the first 15 pixels in the second column of pixels in the search area 210, that is, pixels R(1,2), R(2,2), . . . , R(14,2) and R(15,2 ), will be synchronously input to all computing units on the corresponding oblique dotted line.

依此类推，在第九运算周期中，子区块514中的第一列像素数据，即像素C(9，1)、C(10，1)、...、C(15，1)及C(16，1)，会分别同步输入至相对应的水平虚在线的所有运算单元。同时，搜寻区域210的第一列像素中，自像素R(9，1)开始的15个像素，亦即像素R(9，1)、R(10，1)、...、R(22，1)及R(23，1)，会分别同步输入至相对应的斜向虚在线的所有运算单元。接下来在第十运算周期中，子区块514中的第二列像素数据，即像素C(9，2)、C(10，2)、...、C(15，2)及C(16，2)，会分别同步输入至相对应的水平虚在线的所有运算单元。同时，搜寻区域210的第二列像素中，自像素R(9，2)开始的15个像素，亦即像素R(9，2)、R(9，2)、...、R(22，2)及R(23，2)，会分别同步输入至相对应的斜向虚在线的所有运算单元。By analogy, in the ninth operation cycle, the first row of pixel data in the sub-block 514, that is, pixels C(9,1), C(10,1), . . . , C(15,1) and C(16, 1) will be synchronously input to all computing units corresponding to the horizontal dotted line. At the same time, in the first column of pixels in the search area 210, 15 pixels starting from pixel R(9,1), that is, pixels R(9,1), R(10,1), ..., R(22 , 1) and R(23, 1) will be synchronously input to all computing units corresponding to the diagonal dotted lines. Next, in the tenth operation cycle, the second row of pixel data in the sub-block 514, that is, the pixels C(9, 2), C(10, 2), . . . , C(15, 2) and C( 16, 2), which will be synchronously input to all computing units corresponding to the horizontal dotted line. At the same time, in the second column of pixels in the search area 210, the 15 pixels starting from the pixel R(9,2), that is, the pixels R(9,2), R(9,2), . . . , R(22 , 2) and R(23, 2) will be synchronously input to all computing units corresponding to the diagonal dotted line.

依此类推，经过十六个运算周期的运算后，八个加法单元中所累积的值分别为目标区块510与搜寻区域210中连续八个参考区块(亦即参考区块RB_16 x 8(1，1)、RB_16 x 8(2，1)、...、及RB_16 x 8(8，1))间的绝对差值和。计算目标区块510与单一参考区块间的绝对差值和所需的时间，平均只要两个运算周期。在实作上，区块像素差异运算装置300在同一运算周期中，亦可加载子区块512或514中同一行(column)的像素值，以及搜寻区域210中同一行的像素值来进行运算。By analogy, after sixteen operation cycles, the accumulated values in the eight adding units are the target block 510 and the eight consecutive reference blocks in the search area 210 (that is, the reference block RB _{16 x 8} (1, 1), RB _{16 x 8} (2, 1), ..., and RB _{16 x 8} (8, 1)). The time required to calculate the sum of absolute differences between the target block 510 and a single reference block is only two computing cycles on average. In practice, the block pixel difference calculation device 300 can also load the pixel values of the same row (column) in the sub-block 512 or 514 and the pixel values of the same row in the search area 210 to perform the calculation in the same calculation cycle. .

倘若目标区块的大小为16 X 16，如图7所绘示。区块像素差异运算装置300亦只需将图7的目标画面700中的目标区块710，等分为四个大小为8 X 8的子区块712、714、716及718，即可进行区块匹配。图8所绘示为区块像素差异运算装置300匹配目标区块710与参考画面200的搜寻区域210中多个参考区块的一实施例的数据流示意图。由于区块像素差异运算装置300的运作方式与前述实施例实质上相同，故在此不再赘述。If the size of the target block is 16 X 16, as shown in Figure 7. The block pixel difference calculation device 300 also only needs to divide the target block 710 in the target frame 700 of FIG. block match. FIG. 8 is a schematic diagram of a data flow of an embodiment in which the block pixel difference computing device 300 matches the target block 710 with multiple reference blocks in the search area 210 of the reference frame 200 . Since the operation of the block pixel difference calculation device 300 is substantially the same as that of the foregoing embodiments, it is not repeated here.

在此实施例中，经过三十二个运算周期的运算后，区块像素差异运算装置300的八个加法单元所累积的值，是分别为目标区块710与搜寻区域210中连续八个参考区块(亦即参考区块RB_16 x 16(1，1)、RB_16 x 16(2，1)、...、及RB_16 x 16(8，1))间的绝对差值和。换言之，计算目标区块710与单一参考区块间的绝对差值和所需的时间，平均只要四个运算周期。同理，在实作上，区块像素差异运算装置300在同一运算周期中，亦可加载子区块712、714、716或718中同一行(column)的像素值，以及搜寻区域210中同一行的像素值来进行运算。In this embodiment, after thirty-two calculation cycles, the values accumulated by the eight addition units of the block pixel difference calculation device 300 are the eight consecutive reference values in the target block 710 and the search area 210 respectively. The sum of absolute differences between the blocks (ie, the reference blocks RB _{16 x 16} (1, 1), RB _{16 x 16} (2, 1), . . . , and RB _{16 x 16} (8, 1)). In other words, the time required to calculate the sum of absolute differences between the target block 710 and a single reference block is only four computing cycles on average. Similarly, in practice, the block pixel difference calculation device 300 can also load the pixel values of the same row (column) in the sub-block 712, 714, 716 or 718, and the same row (column) in the search area 210 in the same calculation cycle. row pixel values to perform operations on.

由上述可知，本发明的区块像素差异运算装置300利用同一套运算单元(PE)矩阵，即可处理8 X 8、16 X 8、8 X 16、16 X 16等等不同尺寸的目标区块，使电路的利用性得以大幅提升。As can be seen from the above, the block pixel difference computing device 300 of the present invention can process target blocks of different sizes such as 8 X 8, 16 X 8, 8 X 16, 16 X 16, etc. by using the same set of computing unit (PE) matrix , so that the utilization of the circuit can be greatly improved.

另外，在前述实施例中，区块像素差异运算装置300的运算单元矩阵大小为8 X 8，此仅是本发明的一较佳实施例，并非限定本发明的应用范围。实作上，仅需利用4 X 4大小的运算单元矩阵，甚至是2 X 2大小的运算单元矩阵，即可实现前述不同尺寸的区块匹配运算。In addition, in the foregoing embodiments, the size of the matrix of the computing units of the block pixel difference computing device 300 is 8×8, which is only a preferred embodiment of the present invention, and does not limit the scope of application of the present invention. In practice, it is only necessary to use a 4 X 4 sized computing unit matrix, or even a 2 X 2 sized computing unit matrix, to realize the aforementioned block matching operations of different sizes.

以上所述仅为本发明的较佳实施例，凡依本发明申请专利范围所做的均等变化与修饰，皆应属本发明专利的涵盖范围。The above descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made according to the scope of the patent application of the present invention shall fall within the scope of the patent of the present invention.

Claims

1. A block matching device, comprising:

A plurality of computing modules, each computing module is used to calculate the pixel difference between multiple target pixels in a target block and multiple reference pixels in a reference block, wherein each computing module includes multiple computing units, The computing unit is used to calculate the pixel difference between one of the target pixels and one of the reference pixels; and

A plurality of addition units are respectively coupled to these operation modules, each addition unit is used to sum up the calculation results of a plurality of operation units in the corresponding operation module, and each addition unit is used to accumulate the calculation results of the corresponding operation module Computational results over multiple computing cycles.

2. The block matching device as claimed in claim 1, wherein one of the target pixels is synchronously input to a plurality of first computing units among the computing units.

3. The block matching device according to claim 2, wherein the first computing units belong to different computing modules.

4. The block matching device as claimed in claim 1, wherein each operation unit is used to calculate an absolute difference between one of the target pixels and one of the reference pixels.

5. The block matching device as claimed in claim 1, wherein the target pixels are located in the same column or row in the target block.

6. The block matching device as claimed in claim 1, wherein the reference pixels are located in the same column or row in the reference block.

7. A block pixel difference calculation method, used to calculate the difference between a target block in a target frame and a first reference block and a second reference block in a reference frame, the target block includes There is a first pixel and a second pixel, the first reference block includes a first reference pixel and a second reference pixel, the second reference block includes the second reference pixel and a third reference pixel, the The block pixel difference operation method includes:

calculating the first pixel and the first reference pixel to obtain a first difference value;

calculating the first pixel and the second reference pixel to obtain a second difference value;

calculating the second pixel and the second reference pixel to obtain a third difference value;

calculating the second pixel and the third reference pixel to obtain a fourth difference value;

summing the first difference value and the second difference value; and

Summing up the third difference value and the fourth difference value.

8. The block pixel difference calculation method according to claim 7, wherein each calculation step is performed synchronously.

9. The block pixel difference calculation method according to claim 7, wherein each calculation step is to calculate the absolute difference between pixels.