WO2009136923A2 - Mises en œuvre efficaces de calcul de noyau - Google Patents

Mises en œuvre efficaces de calcul de noyau Download PDF

Info

Publication number
WO2009136923A2
WO2009136923A2 PCT/US2008/062949 US2008062949W WO2009136923A2 WO 2009136923 A2 WO2009136923 A2 WO 2009136923A2 US 2008062949 W US2008062949 W US 2008062949W WO 2009136923 A2 WO2009136923 A2 WO 2009136923A2
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
kernel
product
entry
pixel
Prior art date
Application number
PCT/US2008/062949
Other languages
English (en)
Other versions
WO2009136923A3 (fr
Inventor
Hari Chakravarthula
Christopher Loo
Jose Mendez
Avi Amrami
Noy Cohen
David Ya'ara
Original Assignee
Tessera, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tessera, Inc. filed Critical Tessera, Inc.
Priority to PCT/US2008/062949 priority Critical patent/WO2009136923A2/fr
Publication of WO2009136923A2 publication Critical patent/WO2009136923A2/fr
Publication of WO2009136923A3 publication Critical patent/WO2009136923A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution

Definitions

  • the present invention relates to digital signal processing.
  • embodiments of the present invention relate to techniques for minimizing gate-level resources that are required to perform various signal processing applications that involve applying a kernel matrix.
  • HG. 1 depicts a portion of a pixel grid 100 of a typical CCD or CMOS image sensor.
  • the pixel grid 100 is in raw Bayer format in which each pixel is represented by a discrete color and in which rows contain either red and green pixels or blue and green pixels.
  • Several 13 x 13 kernel areas are depicted overlaying the pixel grid 100. Kernels (sometimes referred to as masks, templates, or windows) are often used in image processing to perform neighborhood operations. In one example, each kernel operation updates the value of the center pixel of the respective kernel area. Therefore, the kernel that is used for each area is based on the color of the center pixel.
  • the entire CMOS image sensor could contain up to a million or even millions of pixels.
  • FIG. 2 depicts coefficient locations for exemplary green 200a, red 200b and blue
  • the red 200b and blue 200c kernel matrices each have a 7 x 7 pattern.
  • the green kernel matrix 200a has the 7 x 7 pattern overlaid by a 6 x 6 pattern.
  • the patterns in FIG. 2 correspond to the pixel grid 100 of the image sensor FIG. 1.
  • each kernel coefficient is multiplied by the value of the corresponding pixel in the region of the pixel grid 100 that is overlaid by the kernel ("kernel region").
  • the center pixel in each kernel region is updated. For example, the multiplication products are summed together and the sum is used to replace the center pixel value.
  • Table I shows the number of operations that could be used to perform the kernel computations.
  • Table I shows that performing the kernel computations requires one multiplier and about one adder for every kernel coefficient, therefore undesirably utilizing considerable gate-level resources.
  • FIG. 1 is an illustration of a pixel grid of a typical CMOS image sensor
  • FIG. 2 depicts three color kernels that may be used to process and update pixels of the pixel grid of FIG. 1 ;
  • FIG. 3 are examples of radially symmetric color kernels
  • FIG. 4 depicts an example kernel matrix in which the coefficients are symmetric about the middle column of coefficients
  • FIG. 5 depicts a kernel matrix in which kernel coefficients are distributed without any symmetry
  • FIG. 6 is a flowchart of a process for grouping kernel coefficients to efficiently perform kernel computations, in accordance with one embodiment
  • FIG. 7 shows how the kernel computation of FIG. 8 might be performed inefficiently using nine multiplications over nine clock cycles; [0016] FIG. 8 depicts kernel computations in accordance with an embodiment
  • FIG. 9A depicts a conventional multiplier that is used to multiply pixel data by one kernel coefficient at a time
  • FIG. 9B depicts a system in accordance with an embodiment, which is used to perform the effect of the kernel operations depicted in FIG. 9A;
  • FIG. 10 is a flowchart illustrating steps of a process of updating a first entry in a signal matrix, I, based on a kernel matrix and a succession of first order difference equations, in accordance with an embodiment
  • FIGS. 1 IA and 1 IB depict hardware for performing kernel computations in accordance with an embodiment
  • FIG. 12 is a block diagram that illustrates a computer system upon which an embodiment of the invention may be implemented.
  • FIG. 13 illustrates a block diagram for an example mobile device in which embodiments of the present invention may be implemented.
  • kernel matrix computations are simplified by grouping similar kernel coefficients together.
  • kernel coefficients that share a common value are grouped together such that coefficient groups are identified.
  • Each coefficient group contains only coefficients having the same value.
  • At least one of the coefficient groups has at least two coefficients.
  • a coefficient group could have only a single coefficient.
  • Each kernel coefficient has a corresponding entry in a data matrix to which the kernel is to be applied.
  • each entry in the data matrix could represent a pixel value, wherein the pixels collectively represent a signal.
  • the matrix entries are not limited to representing pixel values.
  • For each coefficient group the value of each signal matrix entry that corresponds to coefficients in the coefficient group are summed. Then, the sum is multiplied by the coefficient value for the coefficient group.
  • the kernel coefficients are symmetrically distributed in the kernel matrix.
  • the coefficients may be radially symmetric with respect to a central coefficient in the kernel matrix.
  • the coefficients could posses a different symmetry, such as being symmetric about a line in the kernel matrix.
  • the kernel coefficients are a sum of a power of two, which allows a multiplier to be replaced by an adder.
  • gate-level resources are reduced.
  • Techniques are disclosed herein to efficiently apply successive first order difference operations to a data signal.
  • the techniques allow for a low gate count.
  • the techniques allow for a reduction of the number of multipliers without increasing clock frequency, in an embodiment.
  • the techniques update pixels of a data signal at a rate of two clock cycles per each pixel, in an embodiment.
  • the techniques allow hardware that is used to process a first pixel to be re-used to start the processing of a second pixel while the first pixel is still being processed.
  • FIG. 3 depicts several kernel example matrices 300a - 300c, which are used when processing in accordance with embodiments of the present invention.
  • the kernel coefficients are "radially symmetric" about the center coefficient, in this example.
  • radially symmetric it is meant that coefficients that are the same "distance" from the center coefficient have the same value.
  • Measuring distance in the kernel is based on the data sensor for which the kernel applies. For example, if the data sensor is a CMOS image sensor, then the distance may be the physical distance between pixels in the sensor.
  • a kernel having radially symmetric coefficients may be especially useful for situations such as compensating for a lens point spread function (PSF), which may be radially symmetric.
  • PSF lens point spread function
  • selected pixel data is summed prior to multiplying the pixel data by kernel coefficients.
  • kernel coefficients For the example kernels in FIG. 3, pixel data that is equidistant from the center pixel is summed prior to multiplication by the corresponding kernel coefficient, which significantly reduces the number of multipliers required.
  • Table II depicts gate-level resources used for kernel computations, in accordance with an embodiment of the present invention that processes the kernels of FIG. 3.
  • the red and blue kernels each have 10 unique coefficient values (1 - A), which in this example corresponds to the 10 unique distances coefficients are located from the center coefficient. Thus, only 10 multipliers are used when applying the red and blue kernels.
  • the green kernel has 16 unique coefficients. Thus, only 16 multipliers are used when applying the green kernel. Thus, the number of multipliers is greatly reduced when compared to the multipliers used for the technique described in Table I. Moreover, because the gate count of an adder may be linearly proportional to the bit width, whereas the gate count of a multiplier may be proportional to the square of the bit width, the reduction in gate level resources is substantial. [0033] There is no requirement that the kernel coefficients be radially symmetric.
  • the coefficients could posses a different symmetry, such as being symmetric about a line in the kernel.
  • the coefficients may be symmetric about horizontal, vertical, or diagonal directions.
  • FIG. 4 depicts an example kernel matrix 400 in which the coefficients are symmetric about the middle column of coefficients.
  • this symmetry line does not need to correspond to coefficient locations.
  • the symmetry line could be between two columns or rows of coefficients.
  • Other examples of symmetry are symmetry about a horizontal line and symmetry about one or more diagonal lines.
  • FIG. 5 depicts a kernel matrix 500 in which kernel coefficients are distributed without any symmetry. However, there are at least two kernel coefficients that have the same value as each other. Coefficients having the same value as each other are referred to herein as a "coefficient group.”
  • FIG. 6 is a flowchart of a process 600 for grouping kernel coefficients to efficiently perform kernel computations, in accordance with one embodiment.
  • Process 600 will be discussed using the following example kernel computations to demonstrate how: 1) gate level resources are saved, and 2) the number of clock cycles to process a single pixel are reduced.
  • Table III depicts example pixel data and Table IV depicts an example kernel matrix to be applied. TABLE III
  • kernel coefficients that have a common value are identified as being members of coefficient groups.
  • the coefficient groups are groups A, B, and C.
  • At least one of the coefficient groups includes at least two kernel coefficients. However, it is not required that each group has multiple coefficients.
  • Each kernel coefficient has a corresponding entry in a data matrix of Table III.
  • step 604 for each coefficient group, the values of the entries in the data matrix that correspond to kernel coefficients in the coefficient group are summed to generate a summed value for each coefficient group.
  • step 606 for each coefficient group, the summed value is multiplied by the coefficient of the coefficient group to determine a final value.
  • step 608 a new value is determined for a first entry of the data matrix based on the final value for each coefficient group.
  • FIG. 8 depicts the pixel values from TABLE III being summed prior to multiplication by the kernel coefficients, in accordance with an embodiment.
  • Each of the multiplications and each of the additions in FIG. 8 are implemented in hardware by gate-level resources, in an embodiment.
  • the alpha-numeric characters represent inputs to the gate-level resources. Note that only three multipliers and eight adders are required.
  • the gate- level resources can calculate the output in only three clock cycles, in the embodiment depicted in FIG. 8.
  • FIG. 7 shows how the kernel computation might be performed inefficiently using nine multiplications over nine clock cycles.
  • each kernel coefficient is represented as a sum of powers of two, which eliminates multipliers and replaces them with adders. That is, the step of multiplying by a kernel coefficient can be replaced by parallel bit shift operations followed by a single sum operation.
  • FIG. 9A depicts a conventional multiplier that is used to multiply pixel data by one kernel coefficient at a time.
  • FIG. 9B depicts a system in accordance with an embodiment, which is used to perform effect of the kernel multiplication operations as depicted in FIG. 9A.
  • the pixel data is input to each shifter 902(1) - 902(m).
  • the outputs of the shifters 902 are summed by the adder 904 to generate the final result.
  • the pixel data that is input to each shifter may represent intensity data for a single pixel or, as described above, represent previously summed pixel intensity data corresponding to a coefficient group.
  • the shifters 902 can be implemented without any gate level resources.
  • the shifters 902 are implemented by shifting the pixel data bits one or more places by a wire shift and padding one or more zeroes to the pixel data.
  • the adder 904 can be implemented with a fraction of the gates as a multiplier. Therefore, the system of FIG. 9B can be implemented with far fewer gate-level resource than the system of FIG. 9A.
  • the selection of which shifts are performed can be based on the bits of the kernel coefficient.
  • a kernel coefficient of 11 may be represented as the sum of 8 + 2 + 1, which is equivalent to 2 3 + 2 1 +2° (i.e., the sum of three different numbers that are powers of T).
  • the input pixel data is separately left-shifted a total of three bits, one bit, and zero bits, with the sum of all three shifts yielding the output.
  • the pixel data is input to a multiplexer having an output to each shifter 902(1) - 902(m), where 1 through m represent sequentially increasing integer values.
  • the bits of the kernel coefficient are used to select which shifter 902 receives the pixel data.
  • the shifters 902 representing 3, 1, and 0 bits shifts are used, while shifters 902 representing 2, 4, ..., m bit shifts are omitted.
  • the following section describes methods and devices for efficiently performing successive difference operations.
  • the section provides an example in which the successive difference operations are successive kernel update delta operations performed on a signal.
  • the signal may be a matrix of pixel data, wherein the center pixel is updated.
  • A is a kernel matrix that is to be applied to the signal (e.g., represented as a signal matrix)
  • ⁇ o - O n-1 are parameters
  • C is a kronecker delta matrix.
  • a kronecker delta matrix may be defined as a matrix consisting of zeros at all positions except for a position of interest, which consists of a one. The position of interest may be the center of a matrix or other position as desired.
  • matrix includes any n by m matrix, wherein "n” and "m” are any integers.
  • the term matrix includes a vector.
  • the kernel matrix is updated by each successive difference equation.
  • Ao is the initial kernel matrix and A n is a final matrix.
  • the desired processing is that of applying the final matrix A n to the signal matrix.
  • the parameter in each equation may be different from each other.
  • the parameter in one equation may be a suitable value to reduce noise in the signal, whereas the parameter in another equation may be a suitable value to compensate for lens shading properties.
  • the parameters may change. For example, if the parameter is based on an amount of noise in the signal, the amount of noise could be different in a different region of the signal.
  • the application of the final kernel A n to the signal matrix improves image contrast, in an embodiment.
  • FIG. 10 is a flowchart illustrating steps of a process 1000 of updating a first entry in a signal matrix, I, based on a kernel matrix and a succession of first order difference equations, in accordance with an embodiment.
  • the first order difference equations (Eq. 1 - 3 above) will be used to help illustrate process 1000.
  • process 1000 is not limited to those particular difference equations.
  • step 1002 a parameter from each of the first order difference equations is multiplied together to generate a "parameter product". For example, the alphas are multiplied together to form the parameter product.
  • step 1004 the initial kernel matrix is multiplied by the signal matrix, I, to generate an intermediate matrix, M.
  • Steps 1002 and 1004 are both performed during a first clock cycle, in an embodiment. Steps 1002 and 1004 can be implemented using n+1 multipliers.
  • step 1006 the entries of the intermediate matrix are summed to generate a matrix sum. The summing is performed during a second clock cycle, in an embodiment. The summing can be implemented using only a single adder.
  • step 1008 a value that is used to update the first entry of the signal matrix is determined. The value is based on the matrix sum, the parameter product, and the initial value for the first entry. The new value is determined during a third clock cycle, in an embodiment.
  • processing of the next signal matrix begins by performing steps 1002 and 1004 using the next signal matrix and, possibly, new difference equations.
  • the parameters in the difference equations are allowed to change for each new signal matrix.
  • one of the parameters might be based on an amount of noise in the signal, in which case the noise can change from one signal matrix to the next.
  • one of the parameters might be based on a lens shading profile of the lens used to capture the data represented in the signal matrix.
  • the same hardware that was used during the first clock cycle is re-used for processing the second pixel, in an embodiment.
  • each of the first order difference equations computes an update to a kernel matrix.
  • the successive operations may be any first order difference equation. Examples of the first order operations include, but are not limited to, first order filtering operations, infinite impulse response (HR) filters, and averagers.
  • the operations could be part of a multi-stage "smudging" algorithm which blurs an image given some parameters.
  • the operations could be a weighted multiple stage moving average.
  • the signal matrix is not limited to containing pixel data.
  • the signal matrix represents audio data, in an embodiment.
  • the operations are a weighted audio high-pass filter.
  • the signal matrix is a vector of audio data sampled over time.
  • K is an n x n kernel matrix
  • I is a matrix containing image data
  • C is a kronecker delta matrix (contains all zeroes except for a one at the center).
  • Alpha and beta are two parameters used to modify the kernel matrix.
  • alpha could be used to modify the kernel based on the amount of noise in the image data
  • beta could be used to modify the kernel based on a shading profile of the lens used to capture the image data.
  • alpha and beta could be any parameters.
  • the intermediate kernel could be modified based on beta to generate a final kernel, K final .
  • the kernel may be modified by additional parameters (e.g., gamma) to adjust other characteristics in the image matrix I. Then, the final kernel could be piecewise multiplied by the image matrix, I, to generate a modified image matrix M.
  • additional parameters e.g., gamma
  • K is an n x n matrix
  • the above could potentially be performed with n+1 products for Equations 4 and 5, and n products for Equation 6. Therefore, for real time implementation, this would require more than 3n parallel multipliers to implement. Alternatively, this could be performed with a tripling of clock frequency and re-using hardware. However, such a high frequency may be impractical or impossible to achieve.
  • FIGS. 1 IA and 1 IB depict hardware for performing kernel computations in accordance with an embodiment.
  • one multiplier 1102A is used to compute the product, "x", of alpha and beta.
  • N multipliers 1104A are used to piecewise multiply the entries of the initial kernel matrix, K jk , by the entries of the signal matrix, I jk , to generate entries of an intermediate matrix "N, k ".
  • the matrix sum i is multiplied by the value of x determined during clock cycle 0 by multiplier 111OA.
  • a new value, "inew” is determined for the first entry, by summing (at adder 1112A) the product of the matrix sum (i) and the product (x) with the initial value the intermediate product "z".
  • the new pixel value, i new for the center pixel may be calculated as:
  • the hardware depicted in FIG. HA comprises n+3 multipliers and n+1 adders.
  • a first stage includes "n + 1" multipliers 1102A, 1104A, wherein the first stage is operable to multiply a parameter (e.g., alpha, beta) from each of the first order difference equations to generate a parameter product ("x"), and multiply the kernel matrix (K) by the signal matrix (I) to generate "n" entry products (N, k ).
  • a second stage is coupled to the first stage and includes n adders 1108 A, wherein the first stage is operable to sum the "n" entry products N jk to generate a matrix sum (i).
  • a multiplier 1106A generates intermediate product z from the product of i cent and the term 1-x.
  • a third stage is coupled to the second stage and includes an adder 1112A. The third stage is operable to determine a new value (i new ) for the signal, based on the matrix sum, the parameter product, and the initial value for the signal. Note that initial value of the signal is processed by the second stage to produce "z", which is input to the adder 1112A of the third stage.
  • Multiplier 111OA generates the other input to adder 1112A as the product of the matrix sum i and the product x.
  • FIG HB is similar to FIG. HA, except that equation 9A is changed, resulting in equation 9B:
  • FIG 1 IB also differs from FIG. 1 IA in that FIG 1 IB uses one fewer multiplier and includes a substractor to calculate a difference at 1126B during Clock Cycle 1.
  • one multiplier 1102B is used to compute the product, "x", of alpha, beta, and a third parameter gamma.
  • N multipliers 1104B are used to piecewise multiply the entries of the initial kernel matrix, K, k , by the entries of the signal matrix, I jk , to generate entries of an intermediate matrix "N, k ".
  • the entries of the intermediate matrix, N, k are summed by the adder 1108B to generate a matrix sum "/'. The summation of the entries produces: n
  • a new value, "inew”, is determined for the first entry, by summing (at adder 1112B) the intermediate product z from multiplier 111OB and the initial value of "icent.” Based on clock cycles 1 and 2, the new pixel value, i new , for the center pixel may be calculated as:
  • the hardware depicted in FIG. HB comprises n+2 multipliers and n+2 adders (or n+1 adders and a subtractor).
  • the difference operation 1126B may be implemented with a dedicated subtractor or an adder with the icent input complemented.
  • a first stage includes "n + 1" multipliers 1102B, 1104B, wherein the first stage is operable to multiply a parameter (e.g., alpha, beta, gamma) from each of the first order difference equations to generate a parameter product ("x"), and multiply the kernel matrix (K) by the signal matrix (I) to generate "n" entry products (N, k ).
  • a parameter e.g., alpha, beta, gamma
  • a second stage is coupled to the first stage and includes the aforementioned substractor 1126B and adders 1108B that are operable to sum the "n" entry products N jk to generate a matrix sum (i).
  • a third stage is coupled to the second stage and includes a multiplier 111OB to generate intermediate product z from the product of the values x and i-icent.
  • an adder 1112B is operable to determine a new value (i new ) for the signal, based on the matrix sum, the parameter product, and the initial value for the signal. Note that initial value of the signal is processed by the third stage to produce intermediate product "z", which is input to the adder 1112B of the third stage.
  • FIG. 13 illustrates a block diagram for an example mobile device 1300 in which embodiments of the present invention may be implemented.
  • Mobile device 1300 comprises a camera assembly 1302, camera and graphics interface 1380, and a communication circuit 1390.
  • Camera assembly 1370 includes camera lens 1336, image sensor 1372, and image processor 1374.
  • Camera lens 1336 comprising a single lens or a plurality of lenses, collects and focuses light onto image sensor 1372.
  • Image sensor 1372 captures images formed by light collected and focused by camera lens 1336.
  • Image sensor 1372 may be any conventional image sensor 1372, such as a charge-coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) image sensor.
  • CCD charge-coupled device
  • CMOS complementary metal oxide semiconductor
  • Image processor 1374 processes raw image data captured by image sensor 1372 for subsequent storage in memory 1396, output to a display 1326, and/or for transmission by communication circuit 1390.
  • the image processor 1374 may be a conventional digital signal processor programmed to process image data, which is well known in the art.
  • Image processor 1374 interfaces with communication circuit 1390 via camera and graphics interface 1380.
  • Communication circuit 1390 comprises antenna 1312, transceiver 1393, memory 1396, microprocessor 1392, input/output circuit 1394, audio processing circuit 1306, and user interface 1397.
  • Transceiver 1393 is coupled to antenna 1312 for receiving and transmitting signals.
  • Transceiver 1393 is a fully functional cellular radio transceiver, which may operate according to any known standard, including the standards known generally as the Global System for Mobile Communications (GSM), TIA/EIA-136, cdmaOne, cdma2000, UMTS, and Wideband CDMA.
  • GSM Global System for Mobile Communications
  • TIA/EIA-136 cdmaOne
  • cdma2000 cdma2000
  • UMTS Wideband CDMA
  • the image processor 1374 may process images acquired by the sensor 1372 using one or more embodiments described herein.
  • the image processor 1374 can be implemented in hardware, software, or some combination of software and hardware.
  • the image processor 1374 could be implemented as part of an application specific integrated circuit (ASIC).
  • ASIC application specific integrated circuit
  • the image processor 1374 may be capable of accessing instructions that are stored on a computer readable medium and executing those instructions on a processor, in order to implement one or more embodiments of the present invention.
  • Microprocessor 1392 controls the operation of mobile device 1300, including transceiver 1393, according to programs stored in memory 1396. Microprocessor 1392 may further execute portions or the entirety of the image processing embodiments disclosed herein. Processing functions may be implemented in a single microprocessor, or in multiple microprocessors. Suitable microprocessors may include, for example, both general purpose and special purpose microprocessors and digital signal processors. Memory 1396 represents the entire hierarchy of memory in a mobile communication device, and may include both random access memory (RAM) and read-only memory (ROM). Computer program instructions and data required for operation are stored in non- volatile memory, such as EPROM, EEPROM, and/or flash memory, which may be implemented as discrete devices, stacked devices, or integrated with microprocessor 1392.
  • RAM random access memory
  • ROM read-only memory
  • Input/output circuit 1394 interfaces microprocessor 1392 with image processor
  • Camera and graphics interface 1380 may also interface image processor 1374 with user interface 1397 according to any method known in the art.
  • input/output circuit 1394 interfaces microprocessor 1392, transceiver 1393, audio processing circuit 1306, and user interface 1397 of communication circuit 1390.
  • User interface 1397 includes a display 1326, speaker 1328, microphone 1338, and keypad 1340.
  • Display 1326 disposed on the back of display section, allows the operator to see dialed digits, images, called status, menu options, and other service information.
  • Keypad 1340 includes an alphanumeric keypad and may optionally include a navigation control, such as joystick control (not shown) as is well known in the art.
  • keypad 1340 may comprise a full QWERTY keyboard, such as those used with palmtop computers or smart phones. Keypad 1340 allows the operator to dial numbers, enter commands, and select options.
  • Microphone 1338 converts the user's speech into electrical audio signals. Audio processing circuit 1306 accepts the analog audio inputs from microphone 1338, processes these signals, and provides the processed signals to transceiver 1393 via input/output 1394. Audio signals received by transceiver 1393 are processed by audio processing circuit 1306. The basic analog output signals produced by processed audio processing circuit 1306 are provided to speaker 1328. Speaker 1328 then converts the analog audio signals into audible signals that can be heard by the user.
  • camera and graphics interface 1380 may be combined.
  • camera and graphics interface 1380 may be incorporated with input/output circuit 1394.
  • microprocessor 1392, input/output circuit 1394, audio processing circuit 1306, image processor 1374, and/or memory 1396 may be incorporated into a specially designed application-specific integrated circuit (ASIC) 1391.
  • ASIC application-specific integrated circuit
  • FIG. 12 is a block diagram that illustrates a computer system 1200 upon which an embodiment of the invention may be implemented.
  • Computer system 1200 includes a bus 1202 or other communication mechanism for communicating information, and a processor 1204 coupled with bus 1202 for processing information.
  • Computer system 1200 also includes a main memory 1206, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 1202 for storing information and instructions to be executed by processor 1204.
  • Main memory 1206 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 1204.
  • Computer system 1200 further includes a read only memory (ROM) 1208 or other static storage device coupled to bus 1202 for storing static information and instructions for processor 1204.
  • ROM read only memory
  • a storage device 1210 such as a magnetic disk or optical disk, is provided and coupled to bus 1202 for storing information and instructions.
  • Computer system 1200 may be coupled via bus 1202 to a display 1212, such as a cathode ray tube (CRT), for displaying information to a computer user.
  • a display 1212 such as a cathode ray tube (CRT)
  • An input device 1214 is coupled to bus 1202 for communicating information and command selections to processor 1204.
  • cursor control 1216 is Another type of user input device
  • cursor control 1216 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 1204 and for controlling cursor movement on display 1212.
  • This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • the computer system 1200 may further include an audio/video input device 1215 such as a microphone or camera to supply audible sounds, still images, or motion video, any of which may be processed using the
  • Various processing techniques disclosed herein may be implemented to process data on a computer system 1200. According to one embodiment of the invention, those techniques are performed by computer system 1200 in response to processor 1204 executing one or more sequences of one or more instructions contained in main memory 1206. Such instructions may be read into main memory 1206 from another machine-readable medium, such as storage device 1210. Execution of the sequences of instructions contained in main memory 1206 causes processor 1204 to perform the process steps described herein.
  • hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
  • machine-readable medium refers to any medium that participates in providing data that causes a machine to operation in a specific fashion.
  • various machine-readable media are involved, for example, in providing instructions to processor 1204 for execution.
  • Such a medium may take many forms, including but not limited to storage media and transmission media.
  • Storage media includes both non-volatile media and volatile media.
  • Non- volatile media includes, for example, optical or magnetic disks, such as storage device 1210.
  • Volatile media includes dynamic memory, such as main memory 1206.
  • Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 1202.
  • Transmission media can also take the form of acoustic or light waves, such as those generated during radio- wave and infra-red data communications. All such media must be tangible to enable the instructions carried by the media to be detected by a physical mechanism that reads the instructions into a machine.
  • Machine -readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.
  • Various forms of machine-readable media may be involved in carrying one or more sequences of one or more instructions to processor 1204 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to computer system 1200 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
  • An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 1202.
  • Bus 1202 carries the data to main memory 1206, from which processor 1204 retrieves and executes the instructions.
  • the instructions received by main memory 1206 may optionally be stored on storage device 1210 either before or after execution by processor 1204.
  • Computer system 1200 also includes a communication interface 1218 coupled to bus 1202.
  • Communication interface 1218 provides a two-way data communication coupling to a network link 1220 that is connected to a local network 1222.
  • communication interface 1218 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 1218 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 1218 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • Network link 1220 typically provides data communication through one or more networks to other data devices.
  • network link 1220 may provide a connection through local network 1222 to a host computer 1224 or to data equipment operated by an Internet Service Provider (ISP) 1226.
  • ISP 1226 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 1228.
  • Internet 1228 uses electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on network link 1220 and through communication interface 1218, which carry the digital data to and from computer system 1200, are exemplary forms of carrier waves transporting the information.
  • Computer system 1200 can send messages and receive data, including program code, through the network(s), network link 1220 and communication interface 1218.
  • a server 1230 might transmit a requested code for an application program through Internet 1228, ISP 1226, local network 1222 and communication interface 1218.
  • the received code may be executed by processor 1204 as it is received, and/or stored in storage device 1210, or other non-volatile storage for later execution. In this manner, computer system 1200 may obtain application code in the form of a carrier wave.
  • Data that is processed by the embodiments of program code as described herein may be obtained from a variety of sources, including but not limited to an A/V input device 1215, storage device 1210, and communication interface 1218.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un procédé et un dispositif pour effectuer efficacement un traitement de signal numérique. Dans un mode de réalisation, les calculs de matrice de noyau sont simplifiés en groupant les coefficients de noyau similaires. Chaque groupe de coefficients ne contient que des coefficients ayant la même valeur. Au moins l’un des groupes de coefficients comporte au moins deux coefficients. L’invention concerne également des techniques pour appliquer efficacement des opérations de différence de premier ordre successives à un signal de données. Les techniques permettent de réduire le nombre de portes. En particulier, les techniques permettent de réduire le nombre de multiplicateurs sans augmenter la fréquence d'horloge, dans un mode de réalisation. Les techniques mettent à jour les pixels d'un signal de données à une fréquence de deux cycles d'horloge par pixel, dans un mode de réalisation. Les techniques permettent de réutiliser le matériel qui est utilisé pour traiter un premier pixel pour débuter le traitement d'un second pixel alors que le premier pixel est encore en cours de traitement.
PCT/US2008/062949 2008-05-07 2008-05-07 Mises en œuvre efficaces de calcul de noyau WO2009136923A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2008/062949 WO2009136923A2 (fr) 2008-05-07 2008-05-07 Mises en œuvre efficaces de calcul de noyau

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2008/062949 WO2009136923A2 (fr) 2008-05-07 2008-05-07 Mises en œuvre efficaces de calcul de noyau

Publications (2)

Publication Number Publication Date
WO2009136923A2 true WO2009136923A2 (fr) 2009-11-12
WO2009136923A3 WO2009136923A3 (fr) 2011-09-09

Family

ID=41265199

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/062949 WO2009136923A2 (fr) 2008-05-07 2008-05-07 Mises en œuvre efficaces de calcul de noyau

Country Status (1)

Country Link
WO (1) WO2009136923A2 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114418A1 (en) * 2003-11-24 2005-05-26 John E. Rosenstengel Efficient convolution method with radially-symmetric kernels
WO2008124741A2 (fr) * 2007-04-09 2008-10-16 Tessera, Inc. Mises en œuvre efficaces de calcul de noyau
WO2008128772A2 (fr) * 2007-04-24 2008-10-30 Tessera Technologies Hungary Kft. Techniques d'ajustement de l'effet d'application de noyaux à des signaux pour obtenir l'effet souhaité sur des signaux
US20090077359A1 (en) * 2007-09-18 2009-03-19 Hari Chakravarthula Architecture re-utilizing computational blocks for processing of heterogeneous data streams

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114418A1 (en) * 2003-11-24 2005-05-26 John E. Rosenstengel Efficient convolution method with radially-symmetric kernels
WO2008124741A2 (fr) * 2007-04-09 2008-10-16 Tessera, Inc. Mises en œuvre efficaces de calcul de noyau
WO2008128772A2 (fr) * 2007-04-24 2008-10-30 Tessera Technologies Hungary Kft. Techniques d'ajustement de l'effet d'application de noyaux à des signaux pour obtenir l'effet souhaité sur des signaux
US20090077359A1 (en) * 2007-09-18 2009-03-19 Hari Chakravarthula Architecture re-utilizing computational blocks for processing of heterogeneous data streams

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CASTRO-PAREJA C R ET AL: "FPGA-based real-time anisotropic diffusion filtering of 3D ultrasound images", REAL-TIME IMAGING IX, PROCEEDINGS OF SPIE-IS&T ELECTRONIC IMAGING VOL. 5671, 18 January 2005 (2005-01-18), pages 123-131, XP002635470, *
HON KWEUNG KWAN ET AL: "Design of multidimensional spherically symmetric and constant group delay recursive digital filters with sum of powers-of-two coefficients", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, vol. 37, no. 8, August 1990 (1990-08), pages 1027-1035, XP002635471, *
JIHONG KIM ET AL: "Efficient 2-D convolution algorithm with the Single-Data Multiple Kernel approach", CVGIP GRAPHICAL MODELS AND IMAGE PROCESSING, vol. 57, no. 2, March 1995 (1995-03), pages 175-182, XP004419048, ISSN: 1077-3169, DOI: DOI:10.1006/GMIP.1995.1017 *
LEE Y H ET AL: "GA-based design of multiplierless 2-D state-space digital filters with low roundoff noise", IEE PROCEEDINGS: CIRCUITS, DEVICES AND SYSTEMS, vol. 145, no. 2, 8 April 1998 (1998-04-08) , pages 118-124, XP006010755, ISSN: 1350-2409, DOI: DOI:10.1049/IP-CDS:19981845 *
PORIKLI F: "Reshuffling: a fast algorithm for filtering with arbitrary kernels", REAL-TIME IMAGE PROCESSING 2008, PROCEEDINGS OF SPIE-IS&T ELECTRONIC IMAGING VOL. 6811, 28 January 2008 (2008-01-28), pages 68110M-1-68110M-10, XP002524200, *

Also Published As

Publication number Publication date
WO2009136923A3 (fr) 2011-09-09

Similar Documents

Publication Publication Date Title
US8417759B2 (en) Efficient implementations of kernel computations
US20090077359A1 (en) Architecture re-utilizing computational blocks for processing of heterogeneous data streams
KR101526031B1 (ko) 이미지에서 콘트라스트를 보존하면서 노이즈를 저감시키기 위한 기술
CN103561206B (zh) 图像处理方法和装置
EP1601184A1 (fr) Procédé et dispositif pour filtres de traitement d'mages localement adaptatifs
CN109978788B (zh) 卷积神经网络生成方法、图像去马赛克方法及相关装置
US20070071353A1 (en) Denoising method, apparatus, and program
US8503828B2 (en) Image processing device, image processing method, and computer program for performing super resolution
CN101466046A (zh) 用于去除图像信号的颜色噪声的方法和设备
US8073282B2 (en) Scaling filter for video sharpening
US9715720B1 (en) System and method for reducing image noise
CN111429458B (zh) 一种图像还原的方法、装置及电子设备
WO2009136923A2 (fr) Mises en œuvre efficaces de calcul de noyau
JP2008523489A (ja) 画像サイズを変更する方法および装置
CN113793358B (zh) 一种目标跟踪定位方法、装置及计算机可读介质
JP4323808B2 (ja) 二次元ピラミッド・フィルタ・アーキテクチャ
US8666172B2 (en) Providing multiple symmetrical filters
CN111383171B (zh) 一种图片处理方法、系统及终端设备
CN101133430A (zh) 图像对比度和锐度增强
KR101775273B1 (ko) 이미지들 사이의 상관관계 획득 방법 및 시스템
US20230368496A1 (en) Super resolution device and method
Karthik et al. Design and implementation of adaptive Gaussian filters for the removal of salt and pepper noise on FPGA
CN101421760A (zh) 图像缩放方法和设备
CN113610725A (zh) 图片处理方法、装置、电子设备及存储介质
GB2356506A (en) Multiple dimension interpolator

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08769321

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08769321

Country of ref document: EP

Kind code of ref document: A2