CN1338090A

CN1338090A - Digital camera using programmed parallel computer for image processing functions and control

Info

Publication number: CN1338090A
Application number: CN99816096.2A
Authority: CN
Inventors: 托德·E·罗克奥夫; 罗伯特·兰格; 默里·华莱士
Original assignee: INTENSYS Corp
Current assignee: INTENSYS Corp
Priority date: 1998-12-15
Filing date: 1999-12-15
Publication date: 2002-02-27
Also published as: EP1141891A1; TW429331B; WO2000036562A9; JP2002532810A; WO2000036562A1; AU2362200A

Abstract

A digital camera apparatus includes a sensor that generates image data. The apparatus further includes a parallel processor to process the sensed image data. Programmed parallel computing circuitry accomplishes compute-intensive image processing functions on the generated image data. Using semiconductor-efficient programmed parallel computing structures, the ratio of performance to hardware cost is maximized in the digital imaging apparatus, while enabling a great degree of functional flexibility and product diversity both within and across digital imaging product categories. In particular embodiments, the programmed parallel computing structures are instruction-cached SIMD computers.

Description

Use the parallel computer realization image processing function of programming and the digital camera of control

The present invention relates to a kind of digital camera, relate in particular to a kind of digital camera that uses the parallel computer realization image manipulation function of programming.

Digital camera is functional with regard to it, with regard to reliability, convenience and the cost, no matter all have great advantage for amateur and professional.For example, the film of exposure generally must just can be seen image through chemical development, this is a kind of time-consuming and expensive processing, and the image of digital camera can directly be seen by the LCD on the video camera, also can watch on computers, can use color printer to print, perhaps share by the internet.Although these advantages are arranged, with respect to the main flow consumer products, digital camera still is used less at present, only accounts for a little share of market sale.Why digital camera can not be adopted apace, is because present digital camera has relative higher cost, relatively low picture quality.

The digital processing of image sequence (video) is than the processing of rest image computation-intensive especially.When per second 30 frames, it is tens times of the processing of rest image that the computing power of the real-time processing of digital video requires.Though digital video is used for video conference and comprises DVD and the consumer products of video camera at present, but compare with static digital photography, at present relevant quite low picture quality and high cost with digital video mean its by a large amount of employings also in future far away.

Equally, use for digital imagery, in general, conventional solution generally includes expensive special circuit, is used to realize the Flame Image Process of special-purpose computation-intensive.For example, Fig. 1 is by LSI Logic of Milpitas, the calcspar of the DCAM-101 " the single chip that is used for digital camera " that California makes.As seen from Figure 1, DCAM-100 uses independent hardware circuit to carry out gamma correction, color space transformation and JPEG coding and decoding.Fig. 2 is the Flame Image Process scheme of another kind of (general) digital camera of explanation, wherein uses independent hardware circuit for each function of several image processing functions.The dispersion of this digital imagery function has hindered its development, and this is because this makes these products exchange image data easily, and the product fabricator just thinks of limited scale economics.

In addition, even the digital camera image processing capacity is programmed (opposite with special hardware circuit), even and the Digital Image Processing function is programmed for parallel computer, the image processor that is programmed that also remains conventional uses a treatment element for each pixel in the sweep trace of image.For example referring to Allan L.Fisher, Peter T.Highnam, with Todd E.Rockoff, " A four-processor Building Block for SIMDProcessor Arrays ", IEEE Journal of Solid State Circuits vol.25, No.2, April, 1990, pp.369-375.Above-mentioned article has disclosed a kind of array of scan lines processor (SLAP) structure (though being not used in digital camera), it is by the linear array (treatment element of processor, perhaps " PE ") to form, described array is controlled with single instruction multiple data (SIMD) mode by broadcasting instructions.Fig. 3 is a kind of comprehensive SLAP topology and the calcspar that obtains.Scan-line data is moved among pixel data shift register 302 at different levels serially, and wherein each grade is used for a pixel of sweep trace, then, is delivered to concurrently by the PEP that operates concurrently according to broadcasting instructions among the PEI of PE.

Fig. 4 is that explanation distributes the calcspar of view data to the PE in SLAP, the three class pipeline operation of Fig. 5 A-5E explanation in SLAP.Unfortunately, along with the increase of processed image resolution ratio, the quantity of required processor also increases, and a large amount of processors is unfavorable for being used for for example digital camera of mancarried device.

The present invention is a kind of digital camera device.This device comprises the detecting device that produces view data, and comprises the parallel processor of the view data that is used to handle detection.The parallel computation circuit that is programmed carries out the image processing function of computation-intensive to the view data that produces.Use the semiconductor parallel processing structure of efficiently programming, the invention enables performance in digital imaging apparatus to the ratio of the cost of hardware for maximum, have big dirigibility simultaneously, and in the digital imaging apparatus field, can produce many different products.In certain embodiments, the parallel computation structure of described programming is the SIMD computing machine that stores instruction.

According to another aspect of the present invention, the pixel less treatment element image data processing of the parallel computer that is used for image data processing than the sweep trace of image.

Fig. 1 is the calcspar that the monolithic processor of the routine that is used for digital camera is described;

Fig. 2 is the calcspar that the digital image processing apparatus of the routine that is used for digital camera is described;

Fig. 3 is the calcspar of topology of the array of scan lines processor of the explanation routine that is applicable to various image calculation;

Fig. 4 explanation distributes view data to the PE in the array of scan lines processor of SLAP routine for example shown in Figure 3;

The operation of the routine of three grades of view data streamlines of Fig. 5 A-5E explanation array of scan lines processor.

Fig. 6 is the functional block diagram according to the digital imaging apparatus of embodiments of the invention, comprising the parallel computer of the programming that is applied to digital camera;

Fig. 7 is the functional block diagram of single-chip digital imager chip with SIMD PE module with single instrction buffer memory of the PE more less than the pixel of sweep trace;

Fig. 8 is the functional block diagram with single-chip digital imager chip of multiple instruction buffer memory SIMD PE module, and wherein at least some modules have the PE than the negligible amounts of the scan line pixel of distributing to this module;

Fig. 9 is the functional block diagram of digital imaging system that comprises a plurality of examples of Instructions Cache SIMD chip, and it will be as the example that is used for higher-priced digital video camcorder;

Figure 10 illustrates an embodiment of the array of scan lines processor view data streamline of the enhancing with PE more less than the pixel of each sweep trace;

Figure 11 illustrates second embodiment of the array of scan lines processor view data streamline of the enhancing with PE more less than the pixel of each sweep trace.

According to the aspect of broad sense of the present invention, the image processing function of digital camera is realized by the parallel computer of programming.An important fact of still not utilized fully as yet in this aspect of the invention is that the digital imagery function is upgradeable parallel data.The character of this imaging function makes them be suitable for realizing high-level efficiency on the parallel computer of the programming with many or thousands of treatment elements (PE).A kind of measure of raising the efficiency is to quicken by the single processor of being realized by parallel computer.The maximal efficiency of the parallel computer of N PE is N.

Be used for Flame Image Process use programming parallel computer digital camera an embodiment as shown in Figure 6.Referring to Fig. 6, image scioptics 602 are focused on the detecting device 604, and described detecting device for example is a kind of charge-coupled device (CCD), and it produces a plurality of simulating signals corresponding to image.Described simulating signal is by A/D change-over circuit 606, thus the digitized forms of generation image.Be provided for the input FPDP 612 of parallel computer 608 by traffic pilot 610 from digital picture (pixel) data of A/D change-over circuit 606.Digitized pixel data then, offers data processed the view data output port 614 of parallel computer 608 by parallel computer 608 operations.In addition, pixel data can be used for control figure video camera itself, for example the acquisition of control chart picture by parallel computer 608 operations.

The output data port 614 of parallel computer 608 links to each other with bus 616 by traffic pilot 610 with input FPDP 612.Also provide many other circuit that link to each other with bus 616, comprising microprocessor 618 (having continuous ROM620 and RAM621), the general purpose I that links to each other with external unit/O circuit 622, the serial i that links to each other with the serial port of personal computer/O circuit 624, the electrical interface 630 that links to each other with LCD 632, the NTSC/PAL video signal is sent to televisor by analog converter interface 634.Bus 616 also links to each other with the control/status port of parallel computer 608, and links to each other with electrical interface 636, is used to control visual detector 604.

At last, parallel computer 608 also comprises Memory Controller 638, and it links to each other with DRAM640 (or other RAM), and inner PE (" PE " is treatment element) communication interface 642, is used to connect the inner PE communication network of multi-disc.In general, parallel computer 608 is made of many treatment elements (PE) that connected by internal communication network.Though the ad hoc structure of the inside PE communication network in parallel computer 608 is considered to a key character of parallel computer architecture usually, does not concentrate on this aspect of the present invention.But, a kind of topology seemingly suitable for the parallel computer 608 that is used for digital imagery is linear array, for example by shown in the array of scan lines processor (SLAP).In addition, about the background technology of SLAP, the reader can consult Allan L.Fisher, Peter T.Highnam, and ToddE.Rockoff, " A Four-Processor Building Block for SIMD ProcessorArrays ", IEEE Journal of Solid State Circuits, vol.25, No.2, April, 1990, pp.369-375.

Parallel computer 608 can carry out graphical analysis, operation and enhancement function.The speed that the setting of the function that is provided, size of images and function are used is the main identification marks in the middle of the digital imagery product.The image processing function of combine digital video camera efficiently not only because parallel computer 608 is programmed (promptly and opposite with hard-wired ASIC), and the development of digital camera and renewal have been greatly simplified.

The imaging task that can provide comprises compensating images detector characteristic (comprising resolution, the ratio of width to height, primitive shape etc.), compensating images display characteristic (comprising resolution, the ratio of width to height, primitive shape etc.), colour correction and color space conversion, the improvement of picture quality, producing the view finder that strengthens shows, realize compression and decompress(ion) for storing and exchanging, carry out encryption and decryption etc. for carrying out Image Communication.

As discussing in background technology, SLAP provides the view data shift unit, and being of a size of each pixel along horizontal image provides one-level described shift unit.The serial scan output characteristics of the design of SLAP and visual detector is mated well.The SLAP design produces three grades of low view data streamlines of a kind of cost, wherein output pixel sweep trace is shifted output, the output valve of second sweep trace is calculated simultaneously, and the 3rd sweep trace of visual detector data is displaced in the parallel computer and handles simultaneously.

List some functions that in digital imaging apparatus, to carry out below with parallel computer 608.According to order, list the function of some examples in the output of the view data of 604 pairs of processing of processor output terminal 614 some function effect detecting devices.

1) pixel data is proofreaied and correct

Put on the function of the digital pixel data that receives by visual detector 604

A) pixel adjustment

The pixel adjustment need be known the calibration value of each pixel.Pixel is adjusted the pixel value of change detection, so that incomplete response characteristic of each element in the compensation detector array.Calibration information obtains (for example being provided at the image of lens cap inside) by the response of measuring known image.For corresponding visual detector element, the adjustment of each pixel in the image is only carried out according to the pixel value that detects.

B) gamma correction

The response characteristic of the response characteristic of visual detector in dynamic range and human eye is different.The pixel value that gamma correction conversion is non-linearly measured makes the subjective conspicuousness maximum of the least significant bit (LSB) of pixel value.Gamma correction to each pixel in the image only carries out according to the pixel value that detects and the required form of response curve; The target response curve is shared in the middle of all pixels, and does not change with image.

C) color space transformation

Visual detector generally has the intensity level of integer in each primary colors (RGB).It seems that from the viewpoint of linear algebra " basic vector " R, G, B are not quadratures.This fact means that the R value that changes pixel also makes G, B value change.Flame Image Process general more effective expression based on YC _bC _rThe space, the wherein pure brightness (brightness) of Y represent pixel, C _bAnd C _rThe locations of pixels of representative in two-dimentional color plane.YC _bC _rIt is the vector of orthogonal basis.By the RGB image transformation is YC _bC _rImage need each pixel in image be taken advantage of 3 * 1 vectors by 3 * 3 transformation matrixs.The color space transformation of each pixel only carries out according to the pixel value and the value in the transformation matrix that detect, and the transformation matrix value is fixed, and is that all pixel is shared.

2) image optimization: scene analysis and processing

Regulate the image that detects, thereby improve the quality of output image.

A) over-sampling (digital zoom)

When needs make image have than the higher resolution of the resolution that is obtained by visual detector, can handle the value that produces between the pixel that is in detection by interpolation.General digital camera uses linear interpolation, is represented by the mean value of the weighting of its adjacent pixels so as to making each pixel.One aspect of the present invention is to use the ability of parallel computer, thereby uses the interpolation algorithm of higher-order.

B) digital image stabilization

The digital image stabilization processing is used for the motion of compensation video camera when the image of finding a view static.The motion of video camera produces can move the skew that compensates by making pixel between frame.Eliminate in the motion MPEG function below of frame to frame and discuss.A given motion vector only carries out at a class value of the narrow neighbor of a pixel of former frame according to described motion vector and center for the digital image stabilization of each pixel.

C) Xian Jin function is for example eliminated the flicker and the function of finding a view

Thereby make before image is input to storer starting electronic shutter, finish these functions in an ideal way in imaging device, these functions can make image be hunted down in the desirable moment.The various character of these functional analysis special scenes, thus determine how to catch image.Though the definition of these functions itself is not within the scope of the invention, these functions need be carried out the upgradeable data parallel calculating of computation-intensive with high speed.

3) JPEG compression

There is the method for many standards to be used to simplify the expression of the image that step-by-step measures.JPEG has several modes of operation, and some keep all original image and detect data, belong to the mode of operation of " can't harm ", and some remove some information, thereby the compressed image of recovery belongs to the mode of operation of " diminishing " with original different.Psychology-the visual theory that diminishes under the mode at JPEG is that people's eyes are insensitive for the high frequency spatial component of image.In other words, when watching some image with spot, people's eyes increase the weight of edge information.The course of work of the mode that diminishes of Joint Photographic Experts Group is that the spatial frequency spectrum of analysis image then, is selectively removed resolution (resolution) from high fdrequency component, so as to realizing finer and close image expression.Main evaluation work is applied to 8 * 8 pieces of pixel in the JPEG compression, makes only to determine according to 8 * 8 block of pixels at this pixel place about the intermediate result that produces between use JPEG compression period of given pixel.

A) grating piece conversion

The first step is transformed to 8 * 8 expression that are applicable to JPEG to (grating) line by line of visual detector scanning output transform.When entire image is stored in the storer, realize the conversion of grating piece by the pixel value of visit storage suitably.The pixel value of 16 sweep traces need be cushioned in the conversion of line grating piece, only after 8 sweep traces of raster data are received, 8 * 8 could be obtained at the output terminal of transducer.Sort buffer can utilize SLAP molded lines array computer easily to realize.

B) piece discrete cosine transform (DCT)

The DCT that is applied to 8 * 8 block of pixels is similar to a kind of signal Processing function, and belongs to a kind of function of computation-intensive.DCT is transformed into frequency to the space expression of color value and expresses.It is the key of using the psychological physiology principle of JPEG compression that frequency is expressed, and described principle thinks that the resolution of high-frequency information is so important unlike low-frequency information for people's eyes.

8 * 8DCT is provided by following formula:

f (u, v) = \frac{1}{4} C (u) C (v) [Σ_{x = 0}^{7} Σ_{y = 0}^{7} f (x, y) * \cos \frac{(2 x_1) uπ}{16} \cos \frac{(2 y + 1) vπ}{16}]

See Gregory K.Wallace, " The JPEG Still Picture Compression

Standard，”Communications?of?the?ACM，vol.34，no.4，April?1991，pp.30-44.

Be similar to 2-D FFT, piece DCT is a kind of separable conversion.This means in the row of 8 * 8DCT to comprise 8 1-D DCT, comprise 8 other 1-D DCT in being expert at.Estimate that 8 element 1D DCT approximately need 20 multiplications/additions steps.Therefore, in 8 * 8DCT, the quantity of multiplications/additions step is provided by following formula:

20 \frac{MACs}{1 - DDCT} * 8 cols + 20 \frac{MACs}{1 - DDCT} * 8 rows = 320 \frac{MACs}{2 - DDCT}

C) quantize

Jpeg algorithm is reasonably removed some information in harmless mode from the image of compression in quantization step.

Quantification treatment is applied to each coefficient of 8 * 8 block of pixels, to described block of pixels give one group be the shared quantization parameter Q of all pieces in the image (u, v).Quantization algorithm is expressed as follows:

F^{Q} (u, v) = Integer Round (\frac{F (u, v)}{Q (u, v)})

Following formula shows that quantification need be carried out divide operations one time to each pixel.

D) differentiated pulse coded modulation

In the middle of image, the DC of DCT (zero frequency) parameter F (0,0) is carried out differential code.This step need communicate in the middle of the block of pixels of direct neighbor.

E) entropy coding

The DCT coefficient that quantizes is represented densely, for example by using Huffman encoding.Entropy coding in two steps, first home block step is wherein distributed symbol to coefficient, home block and intermediate mass step, wherein symbol is transferred to the sequence of the position of different length.The first step need not communicate in the middle of block of pixels, and second step need communicate in the middle of the block of pixels of direct neighbor.

4) MPEG compression

MPEG is a kind of compression standard that is applied to video image usually.The core of MPEG is identical with jpeg algorithm, and it depends on the quantification of the frequency domain information of realizing by DCT, thereby removes information in unnoticed mode from the image of compression.Additional function below the MPEG definition wherein uses such fact, and the sequence of the video image of single scene is shared a large amount of public informations.

A) estimation

The purpose of estimation is for the given block of pixels in the given frame of video, determines from previous frame and enters the block of pixels of successive frames.The piece of the pixel relevant with the fragment of visible objects in the image when the not motion or during when camera motion of described object, seemingly moves with being interrupted.

Estimation is generally operated 64 * 64 macro blocks, attempts to calculate poor between the given macro block of present frame and the neighboring macro-blocks in consecutive frame.Spatially (in an adjacent frame) and time are gone up the restriction that the degree (quantity of the frame of search) of searching for is subjected to available processing power.

Estimation only requires and carries out this locality communication that in the middle of pixel its validity and the processing power that applies are directly proportional.Therefore, MPEG compresses seemingly a kind of like this compression, it needs many arbitrarily calculating, even Duo processing power (can be by providing with the user at next 20 years) branch only again.

5) display interface 630

Some conventional video cameras comprise that being used to operate LCD shields for example circuit of LCD screen 632.

A) owe sampling

Usually the pixel resolution of the LCD that can provide is less than the resolution of image.Conventional video camera is by the extra pixel in the output of ignoring detecting device 604 or carry out the simple sampling of owing of on average carrying out image.Owing to sample is a kind of the algorithm that need communicate in the middle of adjacent pixels.

B) color space transformation

The LCD screen need not be convenient to be used for the YC of image manipulation algorithms _bC _rValue is as input.Therefore, carry out inverse transformation, pixel value is transformed to RGB represents to be used to show.This conversion need make 3 * 1 vectors multiply by 3 * 3 matrixes in each pixel.

C) LCD display has aliasing effect (dentation distortion row) usually.Use anti-aliased algorithm and realize that LCD shows clearly.

Above-mentioned discussion can be upgraded and carry out the parallel data processing for many (if not all) the digital imagery algorithm that uses in the digital camera static and motion of routine provides foundation.This fact is to assert that a large amount of digital imagery product carries out the foundation of the data parallel algorithm of upgrading.These algorithms are suitable for parallel processing.

Utilize programmable parallel computer to replace the circuit of fixed function, can improve the dirigibility of the function of video camera, make the time minimum that the function of supervene is used.

Remove and can substitute on the cost-effectiveness outside the fixed function circuit, the present invention can also provide following valuable function:

1) employing is than the faulty visual detector of the perfect visual detector element low cost of manufacture of not losing picture quality, proofread and correct the picture quality that reaches best through automatic detector, make the video camera fabricator to reduce cost by adopting cheap visual detector.

2) can adapt to large-scale detector size and picture format.

3) can make the video camera user in a big scope, exchange the ratio of compression that uses for image resolution ratio.

4) can be used to the digital focus (with respect to the low relatively quality of the nearest neighbor linear interpolation of in the digital camera of routine, using) that interpolation algorithm in field of Computer Graphics is realized quite high quality.

5) can be with the high cost/effectiveness realization camera and the function of video camera.In camera, use additional processing power to replace in video camera, carrying out a large amount of computings for carrying out quality optimization.For example carry out compression, decompress(ion) and the correction processing of background continuously, be determined by experiment best quantization table for current content of shooting.

6) can realize that the lcd screen image shows clearly.

7) can be provided for importing any digital image file or any data stream format for example as the fexible unit of universal display device.

8) can be provided for exporting any digital image file or any data stream format for example as the fexible unit of universal display device.

9) can provide the person that makes the production to adapt to the fexible unit of standard fast-developing in the digital picture product scope apace.

10) can provide the person that makes the production can increase or reduce the fexible unit of the function of product, thereby improve the production line of relevant digital picture product, reduce the manufacturing cost of production line with form of software.

According to another aspect of the present invention, embodiment has realized a kind of parallel computer 608 as the SIMD computing machine by the storage of instructing, as described in United States Patent (USP) 5511212 (No. 212 patents) as shown in Figure 7 and Figure 8.Described No. 212 patents are listed in herein, and it in full as a reference.No. 212 Patent publish a kind ofly be used to realize the SIMD computer method be maximum so that make performance (measuring) to the ratio of the cost (measuring) of hardware with chip area with total pixel operation that per second carries out.

In general, small-sized digital imagery product comprises microcontroller 618 (Fig. 6), is used for adjusting and controlling each systemic-function.According to this aspect of the present invention, microcontroller 618 (being sometimes referred to as " microprocessor " or " microprocessor of interior dress ") is as the SIMD system for computer controller of Instructions Cache.Microcontroller bus 616 is as total instruction transmission network and response to network.Discuss as No. 2 patents of Figure 21, and as Fig. 7, shown in 8, each PE module provides a local controller 705, wherein each PE module comprises a plurality of PE.The quantity of the PE module in system depend on for example required PE sum, PE logic complicacy and by the size of the determined retaining zone of VLSI realization technology that is used to realize the digital camera treating apparatus.

Fig. 7 represents single module Instructions Cache SIMD computing machine, and Fig. 8 represents multimode Instructions Cache SIMD computing machine.(wherein the element of the computing machine among Fig. 7 is doubled in Fig. 8, and multicomponent is with a, b label).

List major function below:

1) this device can be realized with a chip or a plurality of chip.Single chip is suitable for camera and low-grade video camera, and the multicore sheet is suitable for having the video camera of very high performance.

2) for any VLSL realization technology that is used to realize the digital camera treating apparatus, this device has the cost ratio of maximum performance to hardware.

3) this device is applicable to cheap cmos image detector element and integrates.

The SIMD computing machine of Instructions Cache has described in detail in No. 212 patents, and pixel data buffers 702 has increased external interface (for example local external memory interface 704), has the system of a plurality of this chips so that help to form.Each PE that is used for image calculation is special-purpose.A suitable PE should have 16 ALU and 128 literal register files and text managemant files and be used for the communication interface circuit that SIMD operates.

The image function of most of computation-intensive (be included in above-mentioned background technology partly list those) is characterised in that they must produce output image, and each pixel wherein is determined as the function of its space neighbor.This function is described to be applied in the quite short usually instruction sequence of each pixel.In this case, will have big repeatability by this locality instruction transmission network 706 to the instruction stream of the array transmission of PE, because be repeated for the common sequence of each pixel instruction.It is very effective using the SIMD instruction cache under this environment.

The linear array topology that is used for inner PE communication is very suitable for the view data from the detector means with serial output.But, needn't necessarily select the linear array topology.If advanced semiconductor fabrication allows detecting device and treating apparatus to be integrated in the chip, then in chip, more interface can be set, so that help the PE internal communication network topology of two dimension.

Continuation is referring to Fig. 7 and Fig. 8 (simultaneously also referring to Fig. 6), and the microprocessor 618 of interior dress is as Instructions Cache SIMD system for computer controller.Instructions Cache SIMD computing machine shown in it adopts the inner PE communication topology of linear array shown in Figure 3, though this linear array topology is not a kind of important selection.According to embodiments of the invention, though pixel data shift unit 702 has one-level for each pixel of sweep trace, " array of scan lines processor " part of Fig. 7 and Fig. 8 for each pixel in the sweep trace less than one-level, as shown in Figure 9.In other words, each PE handles the more than one pixel in the sweep trace, i.e. " delegation " pixel.

With Fig. 9 is example, and the pixel data shift unit is divided into the row (904a-904c) corresponding to (PE1, the PE2, and PE3) of corresponding PE, and the pixel data of every row is delivered to corresponding line buffer (906a-906c).Then, the pixel of each PE (PE1-PE P) operation corresponding line.According to some embodiment, parameter L (pixel count of each sweep trace of each PE) can dispose, can be according to the width programming of using the pixel column that is assigned to each PE thereby make.

The distribution of PE is supposed in an example shown in order to understand in the embodiment of Fig. 7 and Fig. 8 pixel data each detecting device sweep trace and 16 PE have 1024 pixels in Instructions Cache SIMD computing machine.In this case, each PE should be assigned to the image line of 8 block of pixels wide (64 pixels).Have in each pixel under the situation of 2 bytes, need each PE to have DRAM on the sheet of 128KB, so that can store million pixel images in this example.16 this picture frames of storage on chip for the monolithic mpeg encoded, may need to add up to RAM on the sheet of 32MB (256Mb).

Figure 11 shows another embodiment, wherein has less PE, is used to handle the sweep trace with pixel.The pixel data shift unit 1002 of the embodiment of Figure 11 has the level (opposite corresponding to each pixel with Fig. 7, Fig. 8) corresponding to each PE.Each level can keep a pixel.In most of the cases, scanning line width surpasses the quantity of the PE of this sweep trace, make each PE handle a plurality of pixels, wherein or by before receiving one other pixel, handling each pixel of receiving, perhaps by storage pixel (promptly can visit PE partly) partly before the pixel arrival of requirement.

Still referring to Figure 11, pixel data shift unit 1102 has input scan line ordering impact damper (SLOB) 1103 that it is considered in advance and to its additional output SLOB1104.Each SLOB1103,1104 has enough storeies, so that keep the pixel value of two sweep traces at least.Keep in the storer of SLOB1103 after first sweep trace, SLOB is just to its rearrangement in input, during resequencing, keeps second sweep trace in the storer of input SLOB1103.In one embodiment, the pixel of sweep trace is stored in the continuous memory location, and storer " read " in reverse order, makes all adjacent pixels all be provided for same PE.

For example, in one embodiment, if 4 PE will handle 16 picture element scan lines, then PE0 handles the pixel that is numbered 0-3, and PE1 handles the pixel that is numbered 4-7, and PE2 handles the pixel that is numbered 8-11, and PE3 handles the pixel that is numbered 12-15.But input SLOB1103 resequences to pixel, makes pixel data shift unit 1102 in order 0,4,8,12; 1,5,9,13; 2,6,10,14; 3,7,11,15 transmit pixel, and this is the order that pixel is provided to PE.Should be noted that " span " is consistent among 4 PE each.After pixel data is handled by PE, be output the SLOB rearrangement at the output terminal of pixel data shift unit 1102.

If the pixel count of aliquant each sweep trace of quantity of PE, then rearrangement is more complicated.In this case, the pixel of " additionally " can be assigned to one or several PE.In one embodiment, if N additional pixels arranged, then N additional pixels is assigned to each of top n PE.For example, if 4 PE will handle 18 picture element scan lines, then PE0 handles the pixel that is numbered 0-4, and PE1 handles the pixel that is numbered 5-9, and PE2 handles the pixel that is numbered 10-13, and PE3 handles the pixel that is numbered 14-17.But, pixel data shift unit 1102 in order 0,5,19,14; 1,6,11,15; 2,7,12,16; 3,7,13,17; 4,9 transmit pixel, and this is the order that pixel will be provided for PE.In this case, inconsistent for the span of each PE, because described span is 4 sometimes, be 5 sometimes.Latter two pixel that is shifted is received by two PE, and other PE does not receive pixel.

One object of the present invention is to maximally utilise computational resource and calculates.For this reason, generally require in available chip area, to be provided with as much as possible PE.Along with the minimizing of the geometric space of the increase of chip area and circuit, the diameter of retaining zone is significantly smaller than the straight-line dimension of chip.Therefore, the convergent-divergent of the VLSI in past is tending towards making the digital imagery chip that contains Instructions Cache SIMD computing machine to require a plurality of PE modules, therefore requires a plurality of local controller circuit.The difference of required single controller chip is to comprise the response arbiter with constituting the multi-controller chip operation.The response arbiter is connected the SIMD local controller of a plurality of Instructions Caches by the control/status port that links to each other with microprocessor bus, thereby can detect by the microprocessor of having ready conditions in the middle of the PE/unconditional interior dress.

Figure 10 has illustrated the device that contains based on the SIMD of a plurality of described Instructions Caches of digital imagery chip, is used to form and is applicable to senior video camera.Notice that the view data shift register is a series of shift registers by a core assembly sheet, and the function of monolithic chip scheme is assigned to each chip in the core assembly sheet.Inner PE communication network topology is a parameter, though preferred embodiment expands to the linear array topology of using in the preferred embodiment of monolithic chip.

Claims

1. digital camera device comprises:

Be used to produce detecting device corresponding to the view data of image;

The processor that is used for image data processing, described processor comprise a plurality of processor elements (PE) that connected by inner PE network, and described PE is configured to make and realizes parallel work-flow, is used to handle the image of acquisition; And

Be used to keep the storer of the image that obtains by processing.

2. digital camera device as claimed in claim 1, wherein:

Described processor is used to handle the N picture element scan line of image pixel data,

Described processor comprises M PE, wherein M＜N;

Described processor comprises pixel data buffers, and N pixel is provided for PE by described impact damper;

Described M PE parallel work-flow at least in part is used to handle N pixel from the sweep trace of pixel data buffers, and

Among M PE at least some are operated the more than one pixel of sweep trace;

The treatment element that is less than N so as to needs is handled N picture element scan line.

3. digital camera device as claimed in claim 2, wherein said processor also comprises:

Be used to export the local controller of the instruction of decoding; And

Local instruction transmission network, the instruction by the described decoding of described network is transferred to PE, so that carried out by described PE, thereby pixel is carried out parallel work-flow.

4. digital camera device as claimed in claim 3, wherein

Each PE of processor comprises Instruction Register, and

Described Instruction Register links to each other with local instruction transmission network, is used to receive the instruction of decoding.

5. digital camera device as claimed in claim 3, wherein

Pixel data buffers comprises the N level, and

More than one level at least some specific PE and the N level links to each other, thereby is arranged to the more than one pixel of received scanline, makes some specific PE operate the more than one pixel of sweep trace.

6. digital camera device as claimed in claim 2, wherein said processor also comprises:

The local buffer that links to each other with at least some PE and link to each other with pixel data buffers, the described local buffer that is used for specific PE are used for the temporary transient more than one pixel that keeps by the sweep trace of described PE operation.

7. digital camera device as claimed in claim 2, wherein

Described pixel data buffers comprises the M level;

Described processor also is included as the pre-prepd sweep trace ranking circuit of pixel data shift unit, and the pixel data of the input that is used to resequence makes the pixel data of input be provided for pixel data buffers according to being different from the order of pixel in sweep trace; And

Each grade of the M level of described pixel data buffers is arranged to when the pixel data of rearrangement moves by the pixel data shift unit pixel offered a PE.

8. digital camera device as claimed in claim 7, wherein:

Described sweep trace ranking circuit is the first sweep trace ranking circuit, and

Processor also comprises the second sweep trace ranking circuit, and it is attached to the pixel data shift unit of the pixel data handled of being used to resequence.

9. digital camera device as claimed in claim 8, wherein:

The pixel data that second sweep trace ranking circuit rearrangement is handled makes its original order corresponding to the input pixel data in the sweep trace.

10. Digital Image Processor that is used to handle the N picture element scan line of image pixel data, described processor comprises:

M treatment element (PE), wherein M＜N;

Pixel data buffers is provided for PE by described buffer pixel, wherein

At least some PE operate the more than one pixel of sweep trace;

11. Digital Image Processor as claimed in claim 10 also comprises:

Be used to export the local controller of the instruction of decoding; And

12. Digital Image Processor as claimed in claim 10, wherein

Each PE of processor comprises Instruction Register, and

13. Digital Image Processor as claimed in claim 10, wherein

Pixel data buffers comprises the N level, and

14. Digital Image Processor as claimed in claim 13, wherein said processor also comprises:

15. Digital Image Processor as claimed in claim 10, wherein

Described pixel data buffers comprises the M level;

16. Digital Image Processor as claimed in claim 14, wherein:

17. Digital Image Processor as claimed in claim 16, wherein: