IMAGE PROCESSING METHOD AND APPARATUS FOR
SYNTHESISING A REPRESENTATION FROM A PLURALITY OF
SYNCHRONISED MOVING IMAGE CAMERA
The present invention relates to an image processing method and apparatus. The invention relates particularly but not exclusively to methods and apparatus for three-
5 dimensional image processing.
Various computer-based image processing arrangements are known for use in production line inspection and security environments for example. 0
Thus DE-A-4,417,128 discloses a security monitoring arrangement for detecting movement in two dimensions. The described embodiment utilises only one camera, although there is a reference to the possibility of using more than one camera.
5 EP-A-820,039 discloses an automatic inspection arrangement in which the images from a single camera are fed selectively into two frame stores for subsequent processing.
0 JP 10267649 discloses an arrangement employing multiple analogue video cameras whose outputs are digitised and multiplexed. The images appear to be combined into one image prior to processing.
An object of the present invention is to provide a method and apparatus in which the 5 outputs of two or more (preferably three or more) cameras can be repeatedly captured in synchronism.
In one aspect the invention provides an image processing arrangement comprising video multiplexer means arranged to multiplex a plurality of video input signals and 0 to generate an output sync signal for synchronising retrieval of the respective video input signals, and image processing means arranged to process groups of two or more corresponding portions of the respective video input signals to produce a combined output signal. The portions are preferably frames but could for example be 5 fields or lines.
The arrangement enables the outputs of two or more (preferably three or more) video cameras to be fed via one port into a computer for subsequent processing, for example demultiplexing and stereoscopic processing. The arrangement can be scaled
up to accommodate the outputs of more video cameras without placing further demands on the computer hardware.
Preferably the video multiplexer means is arranged to generate and feed to the image processing means a video output comprising cycles of n synchronised frames, fields or lines of the respective video input signals wherein n is the number of video input signals.
Preferably the arrangement further comprises frame grabber means arranged to receive and store corresponding frames, fields or lines of the multiplexed video output of the video multiplexer means, the image processing means being arranged to read and process the stored frames, fields or lines.
In one embodiment the image processing means is arranged to generate a three- dimensional representation from two or more corresponding overlapping images.
In this and other embodiments the image processing means can optionally be arranged to process two or more overlapping images to generate a further image corresponding to a viewpoint different from the respective viewpoints of the overlapping images.
Suitable image processing algorithms for correlating image regions of such overlapping images to enable the' shape of the object common to the region of overlap to be regenerated are already known - eg Gruen's algorithm (see Gruen, A W "Adaptive least squares correlation: a powerful image matching technique"S Afr J of Photogrammetry, remote sensing and Cartography Vol 14 No 3 (1985) and Gruen, A W and Baltsavias, E P "High precision image matching for digital terrain model generation" Int Arch photogrammetry Vol 25 No 3 (1986) p254) and particularly the "region-growing" modification thereto which is described in Otto and Chau "Region-growing algorithm for matching terrain images" Image and Vision Computing Vol 7 No 2 May 1989 p83, all of which are incorporated herein by reference.
Preferably the video multiplexer means is arranged to generate a video output stream comprising cycles of n frames, fields or lines, wherein n is the number of cameras, the frame, field or line rate of the cameras and video mulitplexer means being
determined by the output sync signal.
In another aspect the invention provides method of processing a group of video signals wherein the signals are read in synchronism by means of a common sync signal and fed to an image processing means arranged to process groups of two or more corresponding frames or parts thereof of the respective video signals to produce a combined output signal.
Preferably cycles of n synchronised frames, fields or lines of the respective video input signals are fed to the image processing means, wherein n is the number of video input signals.
Preferably the video signals are acquired by respective cameras having overlapping fields of view.
Other preferred features of the invention are defined in the dependent claims.
A preferred embodiment of the invention is describedbelow by way of example only with reference to Figures 1 to 5 of the accompanying drawings, wherein:
Figure 1 is a block diagram showing an arrangement in accordance with the invention;
Figure 2 is a diagram of the signal waveforms and video frame structure generated in the arrangement of Figure 1;
Figure 3 is a diagrammatic sketch perspective view of a surveillance system incorporating the arrangement of Figure 1;
Figure 4 is a ray diagram conceptually illustrating the image processing performed in the computer of the arrangement of Figure 1, and
Figure 5 is a screenshot of an image manipulation program suitable for manipulating the 3D object representations generated by the computer of Figure 1.
Referring to Figure 1, the arrangement comprises up to six Pulnix TM 9701 or TMC
9700 progressive scan digital video cameras of which only three, namely cameras 1,
1A and IB are shown. Both monochrome and colour cameras can be mixed in the one installation. Camera 1 is representative of the others and its circuitry is shown in some detail. The analogue video output ports and timing input ports of the cameras are connected to a multiplexer module 2 which in turn has its video output port connected to a frame grabber FG of a computer 3 and has a control input port connected to an RS 232 output port of the computer.
Referring to camera 1, each camera comprises timing circuitry 70 (comprising conventional async generator, sync generator, phase-locked loop and timing generator blocks) arranged to receive an initialisation pulse VINIT which causes its electronic shutter (not shown) and the electronic shutters of all the other cameras to open so that all the cameras simultaneously acquire an image focussed on their CCD arrays 20 by their lenses 10. The VINIT pulse also causes the cameras to synchronise their vertical sync pulses VSYNC to VINIT.
All the cameras receive a common horizontal sync signal HSYNC from a sync generator block 110 of the multiplexer module 2 and continually output their stored frames at 30 frames/second as analogue video from a signal processing block 80. The latter receives the analogue output of a digital to analogue converter which in turn is connected to frame stores 50. These are controlled by a memory control block 40. and can be arranged to store either interlaced fields or complete frames, under the control of a switch input INTERLACE/NON-INTERLACE to memory control block 40. In the described embodiment complete frames are stored in frame stores 50. The signal to the frame stores 50 is derived from the digitised output of a CDS and automatic gain control block 60 which is connected to the output of CCD array 20. The timing generator of block 70 controls the CCD readout by means of timing signals sent to both CDS/AGC block 60 and a CCD driver block 30.
Multiplexer module 2 comprises a sync generator 110 which is providedwith a clock CLK and sends a common sync signal not only to the cameras but also to a multiplexer block 90. The overall operation is controlled by a programmed microcontroller 100 which receives control signals (for selecting and triggering the shutters of the cameras) over an RS 232 interface from an RS 232 port (COM 1) of a PC 3. Microcontroller 100 is connected to multiplexer block 90 for this purpose. The
microcontroller is preferably firmware controlled and has a port (not shown) for downloading suitable control software. Microcontroller 100 also has an external
TTL trigger input which provides an alternative (to the PC) camera shutter control.
Camera shutter speeds of from 1/125 second to 1/6000 second are supported.
Additionally the microcontroller 100 has bidirectional ports for interfacing with a) an auxiliary control interface (not shown) and b) other slave multiplexer modules (not shown) for controlling further cameras. The auxiliary control interface can for example be a hard-wired controller which can supplement or be substituted for the PC 3. The output port for connection to slave multiplexer modules carries timing signals for synchronising all the multiplexer modules and hence all the cameras irrespective of the multiplexer module to which they are connected. If a slave multiplexer module is used, its multiplexed video output signal is fed to one of the video inputs of multiplexer 90 of the master multiplexer module, which is therefore connected to up to five cameras in this mode.
In addition to the sync and trigger signals output from the multiplexer module(s) 2, these multiplexer modules are also each arranged to generate four projector control signals for switching on up to four pattern projectors P (Figure 3) as will be discussed subsequently.
The signals from the ports to the auxiliary control interface, slave multiplexer modules and pattern projectors are all controlled via the RS 232 interface from the PC 3 at a baud rate of 9600 bits/second. The following commands are provided for:
WRITE COMMANDS
RESET (to the default state in which no cameras are selected - the multiplexer responds with an OK or ERROR status indication)
NO CAMERAS SELECTED
TRIGGER (triggers electronic shutters of all selected cameras)
SELECT CAMERA n OF MULTIPLEXER MODULE m (n = 1 - 6, m = 1, 2, 3...)
READ COMMAND
RETURN AUXILIARY REGISTER STATUS (indicates status of auxiliary interface, if used).
A status indicator in the form of a seven segment LED (not shown) on the multiplexer module 2 indicates the status of the module (and optionally provides diagnostic information).
A typical physical arrangement of the cameras 1, 1A, IB and 1C in eg an observation or surveillance aystem is shown in Figure 3. A optical projector P switchedby multiplexer module 2 projects a pattern (eg a speckle pattern or other fractal pattern of visible or, preferably, infra-red radiation) onto the scene of interest and the cameras 1, 1A and IB which may be supported in elevated positions are focussed onto regions which overlap with each other and with the region illuminated by the pattern. Thus the fields of view of cameras 1A and IB overlap in region Ql, the fields of view of cameras 1 and 1A overlap in region Q2 and the fields of view of cameras 1A and 1C overlap in region Q3. One or more of the cameras (eg camera IB, as shown) may be have its attitude and/or position controllable remotely by eg a motor M in order to enable one or more of the fields of view and regions of overlap to be varied.
Referring again to Figure 1, the multiplexer module 2 operates in two modes as follows:
i) in SETUP mode the PC 3 selects one or more cameras eg 1 and 1A (which are then rapidly synchronised to the HSYNC signal of the multiplexer module 2) and repeatedly instructs the multiplexer module to send a TRIGGER signal to operate the electronic shutters of the selected cameras. This enables the cameras to be adjusted and focussed in real time.
ii) in CAPTURE mode the PC 3 instructs the multiplexer module 2 to send a single
TRIGGER signal to the electonic shutters of the selected camera(s) which consequently each acquire a single frame which is stored in the frame stores 50. The stored frame of each camera is read out repeatedly to the respective video input of multiplexer 90.
Having selected the desired regions of overlap and focussed the cameras on the scene of interest illuminated by the projected pattern in the SETUP mode, the arrangement is switched to CAPTURE mode and handles the following waveforms
as shown in Figure 2:
In plot i) the horizontal sync signal HSYNC is shown. This (and also the vertical sync pulses, not shown) is common to all the cameras.
In plot ii) the composite video output signal of one camera (CAMERA 1 VIDEO) is shown to the same timescale, lines LI, L2, L3, L4 LN being shown which make up one frame F. Plot iii) shows a typical composite video output signal for a second camera (CAMERA 2 VIDEO) and plot iv) shows a typical composite video output signal for a sixth camera (CAMERA 6 VIDEO). The composite video waveforms
(not shown) of up to three further cameras can be similarly synchronised by one multiplexer module 2 and yet further cameras can be synchronised if a master and slave arrangement of multiplexer modules is employed, as mentioned above.
Thenumber of lines per frame F can be in accordance with any of the normal video standards.
The corresponding frame sequences are shown (to a different timescale) in plots v), vi) and vii) and it will be seen that in an arrangement employing six cameras, CAMERA 1 reads out from its frame stores the same frame Fl i for six frames, CAMERA 2 reads out from its frame stores the same frame Fl2 for six frames, CAMERA 6 reads out from its frame stores the same frame Fl 6 for six frames and in general CAMERA N reads out from its frame stores the same frame FI N for six frames. The cameras then read out the next frame F2i, F22, F26 and in general F2N for the next six frames. The process continues with the reading out of further cycles of new frames, each cycle being of six frames or, more generally where K cameras are employed, K frames.
Plot viii) (to the same timescale as plots v) to vii)) shows the multiplexed output of the multiplexer module 2 which is transmitted to the PC 3. This sequence comprises successive cycles of six frames simultaneously acquired by the six cameras (or more generally, successive cycles of K frames simultaneously acquired by K cameras).
Referring again to Figure 1, the hardware and software of PC 3 which process the sequence of plot viii) will now be described briefly. The computer 3 is suitably equipped with a Pentium® microprocessor 120 and, as noted above, sends control
signals to the multiplexer module(s) 2 from its COM 1 port via an RS 232 interface.
A frame grabber FG receives the multiplexed video signals from the multiplexer module 2 (or the master multiplexer module if more than one multiplexer module is used) processes them in an ANALOGUE processing module and strips out the sync signals in a SYNC module before converting the images to digital form in an analogue to digital converter (A/D) which feeds the digitised images to video memory (RAM) whence they can be accessed and processed by the microprocessor 120. These digitised images are also reconverted to analogue form for display on a monitor 50. Suitable frame grabbers are commercially available.
The microprocessor 120 runs a Windows 95® operating system from hard disc 130 and is provided with conventional RAM and ROM. The PC 3 is provided with a conventional keyboard and a pointing device eg mouse 60. The hard disc 130 is loaded with software:
a) to display images acquired from the multiplexer module 2;
b) to correlate overlapping regions of the images derived from corresponding frames of the output of frame grabber FG, optionally with the assistance of a pattern projected onto the object surface;
c) to generate at least a partial 3-dimensional reconstruction of object surfaces in the region of overlap of the fields of view of respective cameras from the above correlations and (preferably) information on the viewpoints (ie positions and orientations) of the cameras, as illustrated in Figure 4;
d) to combine one or more such 3D reconstructions derived from the images acquired by different pairs of cameras into an overall 3D reconstruction;
e) to process the 3D reconstructions of c) or d) to generate 2D images of the object scene as viewed from a viewpoint different from that of the cameras, and
f) to generate higher resolution images eg by combining two or more 3D reconstructions prior to deriving a 2D image from the combination, whether from a viewpoint identical to that of one of the cameras or from another viewpoint.
The software to carry out function a) can be any suitable graphics program and the software to carry out function b) can be based on the algorithms disclosed in Hu et al "Matching Point Features with ordered Geometric, Rigidity and Disparity Constraints" IEEE Transactions on Pattern Analysis and Machine Intelligence Vol 16 No 10, 1994 ppl041-1049 (and references cited therein). One suitable algorithm is the Gruen algorithm, although we have found a number of improvements, as follows:
i) the additive radiometric shift employed in the algorithm can be dispensed with;
ii) if during successive iterations, a candidate matched point moves by more than a certain amount (eg 3 pixels) per iteration then it is not a valid matched point and should be rejected;
iii) during the growing of a matched region it is useful to check for sufficient contrast at at least three of the four sides of the region in order to ensure that there is sufficient data for a stable convergence - in order to facilitate this it is desirable to make the algorithm configurable to enable the parameters (eg required contrast) to be optimised for different environments, and
iv) in order to quantify the validity of the correspondences between respective patches of one image and points in the other image it has been found useful to re- derive the original grid point in the starting image by applying the algorithm to the matched point in the other image (ie reversing the stereo matching process) and measuring the distance between the original grid point and the new grid point found in the starting image from the reverse stereo matching. The smaller the distance the better the correspondence.
The software to carry out process d) is described in principle in our UK patent GB 2,292,605B and references therein and the software needed to carry out the processes e) and f) (to the extent that it is not commercially available) can be written by graphics programmers of normal skill in this field on the basis of the principles illustrated in Figure 4.
One or more projectors P generate a pattern (eg a speckle pattern) which provides
artificial texture on the scene viewed by the cameras and aids the stereo matching process. The pattern is preferably an infra-red pattern. Control signals for the projector(s) are received from the multiplexer 2.
Referring to Figure 4, a person X (eg a member of the public or a player in a sports match or an athlete) is assumed to be in the scene illuminated by the projector pattern in Figure 3 and within the area of overlap of the fields of view of two cameras 1 and 1 A and also within the further area of overlap of the fields of view of cameras 1A and IB. Three representative points PI, P2 and P3 on the surface of the person's face are assumed to be imaged by all three cameras through their perspective centres 01, 02 and 03 respectively onto respective conjugate points in their image planes..
It is assumed that the locations and orientations of the perspective centres 01 to 03 are known. The correlation software (eg based on the Gruen algorithm) in PC3 correlates the respective camera's simultaneous images (eg the pixels of the images acquired by cameras 1 and 1A corresponding to PI) and thereby enables a 3D representation of the face X to be reconstructed. This can be visualised as a projection of the correlated points from virtual projectors PR1, PR2 and PR3 in a virtual 3D space. If the virtual projectors have the same optical characteristics as the cameras and are located at the same points in the virtual space as the cameras in the real space, with the same oriientations, then the representation will be lifesize and undistorted. This process will be performed for all correlated pixels.
Since the region of overlap of the fields of view of cameras 1 and 1A will not be the same as the region of overlap of the fields of view of cameras 1A and IB, the resulting 3D representations will not be coincident and further software in the PC 3 is arranged to fit these together, optionally under the control of the user. The resulting overall 3D representation can then be viewed from different angles in virtual space by eg the software program COSMO PLAYER, a web browser plug-in produced by Cosmo Software Inc, of California USA. This enables a person or other subject moving in the region viewed by the cameras to be viewed from another viewpoint, eg that of the virtual viewer V shown in Figure 4. In this manner an image corresponding to eg an intermediate viewpoint can be generated from the camera images. This feature is useful not only in surveillance systems (in which a front view or profile of a suspicious individual may be generated) but also in sports
events where a view of the game from a different viewpoint may be required.
Figure 5 shows a screen shot of the user interface generated by a program for manipulating the overall 3D representations generated by the process of Figure 4. Buttons BN are provided which can be clicked on by the mouse under the control of the user and, when thus selected, enable the user to drag portions of the displayed representation so as to zoom, rotate and panthe view of the object, as well as tocome closer to and move away from the object ("nearer" and "further" buttons respectively). As described this far, the interface is similar to the publicly available interface of the COSMO PLAYER web browser plug-in . However, in accordance with a feature of the present invention, "wheels" Wl and W2 are provided which are rotatable by the mouse and enable the user to adjust the separation between the virtual projectors and to vary the distance to the object respectively. The latter control is effectively a perspective control. Optionally, further buttons or other means (not shown) may be provided to enable distortions to be applied in a graphical fashion.