GB2408875A

GB2408875A - Image mis-alignment determination using image edge detection

Info

Publication number: GB2408875A
Application number: GB0328305A
Authority: GB
Inventors: Terence Edwin Dodgson
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2003-12-05
Filing date: 2003-12-05
Publication date: 2005-06-08
Anticipated expiration: 2023-12-05
Also published as: KR101075619B1; KR20050054833A; GB2408875B; GB0328305D0

Abstract

A mobile terminal for use in a cellular telecommunications network, the terminal comprising a video camera for receiving video images and indication means, the terminal being adapted to i) providing a first image for reference; ii) providing a second image for determining whether the image is aligned; ii) processing said first and second image, whereby the processing includes performing edge detection; iv) correlating the processed first and second image; and v) determine whether the image is misaligned. If mis-alignment is present, an indication of such can be made to the user in graphical, audio or tactical form along with an indication of how the mis-alignment may be corrected. A second embodiment independently claimed identifies mis-alignment by dividing the edge detected images into sub-images and identifying the strongest edge in each sub-image.

Description

Mobile Communications This invention relates to mobile communications

terminals and methods of performing mobile communications. More particularly, but not exclusively, the method relates to a terminal and method for aligning a video image in a video call.

Background Art

Video capability is one of the advantages that mobile phones of the third generation (3G) can offer. Some terminals of the third generation are designed to allow a full two-way video phone call.

Teleconferencing systems often provide a so-called picture-in-picture function feature that allows monitoring of the pictures or video images captured by the camera together with the images transmitted via the network.

The monitoring image is usually provided as an inset in one corner of the display, simultaneously with displaying the images transmitted via the network. In this way the user can view the transmitted images in a larger format, whereas the monitoring image is small compared to the main display.

In an earlier application (UK patent application GB 0322513.3 filed 26 September 2003, agent's reference J45670GB) of the present applicant a terminal is described which analyses video data captured by the camera to determine whether the image is misaligned and, if the terminal detects that the image is misaligned, indicates to the user the misalignment of the image. This earlier application is hereby incorporated by reference.

The present application relates to alternative methods of determining whether the image is misaligned.

According to one aspect of the present invention, there is provided a mobile terminal for use in a cellular telecommunications network, the terminal comprising a video camera for receiving video images and indication means, the terminal being adapted to: i) provide a first image for reference; ii) provide a second image for determining whether the image is aligned; iii) process said first and second image, whereby the processing includes performing edge detection; iv) correlate the processed first and second image; and v) determine whether the image is misaligned.

In this way the terminal can compare an image captured by the camera to a reference image, determine correlations between the images and determine whether the captured image is properly aligned.

Using edge detection before correlating the image is an efficient and effective processing step which enhances the correlation process performance significantly.

According to another aspect of the present application, there is provided a mobile terminal for use in a cellular telecommunications network, the terminal comprising a video camera for receiving video images and means for determining whether an image received by the camera is aligned or misaligned, the terminal being adapted to: i) process the image by performing edge detection; ii) divide the image into one or more subimages; and iii) determine the strongest edge in each sub-image.

In this way an accurate determination of the matching process through correlations or multiple correlations is possible, as the applied technique is less variant to environmental changes compared to many other techniques.

Also, the applied technique makes the process more reliable, particularly in poor lighting condition, as the most salient or strongest features are selected for determining whether an image received by the camera is aligned or misaligned.

According to another aspect of the present application, there is provided a method of operating a mobile terminal including a camera, the method comprising the steps of: i) providing a first image for reference; ii) providing a second image for determining whether the image is aligned; iii) processing said first and second image, whereby the processing includes performing edge detection; iv) correlating the processed first and second image; and v) determining whether the image is misaligned.

According to another aspect of the present application, there is provided a method of detecting salient features of an image, the method comprising the step of: i) processing the image by performing edge detection; ii) dividing the image into one or more sub-images; and iii) determining the strongest edge in each sub-image.

The invention will now be described, by way of example only, with reference to the accompanying drawings in which Figure 1 is a schematic front view of a videophone in which the present invention can be implemented; Figure 2 is a schematic block diagram of the functional elements of a videophone in accordance with the invention, Figure 3 is a flow chart diagram illustrating the process of aligning images according to one embodiment of the present invention; Figures 4A and 4B are schematic diagrams illustrating a convolution process according to one embodiment of the present invention.

Figure 5 is a flow chart diagram illustrating a quadrant based salient feature correlation technique according to another embodiment of the present invention; Figures 6A, 6B and 6C are schematic views of the technique described in relation to Figure 5; Figure 6D is a schematic view of a terminal display according to one embodiment of the present invention; and Figure 7 is a schematic illustration of determining a salient feature mask according to one embodiment of the present invention.

Figure 1 is a schematic illustration of a mobile communication terminal 10. The terminal 10 includes a display 26, a camera 24, a microphone 16, speakers 18, a keypad 21 and navigation keys 23 Figure 2 is a schematic block diagram of the main functional elements which may be included commonly to the different embodiments of the invention, which elements are each individually known and will not be described in detail herein. A main processor 36 may be a conventional programmable microprocessor (for example an Intel 80386, 80486, etc.), or a special purpose or specially configured unit (e.g. a digital signal processor) could alternatively be used. A read-only memory (ROM) 38 is connected to the processor 36 for the storage of control programs, data and images. The ROM 38 can be implemented by any appropriate technology, for example, by a flash PROM. A random-access memory (RAM) 40 is connected to the processor 36 via a bus 42, and is used as a working storage and for the storage of data and images captured using a CCD video camera 24.

Signals relating to the data captured by the camera are passed via a camera interface 44 to the processor 36 to be processed. The camera interface 44 also provides the video codec 46 with a digital representation of the captured data from the camera 24, where it can be suitably processed for display and/or transmission to the mobile communications system.

The camera interface 44 carries out all the necessary signal conditioning as required on receiving images from the camera 24. Signal conditioning will depend on the exact configuration of the camera but preferably comprises signal conditioning to enable accurate analogue to digital conversion with sufficient buffering of the captured data. The camera 24 will include all the necessary support circuitry to produce a fully functional camera delivering a fully formatted video signal. The camera 24 may also include circuitry to regulate the voltage for power supply control and a suitable output buffer to directly drive a standard VDU should the videophone be connected to an external device.

A display interface 52 connects the display 26 via the bus 42 to the processor 36. The display interface 52 responds to instructions from the processor 36 to drive the built-in display 26 in a conventional manner.

The display 26 is provided with a touch-screen 56. A touch-screen interface couples the touch-sensitive display 26 to the processor 36 via the bus 42. The touch-screen is a mechanism independent of the video display 26, for example, a transparent touch-screen membrane which is placed over the display 26 and connected appropriately.

The processor 36 can be arranged to transmit to the display 26 a menu of user selectable items, and to be responsive to a location at which the screen is touched for input of the user selection of an item. The touchsensitive screen can then thus be used as a dynamic and reconfigurable user interface.

Touch-screen entry can be used in place of or in addition to the entry's commands from an external keyboard or voice command if appropriate.

Additionally, the touch-screen area can be configured as a general purpose area to allow entry of data and written commands.

An audio interface 56 connects the audio receiver means, consisting of a microphone 18 and audio transmitter means such as an ear-piece and/or speaker 16 to the processor 36 and carries out all the necessary signal conditioning as required to output audio signals and to receive audio signals.

A radio-frequency (RF) interface 62 is also connected via the bus 42 to convert any data to be transmitted into signals for driving an RF transmitter 64, and converts signals from an RF receiver 66 into data to be passed via the bus to the relevant interfaces. The RF transmitter 64 and the RF receiver 66 are connected to a radio antenna 28. This RF interface 62 consequently enables wireless communications between the videophone and the mobile communications system.

The processor 36 is programmed by means of control programs and data stored in the ROM 38 and in use, the RAM 40, to receive signals from the camera 24 via camera interface 44, to interpret those signals and to derive data therefrom which are displayed on display 26 and which can be stored in the RAM 40 or any other suitable memory device.

Depending on the refresh rate used and the number of pixels used in the images, video image data transmitted and received by the videophone may require compression for transfer via a low data rate radio channel, such as those currently available in known cellular radio networks. The video data may be compressed using the MPEG-4 standard. Alternatively, the video images captured may be compressed into a different format suitable for transmitting the data derived across the mobile communications system, such as that disclosed in WO95/20296.

In the following an embodiment of the present invention will be described with reference to Figure 3.

In step 101 the terminal is provided with a so-called "ideal image" for teleconferencing, the ideal images is usually an image of the user's head. In order to provide the ideal image, the user may take a picture of himself with the camera of the terminal, whereby the user can carefully choose the position and/or size of the user's head.

Alternatively, the user may choose the head together with the upper part of the body as an ideal image.

In step 103, the ideal image is processed to extract prominent features of the ideal image. This is achieved by using an edge detection process.

In this way prominent features of the ideal image can be stored in a convenient and space saving way. Methods of edge detection are described in more detail below.

The process then continues in step 105 by storing the processed ideal image for reference.

Steps 101 to 105 are performed before the user starts with a teleconferencing call.

Steps 107 to 117 are performed during an ongoing video or teleconferencing call.

When the camera is activated the instantaneous image of the user, as seen through the lens of the camera is captured using a frame store device.

These instantaneous images of the user may vary in terms of quality depending on lighting conditions, even if correction systems like Automatic Gain and/or Offset Control systems are used. It is for this reason that edge images are used in the method described, since edges can be regarded as less variant then grey level pixel values themselves.

In step 107, the terminal receives the current image taken by the camera. in step 109, the instantaneous or current image held in the frame store is processed basically in the same way as the ideal image in step 103. The prominent features of the current image are extracted using an edge detection process.

In step 111, the terminal processor receives both the processed "ideal" images and the processed current image, and a correlation process is performed to compare the two images. A correlation process suitable for the present invention is described in more detail below. In step 113 the processor determines whether the current images show the desired cut-out as defined by the ideal image.

If this is the case, no correction is required in the positioning of the terminal's camera and the processor continues in step 107 with receiving the next "current" image taken by the camera.

If the current image does not show the desired cut-out in step 113, the terminal's processor determines in which direction the camera or the terminal needs to be moved by the terminal's user in order to improve the current image and provide the other party of the video call with the desired image (step 115).

In step 117, the terminal then indicates the required movement to the user. Subsequently, the terminal is ready to receive and process the next current image in step 107 as long as the video call is in progress.

It is understood that not every single video image needs to be processed in order to perform the process as described above with reference to Figure 3. It might be enough to process one "current" image within a predetermined time period, for example once a second.

In the following an edge detection process suitable for implementing the present invention is described. The described edge detection process is used for edge detection of the ideal image in step 103 and also for the edge detection process of the received current image in step 107.

Subsequently the correlation process of step 111 and the determination of the required movement to align the camera (step 115) will be desired.

Edge Detection In the following a method of detecting edges using a set of eight, relatively small edge detection masks is described. Such an implementation is suitable for use in a mobile terminal. Below, a set of eight 3 by 3 pixel masks is shown as an example.

- 1 0 -1- -O 1 1- --1 -1 -1 1 0-1 -1 0 1 0 0 0 1 0 -1 -1 -1 0 1 1 1 - 1 1 O- --1 0 1- - O -1 -1 1 0 -1 -1 0 1 1 0 -1 O -1 -1 -1 0 1 1 1 0 - 1 1 1 --1 -1 O O O O -1 0 1 -1 -1 -1 0 1 1 These masks are known in the art and correspond to the eight basic directions, denoted as east, south-east, south, south-west, west, north-west, north and north-east in the following.

The masks are convolved with the captured image. According to a first embodiment, this is done through transforming the image into the Fourier domain and then multiplying the Fourier domain representation of the image with the Fourier domain representation of each of the masks. This is achieved by embedding the mask under consideration into a larger image containing zero entries where no mask values are present. The mask values are input into the top right hand corner of the larger image. Through this process of embedding the mask values, the two resulting images have the same dimensions, so that when transforming them to the Fourier domain the Fourier representations can be simply multiplied on a pixel-by-pixel basis. It is noted that each pixel in the Fourier domain will be represented by a complex value, i.e. having a real and imaginary part. After each mask has been multiplied in this pixel-by-pixel manner, with the image representation in the Fourier domain, there will be eight such intermediate results in the Fourier domain.

An inverse Fourier transformation process is now performed to obtain eight images in the spatial domain. The eight images can then be summed and normalised to give the final edge-image of the captured images.

This final image can be stored directly in its present form.

Alternatively, and particularly if memory space is limited, the edge image can be processed to save storage memory. The values of each pixel can for example be compared with a predetermined threshold and the image can be transformed into a binary image in this way.

An alternative way of obtaining the perfect edge-image is to work in the spatial domain itself. This might be particularly useful when the edge detection masks are relatively small, for example if they are no bigger than 8 pixels by 8 pixels. Each mask is again convolved with the original image in the spatial domain. However, in the spatial domain the mask is moved to the left across the whole image in a raster-scan type movement. The convolution process is illustrated in Figures 4A and 4B for a 3 by 3 pixel mask. Figure 4A illustrates the original image 7O, and the mask 72 marked by hatching. Figure 4B illustrates the original image 70 and the output image 80, which is marked by hatching. The process starts by placing the mask 72 in the top right hand corner of the original image 70, and performing appropriate processing to result in one edge value which is placed into output image 80. The mask 72 is then moved across the original image by one pixel, the same processing procedure is again performed and again one output value is obtained and written to the output image, adjacent to the previous value. The movement of the mask 72 over the image 70 is illustrated by arrows 74. The processing is repeated until the left boundary of the original image is reached. The mask is then dropped one pixel down and sent back to the opposite image boundary where processing is again repeated. In this way the mask covers the whole of the original image in a raster-scan type procedure.

The processing that is performed at each mask location includes multiplying for each pixel the mask values with the image pixel values in the position the mask has been placed. For example, for a 3 by 3 pixel mask sizes this includes nine such multiplications per position of the mask. These values resulting from the multiplications are summed together and this is the one output value that is written to the output image 80 for the particular position of the mask. It is noted that the output image will be slightly smaller than the original image, since the mask is not allowed to overlap the original image boundaries. As can be seen from Figure 4B, for a 3 by 3 edge detection mask the output edge image boundary 82 will be a one pixel border. For a 5 by 5 edge detection mask the image boundary will be a two pixel boundary. This must be taken into account during the next stage of image processing.

Again, using the process in the spatial domain results in eight output images corresponding to the eight edge detection masks. These images are then summed and normalised to give the final edge-image of the captured image.

Also, the image processed in this way in the spatial domain can be stored either directly or be transformed into a binary image as described above.

Correlation Process As described above with reference to Figure 3, in step 111 the perfect edge-image is then correlated with the instantaneous edge-image of the user.

Multiple methods of correlation, such as subtractive, multiplicative and various hybrid techniques known in the art are suitable in this process. In the following, a new technique is described, referred to as quadrant based salient feature correlation. This technique is particularly suitable, as it is less variant to environment changes, leading to a more accurate correlation output.

The method involves four, independent correlations of the stored perfect edge-image and the received instantaneous edge-image in such a way as to focus on the centre of the stored image.

Referring now to the flowchart diagram of Figure 5, the method is described in more detail. The centre of the stored edge image is taken as the start "aimpoint" (step 201). A schematic illustration of frame 300 of any image and its centre 320 is shown in Figure 6A.

In step 203, the stored edge image is then divided into four quadrants.

The quadrants have uncertainty boundaries that ensure that correlation is not carried out across, or very close to, image boundaries. After performing edge detection (step 204) as described above, it is determined in step 205 for each quadrant which of the edges is the strongest edge. This can be achieved for example by finding the highest edge value in each quadrant, or by finding the highest value from local groups of pixels. The area including the highest values is taken to represent an area in the image which has the strongest features. The assumption is that these are the more salient, i.e. the more reliable features for each quadrant. In step 207, these salient features are mapped into so-called feature masks. For each quadrant, one feature mask is extracted.

Figure 7 illustrates the process of identifying an area including the strongest feature and mapping of the feature into the feature mask. Figure 7 shows a portion 400 of the edge image as determined in the edge detection process. The image includes edge values 410, 412, 414 and 416 representing edges found in the original image in certain positions in the edge image. The edge values found are now grouped together into different groups 410, 412, 414 and 416, each representing a single identified edge of the original image.

In the process of salient feature recognition, the group 412 including the highest edge values, is identified as the strongest or salient feature 420. The area 430 including the salient feature 420 is now extracted as the salient feature mask 440.

Referring again to Figure 5, in step 209, the geometric relationship of each feature mask with the centre of the image, i.e. the aimpoint, is determined and stored in memory.

This is achieved by first determining the position of the centre of the mask, and subsequently determining the angle and intersections of straight lines drawn between each mask centre and the aimpoint of the frame.

In step 211, the next frame, i.e. the instantaneous image, is received and edge detection is performed (step 212). The processed edge image is then subject to similar division into quadrants (step 213). However, once this has been done, it is the edge feature masks of the previous image which have been stored in step 207 that are correlated with the received edge image quadrants (step 215). The position in which the masks match best to the edges is determined. Once these best match positions are found the geometric relationship of these mask positions with the centre of the total image itself is determined and compared to that stored in step 209. The differences between the geometric relationship as determined in step 215 to those determined in step 209 can now be used to indicate which way the phone should be moved to correctly align the camera (step 217).

In addition to determining the best matches and geometric relationships, the process continues in step 205 with determining the strongest edge for each quadrant as described above. In step 207 the strongest edges are again mapped onto edge feature masks corresponding to the image received in step 211, and the new masks are stored for future use.

In this way reference is not always made to the original image, but rather to the ongoing, frame-by-frame received, images.

Figure 6B illustrates the process of finding the new aimpoint after the previously determined masks have been correlated with the next image.

In a first step the best match is determined between the previously determined masks 301 to 304 and the edges of the image currently processed.

Then the geometric relationship between the new positions of the masks is determined.

This can for example be done by drawing a line (31 1 to 314) from each newly placed mask having the same angle as determined from the geometric relationship of the original placement of the masks to the image centre.

In the next step the intersection 321 of these four lines 311 to 314 is determined. If the four lines do not intersect at a single point, an intersection point 321 is approximated, taking into account all intersection points between each pair of lines.

Figure 6C illustrates the frame 300, the centre 320 of the frame, and the new aimpoint 321 determined as described above with reference to Figure 6B. The direction in which the camera needs to be moved in order to align the image can then be determined by connecting new aimpoint 321 with image centre 320. The terminal then indicates the direction in which the camera needs to be moved by displaying arrow 330 to the user.

Alternatively, the direction may be indicated to the user by other means. The terminal's display 26 may, for example, include an array of arrows 340 such as that illustrated in Figure 6D. In order to indicate a certain direction, the terminal then highlights the direction or a combination of directions. For example, in order to indicate the direction "south-east", (as shown in Figure 6C by arrow 330) the terminal may highlight both the arrows pointing downward (south) and to the right (east).

It is noted that it is assumed in the foregoing that the camera or the terminal will not be rotated to a significant amount. However, it is appreciated that the process can be modified to take into account rotational movements of the camera. In this case the correlation procedure is modified, it may for example be performed in polar coordinates, or rotational processes may be introduced in any other suitable form.

Although graphical indications have been described in the embodiments above, it is appreciated that the indications for indicating the required movement of the camera to align the image may be provided in other ways, for example by audio or tactile indications. The terminal may for example vibrate if the image captured by the camera is misaligned, or the terminal may indicate a misaligned image by one or more predetermined tones, melodies or voice generated by the terminal.

Although the present invention has been described by way of example only and with reference to the possible embodiments thereof, it is to be appreciated that improvements and/or modifications may be made thereto without departing from the scope of the invention as set out in the appended claims.

Claims

1. A mobile terminal for use in a cellular telecommunications network, the terminal comprising a video camera for receiving video images and indication means, the terminal being adapted to: i) provide a first image for reference; ii) provide a second image for determining whether the image is aligned; iii) process said first and second image, whereby the processing includes performing edge detection; iv) correlate the processed first and second image; and v) determine whether the image is misaligned.

2. A terminal according to claim 1, wherein the terminal is further adapted to vi) indicate, if the terminal detects in step v) that the image is misaligned, to the user the misalignment of the image via the indication means.

3. A terminal according to claim 1 or 2, wherein in step vi) the terminal is adapted to indicate to the user how alignment of the image can be achieved.

4. A terminal according to claim 3, wherein the terminal is adapted to indicate the direction of movement of the camera required to align the image. s

5. A terminal according to any of claims 1 to 4, wherein the terminal is adapted to indicate whether the image is aligned.

6. A terminal according to any of claims 1 to 5, wherein the 10terminal is adapted to indicate misalignment with graphical audio or tactile indications.

7. A terminal according to any of claims 1 to 5, wherein the first image is an image provided for reference only.

8. A terminal according to any of claims 1 to 5, wherein the first image is an earlier frame obtained in a video sequence containing also the second image.

209. A terminal according to any of claims l to 8, wherein in step iii) the terminal is further adapted to divide the first and/or second image into one or more sub-images and to find the strongest edge in each subimage.

10. A terminal according to claim 9, wherein the terminal is further adapted to map the strongest edge in each sub-image into a mask.

11. A terminal according to claim lo, wherein the terminal is adapted to store the masks for future use.

12. A terminal according to claim 11, wherein the terminal is adapted to correlate the mask or masks obtained from one image with another image in order to determine whether the image is misaligned.

13. A mobile terminal for use in a cellular telecommunications network, the terminal comprising a video camera for receiving video images and means for determining whether an image received by the camera is aligned or misaligned, the terminal being adapted to: i) process the image by performing edge detection; ii) divide the image into one or more subimages; and iii) determine the strongest edge in each sub-image.

14. A terminal according to claim 13, wherein in step ii) the terminal is further adapted to map the strongest edge in each sub-image into a mask.

IS. A terminal according to claim 13 or 14, wherein the terminal is further adapted to store the masks for future use.

16. A terminal according to claims 13, 14 or IS, wherein the terminal is adapted to correlate the masks obtained from one image in order to determine whether an image is aligned or misaligned.

17. A method of operating a mobile terminal including a camera, the method comprising the steps of: i) providing a first image for reference; ii) providing a second image for determining whether the image is aligned; iii) processing said first and second image, whereby the processing includes performing edge detection; iv) correlating the processed first and second image; and v) determining whether the image is misaligned.

18. The method according to claim 17, further including step vi) of determining whether a correction to the alignment of the second image is required.

19. The method according to claim 18, further comprising the step of indicating to the users the required movement of the terminal to align the second image.

20. The method according to claim 19, further including the step of determining how the camera and/or terminal needs to be moved in order to align the camera.

21. A method of detecting salient features of an image, the method comprising the step of: i) processing the image by performing edge detection; ii) dividing the image into one or more sub-images; and iii) determining the strongest edge in each sub-image.

22. A program running on a processor, the program adapted to perform the method of claim 24.

23. A program adapted to perform the method of claim 21 when running on a processor of a computer or terminal or the like.