WO2003001467A1

WO2003001467A1 - Method and device for monitoring movement

Info

Publication number: WO2003001467A1
Application number: PCT/SE2002/001264
Authority: WO
Inventors: Henrik Benckert; Mats Elfving; Per Nilsson-Stig; Eva SJÖ
Original assignee: Wespot Ab
Priority date: 2001-06-25
Filing date: 2002-06-25
Publication date: 2003-01-03

Abstract

A method, a computer program product and a system for image processing for detection of movement in a monitored area comprising the steps of recording (100) an image of the monitored area, calculating (110) a difference image between the recorded image and a previously recorded image, and determining (140) whether movement has been detected by identifying (130) an object appearing in the difference image. In addition, the difference image is filtered (120) with a filter that has the property of strengthening the edges of the object in one direction. Use of the method for controlling an automatic door-opener.

Description

TITLE OF INVENTION:

METHOD AND DEVICE FOR MONITORING MOVEMENT

10 FIELD OF INVENTION

The present invention relates to a method in image processing for detection of movement in a monitored area, comprising the steps of recording an image of the monitored area, calculating a difference image between the recorded image

15 and a previously recorded image, and determining whether a movement has been detected by identifying an object appearing in the difference image.

BACKGROUND ART

20. Digital monitoring is used today in many different applications. One example of such an application is monitoring of an area to give an alarm in the event of an intrusion.

Monitoring aims to detect various types of movement within the monitored area. Analysis of difference images is a well-

25 established method for automatic detection of movement in digital monitoring.

A common difference method is near time difference (NTD) , that is a difference is calculated, pixel by pixel, between the current digital image and a previous image near in time.

30 Another common method is to calculate the difference between the current image and a fixed reference image, where the reference image comprises the scene without an object, such as a person. If there has been no change between the two images, then the difference will ideally be zero. However, in

35 digital images there is always random noise and this appears as noise also in the difference image. In order to distinguish the noise from actual movement, the next step is usually to carry out a threshold operation on the difference image. A threshold operation results in that pixels with an absolute value exceeding a threshold value being said to be a detection. The threshold value may be chosen as a value slightly above the normal noise level, so that a large number of false detections due to noise are eliminated.

Under good lighting conditions and if the movement that is to be detected is sufficiently clear, a traditional difference image method is usually sufficient. However, if the scene is dark, which is often the case in monitoring situations, and with a person wearing dark clothing moving slowly and perhaps at a great distance, the movement can easily be confused with noise and will not be detected using a threshold operation. One example is when the contrast between the person and the background is less than the threshold value. In such situations, the person may never be detected. This means that traditional near time difference cannot be used in situations with poor lighting and/or high noise levels.

Another way of detecting moving objects in a sequence of images is to use so-called optical flow, see for example B . K. P. Horn, "Robot Vision " , The MIT Press , Cambridge, MA, 1986. This is closely related to near time difference, as optical flow is defined by the derivatives in the vertical, horizontal and time dimensions of the intensities in the sequence of images . The derivatives in the time dimension must be approximated by differences, which in its simplest form resambles near time difference. The traditional use of optical flow in computer vision is to use these derivatives, that define the optical flow, to estimate the field of movement in the image. The field of movement defines how various 3D points move in the planes of the image. This algorithm is based on the assumption that the scene is stationary and that the camera is moving, or vice versa .

If optical flow is used in the form of derivatives in space and time dimensions, this is equivalent to deriving a near time difference, for which reason the same disadvantages remain. If the traditional algorithm is used for estimating the field of movement from optical flow, which is incorrect, but evens out the information from several pixels in the image and gives a result that is less sensitive to noise, there are other disadvantages. Firstly, the algorithm is complex and processor- consuming, which means that it may not be possible to use it in real-time applications, which are very important in a monitoring system. Secondly, the algorithm is based on incorrect assumptions (stationary scene) , which may render the results difficult to interpret.

Another method for detecting movement in sequences of images is so-called Normalized Vector Distance, NVD, which is based on comparison with a reference image, see Matsuyama et al . (2000) Background subtraction for non-stationary scenes, In Proceedings of the 4 th Asian conference on computer vision . Instead of comparing the current image directly with the reference image, the images are divided into blocks, for example 8x8 pixels in size. The intensities in these pixels define a vector, and several vectors are compared by analysing the vector distances after normalization. This is essentially equivalent to considering the angle between the vectors. The method means that proportionally linear changes in intensity in the image will not affect the detection.

A problem with using NVD is that the detection of movement may be insensitive. This is because the method may even out the information too much by processing the image in blocks. If, for example, two events take place in a block, these can counteract each other. As the method assembles everything in the block, the information in the block is not so detailed. In addition, the insensitivity in the detection of movement is due to a recorded image being compared with a reference image and not with a previously recorded image. The edges of objects that move in the image show up more clearly in near time difference than in a comparison with a reference image. A further disadvantage is that the reference image must be updated when the scene is changed permanently. This occurs, for example, if the position of a flowerpot is changed. If the reference image is not updated, a detection of movement will be constantly obtained, in spite of the scene being stationary. SUMMARY OF INVENTION

An object of the present invention is to provide a method for detection of movement with high sensitivity and reliability of detection.

According to a first aspect of the invention, this relates to a method in image processing for detection of movement in a monitored area, comprising the steps of recording an image of the monitored area, calculating a difference image between the recorded image and a previously recorded image, deciding whether movement has been detected by identifying an object appearing in the difference image, and filtering the difference image with a filter that has the property of strengthening the edges of the object in one direction.

A sensor can be arranged to record continually images of a monitored area. The time between each recorded image can depend upon how much processing power is available and what types of movement occur in the monitored area. Near time difference images are used in the invention, that is the recorded image is compared with an image recorded close to it in time. In these images a movement is seen principally as a pair of "movement outlines", in the same direction, one at the front edge and one at the rear edge of the movement. If the person moves towards the camera, the increase in size appears as a horizontal "movement" in different directions on both sides of the person, as the effects of perspective mean that the person becomes larger in the image. This has the advantage that the same algorithm can also be used to detect movements in a direction towards or away from the sensor. The method according to the invention makes it possible to set the required direction, that is the vertical direction or horizontal direction, in which edges are to be strengthened. The filtering results in that variations in intensity is strengthened and becomes clearer in one direction. In this way, moving edges are detected more clearly. By edges is meant edges around a moving object, even if, for example, an edge of a desk can also be seen if a movement takes place across an edge of the desk. The edges around a moving person can, in certain parts, show up as thick lines. The thickness of the lines correspond to the distance the detected object has moved since the previously recorded image. Other sharp lines in the detected object also appear more clearly. An advantage of this information being amplified is that the danger of missing important information in the recorded image is reduced. Another advantage is that the subsequent image processing is made easier, because of the fact that the interesting information being amplified. This means that decisions concerning whether there is an alarm situation or not are more certain and the danger of false alarms is reduced. In intruder monitoring, for example, this is important as each false alarm is very costly. In certain types of monitoring, it is also important that no object escapes detection. Another advantage is that the image processing requires less complex algorithms for the subsequent processing, and that it is easier to determine what a detected object represents.

A third advantage of this method is that it offers a more certain detection of slow movements and movements in dark scenes, in which there is little contrast between the moving object/person and the background. Special emphasis is placed on detecting moving people, by utilizing a person's typical shape and pattern of movements .

A further advantage of the method is that it is quick and does not require a lot of memory. This has the advantage that a program for implementing the method in a processor can be executed in real time in, for example, embedded systems.

It is possible to investigate whether the value of the filtered difference image is equal to or less than a predetermined value, and, if such is the case, to determine that no movement has been detected. This means that calculations are not carried out unnecessarily. If there is no movement to detect, there is no point in continuing the analysis . It is an advantage to filter the image before the investigation is carried out concerning whether the image processing is to continue, as otherwise there is a risk of missing important information. The method can be used for monitoring for intruders, but the monitoring can also have other purposes. For example, it can be a case of monitoring an area near a door that is opened and closed automatically based on the image analysis. The filter may be a high pass filter that lets through high frequencies that are perpendicular to said direction.

The high pass filter can also be expressed as a derivative filter. The high pass filter has the advantage that it strengthens the edges of an object in one direction which can be selected. The direction can be in the vertical direction or in the horizontal direction. The filter operates perpendicular to the edge-strengthening direction, that is the high pass filter lets through high frequencies in a direction perpendicular to the edge-strengthening direction. By high frequencies is meant in this connection how quickly something varies .

Alternatively, or in combination, the filter may be a low pass filter that lets through low frequencies in said direction. An advantage of this is that there is a noise-reducing effect. This noise-reducing effect also affects the edges of the object, that are evened out. As the filter lets through low frequencies in said direction, the detection sensitivity also increases . In the first and the second embodiment of the method, said direction can be vertical.

An advantage of strengthening vertical movement outlines is that this improves the sensitivity of the detection of moving persons. The facts are used that a person is an elongated vertical shape and that the movement appears essentially to be horizontal, which give typical movement outlines .

The method may further comprise a threshold operation on the filtered difference image. The threshold operation makes possible an indication of the strength of any detection and where in the image it is to be found.

After the threshold operation, the value of the difference image can be normalized. The advantage of normalizing is that the calculations are made easier. The values become integers and it can be determined that zero means no detection, while other values mean detection of movement. The higher the value, the stronger the detection of movement. The resolution of the difference image in both dimensions may be reduced. An advantage of reducing the resolution is that it increases the processing speed. Another advantage is that the noise is reduced, as the reduction in resolution has a smoothing effect. For example, the resolution of the image can be reduced by a factor of 2.

The calculation of the detection map may be performed by creating a cluster map which comprises clusters that consist of adjacent detection pixels in the difference image that exceed a predetermined value . The detection pixels can appear when the difference image is subjected to a threshold operation. A cluster can, in addition to being strong adjacent detection pixels, also be two or more weaker detections that have the same sign and are located at the most two pixels apart in a chosen direction. Individual detections can also constitute a cluster if they are sufficiently strong, that is if they exceed a predetermined detection value.

Moreover, the cluster map may be compared with a memory map that contains information about detections of movement in the most recently recorded images and only allowing such clusters to be included in the detection map that lie within a predetermined distance from a cluster in the memory map.

The method according to the invention can also comprise a particular memory that remembers movements and indications of movements a particular distance back in time. This means that criteria such as direction of movement and speed of movement can be used in the decision concerning whether a movement has been detected. In this way, a more certain decision is reached. According to a second aspect of the invention, this relates to a method in image processing for detection of movement in a monitored area, comprising the steps of recording an image of the monitored area, forming a first reference image as a recent representation of the monitored area, and creating a first detection map based on the recorded image and the first reference image, the method further comprising the steps of forming a second reference image, which represents the monitored area further back in time compared with the first reference image, creating a second detection map based on the recorded image and the second reference image, combining the first detection map and the second detection map into a combined detection map, and deciding whether a detection has taken place by identifying an object appearing in the combined detection map.

The basic idea is to combine two or more different algorithms for the detection of movement. The results of the different algorithms are then combined into a final detection. The combination aims to emphasize the respective strengths of the different algorithms, while any weaknesses can be compensated for by another algorithm. For example, one algorithm is chosen with high sensitivity and another that is more robust against changes in the lighting.

The monitored area can be said to be a scene that consists of a number of surfaces with different reflective properties. A change in the scene changes the set of surfaces, for example by an object coming into the area or moving in the monitored area, between the time when the reference image was recorded and the time when the current image was recorded. A change in the lighting means that the incident light in the scene is changed, for example by the sun going behind a cloud or by a lamp being switched on. A change in the lighting affects different surfaces differently, depending upon their reflective properties . The recorded image is processed partly with an NTD (near time difference) algorithm, and partly with an LTD (long time difference) algorithm. NTD acts as a high-pass filter, that is rapid changes remain, while slower changes are filtered out. The image is then processed in such a way that, for example, changes in the lighting are filtered out and moving physical objects remain. This means that an indication is obtained of a physical object or a change in the scene. Using LTD, changes in the lighting are also filtered out, and a physical object remains. LTD and NTD are combined and to the greatest possible extent only moving physical objects remain, while other false detections can be filtered out. In this way, a movement can be detected. The monitored area can, for example, be an area that is to be protected against intruders or an area in front of a door with an automatic opening function. The method according to the invention can be used in most connections in which there is image processing for detection of movement. By means of the method according to the second aspect of the invention, the advantages of both near time difference algorithms and long time difference algorithms are obtained. A near time difference algorithm uses a reference image which can be the previous recorded image or an image recent in time to the recorded image. It is also possible for the reference image to be a combination of information from a number of previously recorded images. When a near time difference algorithm is used, slow changes are filtered out, while rapid changes remain. The NTD algorithm can be adjusted so that it is less sensitive, which means that larger changes are required in order for a detection to occur.

The long time difference algorithm uses a reference image that represents the monitored area further back in time in relation to the recorded image. The reference image can also be updated with a very small part of a new recorded image, while the major part of the information in the reference image comes from images recorded further back in time. With the long time difference algorithm both quick and slow changes are detected.

This algorithm detects to a greater extent a whole object, in comparison with the near time difference algorithm which to a greater extent detects the edges of an object, as this is where the change has occurred. This means that the detection becomes clear and that the whole of the detected object is outlined. A disadvantage of this algorithm is that it is more difficult to filter out high frequency changes in the lighting, that is rapid changes in the lighting. Here the advantage of the near time difference algorithm is used, that filters out these rapid changes in the lighting. A rapid change can, for example, occur when a lamp is lit in the monitored area. The long time difference algorithm is, on the other hand, suitable for filtering out slow changes in the lighting. Thus, the method according to the invention can handle both slow and rapid changes in the lighting.

It is possible to use the same algorithms to create both the first and the second detection map. What is different can instead be that different reference images are used. For calculating the first detection map, the algorithm uses a reference image taken recently in relation to the recording of the current image. For calculating the second detection map, the algorithm uses a reference image taken further back in time . A further advantage of the method according to the invention is that the detection of movement is more reliable and the number of false detections is reduced. This can result in large savings in costs, as false alarms in various systems can create a lot of additional work. For example, if the method is used in a monitoring system that is designed to detect intruders, and there is a detection of movement caused by a change in the lighting, this can mean that security personnel have to go out to the monitored area in vain in response to the alarm. It is also possible to use a third algorithm, that calculates a third detection map, in order in this way to make possible even more reliable detection of movement.

The method according to the second aspect of the invention makes possible rapid image processing and can be carried out in such a way that it does not require a lot of memory. This enables all calculations to be executed in real time in embedded systems.

Using this method, movements can detected with high sensitivity in a poorly lit environment. The method can also handle both slow and sudden changes in the lighting.

According to an alternative embodiment of the method, a normalized vector distance algorithm (NVD) is used as the long time difference algorithm in order to form the second reference image and to create the second detection map.

The advantage of using an NVD for calculating the second detection map is that NVD is adapted for using reference images that represent the monitored area some while back in time, compared with the time of recording the current image. NVD thereby usually detects whole objects.

According to the alternative embodiment, the near time difference algorithm utilizes, in addition, the steps of forming a first difference image between the recorded image and the first reference image and filtering the first difference image with a filter that has the property of strengthening in one direction the edges of an object appearing in the first difference image. In another embodiment of the invention, near time difference images are used, as described, to create the first detection map, that is the recorded image is compared with a recently recorded image. In these images a movement is seen principally as a pair of "movement outlines", one at the front edge and one at the rear edge of the movement. If the person moves towards the camera, the increase in size appears as a horizontal "movement" on both sides of the person, as the effects of perspective mean that the person becomes larger in the image. This has the advantage that the same algorithm can also be used to detect movements in a direction towards or away from the sensor.

The method according to the invention makes it possible to set the required direction, that is the vertical direction or horizontal direction, in which edges are to be strengthened. The filtering results in variations in intensity being strengthened and becoming clearer in one direction. In this way, edges are detected more clearly. By edges is meant edges around a moving object, even if, for example, an edge of a desk can also be seen if a movement takes place close to the edge of the desk. The edges around a moving person can, in certain parts, show as thick lines. The thickness of the lines can correspond to the distance the detected object has moved since the previously recorded image. Other sharp lines in the detected object also appear clearly.

An advantage of this information being amplified is that the danger of missing important information in the recorded image is reduced. Another advantage is that the subsequent image processing is made easier, because of the interesting information being amplified. This means that decisions concerning whether there is an alarm situation or not are more certain and the danger of false alarms is reduced. In intruder monitoring, for example, this is important as each false alarm is costly. In certain types of monitoring, it is also important that no object escapes detection.

Another advantage is that the image processing requires less complex algorithms for the subsequent processing, and that it is easier to determine what a detected object represents. An advantage of this method is that it offers a more certain detection of slow movements and movements in dark scenes, in which there is little contrast between the moving object/person and the background. Special emphasis is placed on detecting moving people, by utilizing a person's typical shape and pattern of movements .

A further advantage of the method is that it is quick and does not require a lot of memory. This has the great advantage that it is possible to execute the method in real time in, for example, embedded systems.

It is possible to investigate whether the value of the filtered difference image is equal to or less than a predetermined value, and, if such is the case, to determine that no movement has been detected. This means that cal- culations are not carried out unnecessarily. If there is no movement to detect, there is no point in continuing the analysis . It is an advantage to filter the image before the investigation is carried out concerning whether the image processing is to continue, as otherwise there is a risk of missing important information.

The method may comprise the use of a high-pass filter that lets through high frequencies perpendicular to said direction. The high-pass filter can also be expressed as a derivative filter. The high-pass filter has the advantage that it strengthens the edges of an object in a direction that can be selected. The direction can be in the vertical direction or in the horizontal direction. The filter operates perpendicular to the edge-strengthening direction, that is the high-pass filter lets through high frequencies in a direction perpendicular to the edge-strengthening direction. By high frequencies is meant in this connection how quickly something varies. The filter can, in addition, constitute a low pass filter that lets through low frequencies in said direction. An advantage of this is that there is a noise-reducing effect. As the filter lets through low frequencies in said direction, the detection sensitivity also increases, as the signal-to-noise ratio is increased.

The direction may be vertical. An advantage of strengthening vertical movement outlines is that this improves the sensitivity of the detection of moving persons. The facts are used that a person is an elongated vertical shape and that the movement appears essentially to be horizontal, which gives typical movement outlines. In addition, mean values can be calculated in the horizontal direction without losing valuable information, as people are elongated vertically.

The method may further comprise carrying out a threshold operation on the value of the filtered difference image.

The threshold operation results in an indication of the strength of any detection and where in the image it is to be found. After the threshold operation, the value of the difference image can be normalized. The advantage of normalizing is that the calculations are made easier. The values become integers and it can be determined that zero means no detection, while other values mean detection of movement. The higher the value, the stronger the detection of movement. The first reference image in the near time difference algorithm may consist of the immediately preceding recorded image. An advantage of this is that rapid changes of movement can be detected. Objects appearing in the second detection map that are not adjacent to an object appearing in the first detection map may be removed from the combined detection map. An advantage of this is that many false detections in the second detection map are removed from the combined detection map.

The method may further comprise updating the second reference image with the recorded image in the part of the image in which there is an object that appears in the second detection map and which object is not adjacent to an object appearing in the first detection map.

In this way, permanent changes in the scene can be incorporated into the second reference image. An example of such a permanent change can consist of a flower pot • that has fallen down on the floor. The method may also comprise updating the second reference image with a particular percentage of the recorded image. This subset can, for example, be 0.1 percent of the recorded image.

Objects appearing in the first detection map that do not overlap an object appearing in the second detection map may be removed from the combined detection map. In this way, most false detections in the second detection map, that is the detections from the LTD algorithm, are eliminated.

The method may also comprise carrying out a check on changes in the lighting in the recorded image, in order to determine whether there has been any change in the lighting that affects a major part of the image.

A previous first detection map may be used instead of said first detection map to create the combined detection map, in the event that it has been determined that there has been a change in the lighting that affects a major part of the image.

An advantage of using the previous first detection map is that the algorithm will not be completely "blind" in the detection which is otherwise common with such cases of total change in the lighting. When the next image arrives, the lighting is normally stable again and the algorithm can continue as before .

Parameters that control the detection in the second detection map may be chosen in such a way that they are more sensitive to changes in the recorded image than the parameters that control the detection in the first detection map.

The respective parameters in the two detection algorithms are chosen in such a way that the LTD algorithm is more sensitive than the NTD algorithm. It is also characteristic of the two algorithms that the NTD algorithm principally detects the edges of the object, while the LTD algorithm rather detects the whole object. On account of this, the NTD detections are more fragmented, but on the other hand LTD gives more false alarms, particularly with the selected high sensitivity. The advantages of both methods are thus combined.

According to a third aspect of the invention, this comprises a computer program product comprising program code, which when loaded into a computer carries out the method in accordance with the above.

According to a fourth aspect of the invention, this comprises a system for monitoring a monitored area, comprising at least one sensor unit for recording images of the area and a processing unit, which is arranged to carry out the method in accordance with the above on the basis of said recorded images.

The processing unit, which carries out the algorithms in accordance with the method above, and the memory in which the algorithms can be stored, can be arranged in a separate unit, for example an ordinary personal computer. Alternatively, the processing unit and the memory are arranged in the sensor unit. The processing unit is then less powerful than it would be if it were arranged in a separate computer. This is because the space in the sensor unit is limited and the power consumption should be low. As the computer program that carries out the method according to the invention is efficient and requires little memory, it is, however, possible to arrange the processing unit in the sensor unit.

An advantage of arranging the processing unit in the sensor unit is that no transmission of image information to a separate unit is required, which transmission can have large capacity requirements and can impose great demands on the transmission medium. In addition, there can be interruptions in the transmission, which can result in the monitoring being put out of action.

According to a fifth aspect of the invention, the method in accordance with the above can be used with an automatic door-opener. A sensor unit can be arranged to continually record images of a monitored area in front of a door. The door can, for example, be a sliding door or a side-hung door or a revolving door. A processing unit can be arranged to carry out the above-mentioned method. If a person moves into the monitored area in front of the door, the person is detected as an object and a decision can be taken concerning whether the detected object is to cause the door to open. The image processing that is used as the basis for the decision concerning the opening of the door can have different degrees of intelligence level. The image processing can, for example, be very simple and the decision that the door is to open can be made for all objects that cause movement to be detected. It can also be very advanced and only cause the door to open in the event that the detected object has, for example, a particular shape, size or direction of movement. If it is decided that the door is to open, a signal that the door is to open can be transmitted to a door-opening device, that physically opens the door.

Automatic door-openers are, for example, used at the main entrances to various companies. Just inside the door there is usually a manned reception area. If the door is opened frequently, this affects the temperature inside the reception area, with resultant often costly heat losses. In addition, the people working there are exposed to draughts and cold air. It is therefore very important that the door is not opened in error. By the use of the above-mentioned method, the risk is reduced of the door being opened in error, in, for example, difficult weather conditions, such as snow and rain, and different light and shade conditions that can arise when, for example, the sun goes behind a cloud. The automatic door-opener is also more reliable when the monitored area is dark, as with the method above it is able more effectively to identify persons moving into the monitored area and can thereby decide in a reliable way whether the door is to open.

BRIEF DESCRIPTION OF THE DRAWINGS

Further objects, features and advantages of the invention will become appearent from the following detailed description of various embodiments of the invention with reference to the drawings, in which: Fig. 1 is a perspective view of a monitoring system according to the invention,

Fig. 2 is a schematic block diagram of a sensor unit according to the invention,

Fig. 3 is a recorded current image of a person whose movement is in the vertical direction in the image,

Fig. 4 is a previous image of the person relative to Fig. 3,

Fig. 5 is a difference image between the current image, Fig. 3, and the previous image, Fig. 4, Fig. 6 is an image that shows the difference image according to Fig. 5 after carrying out filtering and a threshold operation,

Fig. 7 is a current image, recorded by a sensor, of a person who is moving towards the sensor, Fig. 8 is a previous image of the person in Fig. 7,

Fig. 9 is a difference image between the current image, Fig. 7, and the previous image, Fig. 8,

Fig. 10 is an image that shows the difference image according to Fig. 9 after carrying out filtering and a threshold operation,

Fig. 11 is a schematic flow chart for the invention, and

Figs 12a and 12b are a flow chart for another embodiment of the invention.

Fig. 13 is a schematic flow chart of how the recorded image is processed in order to detect a movement in a second embodiment of the invention.

Fig. 14 is a flow chart that describes schematically the method according to Fig. 13. Fig. 15 is a flow chart of another embodiment of the method of Fig. 13.

Fig. 16 is a current recorded image.

Fig. 17 is a first reference image. Fig. 18 is a first difference image.

Fig. 19 is an image that shows the result of a threshold operation.

Fig. 20 is an image that constitutes a first detection map. Fig. 21 is an image that constitutes a second detection map.

Fig. 22 is an image that constitutes a combined detection map.

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION In the embodiment described below with reference to Fig. 1, the invention will be used in a monitoring system comprising at least one, sensor unit 1 that monitors a monitored area 2. The monitored area 2 can be an area in which no object, such as. a person, should be found. The sensor unit 1 continually records digital images of the monitored area 2 in order to detect whether, for example, a person 3 is within the monitored area 2. If a person 3 is detected within the monitored area 2, the sensor unit can output an alarm signal, that may be sent to an alarm center 4. The alarm signal, that is sent to the alarm center 4, can consist of a signal that a movement has been detected, but it can also comprise a recorded image or an image of the moving object that caused the alarm. This image can be displayed on a screen in the alarm center 4 and a person in the alarm center 4 can then carry out a further check on what caused the alarm. In one embodiment, the alarm center 4 can be a device that emits a sound signal when it receives an alarm signal from the sensor unit 1.

Fig. 2 shows a block diagram of the hardware in the sensor unit 1. The sensor unit 1 is supplied with power at a voltage connection 10. In addition, the sensor unit 1 comprises a processing unit 11. The sensor unit 1 also comprises a communication unit 12. The communication unit can be arranged to send an alarm signal to the alarm center 4 in the event of the detection of a movement. In addition, the sensor unit 1 comprises a light-sensitive sensor 13, for example a CMOS sensor or a CCD sensor, for recording images. The sensor 13 is integrated with a lens arrangement 14 on a chip. In addition, the sensor unit 1 comprises a RAM memory 15. The sensor unit 1 uses an operating system and can carry out advanced image processing. The sensor unit 1 also comprises a permanent memory 16 for processing code and other data to be saved in a non- volatile memory. All the components in the sensor unit 1 are advantageously integrated on a circuit board. The advantage of this is that the sensor unit 1 is very robust, that is to say that it is less sensitive to sources of interference and has fewer points where sabotage can be carried out. The algorithms that are used in the method according to the invention are stored in the permanent memory 16.

An embodiment of the method according to the invention will now be described with reference to Figs 1-10 and the flow charts in Figs 11 and 12 a-b. The overall method of operation is shown in Fig. 11 and is as follows. A current image of the monitored area is recorded in step 100. A difference image is calculated in step 110 as the difference between the current image and a previous image. Thereafter, in step 120, the difference image is filtered in a way that will be described in greater detail below. In step

130, any object is identified in the filtered difference image. This is used as the basis for a decision whether movement has been detected in step 140.

The method according to the embodiment is shown in greater detail in Figs 12 a-b and is as follows.

In step 200, the sensor 13 records a current image of the monitored area 2. As is well known, an image consists of a number of pixels, the locations of which can be expressed using coordinates. Such a current image is shown in Figs 3 and 7. In both these images a moving object is to be found, which moving object is in this case a person. In Fig. 3, the direction of movement of the person is horizontal in the image. In Fig. 7, the person is moving towards the sensor unit 1. In step 210, the processing unit calculates a difference image ΔI=I( t) -I( t-2) , see Figs 5 and 9, where I(t) is the current digital image, Figs 3 and 7, while I (t-2) is the previous digital image, Figs 4 and 8. In Fig. 3, the person has moved slightly to the left in the image in comparison with his position in Fig. 4. In Fig. 7, the person is closer to the sensor unit 1 than in Fig. 8. The difference image in Fig. 5 is thus the difference between the current image in Fig. 3 and the previous image in Fig. 4, and Fig. 9 is thus the difference between the current image in Fig. 7 and the previous image in Fig. 8.

Al (r, c) is the value of the difference image in the pixel having coordinates (r, c) . The time between each recording depends, for example, upon the processing power and the field of application. For example, the time between each recording can be 0.1 second.

In the next step 220, the difference image ΔI is filtered to strengthen vertical edges in the same. The edge strengthening, or, more accurately, the intensity variation strengthening filter, has the following representation:

The filter g is a high pass filter in the horizontal direction and a mean value filter in the vertical direction. This filter is suitable for the detection of movement in the horizontal direction. If a detection in the vertical direction is required, the filter is instead a high pass filter in the vertical direction and a mean value filter in the horizontal direction. Other types of filter that strengthen variations in intensity, such as edges, are also possible. These can be produced experimentally and can depend upon the field of application for the monitoring and on the distance to people that should be detected. The rows in the filter g can be varied, depending upon the resolution and what smoothing effect we want to achieve. An advantage of using the filter above is that it results in simple calculations. In step 220, the filter g is applied twice to the difference image according to the following formulas:

Cι(r,c) = ∑∑Al(2r+ j -3,2c + k -ϊ) g(j,k),

C2(r,c) = _j∑Al(2r+ j -3,2c + k) g(j,k).

J=l k=l

The result is set to zero for rows and columns where the difference image is indexed outside its definition range, for example 2r+j-3≤0, in step 230. Ci and C₂ have half as many rows and columns as the original difference image. The filter is thus applied more tightly in the horizontal direction than in the vertical direction, as we do not want to miss the important information in the horizontal direction. In step 240, the two filtered images Cι (r, c) and C₂ (r, c) are combined to give a resulting filtered image:

A so-called primary activity test is carried out in order not to continue the image processing unnecessarily, that is to say to abort the calculations if there is nothing to detect in the image. This means that the values in the filtered difference image C (r, c) are investigated in step 250 and, if they are less than a predetermined low value, the algorithm is aborted here and the result is "No detection" . If the value exceeds the predetermined value, then a threshold operation is carried out on the filtered difference image in step 260, and the values are normalized. In Figs 6 and 10, the threshold operation has been carried out on the filtered difference images in Fig. 5 and Fig. 9 respectively. The threshold operation is carried out as follows . Let σ denote the integer component of the standard deviation of C and let S be a sensitivity parameter with standard value 3. Then the image on which the threshold operation is carried out is given by

~ \C(r,c)lσ if|C(r,c)| ≥ σS, C (r,c) = ' _

[0 otherwise.

In practice, this means that the pixels take any one of the values 0,±3,+4,±5, ... (when the standard value of S is used) . There can be said to be a detection when the value is different from zero. The further from zero the value, the stronger the detection can be said to be.

Isolated detection pixels are deleted in step 270, if they are not very strong, and otherwise only such detection pixels are retained as belong to a cluster of detections. Let r:r+3 in indexing of matrices mean the four rows between row r and row r+3 and let S denote the same parameter as above. The cluster map is then defined according to

+ ;,c >) > 2S,

The formula above is valid for all rows except the last three, but detections in these rows still occur as members in clusters, if any, that start one or two rows above. The definition of the cluster map means that two or more weaker detections, that have the same sign and are located at the most two pixels apart vertically, constitute a cluster, but also individual detections constitute a cluster if they are sufficiently strong.

Finally, in step 280, the detection map is determined by comparing the cluster map C_k( r,c ) with a memory map. The memory map indicates where there has been activity in the most recent images in the sequence. Only those clusters that are in the vicinity of a cluster in the memory map are included in the final detection map. Other detections are removed in step 290. In the final detection map, any objects can be identified in step 300 and a decision can be taken in step 310 concerning whether movement has been detected. It is possible to decide immediately that a movement has been detected, if an object appears in the final detection map. It is also possible to follow the movement of an object appearing in the final detection map by comparison with one or more previous final detection maps. By means of the comparison, the direction of movement and speed of the object can be calculated, which can form the basis for the decision concerning whether a movement has actually been detected.

Finally, in step 320, the memory map is updated with clusters from the most recent image. Those clusters that were not included in the final detection map are also memorized. The lifetime (in memory) of a detection is equal to its absolute value in the cluster map C_k (r,c), that is the stronger the detection, the longer it remains in the memory map. Finally, clusters whose lifetimes have expired are deleted.

In another embodiment, the above method is combined with the same algorithm applied to images taken with larger time intervals (for example every second and/or every fourth image) in order to detect very slow movements.

A further embodiment of the invention is described below. Fig. 13 shows very schematically the concept of the embodiment and show the processing of all changes, such as changes in the lighting and changes in the scene, that can occur in a recorded image. All changes that can occur in a monitored area is processed by a NTS system and a LTS system in parallel . The NTS system comprises a high pass filter that passes a moving physical object and gives an indication on a moving physical object. The LTS system passes a physical object. The combination of these systems give a physical object that is moving, resulting in a movement detection.

The invention will be described in further detail below with reference to the images in Figs 16-22 and the flow charts in Figs 14 and 15.

The overall method of operation is shown in Fig. 14 and is as follows. A current image of the monitored area 2 is recorded in step 400. A first reference image is formed in step 410 as a recent representation of the monitored area. Thereafter, in step 420, a first detection map is created based on the current image and the first reference image. In addition, in step 430, a second reference image is formed, which represents the monitored area further back in time compared with the first reference image. Thereafter, in step 440, a second detection map is created based on the current image and the second reference image. In step 450, the first and the second detection map are combined into a combined detection map. In step 460, this is used as the basis for a decision concerning whether movement has been detected.

The method according to the preferred embodiment is shown in greater detail in Fig. 15 and is as follows. A near time difference algorithm (NTD) is used to calculate the first detection map, and an NVD ("Normalized Vector Distance") is used to calculate the second detection map. In step 500, the sensor 13 records a current image of the monitored area 2. As is well known, an image consists of a number of pixels, the locations of which can be expressed using coordinates. Such a . current image is shown in Fig. 16. In the image, a person is moving towards the sensor 13. Both the algorithms then process this current image in parallel.

NTD can be carried out according to the following method. In step 510, the processing unit calculates a first difference image ΔI=I ( ) -I( t-1) , see Fig. 18, where I(t) is the current digital image, Fig. 16, and I (t-2) is the first reference image. In this embodiment, the first reference image is formed quite simply from the immediately preceding digital image, Fig. 17. In the current image , Fig. 16, the person has moved closer to the sensor 13 in comparison with his position at the time of recording the previous image, Fig. 17. The first difference image in Fig. 18 is thus the difference between the current image in Fig. 16 and the previous image in Fig. 17.

Al (r, c) is the value of the first difference image in the pixel with the coordinate (r, c) . The time between each recording depends, for example, upon the processor power and the field of application. For example, the time between each recording can be 0.1 second. In the next step 520, the first difference image ΔI is filtered to strengthen vertical edges in the same. The edge strengthening, or, more accurately, the intensity variation strengthening filter, has the following representation:

The filter g is a high-pass filter in the horizontal direction and a mean value filter in the vertical direction. This filter is suitable for detection of movement in the horizontal direction. If a detection in the vertical direction is required, the filter is instead a high-pass filter in the vertical direction and a mean value filter in the horizontal direction. Other types of filter that strengthen variations in intensity, such as edges, are also possible. These can be produced experimentally and can depend upon the field of application for the monitoring and on the distance to people that we want to detect.

The rows in the filter g can be varied, depending upon the resolution and what smoothing effect we want to achieve. An advantage of using the filter above is that it results in simple calculations. The filter g is applied twice to the difference image according to the following formulas:

6 2

Cι (r, c) = ∑∑M (2r+j-3 , 2 c+k-l ) g(j, k) , y=ι k=ι

Cι (r, c) = ∑∑AI (2r+j-3 2 c+k) g(j, k) y=ι /=ι

For rows and columns where the first difference image is indexed outside its definition range, for example 2r+j-3≤0 , the result is set to zero.

Ci and C₂ have half as many rows and columns as the original difference image. The filter is thus applied more tightly in the horizontal direction than in the vertical direction, as we do not want to miss the important information in the horizontal direction.

The two filtered images Cι (r, c) and C₂ (r, c) are taken together to give a resulting filtered image by:

(r, c) if \cι (r, c)\ ≥\C2 (r, c)\,

(r, c) otherwise.

A so-called primary activity test is carried out in order not to continue the image processing unnecessarily, that is to say to abort the calculations if there is nothing to detect in the image. This means that the values in the filtered first difference image C (r, c) are investigated, and if they are less than a predetermined low value, the algorithm is aborted here and the result is "No detection" . If the value exceeds the predetermined value, then a threshold operation is carried out on the filtered first difference image in step 530, and the values are normalized. In Fig. 19, the threshold operation has been carried out on the filtered first difference image in Fig. 18. The threshold operation is carried out as follows. Let σ denote the integer component of the standard deviation of C and let S be a sensitivity parameter with standard value 3. Then the image on which a threshold operation is carried out is given by:

C (r, c) /σ if \c (r, c)\≥ σS

C (r, c) = 0 otherwise.

In practice, this means that the pixels take any one of the values 0, ±3, ±4, ±5, ... (when the standard value of S is used) .

There can be said to be a detection when the value is different from zero. The further from zero the value, the stronger the detection can be said to be.

Isolated detection pixels are deleted, if they are not very strong, and otherwise only such detection pixels are retained as belong to a cluster of detections. Let r:r+3 in indexing of matrices mean the four rows between row r and row r+3 and let S denote the same parameter as above. The cluster map is then defined according to:

C (r : r + 3, c) if Σ_{j = 0}C(r ÷j, c)\≥ 2S,

Ck (r: r+3 , c) - otherwise.

The formula above is valid for all rows except the last three, but detections in these rows still occur as members in clusters, if any, that start one or two rows above. The definition of the cluster map means that two or more weaker detections, that have the same sign and are located at the most two pixels apart vertically, constitute a cluster, but also individual detections constitute a cluster if they are sufficiently strong. The clusters are four pixels high and one pixel wide and can partially overlap each other. Finally, in step 540, a first detection map, see Fig. 20, is created by comparing the cluster map Ck (r, c) with a memory map. The memory map indicates where there has been activity in the most recent images in the sequence. Only those clusters that are in the vicinity of a cluster in the memory map are included in the final first detection map. Other detections are removed.

Finally, the memory map is updated with clusters from the most recent image. Those clusters that were not included in the final first detection map are also memorized. The life-time (in memory) of a detection is equal to its absolute value in the cluster map Ck (r, c) , that is the stronger the detection, the longer it remains in the memory map. Finally, clusters whose life-times have expired are deleted.

The second detection map is created using NVD ("Normalized Vector Distance"), supplemented by updating of the reference image and adaptive parameters. A basic NVD algorithm is presented in Matsuyama, T., Ohya, T., Habe, H. (2000), Background subtraction for non- stationary scenes (In: Proceedings of the 4 th Asian Conf rence on Computer Vision . ) , to which in its entirety reference is made here. NVD is a method in which the recorded image, see Fig. 16, is compared with a second reference image. The first time the algorithm is executed, the second reference image is calculated, in step 560, as the pixel by pixel mean value of the 20 first images. The images are divided into blocks, for example 8x8 pixels in size, which blocks do not overlap. The pixels in each block are regarded as a vector, they are normalized to the length 1, and then in step 270 the distance is measured between the vectors in the current image and corresponding vectors in the second reference image. The idea is that the normalization is to make the algorithm robust against multiplicative changes in intensity in the image, that is against certain types of changes in the light. In step 280, the calculated vector distance (the NVD value) is compared with an expected value, from which it is allowed to deviate by a certain given tolerance. The expected value is calculated as the mean value (the expected value) of the NVD values in, for example, the 20 first images compared with the second reference image, and the tolerance is determined by the corresponding standard deviation. In this way, the second detection map is created, see Fig. 21.

In order for the algorithm to work well, the second reference image, the expected value and the standard deviation must be updated gradually. This is partly in order to be able to handle different changes in the light that are not covered by the NVD model, and partly in order to be able to handle permanent changes in the scene, such as a chair that has been moved. The updating of the reference image and the parameters is suitably carried out after the first and second detection maps have been combined, see below, since the result of the NTD algorithm can then also determine where and how much updating is required. In order to avoid a massive false alarm in the event of a sudden change in the lighting that affects a major part of the recorded image, a check is carried out on changes in the lighting in each new recorded image in the sequence, irrespective of the other detection algorithms. If

max(∑iγ λ∑l - r

for a threshold T ( = 0.15 as standard), then it can be said to be a question of a total change in the lighting (∑- denotes here the sum of all the pixels) . The measures that are taken in such a case will be described below.

The respective sensitivity parameters in the two detection algorithms are chosen so that NVD is more sensitive than NTD. It is also characteristic of the two algorithms that NTD principally detects the edges of the object, while NVD rather detects the whole object. On account of this, the NTD detections are more fragmented, but on the other hand NVD gives more false alarms, particularly with the selected high sensitivity. The advantages of both methods are combined as follows.

Firstly, in step 550, the NTD detections must be transferred to 8x8 blocks in order to be able to be compared with the NVD detections. The NTD detections are, as mentioned above, grouped in clusters of four pixels. A cluster gives a detection indication in the 8x8 block that overlaps the uppermost pixel of the cluster.

The block detection maps from NTD, Fig. 20, and NVD, Fig. 21, are combined into a single combined detection map, see Fig. 22. Based on blocks with both NTD and NVD detection, in step 590, a so-called "flood fill" is carried out, so that all NVD detections that are 4-neighbors are combined into one object. By 4-neighbors is meant detection blocks that lie immediately above, below, to the right and to the left of a given detection block. NVD detections, 622, that are not adjacent to any NTD detection are not included in the final map, and nor are NTD detections 620 and 621 that do not overlap an NVD detection. In this way, most of the false NVD detections are eliminated, while the fragmentary NTD detections are filled in by the NVD detections. If the check on changes in the light produces a result, both NTD and NVD will presumably give many false detections. In order to avoid a false alarm, in the event of a change in the light being ascertained, the current NTD detection map is not used, but instead the one from the previous image is reused. With the current NVD map and the previous NTD map, the combination map is calculated in precisely the same way as described above. The result will not have such high precision as if the light had not changed, but the algorithm will not be completely "blind", which is otherwise common with changes in the lighting. When the next image arrives, the lighting is hopefully stable again and the algorithm continues as normal.

After each iteration, in step 560, the second reference image and the parameters in the NVD algorithm are updated before the next iteration of the same. This is carried out in such a way that for blocks with NVD detections that are not adjacent to any NTD detection, that is NVD detections that have not been included in the final detection map, the reference image is updated completely with the current image in the block in question. In all other blocks, the reference image is updated by the current image being blended in with a (very) small weight, which can be a small percentage or thousandth. The new reference image can be expressed as Reference image_new = Reference image_previous⁺ (l-oa) Recorded image. The expected value and the standard deviation are handled in a corresponding way. In the combined detection map, any objects can be identified in step 600 and a decision can be taken in step 610 concerning whether a movement has been detected. It is possible to decide immediately that a movement has been detected, if an object appears in the final detection map. It is also possible to follow the movement of an object appearing in the final detection map by comparison with one or more previous final detection maps. By means of the comparison, the direction of movement and speed of the object can be calculated, which can form the basis for the decision concerning whether a movement has actually been detected.

Even though several embodiments of the invention has been described above, it is obvious to those skilled in the art that many alternatives, modifications and variations can be achieved in the light of the above description. The embodiments described do not limit the invention, which is only limited by the appended patent claims .

Claims

PATENT CLAIMS

1. A method in image processing for detection of movement in a monitored area, comprising the steps of recording an image of the monitored area, calculating a difference image between the recorded image and a previously recorded image, and determining whether movement has been detected by identifying an object appearing in the difference image, c h a r a c t e r i z e d by filtering the difference image with a filter for strengthening the edges of the object in one direction.

2. A method according to claim 1, c h a ra c t e r i z e d in that the filter is a high pass filter that lets through high frequencies perpendicular to said direction.

3. A method according to any one of the preceding claims, c ha r a c t e r i z e d in that the filter is a low pass filter that lets through low frequencies in said direction.

4. A method according to any one of the preceding claims, c ha r a c t e r i z e d in that said direction is vertical.

5. A method according to any one of the preceding claims, c h a r a c t e r i z e d by carrying out a threshold operation on the filtered difference image.

6. A method according to any one of the preceding claims, c h a r a c t e r i z e d in that the previously recorded image is the immediately preceding recorded image.

7. A method according to any one of the preceding claims, c h a r a c t e r i z e d by reducing the resolution of the difference image in both dimensions.

8. A method according to any one of the preceding claims, c h a r a c t e r i z e d in that the determination whether a movement has been detected comprises: calculating a detection map and identifying said object that appears based on the detection map.

9. A method according to claim 8, ch a r a c t e r i z e d in that the calculation of the detection map further comprises: creating a cluster map which comprises clusters that consist of adjacent detection pixels in the difference image that exceed a predetermined value .

10. A method according to claim 9, c h a r a c t e r i z e d by comparing the cluster map with a memory map that contains information about detections of movement in the most recently recorded images and only allowing such clusters to be included in the detection map that lie within a predetermined distance from a cluster in the memory map.

11. A method in image processing for detection of movement in a monitored area, comprising the steps of recording an image of the monitored area, forming a first reference image as a recent representation of the monitored area, creating a first detection map based on the recorded image and the first reference image, c h a r a c t e r i z e d by forming a second reference image, which represents the monitored area further back in time in comparison with the first reference image, creating a second detection map based on the recorded image and the second reference image, combining the first detection map and the second detection map into a combined detection map, and deciding whether there has been a detection by identifying an object appearing in the combined detection map.

12. A method according to claim 11, c ha r a c t e r i z e d in that the formation of the second reference image and creating the second detection map are carried out using a normalized vector distance algorithm.

13. A method according to claim 11 or 12, c h a r a c t e r i z e d in that the formation of the first reference image and the creation of the first detection map comprise, in addition: forming a first difference image between the recorded image and the first reference image; and filtering the first difference image with a filter ^'that has the property of strengthening in one direction the edges of an object appearing in the first difference image.

14. A method according to claim 13 , c h a r - a c t e r i z e d in that the filter is a high-pass filter that lets through high frequencies perpendicular to said direction.

15. A method according to claim 13 or 14, c ha r a c t e r i z e d in that said direction is vertical.

16. A method according to claim 13, 14 or 15, c h a r - a c t e r i z e d by carrying out a threshold operation on the filtered first difference image.

17. A method according to any one of claims 11 to 16, c h a r a c t e r i z e d in that the first reference image is formed from the immediately preceding recorded image.

18. A method according to any one of claims 11 to 17, ch a r a c t e r i z e d in that said combination comprises: removing from the combined detection map objects appearing in the second detection map that are not adjacent to an object appearing in the first detection map.

19. A method according to any one of claims 11 to 18, c h a r a c t e r i z e d by updating the second reference image with the recorded image in the part of the image in which there is an object that appears in the second detection map and which object is not adjacent to an object appearing in the first detection map.

20. A method according to any one of claims 11 to 19, ch a r a c t e r i z e d by updating the second reference image with a particular percentage of the recorded image.

21. A method according to any one of claims 11 to 20, ch a r a c t e r i z e d in that said combination comprises: removing from the combined detection map objects appearing in the first detection map that do not overlap an object appearing in the second detection map.

22. A method according to any one of claims 11 to 21, c h a r a c t e r i z e d by carrying out a check on changes in the lighting in the recorded image, in order to decide whether there has been any change in the lighting that affects a major part of the image.

23. A method according to claim 22, c h a r - a c t e r i z e d in that a previous first detection map is used instead of said first detection map during said combination, in the event that it has been determined that there has been a change in the lighting that affects a major part of the image .

24. A method according to any one of claims 11 to 23, ch a r a c t e r i z e d in that parameters that control detection in the second detection map are chosen in such a way that they are more sensitive to changes in the recorded image than the parameters that control detection in the first detec- tion map.

25. A computer program product comprising program code, which when loaded into a computer carries out the method according to any one of claims 1-24.

26. A system for monitoring a monitored area (2), comprising at least one sensor unit (1) for recording images of the area and a processing unit (11) , which is arranged to carry out the method according to any one of claims 1-24 on the basis of said recorded images .

27. Use of a method according to any one of claims 1-24, for controlling an automatic door-opener.