WO2016136214A1

WO2016136214A1 - Identifier learning device, remaining object detection system, identifier learning method, remaining object detection method, and program recording medium

Info

Publication number: WO2016136214A1
Application number: PCT/JP2016/000869
Authority: WO
Inventors: 有紀江海老山
Original assignee: 日本電気株式会社
Priority date: 2015-02-27
Filing date: 2016-02-18
Publication date: 2016-09-01
Also published as: JP7074174B2; JPWO2016136214A1; JP6784254B2; JP2021007055A

Abstract

[Problem] To provide a remaining object detection system, remaining object detection method, and remaining object detection program capable of preferably detecting a remaining object, and an identifier learning device, identifier learning method, and identifier learning program for learning an identifier for preferably identifying a remaining object. [Solution] In an identifier learning device that learns an identifier for identifying a remaining object, a learning unit 91 of the identifier learning device learns the identifier for identifying the remaining object with a set of a plurality of images including the same detection target as a positive example that indicates a remaining state and with a set of a plurality of images not including the same detection target as a negative example that indicates a non-remaining state.

Description

Discriminator learning apparatus, stagnant object detection system, discriminator learning method, stagnant object detection method, and program recording medium

The present disclosure relates to a system, a method and a program recording medium for detecting a person or an object staying in a monitoring area, and an apparatus and a method for learning an identifier for identifying the person or an object staying. And a program recording medium.

A technique for detecting an object is known (for example, see Patent Documents 1 to 4). For example, in video surveillance, it is considered to specify an object left behind or a person who stays for a certain period of time.

Patent Document 1 describes a method for detecting an object left behind from a scene of an image taken by a camera. In the method described in Patent Document 1, a motion in a scene is analyzed on a plurality of time scales, and a long-term background model is generated based on the frequency of appearance of pixel values using a plurality of photographed images photographed over a long period of time. To do. Then, this long-term background model is compared with a short-term background model generated using a plurality of photographed images photographed over a shorter period.

At this time, if an image is generated from a captured image within a certain period using pixels with a high appearance frequency, for example, the appearance frequency of a pixel of a moving object that is immediately out of frame is low, and the appearance frequency of a pixel of a stationary object Becomes higher. Therefore, in the long-term background model and the short-term background model, the background and the stationary object are easily extracted.

And comparing the long-term background model with the short-term background model, the long-term background model has a long background observation time compared to the observation time of the left object that is stationary for a short time. Become dominant. On the other hand, in the short-term background model, in addition to the background, the pixel of the left object that remains stationary for a short time becomes dominant. Therefore, the long-term background model and the short-term background model have a difference in the appearance frequency of the pixel values belonging to the left object that is stationary for a short time.

Thus, in the analysis scene, the pixels belonging to the background portion that is mainly stationary and the left object that is stationary for a short time are distinguished from each other.

Patent Document 2 describes an abandoned object detection device that detects an abandoned object based on a captured image of a target area. Similarly, the abandoned object detection device described in Patent Document 2 also analyzes the movement in the scene on a plurality of time scales. Specifically, the abandoned object detection device described in Patent Document 2 distinguishes a foreground region and a background region based on pixel value variations using the latest captured images of a plurality of frames, and obtains a currently obtained background. The pixel values in the region and the background region obtained in the past are compared.

At this time, in the area through which the moving object has passed, pixels of the moving object and the background and the stationary object are mixed, so that the pixel value variation increases, and in the background and the stationary object region, the pixel value variation decreases. A region and a background region are distinguished. Then, paying attention to the background area where the variation of the pixel value is small, and comparing the pixel value of the current background area with the pixel value of the past background area, the pixel value belonging to the stationary object is obtained before and after the stationary object appears. A difference is born.

Thereby, in the analysis scene, a pixel belonging to a dynamic foreground part, a pixel belonging to a stationary background, and a pixel belonging to a left object that has been stationary for a short time are distinguished from each other. .

In this way, generally, a method (difference-based method) for comparing the results of analyzing images on a plurality of time scales and determining that a stagnant object is present in the region where the difference is obtained has been proposed. .

Japanese Patent No. 5058010 Japanese Patent No. 4852355 JP 2010-176206 A JP 2014-126942 A

However, the difference-based methods described in Patent Document 1 and Patent Document 2 have a problem that false detection is likely to occur with respect to changes in the shooting environment. In the difference-based method, pixel information obtained at a plurality of time scales is compared to extract a difference region. Therefore, when a change occurs in the shooting environment between time scales used for comparison, erroneous detection occurs in the change region.

Specific examples of changes in the shooting environment include differences in sunshine and lighting conditions due to shooting time and weather, movement of objects, changes in posters and signs such as digital signage, camera lens contamination, wind, vibration and contact There is a shift in the angle of view of the camera.

An exemplary object of the present disclosure is to provide a technique capable of suitably detecting a staying object and a technique for appropriately identifying the staying object.

The discriminator learning device according to the present disclosure uses a set of a plurality of images including the same detection target as a positive example indicating a staying state, and a set of a plurality of images not including the same detection target as a negative example indicating a non-staying state And a learning unit for learning a discriminator for identifying a staying object.

The staying object detection system according to the present disclosure includes a target image selection unit that selects a plurality of detection target images that are captured with a time difference suitable for stay analysis from a plurality of detection target images that are captured at different times. Identification that identifies each staying object from the plurality of images, and an analysis image generation unit that extracts an image showing the same analysis region from the plurality of selected detection target images and generates an analysis image that is a set of the extracted images And a stagnant object detecting means for detecting a stagnant object from the generated analysis image, and the target image selecting means is based on at least one of the movement model of the detection target or the size of the analysis region. It is characterized by determining a time difference suitable for analysis.

The discriminator learning method according to the present disclosure is a discriminator learning method for learning a discriminator for identifying a stagnant object, in which a computer sets a plurality of image sets including the same detection target as a positive example indicating a stagnant state. A classifier that identifies a staying object is learned by using a set of a plurality of images that do not include the same detection target as a negative example indicating a non-staying state.

The staying object detection method according to the present disclosure selects a plurality of detection target images that are captured with a time difference suitable for stay analysis from a plurality of detection target images that are captured at different times. Each of the images showing the same analysis area is extracted from the detection target image, an analysis image that is a set of the extracted images is generated, and an analysis image generated using a discriminator that identifies a stagnant object from a plurality of images When a stagnant object is detected and a detection target image is selected, a time difference suitable for stagnant analysis is determined based on at least one of a movement model of a detection target or a size of an analysis region.

A discriminator learning program according to the present disclosure is a discriminator learning program applied to a computer that learns a discriminator for identifying a stagnant object, wherein a set of a plurality of images including the same detection target is stored in the computer. And a learning process for learning a discriminator for identifying a staying object is executed by using a set of a plurality of images not including the same detection target as a negative example indicating a non-staying state.

The stagnant object detection program according to the present disclosure is a target image selection that selects a plurality of detection target images that are captured at a time difference suitable for stay analysis from a plurality of detection target images that are captured at different times. Processing, an analysis image generation process for generating an analysis image that is a set of the extracted images, respectively, and extracting a stagnant object from the plurality of images. Based on at least one of the moving model of the detection target or the size of the analysis region in the target image selection process, the identification object is used to execute the staying object detection process for detecting the staying object from the generated analysis image. The method is characterized in that a time difference suitable for the residence analysis is determined.

According to the present disclosure, a staying object can be detected suitably.

FIG. 1 is a block diagram illustrating a configuration example of an embodiment of a staying object detection system according to the present disclosure. FIG. 2 is a block diagram illustrating a configuration example of the analysis image acquisition unit. FIG. 3 is an explanatory diagram illustrating an example of selecting an analysis region. FIG. 4 is an explanatory diagram illustrating an example of a method for detecting a person who stays. FIG. 5 is an explanatory diagram showing an example of another method for detecting a person who stays. FIG. 6 is an explanatory diagram illustrating an operation example of the staying object detection system. FIG. 7 is a flowchart illustrating an operation example of learning a classifier. FIG. 8 is a block diagram illustrating an overview of a classifier learning device according to the present disclosure. FIG. 9 is a block diagram illustrating an outline of a staying object detection system according to the present disclosure. FIG. 10 is a block diagram illustrating a configuration example of a computer device according to the present disclosure.

Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. In the present disclosure, “part”, “means”, “apparatus”, and “system” do not simply mean physical means or apparatus, but “part”, “means”, “apparatus”, The case where the functions of the “system” are realized by software is also included. Further, the functions of one “unit”, “means”, “apparatus”, and “system” may be realized by two or more physical means or devices, and two or more “parts” or “means” may be realized. The functions of “device” and “system” may be realized by one physical means or device.

FIG. 1 is a block diagram illustrating an embodiment of a staying object detection system according to the present disclosure. As shown in FIG. 1, the staying object detection system according to this embodiment includes an image input unit 1, a stay detection unit 2, an output unit 3, and a classifier learning unit 4.

The image input unit 1 sequentially inputs time-series images obtained by photographing a predetermined monitoring area to the stay detection unit 2. Note that the input image input from the image input unit 1 can be said to be an image in which the detection target is photographed, and hence may be referred to as a “detection target image” below. For acquisition of an image, for example, a photographing device such as a surveillance camera may be used. Further, the image input unit 1 may sequentially input time-series images obtained by reading video data stored in a storage device (not shown) to the stay detection unit 2.

Note that the type of object to be detected in the present disclosure is not particularly limited, and may be a human, an animal, a car, a robot, or the like.

The stay detection unit 2 analyzes images sequentially input from the image input unit 1 and detects staying objects present in the image. The stay detection unit 2 includes an analysis image acquisition unit 21, a stay identifier storage unit 22, a stay degree calculation unit 23, and a stay determination unit 24.

The analysis image acquisition means 21 holds the images input from the image input unit 1 for the past several frames, and acquires a set of local region images subdivided based on the size of the detection target appearing in the input image. The set of images of the local area is used for calculation of the staying degree described later.

FIG. 2 is a block diagram illustrating a configuration example of the analysis image acquisition unit 21 of the present embodiment. The analysis image acquisition unit 21 of the present embodiment includes an analysis region selection unit 211, an analysis time selection unit 212, and an analysis image selection unit 213.

The analysis area selection unit 211 selects a local area as a unit for analyzing the staying state from the input image. Hereinafter, the local region selected by the analysis region selection unit 211 is referred to as an analysis region. The size of the analysis region is arbitrary, and may be determined based on the size of the detection target, for example.

The analysis area selection unit 211 may select an analysis area by moving an area having a predetermined size at predetermined intervals on the image, for example. The size and interval of the analysis area may be determined by the administrator of the staying object detection system based on the apparent size of the detection target in the image. The analysis region selection unit 211 may select the analysis region using the size and interval values determined in this way.

In addition, when the apparent size of the detection target changes depending on the position of the detection target, the analysis region selection unit 211 uses the camera parameter indicating the camera posture obtained in advance to detect the detection target for each position on the image. The apparent size may be calculated. Then, the analysis region selection unit 211 may determine the size of the analysis region according to the apparent size calculation result.

In addition, the analysis area selection unit 211 may continue to use the analysis area that was initially selected when the stagnant object detection system was started, or each time a new image is input, a new position or size that is newly changed. The analysis area may be selected again. That is, the analysis region selection unit 211 may select the same region of a plurality of images as the analysis region using the newly selected analysis region.

FIG. 3 is an explanatory diagram illustrating an example in which the analysis region selection unit 211 selects an analysis region. In the example shown in FIG. 3, it is shown that different analysis regions are selected in images at different times. For example, the analysis region selection unit 211 may select the region R1 at the time t1 and the time t2, and may select the region R2 at the time when another image is input (time t11). However, when the images are compared, a region having the same coordinates is used. For example, when comparing the image at time t11 and the image at time t12, both regions R2 are used.

The analysis time selection unit 212 is photographed for each analysis region selected by the analysis region selection unit 211 at a time suitable for the stay analysis among the past several frames of images input from the image input unit 1. (Ie, taken with a time difference suitable for the analysis of dwell).

The analysis time selection unit 212 may calculate a time difference suitable for the analysis of the stay using, for example, a movement model of a detection target. As a specific example, consider a case where a detection target is a person, and a range having a width of 0.6 m centering on the person is cut out as a local region (analysis region). For example, assuming that the moving speed of a general person is 1.2 m / sec, this is set as a moving model of a detection target. In this case, the analysis time selection unit 212 may select images taken at intervals of 0.5 seconds or more. This is because only the staying person is photographed in common at the same position, and the moving person passes through the analysis region, so that it is not photographed in common at the same position. Accordingly, the time difference suitable for the stay analysis in this case is 0.5 seconds. Therefore, the analysis time selection unit 212 may select an image taken at intervals of 0.5 seconds or more from the images input from the image input unit 1.

As described above, the analysis time selection unit 212 calculates the time required for the detection target to pass through the analysis region based on the movement model of the detection target, and the input images taken at intervals of the calculated time or more. May be selected. At that time, the size of the analysis region may be a fixed size defined in advance.

In the above-described example, the movement model is exemplified when the movement speed of the detection target is modeled. However, the movement model may be a model of movement speed and movement direction. Specifically, the movement model may be a model that can derive the movement direction of the detection target and the movement speed assumed for the movement direction. Further, without using such a movement model, the movement direction and movement speed of the detection target may be fixed values defined in advance. As described above, the analysis time selection unit 212 may determine the time difference suitable for the stay analysis by using one or both of the movement model to be detected and the size of the analysis region.

Note that the administrator of the stagnant object detection system may determine a movement model to be detected in advance and use the value. Further, the apparent moving speed of the detection target in the image may change depending on the position of the detection target. In this case, the analysis time selection unit 212 may calculate the apparent movement distance of the detection target between the frame images for each position on the image, using the camera parameter indicating the camera posture obtained in advance. Then, the analysis time selection unit 212 may select only images in which the moving object is not included in the same analysis region in the previous and subsequent frames. The analysis time selection unit 212 inputs an image for each selected analysis region to the analysis image selection unit 213.

The analysis image selection unit 213 selects a combination of images used for calculating the staying degree from the images for each analysis region input from the analysis time selection unit 212. Here, the staying degree is an index indicating the probability that the detection target is staying. Hereinafter, the image selected by the analysis image selection unit 213 is referred to as an analysis image.

Here, the method for acquiring the analysis image will be specifically described. FIG. 4 is an explanatory diagram illustrating an example of a method for detecting a person who stays. Hereinafter, with reference to FIG. 4, a method of detecting a person staying in a surveillance camera image taken on the street will be described.

FIG. 4 shows an example of detecting a staying person by paying attention to the upper body of the person. In this example, it is assumed that the image input unit 1 sequentially inputs the images at time t1, time t2, and time t3 illustrated in FIG. 4, and the analysis image acquisition unit 21 holds the past two images. That is, when input images are obtained in the order of time t1, time t2, and time t3, the analysis image acquisition unit 21 acquires a set of analysis images based on the images at time t1 and time t2 at time t2. At time t3, another set of analysis images is acquired based on the images at time t2 and time t3.

At this time, the analysis region selection means 211 selects the analysis region based on the size in the image of the person to be detected. FIG. 4 shows an example in which three analysis regions, a predetermined region 1, region 2, and region 3, are set to simplify the description.

These analysis areas are set to the same coordinates (that is, the same analysis area) for each input image taken at different times. Then, the analysis time selection unit 212 determines whether or not a person can move on the analysis area at the shooting time interval of the input image for each selected analysis area. If it is possible to move, the analysis time selection unit 212 sets the images taken at the time interval as analysis image candidates. In the example of FIG. 4, it is assumed that a person can move in all analysis regions.

Then, the analysis image selection unit 213 obtains an image of the analysis area from each input image. That is, at time t2, a pair of the image of region 1 captured at time t1 and the image of region 1 captured at time t2 is a set of analysis images. As described above, the analysis image selection unit 213 sets a set of images acquired from the same analysis region as one set of analysis images, and inputs all of the obtained analysis images to the staying degree calculation unit 23. In other words, the analysis image selection unit 213 extracts images indicating the same analysis region from the plurality of input images selected by the analysis time selection unit 212, and generates a set analysis image of the extracted images. I can say that.

Although FIG. 4 shows an example in which three analysis areas are set, the number of analysis areas to be set is arbitrary. Further, the analysis area may be set in an overlapping range on the image.

FIG. 4 shows an example in which the upper half of the person to be detected is included in the analysis region. However, the analysis region may be set so as to include an arbitrary part to be detected, or may be set so as to include the detection target.

Further, FIG. 4 shows an example in which the analysis region is a square, but the shape of the analysis region is not limited to a square and may be set to an arbitrary rectangle.

In this example, an example in which one set of analysis images is generated from two local images has been described. However, the number of local images is not limited to two, and an arbitrary number of two or more images can be used as one set. An analysis image may be used.

In the example of FIG. 4, the case where the number of images included in one set of analysis images matches the number of past images held by the analysis image acquisition unit 21 has been described. However, when the number of past images held by the analysis image acquisition unit 21 is larger than the number of images included in one set of analysis images, the analysis image selection unit 213 may select a plurality of sets of analysis images. .

Here, the procedure for the analysis image selection means 213 to acquire a plurality of sets of analysis images for one analysis region will be specifically described with reference to FIG. FIG. 5 is an explanatory diagram showing an example of another method for detecting a person who stays. Note that the example shown in FIG. 5 is the same as the example shown in FIG. 4 except that the analysis image acquisition unit 21 holds the past three images.

At time t3 shown in FIG. 5, three images from time t1 to t3 are obtained in each analysis region of regions 1 to 3. In this example, the analysis image selection unit 213 selects two images from the three images at times t1 to t3 to form a set of analysis images, and therefore (t1, t2), (t2, t3) for each analysis region. ), (T1, t3) three sets of analysis images are selected. The analysis image selection unit 213 inputs the set of analysis images thus selected for each analysis region to the staying degree calculation unit 23.

In the example shown in FIG. 5, the analysis image selection unit 213 selects an analysis image from all combinations of analysis images. However, the analysis image selection means 213 does not necessarily need to select all combinations, and may select analysis images by other methods. For example, as a method of selecting a set of analysis images, it is assumed that a set of analysis images is generated by selecting two images from among five frames of images from time t1 to t5. In this case, the analysis image selection unit 213 may generate a set of analysis images from adjacent frames such as (t1, t2), (t2, t3), ..., (t4, t5). . In addition, the analysis image selection unit 213 sets the latest frame image and any one of the past frame images as 1 (t5, t2), (t5, t3),..., (T5, t4). A set of analysis images may be used.

The advantages of selecting a plurality of sets of analysis images are as follows.

When there are a large number of moving objects in the monitoring area, when comparing an image at a certain time with an image at a different time, it is likely that different moving objects will accidentally exist in the same analysis area. In this case, if the appearances of the different moving bodies are similar, a high retention degree is erroneously obtained in the analysis region, and erroneous detection of retention is likely to occur.

On the other hand, the analysis image selection means 213 of the present embodiment selects a plurality of sets of analysis images for calculating the staying degree with respect to the past several frames of images obtained from the image input unit 1. By selecting a plurality of sets of analysis images, it is possible to reduce the possibility that a different moving body will accidentally exist in the same analysis region, and to reduce erroneous detection.

The analysis image selection means 213 inputs the selected set of analysis images to the staying degree calculation means 23.

The staying discriminator storage unit 22 stores a discriminator used by the staying degree calculating unit 23 described later for calculating the staying degree with respect to the set of analysis images input from the analysis image acquiring unit 21. This discriminator is constructed in advance before the stagnant object detection system performs processing for detecting stagnant objects.

The staying classifier storage unit 22 may store a classifier generated by the classifier learning unit 4 described later, or may store a classifier generated by an administrator or the like.

The discriminator learning unit 4 learns a discriminator that identifies a staying object from a plurality of images. Here, identifying a stagnant object includes not only identifying whether or not it is a stagnant object, but also calculating an index (degree of stay) indicating the probability that the detection target is stagnating in order to identify the stagnant object. It is.

The discriminator learning unit 4 may generate a discriminator that outputs a staying degree as a determination result for a plurality of images, for example. Specifically, the discriminator learning unit 4 may generate a discriminator that calculates the staying degree of the detection target higher as the same detection target is included in the plurality of input images.

Hereinafter, a specific method in which the classifier learning unit 4 of the present embodiment learns the classifier will be described. The classifier learning unit 4 according to the present embodiment learns a classifier using positive and negative learning images. Specifically, the classifier learning unit 4 uses a set of a plurality of images including the same detection target as a positive example indicating the staying state. The classifier learning unit 4 uses a set of a plurality of images that do not include the same detection target as a negative example indicating a non-staying state.

Then, the classifier learning unit 4 constructs a classifier suitable for discriminating between the positive example and the negative example by machine learning. Specifically, the classifier learning unit 4 learns a classifier that identifies a stagnant object from these images when the same number of images as the number of images included in the positive or negative example set is input. To do.

Here, the learning image will be specifically described with an example in which the detection target is a person. The positive example may be an image including the same detection target. The positive example is not necessarily an image in which the same detection target is included in the same state. The positive example assumes the monitoring environment of the application destination, for example, a set of images assuming that different people such as passersby appear before and after the staying person, or a set of images in which the lighting conditions around the staying person have changed. It may be.

That is, the learning image is subjected to perturbation processing that reflects the influence of how light strikes, brightness, shadows, etc., on the detection target or background of at least one of the images included in the positive example set. It may be given. By doing so, it is possible to maintain the accuracy of identifying the staying object even when the shooting environment changes.

Also, the positive example may be a set of a plurality of images including the same detection target and at least a part of the same background image. When comparing a plurality of images targeting the same analysis region as in this embodiment, there is a high possibility that the same background image is reflected in the comparison analysis region. Therefore, the classifier learning unit 4 can more appropriately determine the staying image by learning the classifier by using, as a positive example, a set of a plurality of images including the same detection target and at least a part of the same background image. At this time, the above-described perturbation process may be performed on the background image.

The negative example may be an image that does not include the same detection target.For example, a set of images in which different persons are photographed assuming a passerby, a set of images of backgrounds such as the ground and buildings, and the like are learned images. As an example. Further, the negative example may be subjected to the above-described perturbation process in the same manner as the positive example. By performing the perturbation process on the negative example image, it is possible to suppress erroneous detection even when the way the light strikes and how the shadow is produced changes due to changes in the shooting environment.

The discriminator learning unit 4 learns the discriminator using such positive examples and negative examples collected in large quantities as learning images. That is, the discriminator learning unit 4 learns a discriminator using a set of images in which at least one image is subjected to perturbation processing among images included in a set of positive examples or negative examples. At this time, the target to which the perturbation process is performed is arbitrary, and may be, for example, a detection target or a background included in a positive example or a negative example.

Note that the learning image may be an image cut out from the real image, may be an image obtained by combining the background of the real image and the foreground (detection target) of the real image, or may be artificially generated by CG (Computer Graphics). An automatically generated image may be used.

The discriminator learning unit 4 constructs a discriminator suitable for discriminating between positive examples and negative examples using the prepared learning images. The discriminator learning unit 4 may construct a discriminator suitable for discriminating between positive examples and negative examples by using a machine learning method such as CNN (Convolutional Neural Network). By using the discriminator generated in this way, it is possible to obtain the probability of belonging to a positive example or a negative example for an arbitrary input image.

However, the learning method used by the discriminator learning unit 4 is not limited to CNN, and any method can be used as long as it can construct a discriminator that outputs a probability belonging to a positive example or a negative example for an arbitrary input image. A method of learning a plurality of images with CNN is also known. However, this method is a method of learning for images that are very close to each other at regular intervals, and is different from a method of using images that are taken at some time apart as in the discriminator learning unit 4 of the present embodiment. .

Also included in the set of analysis images acquired by the analysis image acquisition means 21 and the number of images included in one positive example and negative example used for learning of the identifiers stored in the staying identifier storage unit 22. Assume that the number of images matches.

The staying degree calculating unit 23 calculates a staying degree for the set of analysis images input from the analysis image acquiring unit 21 using a classifier stored in the staying classifier storage unit 22. That is, the staying degree is calculated for each analysis region. The staying degree calculating unit 23 inputs a set of the coordinates of the analysis region and the calculated staying degree to the staying determining unit 24.

Further, as illustrated in FIG. 5, when the analysis image selection unit 213 selects a plurality of sets of analysis images for one analysis region, the staying degree calculation unit 23 applies to all sets of analysis images. The staying degree is calculated, and the calculated staying degree is integrated for each analysis region.

FIG. 5 shows an example in which input images for the past three frames are held, and two of the local images are used for calculating the staying degree. In this example, the analysis image selection unit 213 selects three sets of analysis images (t1, t2), (t2, t3), and (t1, t3) shown in FIG. Therefore, the staying degree calculating unit 23 calculates three staying degrees for these three sets of analysis images. Then, the staying degree calculating means 23 calculates, for example, an average value, a median value, a maximum value, or a minimum value of three values and integrates the calculated staying degrees. It is good also as a result of integration.

The stay determination unit 24 performs stay determination using information on a set of the analysis region coordinates input from the stay degree calculation unit 23 and the calculated stay degree, and outputs stay generation coordinates for the input image. In other words, a staying object detection process for detecting a staying object from the set of generated analysis images is executed by the staying degree calculation unit 23 and the staying determination unit 24.

The stay determination unit 24 may determine, for example, that a stay has occurred in an analysis region in which a stay degree equal to or greater than the threshold is obtained by comparing a preset threshold value with a stay degree value.

In the case where the analysis areas overlap on the image, the stay determination means 24 performs the stay determination in the overlap areas, and the average value, median value, maximum value, and minimum value of the stay degrees calculated for each overlap analysis area If any of the above values is equal to or greater than a predetermined threshold value, it may be determined as staying.

Further, when the staying object detection is performed on the image captured by the fixed monitoring camera, the staying degree calculation unit 23 specifies a background image that does not include the stay of the detection target in advance, and the specified background image portion is included in the specified background image portion. On the other hand, the staying degree may be calculated. Then, the staying determination unit 24 may perform a correction process for reducing the staying degree with respect to an area where the staying degree is easily calculated with a high degree of staying (a region where erroneous detection is likely to occur).

Further, the stay determination unit 24 may calculate the reliability based on the stay degree with respect to the background image that does not include the stay of the detection target. In this case, the stay determination means 24 has a low reliability of the area where the stay degree is easily calculated with respect to the background (an area where erroneous detection is likely to occur), and the reliability of the area where the stay degree is low with respect to the background is high. Thus, the reliability is calculated from the staying degree. Then, the stay determination unit 24 outputs the calculated reliability to the output unit 3 together with the stay degree for each region.

The stay determination means 24 may use coordinates on the screen as the stay occurrence coordinates to be output, or may use coordinates converted to real world coordinates.

The output unit 3 outputs the stay occurrence coordinates input from the stay detection unit 2. The output mode of the output unit 3 is to display, for example. In this case, the output unit 3 may include a display device (not shown) and display on the display device. However, the output mode of the output unit 3 is not limited to display, and may be other modes.

Analysis image acquisition means 21 (more specifically, analysis region selection means 211, analysis time selection means 212, analysis image selection means 213), retention degree calculation means 23, and stay determination means 24 in the stay detection unit 2 Is realized by a CPU (Central Processing Unit) of a computer that operates according to a program (a staying object detection program).

For example, the program may be stored in a storage unit (not shown) included in the staying object detection system. The CPU reads the program, and according to the program, the analysis image acquisition means 21 (more specifically, the analysis region selection means 211, the analysis time selection means 212, and the analysis image selection means 213), the staying degree calculation means 23, and The residence determination unit 24 may operate.

The analysis image acquisition means 21 (more specifically, the analysis region selection means 211, the analysis time selection means 212, and the analysis image selection means 213), the staying degree calculation means 23, and the stay determination means 24 are: Each may be realized by dedicated hardware.

The classifier learning unit 4 is realized by a CPU of a computer that operates according to a program (a classifier learning program). The classifier learning unit 4 may also be realized by dedicated hardware.

Next, the operation of the staying object detection system according to this embodiment will be described. FIG. 6 is an explanatory diagram illustrating an operation example of the staying object detection system according to the present embodiment. Note that the order of the processing steps described below may be arbitrarily changed within a range in which there is no contradiction in processing content, or may be executed in parallel. Further, other steps may be added between the processing steps. Further, a step described as one step for convenience can be divided into a plurality of steps, and a step described as divided for convenience can be executed as one step.

Analytical image acquisition means 21 acquires a captured image and its captured time from the image input unit 1 (step S1). Next, the analysis image acquisition means 21 discards the image with the oldest shooting time among the images of the past several frames to be stored, and newly stores the latest input image acquired in step S1, thereby storing the image history. Update (step S2).

Next, the analysis area selection unit 211 selects a plurality of analysis areas from the image (step S3). The analysis image acquisition unit 21 (specifically, the analysis region selection unit 211) has an unprocessed analysis region for which the residence degree has not yet been calculated among the plurality of analysis regions selected in step S3. (Yes in step S4), one unprocessed analysis region is selected (step S5).

The analysis time selection unit 212 calculates the photographing time interval of each image from the image history updated in step S2 based on the movement model of the detection target defined in advance in the analysis region selected in step S5. Then, the analysis time selection unit 212 determines whether or not the detection target is movable on the target analysis region at the time interval, and selects an image that is determined to be movable (step S6).

The analysis image selection means 213 selects a combination of analysis images to be used for calculating the staying degree from the images of the respective history selected in step S6 (step S7).

The staying degree calculating means 23, when there is a set of unprocessed analysis images for which the staying degree is not calculated among the set of analysis images selected in step S7 (yes in step S8), the unprocessed analysis image. One set is selected (step S9).

Then, the staying degree calculating means 23 calculates the staying degree for the set of analysis images selected in Step S9 by using the classifier held by the staying classifier storage unit 22 (Step S10).

When step S10 is completed, the staying degree calculating means 23 repeats the processing after step S8. If it is determined in step S8 that there is no set of unprocessed analysis images (no in step S8), the staying degree calculating unit 23 integrates the results of calculating a plurality of staying degrees for one analysis region. The calculated numerical value is calculated (step S11). The staying degree calculating unit 23 calculates, for example, any one of the calculated average value, median value, maximum value, and minimum value of the staying degrees as an integrated numerical value.

After step S11, the analysis image acquisition unit 21 repeats the processing after step S4. If it is determined in step S4 that there is no unprocessed analysis area (no in step S4), the stay determination unit 24 performs a stay determination process using the stay degree calculated for each analysis area (step S12). ). The stay determination unit 24 performs a stay determination process so that, for example, if the stay degree calculated for each analysis region is equal to or greater than a predetermined threshold, the stay determination unit 24 determines that the stay is present.

When the analysis regions overlap on the image, the stay determination unit 24 performs the stay determination in the overlap region, for example, the average value, median value, maximum value, minimum value of the stay degree calculated for each analysis region that overlaps If any of the values is equal to or greater than a predetermined threshold, it may be determined that the object is staying.

The output unit 3 outputs the stay detection result output from the stay determination unit 24 (step S13). For example, the output unit 3 may output a stay detection result to an application, or may output it to an external module such as a storage medium.

Next, the operation in which the classifier learning unit 4 of the present embodiment learns the classifier will be described. FIG. 7 is a flowchart showing an operation example of the classifier learning unit 4 of the present embodiment.

The discriminator learning unit 4 reads positive and negative learning images stored in a storage unit (not shown) (step S21). Specifically, the classifier learning unit 4 reads a set of a plurality of images including the same detection target as a positive example indicating the staying state, and does not include the same detection target as a negative example indicating the non-staying state. Read multiple image sets.

Then, the discriminator learning unit 4 learns discriminators that identify stagnant objects from the same number of input images as the number of images included in the positive or negative example set from the positive and negative example learning images (step). S22).

As described above, in the present embodiment, the analysis time selection unit 212 selects a plurality of input images taken with a time difference suitable for stay analysis from a plurality of input images with different taken times. Moreover, the analysis image selection means 213 extracts the image which shows the same analysis area | region from the selected several input image, respectively, and produces | generates the set of the analysis image which is a set of the extracted image. Then, the staying degree calculating unit 23 and the staying determining unit 24 detect the staying object from the generated set of analysis images using an identifier that identifies the staying object from the plurality of images. At this time, the analysis time selection unit 212 determines a time difference suitable for the stay analysis based on at least one of the movement model to be detected or the size of the analysis region. Therefore, the staying object can be detected suitably.

In particular, in this embodiment, the staying degree calculating unit 23 and the staying determining unit 24 detect the staying object from the set of analysis images using the above-described discriminator. Therefore, it is possible to detect a stagnant object stably without being affected by an increase in false detection caused by a change in photographing environment represented by a change in illumination in the monitoring area, lens contamination of the monitoring camera, movement of an object, and the like.

In the present embodiment, the classifier learning unit 4 sets a plurality of image sets including the same detection target as a positive example indicating the staying state, and sets a plurality of image sets not including the same detection target as the non-staying state. As a negative example, a discriminator for identifying a staying object is learned. By using this discriminator, it is possible to suitably detect a staying object.

Next, the outline of this embodiment will be described. FIG. 8 is a block diagram illustrating an overview of a classifier learning device according to the present disclosure. The classifier learning device 90 according to the present disclosure includes a learning unit 91 (for example, the classifier learning unit 4) that learns a classifier that identifies a staying object. The learning unit 91 sets a plurality of images including the same detection target as a positive example indicating a staying state, and sets a plurality of images not including the same detection target as a negative example indicating a non-staying state. Learn classifiers to identify.

By using the discriminator generated with such a configuration, the staying object can be suitably detected.

Further, the learning unit 91 may learn a discriminator that identifies a staying object from the same number of detection target images as the number of images included in the positive or negative example set.

Further, the learning unit 91 may learn the discriminator using a set of a plurality of images including at least a part of the same background image together with the same detection target as a positive example. When comparing multiple images targeting the same analysis area, it is more likely that the same background image will be reflected in the analysis area to be compared. A staying image can be determined.

In addition, the learning unit 91 performs perturbation processing on the detection target of at least one of the images included in the positive or negative example set (for example, how the light hits the detection target, brightness, shadow) The classifier may be learned using a set of images that have been subjected to processing reflecting the influence of the above. By using such an image as a positive example or a negative example, it is possible to learn a discriminator that can maintain the accuracy of identifying a staying object even when the shooting environment changes.

FIG. 9 is a block diagram illustrating an outline of the stagnant object detection system according to the present disclosure. A staying object detection system 80 according to the present disclosure includes target image selection means 81, analysis image generation means 82, and staying object detection means 83. The target image selection unit 81 (for example, the analysis time selection unit 212) selects a plurality of detection target images that are captured with a time difference suitable for stay analysis from a plurality of detection target images that are captured at different times. . The analysis image generation unit 82 (for example, the analysis image selection unit 213) extracts images indicating the same analysis region from the plurality of detection target images selected by the target image selection unit 81, and uses the extracted sets of images. A set of analysis images is generated. The staying object detection means 83 (for example, staying degree calculation means 23, stay determination means 24) uses a discriminator for identifying staying objects from a plurality of images, and uses a set of analysis images generated by the analysis image generation means 82. Detect stagnant objects.

The target image selection unit 81 is suitable for the analysis of the stay based on at least one of the movement model of the detection target (for example, the movement model modeling the movement speed and movement direction of the detection target) or the size of the analysis region. Determine the time difference.

With such a configuration, a staying object can be detected suitably.

Further, the staying object detection means 83 is generated by using an identifier that calculates the staying degree that indicates the probability that the detection target stays as the same detection target is included in a plurality of input images. A staying object may be detected from the set of analyzed images.

Further, the analysis image generation means 82 may generate a plurality of sets of analysis images in the same analysis region. In this case, the staying object detection means 83 acquires the staying degree calculated by the classifier for each of a plurality of generated analysis images, and the average value, median value, maximum value, and minimum value of the staying degree obtained. At least one of these values may be calculated as the staying degree in the same analysis region. The staying object detection means 83 may detect a staying object based on the calculated staying degree.

Further, the staying object detection unit 83 may specify the background image portion from the detection target image and correct the staying degree of the region corresponding to the specified background image portion to be low. According to such a configuration, it is possible to improve the detection accuracy of a region where the degree of stay is high with respect to the background and is easily calculated (a region where erroneous detection is likely to occur).

Further, the target image selection unit 81 calculates a time required for the detection target to pass through the analysis region based on the movement model of the detection target, and detects the detection target image captured at an interval equal to or longer than the calculated time. You may choose.

FIG. 10 is a block diagram illustrating a hardware configuration of the computer device 200 that implements the discriminator learning device 90 or the staying object detection system 80. The computer device 200 includes a CPU 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, a storage device 204, a drive device 205, a communication interface 206, and an input / output interface 207. The discriminator learning device 90 or the staying object detection system 80 can be realized by the configuration (or part thereof) shown in FIG.

The CPU 201 executes the program 208 using the RAM 203. The program 208 may be stored in the ROM 202. The program 208 may be recorded on a recording medium 209 such as a flash memory and read by the drive device 205 or transmitted from an external device via the network 210. The communication interface 206 exchanges data with an external device via the network 210. The input / output interface 207 exchanges data with peripheral devices (such as an input device and a display device). The communication interface 206 and the input / output interface 207 can function as means for acquiring or outputting data.

Note that the classifier learning device 90 or the staying object detection system 80 may be configured by a single circuit (such as a processor) or may be configured by a combination of a plurality of circuits. The circuit here may be either dedicated or general purpose.

The present disclosure can be suitably applied to a system that detects an object such as a person or an abandoned object staying in the monitoring area.

Further, in the present disclosure, the characteristics of the stay image that is a specific detection target are learned. Therefore, in an outdoor environment that was difficult to apply with the difference-based method, the present disclosure is disclosed to detect only the target stagnant object without being affected by the increase in false detection due to illumination fluctuations, lens contamination, movement of objects, etc. Can be suitably applied.

In addition, the present disclosure does not need to generate a background image in advance as compared with the difference-based stay detection method. Therefore, it is easy to introduce the stagnant object detection system in an environment where it is difficult to acquire and generate a background image in which detection targets are always coming and going.

The present disclosure has been described above using the above-described embodiment as an exemplary example. However, the present disclosure is not limited to the above-described embodiment. That is, the present disclosure can apply various modes that can be understood by those skilled in the art within the scope of the present disclosure.

This application claims priority based on Japanese Patent Application No. 2015-037926 filed on February 27, 2015, the entire disclosure of which is incorporated herein.

DESCRIPTION OF SYMBOLS 1 Image input part 2 Stay detection part 3 Output part 4 Discriminator learning part 21 Analysis image acquisition means 22 Stay discriminator memory | storage part 23 Stay degree calculation means 24 Stay determination means 211 Analysis area selection means 212 Analysis time selection means 213 Analysis image selection means

Claims

A discriminator for identifying a staying object is a positive example indicating a staying state as a set of a plurality of images including the same detection target, and a negative example indicating a non-staying state as a set of a plurality of images not including the same detection target. A classifier learning apparatus comprising a learning unit for learning.
The classifier learning device according to claim 1, wherein the learning unit learns a classifier that identifies a staying object from the same number of detection target images as the number of images included in a positive example or a negative example.
The discriminator learning device according to claim 1, wherein the learning unit learns a discriminator using a set of a plurality of images including the same detection target and at least a part of the same background image as a positive example.
The learning unit learns a discriminator using a set of images in which at least one image is subjected to perturbation processing among images included in a set of positive examples or negative examples. The classifier learning device according to any one of the above.
A target image selection means for selecting a plurality of detection target images that have been shot with a time difference suitable for stay analysis from a plurality of detection target images that have been shot at different times;
An analysis image generating means for extracting an image showing the same analysis region from the selected plurality of detection target images, and generating a set of analysis images that is a set of extracted images;
Using a discriminator for identifying a stagnant object from a plurality of images, and a stagnant object detection means for detecting a stagnant object from a set of generated analysis images,
The target object selection unit determines a time difference suitable for the stay analysis based on at least one of a movement model of a detection target or a size of the analysis region.
The staying object detection means uses a discriminator that calculates a staying degree that represents the probability that the detection target stays as the same detection target is included in a plurality of input images. The staying object detection system according to claim 5, wherein a staying object is detected from a set of images.
The analysis image generation means generates a plurality of sets of analysis images of the same analysis region,
The staying object detection means acquires the staying degree calculated by the classifier for each of a plurality of generated analysis image sets, and at least one of the average value, median value, maximum value, and minimum value of the staying degree obtained. The staying object detection system according to claim 6, wherein the value is calculated as a staying degree of the same analysis region, and a staying object is detected based on the calculated staying degree.
The stagnant object detection unit according to claim 6 or 7, wherein the stagnant object detection unit identifies a background image portion from a detection target image and corrects the stagnant degree of a region corresponding to the specified background image portion to be low. system.
The target image selection unit calculates a time required for the detection target to pass through the analysis region based on a movement model of the detection target, and selects a detection target image captured at an interval equal to or longer than the calculated time. The stagnant object detection system according to any one of claims 5 to 8.
A discriminator for identifying a staying object is a positive example indicating a staying state as a set of a plurality of images including the same detection target, and a negative example indicating a non-staying state as a set of a plurality of images not including the same detection target. A classifier learning method characterized by learning.
Select multiple detection target images taken with a time difference suitable for stay analysis from multiple detection target images taken at different times,
Extracting images indicating the same analysis region from the selected plurality of detection target images, respectively, and generating a set of analysis images that are a set of extracted images,
Using a discriminator that identifies stagnant objects from multiple images, detect stagnant objects from a set of generated analysis images,
A method for detecting a staying object, wherein when selecting a detection target image, a time difference suitable for the stay analysis is determined based on at least one of a movement model of the detection target and a size of the analysis region.
On the computer,
A discriminator for identifying a staying object is a positive example indicating a staying state as a set of a plurality of images including the same detection target, and a negative example indicating a non-staying state as a set of a plurality of images not including the same detection target. A program recording medium storing a program for executing a learning process for learning.
On the computer,
A target image selection process for selecting a plurality of detection target images shot at a time difference suitable for stay analysis from a plurality of detection target images taken at different times;
An analysis image generation process for extracting an image showing the same analysis region from a plurality of selected detection target images, and generating a set of analysis images that is a set of extracted images, and
Using a discriminator that identifies a stagnant object from a plurality of images, a stagnant object detection process for detecting a stagnant object from a set of generated analysis images is executed,
A program recording medium recording a program for determining a time difference suitable for the stay analysis based on at least one of a movement model of a detection target and a size of the analysis region in the target image selection process.