WO2014103673A1

WO2014103673A1 - Information processing system, information processing method, and program

Info

Publication number: WO2014103673A1
Application number: PCT/JP2013/082914
Authority: WO
Inventors: 亮磨大網
Original assignee: 日本電気株式会社
Priority date: 2012-12-28
Filing date: 2013-12-09
Publication date: 2014-07-03
Also published as: JPWO2014103673A1; JP6292540B2

Abstract

[Problem] To provide an information processing system, an information processing method, and a program, whereby a correspondence relationships among images of a person taken by a plurality of image capture devices can be suitably inferred. [Solution] The system is provided with: object detection/tracking units (110) for receiving input of images captured by a plurality of image capture devices; and a correspondence relationship prediction unit (170) for determining whether a moving body appearing in video captured by a first image capture device among the plurality of image capture devices, and a moving object in a video captured by a second image capture device among the plurality of image capture devices, which object is situated in a suitable area affording better feature amount extraction than other areas, are the same moving body, doing so according to the degree of similarity of a feature amount.

Description

Information processing system, information processing method, and program

Some aspects according to the present invention relate to an information processing system, an information processing method, and a program.

In recent years, there has been considered a system for performing wide-area monitoring using images taken by a plurality of video cameras (photographing devices). For example, Patent Document 1 discloses an apparatus that can appropriately perform tracking (monitoring) of a person across cameras using connection relationship information between cameras. This apparatus obtains the correspondence between persons according to the similarity of the person feature amount between a point appearing in the camera field of view (In point) and a point disappearing from the camera field of view (Out point).

JP 2008-219570 A

However, in the method of extracting a person feature amount at a point that appears in the camera field of view or a point that disappears from the camera field of view as in the method described in Patent Document 1, a suitable feature amount cannot be extracted. May not be suitably obtained. For example, when the illumination condition of a point that appears in the camera field of view or a point that disappears from the camera field of view is backlit, various characteristics such as color cannot be suitably extracted, and the correspondence may be erroneously evaluated. .

Some aspects of the present invention have been made in view of the above-described problems, and provide an information processing system, an information processing method, and a program capable of suitably estimating a correspondence relationship between persons related to a plurality of photographing apparatuses. This is one of the purposes.

An information processing system according to one aspect of the present invention includes an input unit that receives input of images captured by a plurality of imaging devices, and a moving body that is displayed on an image captured by a first imaging device of the plurality of imaging devices. And whether the moving body in the appropriate area where the feature amount extraction is superior to the other area in the video imaged by the second imaging apparatus among the plurality of imaging apparatuses is the same moving body And determining means for determining whether or not according to the similarity of the feature amount.

One information processing method according to the present invention includes a step of receiving input of images shot by a plurality of shooting devices, and a moving body reflected in a video shot by a first shooting device of the plurality of shooting devices. Depending on the degree of similarity, it is determined whether or not the moving body in the predetermined appropriate area in the video imaged by the second imaging apparatus among the plurality of imaging apparatuses is the same moving body. The information processing system performs the step of determining.

One program according to the present invention includes: a process of receiving input of images shot by a plurality of shooting devices; a moving body reflected in a video shot by a first shooting device of the plurality of shooting devices; It is determined according to the degree of similarity whether or not the moving body in the predetermined appropriate area in the video imaged by the second imaging apparatus among the plurality of imaging apparatuses is the same moving body. Causes the computer to execute the process.

An information processing system according to one aspect of the present invention includes an input unit that receives input of images captured by a plurality of imaging devices, and a first image that is displayed on an image captured by a first imaging device of the plurality of imaging devices. Whether or not the moving body is similar to the second moving body shown in the video photographed by the second photographing device among the plurality of photographing devices. And a discriminating means for discriminating when the second moving body enters an appropriate area that can be discriminated more appropriately than other areas.

The first information processing method according to the present invention includes a step of receiving input of images shot by a plurality of shooting devices, and a first image reflected on a video shot by a first shooting device of the plurality of shooting devices. Whether or not the moving body is similar to the second moving body shown in the video shot by the second shooting device among the plurality of shooting devices, in the video shot by the second shooting device. The information processing system performs a step of discriminating when the second moving body enters an appropriate region that can be discriminated more appropriately than other regions.

One program according to the present invention includes a process of receiving input of images shot by a plurality of shooting devices, and a first moving body that is displayed in a video shot by a first shooting device of the plurality of shooting devices. Whether or not the second moving body reflected in the video shot by the second shooting device among the plurality of shooting devices is similar in the video shot by the second shooting device. And a process for determining when the second moving body enters an appropriate area that can be determined more appropriately than the area.

In the present invention, “part”, “means”, “apparatus”, and “system” do not simply mean physical means, but “part”, “means”, “apparatus”, “system”. This includes the case where the functions possessed by "are realized by software. Further, even if the functions of one “unit”, “means”, “apparatus”, and “system” are realized by two or more physical means or devices, two or more “parts” or “means”, The functions of “device” and “system” may be realized by a single physical means or device.

According to the present invention, it is possible to provide an information processing system, an information processing method, and a program capable of suitably estimating the correspondence between persons related to a plurality of photographing devices.

It is a figure which shows schematic structure of the monitoring system which concerns on 1st Embodiment. It is a figure which shows the specific example of a picked-up image. It is a functional block diagram which shows the function structure of the monitoring system shown in FIG. It is a flowchart which shows the flow of a process of the information processing server shown in FIG. It is a block diagram which shows the structure of the hardware which can mount the information processing server shown in FIG. It is a functional block diagram which shows the function structure of the monitoring system which concerns on 2nd Embodiment. It is a flowchart which shows the flow of a process of the information processing server shown in FIG. It is a functional block diagram which shows schematic structure of the monitoring apparatus which concerns on 3rd Embodiment.

Embodiments of the present invention will be described below. In the following description and the description of the drawings to be referred to, the same or similar components are denoted by the same or similar reference numerals.

(1 First Embodiment)
1 to 5 are diagrams for explaining the first embodiment. Hereinafter, the present embodiment will be described along the following flow with reference to these drawings. First, an outline of the system configuration is shown by “1.1”, and an outline of the entire first embodiment is shown. Then, “1.2” will explain the functional configuration of the system, and “1.3” will explain the flow of processing. “1.4” shows a specific example of a hardware configuration capable of realizing this system. Finally, the effects and the like according to the present embodiment will be described after “1.5”.

(1.1 System configuration and overview)
A system configuration of a monitoring system 1 that is an information processing system according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing a system configuration of the monitoring system 1.

The monitoring system 1 is broadly divided into an information processing server 100 and a plurality of video cameras 200 (video cameras 200A to 200N are collectively referred to as video cameras 200) that capture (capture) video (moving images). Composed.

Hereinafter, although the monitoring system 1 is described as a system for monitoring a person photographed by the video camera 200, the monitoring target is not limited to this. For example, it may be a moving object (object / moving body) such as a car or a motorcycle.

The video camera 200 shoots a video and determines whether or not there is a person in the shot video, and then transmits information such as a position and a feature amount related to the person to the information processing server 100 together with the shot video. Send. The video camera 200 can also track a person in the captured video.

It should be noted that processes such as person detection, feature extraction, and person tracking in the camera may be performed on the information processing server 100 or other information processing apparatus (not shown). In the following description, it is assumed that the video camera 200 performs these processes.

The information processing server 100 performs various processes such as detection of a person, registration of a person to be tracked, and tracking of a registered person by analyzing video captured by the video camera 200.

In the following description, the case where person monitoring is performed based on real-time video captured by the video camera 200 will be mainly described. However, the present invention is not limited to this. For example, after the video camera 200 captures the image, the storage device (For example, it is possible to monitor (analyze) video stored in an HDD (Hard Disk Drive), a VCR (Video Cassette Recorder), etc. Further, the video stored in the storage device is played back in reverse order ( It is also possible to monitor the reversely-played video after the reverse playback.In general, if a person takes a suspicious action, what action the person takes before the action is taken. It is extremely effective to have such monitoring means by reverse regeneration.

In the person monitoring by the information processing server 100, the information processing server 100 outputs, for example, a monitoring screen to a display device (not shown), and information such as whether or not a person registered as a tracking target person appears on the video. Can be output to the monitoring screen. For this reason, the information processing server 100 determines whether a person photographed by a certain video camera 200 (for example, a person registered as a tracking target) is the same as a person photographed by another video camera 200. It has a function (function to determine the correspondence between persons).
It should be noted that the information processing server 100 may output whether or not a person registered as a person to be tracked appears on the video by a sound output means (not shown). It is not limited.

There are a plurality of methods for determining whether or not the person on the video is the same person as the person photographed by the other video camera 200. One of the methods is, for example, a person image associated with each person. A method is conceivable in which a feature amount is extracted, and when the feature amount similarity exceeds a threshold, it is determined that they are the same person.

At this time, as feature amounts extracted from the human image, for example, feature amounts related to color information, posture, height, and the like are conceivable. However, it is conceivable that a suitable feature amount cannot be extracted depending on the position of the person in the image. For example, when the lighting is backlit, or when it is difficult to recognize a person image because it is dim, or when there is an illumination of a specific color such as orange, it is easy to enter the shadow of some object (the whole person is difficult to see) If so, etc. Therefore, the information processing server 100 according to the present embodiment extracts a feature amount when the area suitability in an image captured by the video camera 200 is in a region (also referred to as an appropriate region) higher than other regions. Based on the feature amount, it is determined whether or not they are the same person.

This point will be described with reference to FIG. FIG. 2 is a diagram showing a specific example of the video image 20 taken by the video camera 200. It is assumed that the photographed video 20 in FIG. 2 shows a person P and the person P is moving in the traveling direction a. Here, the peripheral region 22 of the video 20 is likely to cause variations in the feature amount of the person P due to, for example, the position where the person P can easily change the traveling direction or the lighting is dim. It is assumed that the area is not suitable. In such a situation, the information processing server 100 according to the present embodiment recognizes the area 21 as an appropriate area suitable for feature amount extraction, and the association process is performed while the person P is in the peripheral area 22. Hold on. After that, when the person P enters the appropriate region 21, the feature amount is extracted from the person image related to the person P, and the degree of similarity with the moving body reflected in the video captured in the past is determined, and then the person association is performed. I do. Alternatively, while the person P is in the peripheral area 22, the determination result is obtained by performing similarity determination using the feature amount (feature amount considered to be low in accuracy) of the person image related to the person P. It is also conceivable that the person is associated according to, and then the person is associated again when the person P moves to the area 21. In the following description, it is assumed that the person is associated again after moving to the area 21 after performing the temporary association in the area 22.

In order to discriminate such an appropriate area 21, the information processing server 100 divides the video captured by each video camera 200 into a plurality of areas, and each of these areas is an area suitable for feature amount extraction. It has a function to evaluate whether or not. There may be a plurality of methods for determining the appropriate area 21, but for example, after associating persons who can be reliably associated with each other between the video cameras 200, changes in the feature amount extracted from the person image related to the person By learning the above, it is conceivable that a region where a feature amount having a high degree of similarity with the feature amount of a person image related to another video camera 200 can be extracted as the appropriate region 21 having a high area appropriateness level. For example, if there is only one person in the monitored area that is moving, that person can be reliably associated with the same person. It can be determined whether or not the region 21 is present. At this time, whether or not the region is the appropriate region 21 is determined by comparing the feature amounts acquired in each region, and whether or not the similarity between the feature amounts is sufficient for determining the same person. It can be determined by examining. Specifically, if the degree of similarity of the feature values extracted in each region is equal to or greater than a certain threshold value, it can be determined as the appropriate region 21. Alternatively, a reference feature amount (for example, a reference color in the case of a color feature) is compared with a feature amount acquired in a certain region, and this similarity is sufficiently high (for example, a certain threshold or more) In this case, the area can be determined as the appropriate area 21.

The learning for identifying the appropriate area 21 may be performed by walking a person having various feature amounts when installing the system, or the person is surely in operation after the system is installed. You may be made to learn in the situation where this matching is possible. Whether or not the association can be surely performed may be automatically determined (for example, the number of moving persons is counted and it is determined that the association is possible in the case of one person), or the operator may specify manually. Good.

In the following, the case where the appropriateness for each area has two levels (whether or not it is the appropriate area 21) will be described, but the appropriateness may be set in a plurality of stages. In this case, the determination may be made again when the object moves to a region with a higher degree of appropriateness, and the other operations are basically the same as in the case of two stages.

In addition, the determination of the appropriateness for each area may be switched according to time or the like. For example, when the lighting conditions change between day and night, the appropriateness may be obtained for each lighting condition, and the appropriateness may be switched when the lighting conditions are switched. This may be switched automatically according to time, or may be switched automatically upon detection of a change in illumination conditions. This can be determined by detecting whether the brightness or color value of a specific area has changed. At this time, if it is determined that the illumination condition is other than the illumination condition for which the appropriateness has already been obtained, the appropriateness for the illumination condition may be learned and registered on the spot. This appropriateness can be used when the same illumination condition is obtained thereafter.

Alternatively, the appropriateness may be switched according to changes in conditions such as white balance of the video camera. This switching is the same as in the case of the change in the illumination conditions described above.

(1.2 System functional configuration)
Hereinafter, the functional configuration of the monitoring system 1 will be described with reference to FIG.

As shown in FIG. 6, the monitoring system 1 includes an image acquisition unit 101 (the image acquisition units 101A to 101N are collectively referred to as an image acquisition unit 101), an object detection / tracking unit 110 (an object detection / tracking unit 110A to 110A). 110N is collectively referred to as object detection / tracking unit 110.), object tracking information DB (database) 120, next camera prediction unit 130, camera arrangement information 140, area appropriateness calculation unit 150, area appropriateness information 160, A relationship prediction unit 170 is included.

The image acquisition unit 101 acquires a captured video when the video camera 200 captures an actual scene. Alternatively, after an image captured by the video camera 200 is recorded (recorded) in a storage device such as an HDD, an image is acquired by reproducing the image (in the case of a VCR, capturing the reproduced analog signal).

Here, reproduction means decoding the encoded moving image data (video data) to generate the original picture (frame) data, and displaying the generated result on the display screen is the reproduction. Shall not be included. Also, the playback speed need not be an actual speed (recorded actual speed), and if possible, playback (decoding) may be performed faster than the real time. Furthermore, it is conceivable that all video frames are not decoded and reproduced while skipping frames. For example, when encoded by an encoding method such as MPEG-2, there are I, P, and B pictures in the video data. Of these, only I pictures or I pictures and P pictures are included. Only may be decrypted.

It should be noted that when the video recorded in the storage device is played back, there are a case where the video is acquired by forward playback and a case where the video is acquired by reverse playback. In the following, an example in which a moving image shot by the video camera 200 is processed in real time in the forward direction will be mainly described.

The object detection / tracking unit 110 includes an object detection unit 111 (the object detection units 111A to 111N are collectively referred to as an object detection unit 111), and an object tracking unit 113A (the object tracking units 113A to 113N are collectively referred to as an object tracking unit). And an object feature amount extraction unit 115 (the object feature amount extraction units 115A to 115N are collectively referred to as an object feature amount extraction unit 115). The object detection / tracking unit 110 detects a person as an object from the video (moving image) acquired by each of the image acquisition units 101 in the object detection unit 111, and also detects an object detection unit in the object feature amount extraction unit 115. The feature amount related to the person is calculated from the person region (person image) detected by the person 111. More specifically, for example, after extracting a person by a background difference method that takes a difference between a previously generated background image and a frame image, a detector that has learned features such as the person and the shape of a part thereof is extracted. It is possible to extract a person by applying to the person area. As the feature amount of the person, for example, the color of the clothes or the feature of the pattern worn by the person can be extracted in the form of a color histogram or an edge histogram.

Further, the object tracking unit 113 tracks each person extracted as an object within the same angle of view (within the same video taken by one video camera 200) by comparing time-series images (frames). Then, object tracking information (time series data of the position of the person as the object and the feature amount information) is generated for each detected / tracked person. For tracking a person between frames, for example, tracking using a mean shift method or tracking using a particle filter may be used. The object tracking unit 115 stores the generated object tracking information in the object tracking information DB 120 and outputs it to the next camera prediction unit 130.

The next camera prediction unit 130 obtains the next image from the object tracking information generated by the object tracking unit 113 and the camera arrangement information 140 when the person goes out of the angle of view of the video (out of frame). In addition to predicting whether there is a high possibility of appearing in the video acquired by the unit 101, next camera prediction information indicating the result is generated. Here, the camera arrangement information 140 is information describing a spatial positional relationship between a plurality of arranged video cameras 200. Specifically, for example, the adjacency relation between the video cameras 200, the inter-video camera 200, or the like. Information (or the average time required for movement between the video cameras 200). The adjacency information is described in association with the angle of view of the video camera 200. As a result, the next camera prediction unit 130 can select the adjacent video camera 200 (that is, the video camera 200 in which the person may appear) according to the direction in which the person frames out.

The next camera prediction information generated by the next camera prediction unit 130 includes the result of calculating the appearance probability of the person, the predicted appearance position within the angle of view, and the predicted appearance time for each image acquisition unit 101 (for each video camera 200), Are generated for each person to be tracked. For example, when the person A is reflected in the camera 01 and is out of the frame in the direction of the camera 02, when the prediction is performed using the average movement time between the cameras, the time obtained by adding the average movement time to the frame out time is The appearance probability can be calculated using the largest probability distribution. At this time, instead of using the average moving time, the time to reach the camera 02 is predicted by calculating the moving speed before the frame out from the tracking result of the camera 01, and the probability distribution is calculated based on the time. You may do it. Here, various shapes such as a Gaussian distribution can be used as the probability distribution. However, when determining the parameters of the probability distribution, information related to arrival time variation from the camera 01 to the camera 02 is important. is there. The information related to the variation can be obtained as a data by measuring in advance, or can be obtained by a method of newly learning and generating from information on correspondence between persons by the user. If there is a video camera 200 adjacent to the camera 01 other than the camera 02, the possibility of a person moving in the direction of each adjacent camera is estimated, and this value is multiplied by the appearance probability. The probability may be calculated. For this estimation, a result measured in advance can be used.

For each person (object / moving body), the correspondence relationship prediction unit 170 includes the feature amount included in the next camera prediction information, and the feature amount of the person detected in the video of the video camera 200 that may appear next. When the distance between the feature quantities is smaller than the threshold (or the similarity between the feature quantities is higher than the threshold), the persons are associated with each other as being the same person, and the association information is Output. Here, as described above, the correspondence relationship prediction unit 170 refers to the area appropriateness information DB 160 in the video of the video camera 200, and the person is positioned on the appropriate region 21 whose area appropriateness is higher than other regions. The person is associated using the feature amount in the case of The correspondence information created by the correspondence relationship prediction unit 170 can be processed as necessary and displayed as a person tracking information on a display device (not shown) to the user.

The area appropriateness calculation unit 150 divides each video acquired by each image acquisition unit 101 into a plurality of regions, and determines whether or not each region is a region suitable for extracting a feature amount of a person. The area suitability, which is a measure to be shown, is calculated. As a specific example of this calculation method, for example, as described above, after associating persons who can be reliably associated with each other between the video cameras 200 (for example, when there is only one person who may appear) Or when the supervisor inputs the correspondence relationship of the person manually), by learning the change in the feature value extracted from the person image related to the person, the feature value of the person image related to the other video camera 200 It is conceivable to set / calculate the area appropriateness so that the value of the region where the feature amount having a high similarity can be extracted becomes high. The area appropriateness calculated by the area appropriateness calculating unit 150 is stored in the area appropriateness information DB 160 and then referred to by the correspondence prediction unit 170.

(1.3 Process flow)
Next, the process flow of the monitoring system 1 will be described with reference to FIG. FIG. 4 is a flowchart showing a processing flow of the information processing server 100 according to the present embodiment.

Each processing step to be described later can be executed in any order or in parallel as long as there is no contradiction in processing contents, and other steps can be added between the processing steps. good. Further, a step described as a single step for convenience can be executed by being divided into a plurality of steps, and a step described as being divided into a plurality of steps for convenience can be executed as one step.

First, the object detection unit 111 detects whether or not a person as a detection target object is reflected in the image acquired by the image acquisition unit 101 (S401). As a result, when a person is detected (Yes in S401), the object feature amount extraction unit 115 calculates the feature amount of the person, and the feature amount is included in the object tracking information DB 120 along with the person tracking result by the object tracking unit 113. (S403). At this time, the feature amount registered in the object tracking information DB 120 is preferably a feature amount extracted when a person is present in the appropriate area 21 related to the image acquisition unit 101.

Thereafter, if the object tracking unit 113 detects a frame-out from the video of the person (S405), the next camera prediction unit 130, based on the object tracking information received from the object tracking unit 113 and the camera arrangement information 140, It is predicted which image acquisition unit 101 is likely to appear next in the video of the tracking target person who is out of frame from the video acquired by the image acquisition unit 101 (S407).

Thereafter, when the object detection unit 111 detects a new person in any video of the image acquisition unit 101 predicted as the next camera (Yes in S409), the correspondence prediction unit 170 determines that the position of the person is Then, it is determined whether or not it is within the appropriate area 21 related to the image acquisition unit 101 (S411). If the detected position of the person is within the appropriate area 21 (Yes in S411), the correspondence prediction unit 170 extracts the feature amount extracted within the appropriate area 21 and the characteristics of the person photographed by the camera A. It is determined whether or not the two persons are the same person (whether or not the persons correspond) by comparing the quantities and calculating the degree of similarity (S413).

If the position of the detected person is not in the appropriate area 21 in S411 (No in S411), the feature amount detected in the area is compared with the feature quantity of the person photographed by the camera A. Temporary association is determined (S415). Thereafter, when the person moves to the appropriate area 21 having a high area appropriateness (Yes in S417, No in S419, Yes in S411), the determination of the association is performed using the movement amount extracted at the position of the appropriate area 21. (S413).

(1.4 Specific examples of hardware configuration)
Hereinafter, an example of a hardware configuration when the information processing server 100 described above is realized by a computer will be described with reference to FIG. Note that the function of the information processing server 100 can be realized by a plurality of information processing apparatuses (for example, a server and a client).

As illustrated in FIG. 5, the information processing server 100 includes a processor 501, a memory 503, a storage device 505, an input interface (I / F) 507, a data I / F 509, a communication I / F 511, and a display device 513.

The processor 501 controls various processes in the information processing server 100 by executing a program stored in the memory 503. For example, the processes related to the next camera prediction unit 130, the correspondence relationship prediction unit 170, and the area appropriateness calculation unit 150 described in FIG. 3 are temporarily stored in the memory 503 and then run mainly on the processor 501. It is feasible as

The memory 503 is a storage medium such as a RAM (Random Access Memory). The memory 503 temporarily stores a program code of a program executed by the processor 501 and data necessary for executing the program. For example, a stack area necessary for program execution is secured in the storage area of the memory 503.

The storage device 505 is a nonvolatile storage medium such as an HDD, a flash memory, or a VCR. The storage device 505 includes various programs for realizing the operating system, the next camera prediction unit 130, the correspondence relationship prediction unit 170, and the area appropriateness calculation unit 150, the object tracking information DB 120, the camera arrangement information 140, and the area appropriateness. Various data including the information DB 160 is stored. Programs and data stored in the storage device 505 are referred to by the processor 501 by being loaded into the memory 103 as necessary.

The input I / F 507 is a device for receiving input from the user. Specific examples of the input I / F 107 include a keyboard, a mouse, a touch panel, and various sensors. The input I / F 107 may be connected to the information processing server 100 via an interface such as a USB (Universal Serial Bus), for example.

The data I / F 509 is a device for inputting data from outside the information processing server 100. Specific examples of the data I / F 509 include a drive device for reading data stored in various storage media. In this case, the data I / F 509 is connected to the information processing server 100 via an interface such as a USB.

The communication I / F 511 is a device for performing data communication with a device external to the information processing server 100, such as a video camera 200, by wire or wireless. It is conceivable that the communication I / F 511 is provided outside the information processing server 100. In this case, the communication I / F 511 is connected to the information processing server 100 via an interface such as a USB.

The display device 513 is a device for displaying various information such as a monitoring screen. For example, the display device 513 may display the monitoring video illustrated in FIG. Specific examples of the display device 513 include a liquid crystal display and an organic EL (Electro-Luminescence) display. The display device 513 may be provided outside the information processing server 100. In that case, the display device 513 is connected to the information processing server 100 via, for example, a display cable.

(1.5 Effects according to this embodiment)
As described above, in the monitoring system 1 according to the present embodiment, when tracking a person (object / moving body) to be tracked (monitored object), the correspondence of the person using the feature amount of the person to be tracked is used. Seeking a relationship. At this time, since a suitable feature amount may not be extracted depending on the position in the video due to the influence of lighting or the like, the monitoring system 1 according to this embodiment defines an appropriate region 21 suitable for feature amount extraction. Thus, the correspondence between the persons is estimated by using the feature amount when there is a person in the appropriate area 21. Thereby, it is possible to estimate a suitable correspondence between persons.

Furthermore, in the present embodiment, even if a person does not exist in the appropriate area 21, the provisional correspondence is estimated using the feature amount outside the appropriate area 21. Thereby, even if a person does not enter the appropriate area 21, the correspondence can be estimated.

(2 Second Embodiment)
Hereinafter, the second embodiment will be described with reference to FIGS. 6 and 7. 6 and 7 are diagrams for explaining the second embodiment. Hereinafter, the second embodiment will be described focusing on differences from the first embodiment. In the following description, the same components as those in the first embodiment are denoted by the same reference numerals as those in the first embodiment and description thereof is omitted. In addition, description of the same function and effect as in the first embodiment is also omitted.

The outline of the system configuration is the same as that of the first embodiment shown in FIG. Further, a specific example of a hardware configuration capable of mounting the information processing server 100 according to the present embodiment is the same as that of the first embodiment. Therefore, the description about these is abbreviate | omitted.

In addition to the functions of the information processing server 100 according to the first embodiment, the information processing server 100 according to the second embodiment calculates correction information for extracting feature amounts for each area, and uses the correction information. Thus, it has a function of correcting the feature amount extracted from the person image.

(2.1 System functional configuration)
Hereinafter, the functional configuration of the monitoring system 1 according to the present embodiment will be described with reference to FIG. The monitoring system 1 according to the second embodiment includes a correction information generation unit 180 and a correction information DB 190 in addition to the functions of the monitoring system 1 according to the first embodiment. Since the operation of the function of the monitoring system 1 according to the first embodiment is the same as that of the first embodiment in the second embodiment, the description thereof is omitted here.

The correction information generation unit 180 is used when the object feature amount extraction unit 115 extracts a feature amount in each region in the video acquired by the image acquisition unit 101 according to the area appropriateness calculated by the area appropriateness calculation unit 150. Generate correction information. More specifically, if the brightness information is corrected, the brightness correction value, the white balance is corrected, the gain value of each RGB channel, or the entire color tone is corrected. The correction information generation unit 180 generates a correction conversion formula (for example, RGB affine conversion parameters). These correction information are calculated in association with coordinates for each camera.

There are a plurality of methods for generating this correction information. For example, the feature amount extracted by the appropriate area 21 having the highest area appropriateness calculated by the area appropriateness calculating unit 150 and the feature amount extracted by each other area. Based on the difference between the correction amount and the correction information, a correction amount that can statistically reduce the difference can be considered. Such correction information can be calculated for each region in each video acquired by each image acquisition unit 101 (video camera 200).
The correction information DB 190 is a database for storing the correction information generated by the correction information generation unit 180 for each area of each video.

Further, the object feature quantity extraction unit 115 according to the present embodiment can correct the raw feature quantity extracted for the detected person using the correction information by referring to the correction information DB 190. . Thereby, even if the person P is not in the appropriate region 21, a suitable feature amount can be calculated, so that it is possible to improve the accuracy of the provisional correspondence estimation described in the first embodiment. It becomes.

(2.2 Process flow)
Hereinafter, the flow of processing when the monitoring system 1 corrects the feature amount will be described with reference to FIG. Note that the flow of processing related to the estimation of the correspondence between persons described with reference to FIG. 4 in the first embodiment is the same as that in the second embodiment, and thus the description thereof is omitted. However, the second embodiment is different in that the feature amount is corrected with reference to the correction information DB 190 when extracting the feature amount.

The area appropriateness calculation unit 150 calculates the area appropriateness for each region in the video related to each video camera 200 as necessary (S701). There are a plurality of timings for calculating the appropriateness of the area. For example, when a person with a clear relationship between persons is detected, it is possible to calculate the appropriateness of the area using the change in the feature amount of the person as learning data. .

The correction information generation unit 180 generates correction information according to the area appropriateness in each area calculated by the area appropriateness calculating unit 150 (S703). As this method, for example, as described above, a parameter (correction amount) that can statistically fill in the difference between the most suitably extracted feature amount and the feature amount actually extracted in each region is used as the correction information. It is possible to do.

The information processing server 100 requests the object detection / tracking unit 110 to correct the feature amount using the correction information. More specifically, the object feature amount extraction unit 115 corrects the feature amount extracted from the video after that according to the correction information acquired from the correction information DB 190, and uses the corrected feature amount as the object tracking unit 113. Output to. Thereby, even if it is not the appropriate area | region 21, a suitable feature-value can be calculated, As a result, the estimation of the correspondence in the correspondence prediction part 170 can also be performed suitably.

(2.3 Effects of this embodiment)
As described above, in the monitoring system 1 according to the present embodiment, in addition to the functions of the monitoring system 1 according to the first embodiment, correction information related to feature amount extraction for each region is calculated, and the correction information is calculated. And has a function of correcting the feature amount. Thereby, even if the person P is not in the appropriate region 21, a suitable feature amount can be calculated, so that it is possible to improve the accuracy of the provisional correspondence estimation described in the first embodiment. It becomes.

(3 Third Embodiment)
The third embodiment will be described below with reference to FIG. FIG. 8 is a block diagram illustrating a functional configuration of the monitoring apparatus 800 that is an information processing system. As illustrated in FIG. 8, the monitoring device 800 includes an input unit 810 and a determination unit 820.
The input unit 810 can receive an image captured by a video camera (imaging device) (not shown).

The discriminating unit 820 includes a moving body reflected in an image captured by a certain video camera (first imaging device) and an image captured by another video camera (second imaging device) including the video camera. It is determined according to the similarity of the feature quantity whether or not the moving object in the appropriate area that is superior in feature quantity extraction than the other areas is the same moving object. Here, specific examples of the moving body include a human, a car, a bicycle, a motorcycle, and the like.

By implementing in this way, according to the monitoring device 800 according to the present embodiment, it is possible to suitably estimate the correspondence between persons related to a plurality of imaging devices.

(4 Appendix)
Note that the configurations of the above-described embodiments may be combined or some of the components may be replaced. The configuration of the present invention is not limited to the above-described embodiment, and various modifications may be made without departing from the scope of the present invention.

Note that part or all of the above-described embodiments can be described as in the following supplementary notes, but is not limited to the following. Moreover, the program of this invention should just be a program which makes a computer perform each operation | movement demonstrated in said each embodiment.

(Appendix 1)
Input means for receiving input of images captured by a plurality of imaging devices, a moving body reflected in an image captured by a first imaging device of the plurality of imaging devices, and a first of the plurality of imaging devices In accordance with the similarity of feature quantities, whether or not a moving body in an appropriate area that is superior to other areas in the video captured by the image capturing apparatus 2 is the same moving body. An information processing system provided with a discriminating means for discriminating.

(Appendix 2)
Note that each area in the video further includes a calculation unit that calculates an area appropriateness for extracting the feature amount of the moving object, and the appropriate area is an area having a higher area appropriateness than other areas in the video. 1. An information processing system according to 1.

(Appendix 3)
The information processing system according to appendix 2, wherein the calculating means statistically calculates the area appropriateness in each region in the video according to a change in the feature amount according to the movement of the moving body in the video.

(Appendix 4)
The information processing system according to any one of supplementary notes 1 to 3, further comprising means for generating correction information for correcting a feature amount for calculating the similarity of the moving object for each region.

(Appendix 5)
The determining means determines whether or not the moving body in the region other than the appropriate region in the video imaged by the second imaging device is the same as the moving body imaged by the first imaging device. When the moving body imaged by the second imaging device moves to the appropriate area, the moving body is again the same as the moving object imaged by the first imaging device. The information processing system according to any one of appendix 1 to appendix 4, which determines whether or not.

(Appendix 6)
The calculation means calculates an area appropriateness in each region in the video under a plurality of different conditions, and the determination means includes an area corresponding to the condition among the plurality of area appropriateness calculated by the calculation means. The information processing system according to any one of appendix 2 to appendix 5, wherein it is determined whether or not they are the same mobile body using the appropriate area determined by the appropriateness.

(Appendix 7)
The information processing system according to any one of appendices 1 to 6, wherein the discrimination result by the discriminating means is notified.

(Appendix 8)
A step of receiving input of images taken by a plurality of photographing devices, a moving body reflected in a video photographed by a first photographing device of the plurality of photographing devices, and a second of the plurality of photographing devices. The information processing system performs a step of determining, based on similarity, whether or not a moving body in a predetermined appropriate area in a video imaged by the imaging device is the same moving body Processing method.

(Appendix 9)
The method further includes a step of calculating an area appropriateness for extracting the feature amount of the moving object in each area in the video, wherein the appropriate area is an area having a higher area appropriateness than other areas in the video. The information processing method described.

(Appendix 10)
The information processing method according to appendix 9, wherein the area appropriateness in each region in the video is statistically calculated according to a change in the feature amount according to the movement of the moving body in the video.

(Appendix 11)
The information processing method according to any one of appendices 8 to 10, further comprising a step of generating correction information for correcting the feature amount for calculating the similarity of the moving object for each region.

(Appendix 12)
After determining whether or not the moving body in the area other than the appropriate area in the video imaged by the second imaging device is the same as the moving object imaged by the first imaging device When the moving object photographed by the second photographing device moves to the appropriate area, it is determined again whether or not the moving object is the same as the moving object photographed by the first photographing device. The information processing method according to any one of appendix 8 to appendix 11.

(Appendix 13)
In each area in the video, the area appropriateness is calculated under a plurality of different conditions, and the same appropriate area determined by the area appropriateness corresponding to the condition among the calculated plurality of area appropriatenesses is used. 13. The information processing method according to any one of appendix 9 to appendix 12, wherein it is determined whether the object is a moving body.

(Appendix 14)
14. The information processing method according to any one of appendix 8 to appendix 13, wherein a result of determination as to whether or not the mobile unit is the same is notified.

(Appendix 15)
A process of receiving input of videos shot by a plurality of shooting devices, a moving body reflected in a video shot by a first shooting device of the plurality of shooting devices, and a second of the plurality of shooting devices. The program which makes a computer perform the process which discriminate | determines according to a similarity whether the mobile body in the predetermined | prescribed appropriate area | region in the image | video image | photographed with this imaging device is the same mobile body.

(Appendix 16)
In each area in the video, the calculation process for calculating the area appropriateness of the feature amount extraction of the moving object is further executed, and the appropriate area is an area having a higher area appropriateness than other areas in the video. The program according to appendix 15.

(Appendix 17)
The program according to appendix 16, wherein, in the calculation process, the area appropriateness in each area in the video is statistically calculated according to a change in a feature amount according to movement of a moving object in the video.

(Appendix 18)
The program according to any one of appendix 15 to appendix 17, further causing a process of generating correction information for correcting a feature amount for calculating a similarity of a moving object for each region.

(Appendix 19)
In the determination process, is the moving body in the area other than the appropriate area in the image captured by the second imaging device identical to the moving body captured by the first imaging device? After determining whether or not the moving body imaged by the second imaging device has moved to the appropriate area, the moving body is again the same as the moving body imaged by the first imaging device. 19. The program according to any one of appendix 15 to appendix 18, which determines whether or not there is.

(Appendix 20)
In each area in the video, the area appropriateness is calculated under a plurality of different conditions, and the same appropriate area determined by the area appropriateness corresponding to the condition among the calculated plurality of area appropriatenesses is used. The program according to any one of appendix 16 to appendix 19, which determines whether or not the object is a moving object.

(Appendix 21)
21. The program according to any one of appendix 15 to appendix 20, which notifies a determination result of whether or not they are the same mobile object.

(Appendix 22)
An input unit that receives input of videos shot by a plurality of shooting devices, a first moving body that is reflected in a video shot by a first shooting device of the plurality of shooting devices, and a plurality of shooting devices. Whether or not the second moving body reflected in the video imaged by the second imaging device is similar can be determined more appropriately than the other areas in the video imaged by the second imaging device. An information processing system comprising: discrimination means for discriminating when the second moving body enters a proper appropriate area.

(Appendix 23)
The information processing system according to appendix 22, further comprising display means for displaying the appropriate area so as to be visible.

(Appendix 24)
A step of receiving input of images taken by a plurality of imaging devices; a first moving body reflected in an image taken by a first imaging device of the plurality of imaging devices; and among the plurality of imaging devices Whether or not the second moving body reflected in the video shot by the second imaging device is similar can be determined more appropriately than in other areas in the video shot by the second imaging device. An information processing method in which the information processing system performs a step of determining when the second moving body enters the appropriate area.

(Appendix 25)
25. The information processing method according to appendix 24, wherein the appropriate area is displayed so as to be visible.

(Appendix 26)
A process of receiving input of images captured by a plurality of imaging devices; a first moving body reflected in an image captured by a first imaging device of the plurality of imaging devices; and the plurality of imaging devices Whether or not the second moving body reflected in the video shot by the second imaging device is similar can be determined more appropriately than in other areas in the video shot by the second imaging device. A program for causing a computer to execute a process of determining when the second moving body enters an appropriate area.

(Appendix 27)
27. The program according to appendix 26, wherein the appropriate area is displayed so as to be visible.

This application claims priority based on Japanese Patent Application No. 2012-287759 filed on December 28, 2012, the entire disclosure of which is incorporated herein.

DESCRIPTION OF SYMBOLS 1 ... Surveillance system, 20 ... Shooting image | video, 21 ... Proper area | region, 22 ... Peripheral area | region, 101 ... Image acquisition part, 110 ... Object detection and tracking part, 111 ... Object detection unit 113 ... Object tracking unit 115 ... Object feature amount extraction unit 120 ... Object tracking information database 130 ... Next camera prediction unit 140 ... Camera arrangement information 150 ..Area appropriateness calculation unit, 160... Area appropriateness information database, 170... Corresponding relationship prediction unit, 180... Correction information generation unit, 190. 810: Input unit 820: Discrimination unit

Claims

Input means for receiving input of images shot by a plurality of shooting devices;
More than the other areas in the moving object reflected in the image captured by the first imaging device of the plurality of imaging devices and the image captured by the second imaging device of the plurality of imaging devices An information processing system comprising: discrimination means for discriminating whether or not a mobile body in an appropriate region excellent in feature quantity extraction is the same mobile body according to the similarity of feature quantities.
In each region in the video, further comprises a calculation means for calculating the area appropriateness of the feature extraction of the moving body,
The appropriate area is an area having a higher area appropriateness than other areas in the video.
The information processing system according to claim 1.
The calculation means statistically calculates the area appropriateness in each area in the video according to the change of the feature amount according to the movement of the moving body in the video.
The information processing system according to claim 2.
The information processing system according to any one of claims 1 to 3, further comprising means for generating correction information for correcting a feature amount for calculating a similarity of a moving object for each region.
The determining means determines whether or not the moving body in the region other than the appropriate region in the video imaged by the second imaging device is the same as the moving body imaged by the first imaging device. When the moving body imaged by the second imaging device moves to the appropriate area, the moving body is again the same as the moving object imaged by the first imaging device. To determine whether or not
The information processing system according to any one of claims 1 to 4.
The calculation means calculates the area suitability for each region in the video under a plurality of different conditions,
The discriminating unit discriminates whether or not they are the same moving body using the appropriate area determined by the area appropriateness corresponding to the condition among the plurality of area appropriateness values calculated by the calculating unit.
The information processing system according to any one of claims 2 to 5.
Receiving input of images shot by a plurality of shooting devices;
Predetermined appropriateness in a moving body reflected in an image captured by a first imaging device of the plurality of imaging devices and an image captured by a second imaging device of the plurality of imaging devices An information processing method in which an information processing system performs a step of discriminating whether or not a moving body in a region is the same moving body according to a similarity.
A process of receiving input of images shot by a plurality of shooting devices;
Predetermined appropriateness in a moving body reflected in an image captured by a first imaging device of the plurality of imaging devices and an image captured by a second imaging device of the plurality of imaging devices A program that causes a computer to execute processing for determining whether or not a moving object in an area is the same moving object according to the degree of similarity.
Input means for receiving input of images shot by a plurality of shooting devices;
A first moving body reflected in an image captured by a first imaging device among the plurality of imaging devices, and a second movable object reflected in an image captured by a second imaging device of the plurality of imaging devices. Whether or not the moving body is similar is determined when the second moving body enters an appropriate area that can be determined more appropriately than other areas in the video imaged by the second imaging device. An information processing system comprising a determination unit.
The information processing system according to claim 9, further comprising display means for displaying the appropriate area so as to be visible.
Receiving input of images shot by a plurality of shooting devices;
A first moving body reflected in an image captured by a first imaging device among the plurality of imaging devices, and a second movable object reflected in an image captured by a second imaging device of the plurality of imaging devices. Whether or not the moving body is similar is determined when the second moving body enters an appropriate area that can be determined more appropriately than other areas in the video imaged by the second imaging device. An information processing method in which the information processing system performs steps.
A process of receiving input of images shot by a plurality of shooting devices;
A first moving body reflected in an image captured by a first imaging device among the plurality of imaging devices, and a second movable object reflected in an image captured by a second imaging device of the plurality of imaging devices. Whether or not the moving body is similar is determined when the second moving body enters an appropriate area that can be determined more appropriately than other areas in the video imaged by the second imaging device. A program that causes a computer to execute processing.