WO2022030546A1 - Information processing device, information processing method, and program - Google Patents
Information processing device, information processing method, and program Download PDFInfo
- Publication number
- WO2022030546A1 WO2022030546A1 PCT/JP2021/028961 JP2021028961W WO2022030546A1 WO 2022030546 A1 WO2022030546 A1 WO 2022030546A1 JP 2021028961 W JP2021028961 W JP 2021028961W WO 2022030546 A1 WO2022030546 A1 WO 2022030546A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- tracking
- target person
- video data
- information processing
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B25/00—Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
Definitions
- An embodiment of the present invention relates to an information processing device, an information processing method, and a program for detecting an image of a target person by analyzing video data from a surveillance camera, for example.
- An information processing device such as a general-purpose personal computer receives video data from a camera, stores the received video data in a storage unit, analyzes the video data, and detects an image of a target person. In addition, the information processing device displays the detected image of the target person on a monitor or the like.
- the information processing device receives the video data from each camera, analyzes each of the video data from each camera, and detects the image of the target person. An image of the target person detected from the video data from each camera is displayed on each monitor. The observer visually confirms the target person displayed on each monitor.
- Patent Document 1 In order to monitor a person, a technique for detecting the target person with high accuracy is required, and a technique related to this is proposed (see, for example, Patent Document 1).
- the observer visually tracks the target person across the multiple monitors. Will be done. In this case, the observer will follow the displays of a plurality of monitors at the same time, which not only imposes a heavy monitoring burden on the observer, but may also cause oversight.
- the present invention has been made by paying attention to the above circumstances, and is intended to provide a technique for reducing the monitoring burden of a watchman based on video data from a plurality of cameras.
- the information processing apparatus has a detection unit that comprehensively analyzes video data from a plurality of cameras to detect a target person image, and a target based on the detection result of the target person image. It is provided with a tracking unit for tracking a person and an output unit for outputting the tracking result of the target person.
- FIG. 1 is a diagram showing an example of a configuration of a monitoring system including a monitoring information processing apparatus according to an embodiment of the present invention.
- FIG. 2 is a block diagram showing an example of a hardware configuration of a Web server device used as a monitoring information processing device according to an embodiment of the present invention.
- FIG. 3 is a block diagram showing an example of a software configuration of a Web server device used as a monitoring information processing device according to an embodiment of the present invention.
- FIG. 4 is a flowchart showing an example of tracking processing by the system according to the embodiment of the present invention.
- FIG. 5 is a flowchart showing a first example of tracking processing by the Web server device according to the embodiment of the present invention.
- FIG. 1 is a diagram showing an example of a configuration of a monitoring system including a monitoring information processing apparatus according to an embodiment of the present invention.
- FIG. 2 is a block diagram showing an example of a hardware configuration of a Web server device used as a monitoring information processing device according to an embodiment of the
- FIG. 6 is a flowchart showing a second example of tracking processing by the Web server device according to the embodiment of the present invention.
- FIG. 7 is a conceptual diagram showing an example of video analysis by the video analysis engine according to the embodiment of the present invention.
- FIG. 8 is a conceptual diagram showing an example of integrated video analysis by a Web server device used as a monitoring information processing device according to an embodiment of the present invention.
- FIG. 1 is a diagram showing an overall configuration of a system including a monitoring information processing apparatus according to an embodiment of the present invention.
- a plurality of surveillance cameras C1 to Cn are distributed and arranged in the aisles and sales floors of large-scale stores such as shopping malls and department stores.
- surveillance cameras C1 to Cn are attached to, for example, a ceiling or a wall surface, capture images of each surveillance area, and output video data thereof.
- the surveillance cameras C1 to Cn are equipped with video analysis engines VE1 to VEn, respectively.
- the video analysis engines VE1 to VEn correspond to the video analysis unit, and the video analysis unit analyzes each video data from the surveillance cameras C1 to Cn.
- the video analysis engines VE1 to VEn each perform intra-angle tracking on a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn, and the images are imaged from the plurality of image frames. The same person image is determined based on the position information in the frame.
- the video analysis engines VE1 to VEn are not arranged one-to-one with respect to the surveillance cameras C1 to Cn, but a smaller number of video analysis engines are arranged for a plurality of cameras, and a plurality of video analysis engines are used.
- the video data of the surveillance camera may be collectively processed.
- the system of one embodiment includes a Web server device SV used as a monitoring information processing device.
- the video analysis engines VE1 to VEn are capable of data communication with the Web server device SV via the network NW, and transmit the generated video analysis result to the Web server device SV via the network NW.
- NW for example, a wired LAN (Local Area Network) or a wireless LAN is used, but any other network may be used.
- the web server device SV includes video analysis engines VE1 to VEN or one video analysis engine, and the video analysis engines VE1 to VEN of the web server device SV or one video analysis engine is a surveillance camera via a network NW.
- Each video data from C1 to Cn may be received and the received video data may be analyzed.
- Web server device SV 2 and 3 are block diagrams showing an example of a hardware configuration and a software configuration of the Web server device SV, respectively.
- the Web server device SV includes a control unit 1 having a hardware processor such as a central processing unit (CPU), and the program storage unit 2 and a data storage unit are provided to the control unit 1 via a bus 6.
- the storage unit having 3 is connected to the input / output interface (input / output I / F) 4 and the communication interface (communication I / F) 5.
- a monitor device MT and an administrator terminal OT are connected to the input / output I / F4.
- the monitoring device MT is used for the observer to visually monitor the monitoring area, and displays images of the surveillance cameras C1 to Cn, information indicating the detection result or the tracking result of the query to be monitored, and the like.
- the administrator terminal OT is used by the system administrator for system management and maintenance, and displays various setting screens and information indicating the operating status in the system, and the system administrator manages and operates the system. When inputting various necessary data, it has a function of accepting the data and setting it in the Web server device SV.
- the communication I / F5 transmits data between the video analysis engines VE1 to VEn using a communication protocol defined by the network NW under the control of the control unit 1, for example, a wired LAN or a wireless LAN. It is composed of the corresponding interfaces.
- the program storage unit 2 includes, for example, a non-volatile memory such as an HDD (Hard Disk Drive) or SSD (Solid State Drive) that can be written and read at any time as a storage medium, and a non-volatile memory such as a ROM (Read Only Memory).
- a non-volatile memory such as an HDD (Hard Disk Drive) or SSD (Solid State Drive) that can be written and read at any time as a storage medium
- a non-volatile memory such as a ROM (Read Only Memory).
- middleware such as an OS (Operating System)
- OS Operating System
- the data storage unit 3 is, for example, a combination of a non-volatile memory such as an HDD or SSD capable of writing and reading at any time and a volatile memory such as a RAM (RandomAccessMemory) as a storage medium, and is one of the present inventions.
- a camera information table 31, a setting information table 32, and a tracking result table 33 are provided as a main storage unit necessary for implementing the embodiment.
- the camera information table 31 stores, for example, information representing the name, performance, and installation position of the surveillance camera in association with the identification information (hereinafter referred to as the camera ID) for each of the surveillance cameras C1 to Cn.
- Information representing performance includes, for example, resolution and aspect ratio.
- the information indicating the installation position includes, for example, latitude / longitude, imaging direction, and imaging angle.
- the setting information table 32 stores the image feature amount of the query.
- the setting information table 32 stores the image feature amount of the query input from the administrator terminal OT via the input / output I / F4.
- the setting information table 32 stores the image feature amount of the query detected from the video data transmitted from the surveillance cameras C1 to Cn via the communication I / F5.
- the setting information table 32 stores the alert determination condition input via the administrator terminal OT or the like.
- the setting information table 32 stores the first or second alert determination condition input via the administrator terminal OT or the like.
- the administrator presses the tracking button on the administrator terminal OT for the person (image) to be tracked.
- the control unit 1 automatically registers the latest set of detected images (face image and whole body image) as a query image (query image feature amount) in response to pressing the tracking button, and starts tracking.
- the administrator presses the history button on the administrator terminal OT for the person (image) that the administrator wants to track.
- the control unit 1 selects an arbitrary image from the history list, registers it as a query image, and starts tracking.
- control unit 1 performs a person search from the image of the surveillance camera according to the history search from the administrator, registers the image selected from the person search results by the administrator as a query image, and starts tracking. .. Further, the administrator selects a person (image) included in the surveillance image data obtained in real time, and the control unit 1 registers the selected person as a query image and starts tracking. In addition, the administrator takes in the image provided by the requester from the administrator terminal OT, registers it as a query image, and starts tracking.
- the tracking result table 33 stores the tracking results of the tracked person. For example, the tracking result table 33 stores tracking results for each tracking target person edited in chronological order.
- the control unit 1 includes an information acquisition unit 11, a detection unit 12, a tracking unit 13, and an output unit 14 as processing functions according to an embodiment of the present invention. Each unit is realized by causing the hardware processor of the control unit 1 to execute the program stored in the program storage unit 2.
- the information acquisition unit 11 acquires video data and video analysis results from the video analysis engines VE1 to VEn connected to the surveillance cameras C1 to Cn or the video analysis engines VE1 to VEn provided in the Web server device SV.
- the video analysis engines VE1 to VEn each determine the same person from a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn based on the position information in the image frame, and the determination result is obtained.
- Outputs video analysis results including.
- the detection unit 12 comprehensively analyzes the video analysis result and the video data from the surveillance cameras C1 to Cn to detect the tracked person image.
- the video analysis engines VE1 to VEn are, for example, from a plurality of image frames included in the video data from the surveillance cameras C1 to Cn based on the image feature amount (feature amount of the tracked person image) of the query given in advance. , A person image (tracked person image) having an image feature amount similar to the image feature amount of the query is extracted. For example, a plurality of queries are given in advance, and a plurality of person images having an image feature amount similar to the image feature amounts of the plurality of queries are extracted.
- the video analysis engines VE1 to VEn include information indicating the degree of similarity between the extracted person image and the query image, the camera IDs of the surveillance cameras C1 to Cn, the tracking ID within the angle of view, and the shooting time (date, time, minute, and second). ) And the video analysis result including.
- the portrait image includes a face image and a whole body image
- the similarity information includes the similarity corresponding to each of the face image and the whole body image.
- the camera ID is identification information unique to the surveillance camera.
- the in-angle tracking ID is an ID for tracking images regarded as the same person in the same surveillance camera.
- the detection unit 12 alerts the tracked subject based on the average similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a plurality of consecutive frames included in the video data from the surveillance cameras C1 to Cn. May be determined. For example, the detection unit 12 follows the first alert determination condition input via the administrator terminal OT or the like when the alert frequency may be slightly higher, and the feature amount of the candidate image in a predetermined number of consecutive frames. The necessity of the tracked person alert is determined based on the average similarity between the image and the feature amount of the tracked person image.
- the detection unit 12 When the average similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a predetermined number of consecutive frames exceeds the first average similarity threshold value, the detection unit 12 requires a tracker alert. judge. When the average similarity does not exceed the first average similarity threshold, the detection unit 12 determines that the tracker alert is unnecessary.
- the detection unit 12 follows the second alert determination condition input via the administrator terminal OT or the like when it is desired to suppress the alert, and the feature amount of the candidate image and the tracking target person image in a predetermined number of consecutive frames.
- the necessity of the tracked person alert is determined based on the average similarity with the feature amount of.
- the detection unit 12 determines that the tracker alert is necessary.
- the detection unit 12 determines that the tracker alert is unnecessary.
- the tracking unit 13 tracks the tracking target person based on the detection result of the tracking target person image. For example, the tracking unit 13 generates tracking results of a time-series tracking target person based on the time information of a plurality of frames included in the video data from the surveillance cameras C1 to Cn.
- the output unit 14 outputs the tracking result of the tracked person. For example, the output unit 14 outputs a tracking result for display on the monitoring device MT. Further, the output unit 14 outputs the tracking result to be stored in the tracking result table 33.
- the output tracking result is a tracking result of a time-series tracked person.
- the case where the tables 31 to 33 provided in the data storage unit 3 are provided in the Web server device SV is taken as an example.
- the present invention is not limited to this, and it may be provided in a database server or a file server arranged outside the Web server device SV.
- the Web server device SV accesses each of the tables 31 to 33 in the database server or the file server and performs each process by acquiring necessary information.
- FIG. 4 is a flowchart showing an example of tracking processing by the system according to the embodiment of the present invention.
- Surveillance cameras C1 to Cn start shooting and output video data (ST1).
- the video analysis engines VE1 to VEN analyze video data from the corresponding surveillance cameras C1 to Cn, respectively (ST2).
- the video analysis engines VE1 to VEn each perform intra-angle tracking on a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn, and the images are imaged from the plurality of image frames. The same person is determined based on the position information in the frame.
- the video analysis engines VE1 to VEn detect the feature amount of the candidate image obtained from a plurality of frames included in the video data from the surveillance cameras C1 to Cn, and the feature amount of the candidate image and the tracking target person image. Calculate the degree of similarity with the feature amount.
- the video analysis engines VE1 to VEn output video analysis results including video data and determination results of the same person.
- the communication I / F5 of the Web server device SV receives the video data and the same person determination from the video analysis engines VE1 to VEn.
- the first information acquisition unit 11 acquires video data from the video analysis engines VE1 to VEn and determination of the same person (ST3).
- the detection unit 12 comprehensively analyzes the video data from the video analysis engines VE1 to VEn and the same person determination, and detects the tracking target person image from a plurality of frames included in the video data from the video analysis engines VE1 to VEn. (ST4).
- the tracking unit 13 tracks the tracked subject based on the detection result of the tracked subject image (ST5).
- the output unit 14 outputs the tracking result of the tracking target person (ST6).
- FIG. 5 is a flowchart showing a first example of tracking processing by the Web server device according to the embodiment of the present invention.
- the detection (ST4) of the tracked subject image by the detection unit 12 shown in FIG. 4 will be described in more detail.
- the detection unit 12 detects the tracked person image based on the video analysis result and the respective video data from the surveillance cameras C1 to Cn. For example, the detection unit 12 compares the similarity threshold value with the similarity calculated by the video analysis engines VE1 to VEn (ST412), extracts candidate images exceeding the similarity threshold value (ST413), and further extracts the same person. The candidate image is extracted (ST414), and the extracted candidate image is detected as the tracking target image (ST415).
- the detection unit 12 detects the candidate image exceeding the similarity threshold value “29” as the tracking target person image.
- the tracking unit 13 tracks the tracking target person based on the detection result of the tracking target person image.
- the output unit 14 outputs the tracking result of the tracking target person. When a plurality of tracked person images are detected, the output unit 14 outputs the tracking result for each tracker.
- the tracking result includes the tracking target person image, the camera ID that captured the tracking target person image, and the shooting time of the tracking target person image.
- the output unit 14 outputs the tracking result in time series based on the time information of a plurality of frames included in the video data from the surveillance cameras C1 to Cn.
- FIG. 6 is a flowchart showing a second example of tracking processing by the Web server device according to the embodiment of the present invention.
- the second example a case where the number of target frames is dynamically changed will be described.
- the output of the detection (ST4) of the tracking target person image by the detection unit 12 shown in FIG. 4 will be described in more detail.
- the first alert determination condition is a condition applied when it is desired to receive an alert output as standard
- the second alert determination condition is a condition applied when it is desired to suppress an alert.
- alert suppression is not specified, that is, if the first alert determination condition is set (ST421, YES)
- the detection unit 12 will perform a first predetermined number of consecutive frames (relatively short time).
- the feature amount of the candidate image in the captured image) is detected, and the similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in the first predetermined number of consecutive frames is calculated (ST423).
- the detection unit 12 compares the similarity threshold value with the calculated similarity degree (ST425), extracts a candidate image exceeding the similarity degree threshold value (ST426), and further extracts a candidate image of the same person (ST426). ST427).
- the detection unit 12 calculates the average value of the similarity of the candidate images in the first predetermined number of frames (ST428), compares the average similarity threshold with the calculated average similarity (ST429), and compares it with the calculated average similarity. When the calculated average similarity exceeds the average similarity threshold value, it is determined that the tracked subject alert is necessary, and the tracked subject alert is set (ST430). When it is determined that the tracked person alert is necessary, the output unit 14 outputs the tracking result including the tracked person alert.
- the detection unit 12 If suppression of alerts is specified, that is, when the second alert determination condition is set (ST422, YES), the detection unit 12 has a second predetermined number of consecutive numbers more than the first predetermined number.
- the feature amount of the candidate image in the frame (the image taken for a relatively long time) is detected, and the similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a second predetermined number of consecutive frames is determined. Calculate (ST424). Further, the detection unit 12 compares the similarity threshold value with the calculated similarity degree (ST425), extracts a candidate image exceeding the similarity degree threshold value (ST426), and further extracts a candidate image of the same person (ST426). ST427).
- the detection unit 12 calculates the average value of the similarity of the candidate images in the second predetermined number of frames (ST428), and compares the average similarity threshold with the calculated average similarity (ST429). When the calculated average similarity exceeds the average similarity threshold value, it is determined that the tracked subject alert is necessary, and the tracked subject alert is set (ST430). When it is determined that the tracked person alert is necessary, the output unit 14 outputs the tracking result including the tracked person alert.
- the Web server device SV of the present embodiment comprehensively analyzes video data from a plurality of surveillance cameras C1 to Cn in real time to detect a tracked subject image, and the tracked subject is based on the detection result of the tracked subject image. And output the tracking result of the tracked person.
- the integrated analysis is a video analysis based on the video analysis result obtained by analyzing the video data from the surveillance cameras C1 to Cn for each camera and the video data from the surveillance cameras C1 to Cn.
- the video analysis engines VE1 to VEn are responsible for analyzing the video data for each camera
- the Web server device SV uses the video analysis results for each camera by the video analysis engines VE1 to VEn from the surveillance cameras C1 to Cn.
- the tracked person image is detected from the video data of.
- the monitoring device MT displays tracking results in chronological order for each tracking target person. Even when the tracking target person is photographed across a plurality of surveillance cameras C1 to Cn, the monitoring device MT collectively displays the images of the same tracking target person in chronological order. As a result, the observer does not have to visually chase the same tracked person across a plurality of monitors, and the monitoring burden of the observer is reduced.
- the Web server device SV of the present embodiment outputs a tracking result including a tracking target person alert such as an image or a voice.
- a tracking target person alert such as an image or a voice.
- the monitoring device MT highlights and displays the detected image of the tracked person by a symbol, a mark, a frame, or the like indicating an alert.
- the observer can visually confirm the person to be tracked without overlooking it.
- the Web server device SV is a target person alert based on the average similarity between the feature amount of the candidate image and the feature amount of the target person image in a plurality of consecutive frames included in the video data from the surveillance cameras C1 to Cn. Judge the necessity.
- the Web server device SV determines that the target person alert is necessary, the Web server device SV outputs the tracking result including the target person alert. For example, the Web server device SV determines whether or not the target person alert is necessary depending on whether or not the calculated average similarity exceeds the average similarity threshold. If the average similarity threshold is set high, the sensitivity of the subject alert can be suppressed, and conversely, if the average similarity threshold is set low, the sensitivity of the subject alert can be increased.
- the Web server device SV may dynamically set the number of frames for which the average similarity is calculated to control the sensitivity of the target alert. For example, when it is not necessary to particularly suppress the sensitivity of the target person alert and the first alert determination condition is set, the Web server device SV has the feature amount of the candidate image in the first predetermined number of consecutive frames. The necessity of the subject alert is determined based on the average similarity between the subject and the feature amount of the subject image. On the contrary, it is necessary to suppress the sensitivity of the target person alert, and when the second alert determination condition is set, the Web server device SV has a second predetermined number of frames that are continuous more than the first predetermined number.
- the necessity of the subject alert is determined based on the average similarity between the feature amount of the candidate image and the feature amount of the subject image in.
- the sensitivity of the target person alert can be controlled by dynamically changing the number of frames for which the average similarity is calculated according to the determination condition.
- the program according to the present embodiment may be transferred in a state of being stored in an electronic device, may be transferred in a state of being stored in a storage medium, or may be transferred by downloading via a network or the like.
- the recording medium is a non-temporary computer-readable storage medium such as a magnetic disk, an optical disk, or a flash memory.
- the present invention is not limited to the above embodiment, and can be variously modified at the implementation stage without departing from the gist thereof.
- each embodiment may be carried out in combination as appropriate, in which case the combined effect can be obtained.
- the above-described embodiment includes various inventions, and various inventions can be extracted by a combination selected from a plurality of disclosed constituent requirements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiment, if the problem can be solved and the effect is obtained, the configuration in which the constituent elements are deleted can be extracted as an invention.
Landscapes
- Business, Economics & Management (AREA)
- Emergency Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Alarm Systems (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
Provided is a technique for alleviating the burden of monitoring that is performed by monitoring personnel on the basis of video data from a plurality of cameras. An information processing device according to one aspect of the present invention is provided with: a detection unit that detects an image of a subject through analyzing video data from a plurality of cameras comprehensively; a tracking unit that tracks the subject on the basis of a result of detection of the image of the subject; and an output unit that outputs a result of tracking of the subject.
Description
この発明の実施形態は、例えば監視カメラからの映像データを解析して対象者の画像を検出する情報処理装置、情報処理方法、及びプログラムに関する。
An embodiment of the present invention relates to an information processing device, an information processing method, and a program for detecting an image of a target person by analyzing video data from a surveillance camera, for example.
近年、防犯対策の一環として、様々な場所にカメラが設置されている。カメラは、監視対象エリアを撮影し、映像データを出力する。汎用のパーソナルコンピュータ等の情報処理装置は、カメラからの映像データを受信し、受信した映像データを記憶部に記憶し映像データを解析して対象者の画像を検出する。また、情報処理装置は、検出した対象者の画像をモニタ等に表示する。
In recent years, cameras have been installed in various places as part of crime prevention measures. The camera captures the monitored area and outputs video data. An information processing device such as a general-purpose personal computer receives video data from a camera, stores the received video data in a storage unit, analyzes the video data, and detects an image of a target person. In addition, the information processing device displays the detected image of the target person on a monitor or the like.
例えば、大規模店舗、オフィスビル、又は駅の構内のように多くの人が利用する施設には、複数のカメラが設置される。また、監視室等には、複数のカメラに対応する複数のモニタが設置される。情報処理装置は、各カメラからの映像データを受信し、各カメラからの映像データのそれぞれを解析して対象者の画像を検出する。各モニタには、各カメラからの映像データから検出された対象者の画像が表示される。監視員は、各モニタに表示される対象者を目視で確認する。
For example, in a facility used by many people, such as a large-scale store, an office building, or a station yard, multiple cameras are installed. Further, in the monitoring room or the like, a plurality of monitors corresponding to a plurality of cameras are installed. The information processing device receives the video data from each camera, analyzes each of the video data from each camera, and detects the image of the target person. An image of the target person detected from the video data from each camera is displayed on each monitor. The observer visually confirms the target person displayed on each monitor.
人物を監視する上では、高精度に対象者を検出する技術が必要となり、これに関する技術が提案されている(例えば特許文献1を参照)。
In order to monitor a person, a technique for detecting the target person with high accuracy is required, and a technique related to this is proposed (see, for example, Patent Document 1).
対象者を高精度に検出する技術についてはいくつかの提案があるが、人物を監視の負担を軽減したいという要望がある。
There are some proposals for technology to detect the target person with high accuracy, but there is a request to reduce the burden of monitoring the person.
上記したように、複数のカメラからの映像データの解析結果(対象者の画像)を複数のモニタで分担して表示する場合、監視員は、複数のモニタに跨って、目視で対象者を追跡することになる。この場合、監視員は、複数のモニタの表示を同時に追いかけることになり、監視員の監視負担が大きいだけでなく、見落しが生じるおそれがある。
As described above, when the analysis result (image of the target person) of the video data from multiple cameras is shared and displayed on multiple monitors, the observer visually tracks the target person across the multiple monitors. Will be done. In this case, the observer will follow the displays of a plurality of monitors at the same time, which not only imposes a heavy monitoring burden on the observer, but may also cause oversight.
この発明は上記事情に着目してなされたもので、複数のカメラからの映像データに基づく監視員の監視負担の軽減を図る技術を提供しようとするものである。
The present invention has been made by paying attention to the above circumstances, and is intended to provide a technique for reducing the monitoring burden of a watchman based on video data from a plurality of cameras.
上記課題を解決するためにこの発明の一態様の情報処理装置は、複数のカメラからの映像データを統合的に解析し対象者画像を検出する検出部と、対象者画像の検出結果に基づき対象者を追跡する追跡部と、対象者の追跡結果を出力する出力部と、を備える。
In order to solve the above problems, the information processing apparatus according to one aspect of the present invention has a detection unit that comprehensively analyzes video data from a plurality of cameras to detect a target person image, and a target based on the detection result of the target person image. It is provided with a tracking unit for tracking a person and an output unit for outputting the tracking result of the target person.
この発明の一態様によれば、複数のカメラからの映像データに基づく監視員の監視負担の軽減を図る技術を提供することができる。
According to one aspect of the present invention, it is possible to provide a technique for reducing the monitoring burden of a watchman based on video data from a plurality of cameras.
以下、図面を参照してこの発明に係る実施形態を説明する。
Hereinafter, embodiments according to the present invention will be described with reference to the drawings.
[一実施形態]
(構成例)
(1)システム
図1は、この発明の一実施形態に係る監視情報処理装置を含むシステムの全体構成を示す図である。
例えば、ショッピングモールや百貨店などの大規模店舗の通路や売り場には、複数台の監視カメラC1~Cnが分散配置されている。監視カメラC1~Cnは、例えば天井または壁面に取着され、それぞれの監視エリアを撮像してその映像データを出力する。 [One Embodiment]
(Configuration example)
(1) System FIG. 1 is a diagram showing an overall configuration of a system including a monitoring information processing apparatus according to an embodiment of the present invention.
For example, a plurality of surveillance cameras C1 to Cn are distributed and arranged in the aisles and sales floors of large-scale stores such as shopping malls and department stores. Surveillance cameras C1 to Cn are attached to, for example, a ceiling or a wall surface, capture images of each surveillance area, and output video data thereof.
(構成例)
(1)システム
図1は、この発明の一実施形態に係る監視情報処理装置を含むシステムの全体構成を示す図である。
例えば、ショッピングモールや百貨店などの大規模店舗の通路や売り場には、複数台の監視カメラC1~Cnが分散配置されている。監視カメラC1~Cnは、例えば天井または壁面に取着され、それぞれの監視エリアを撮像してその映像データを出力する。 [One Embodiment]
(Configuration example)
(1) System FIG. 1 is a diagram showing an overall configuration of a system including a monitoring information processing apparatus according to an embodiment of the present invention.
For example, a plurality of surveillance cameras C1 to Cn are distributed and arranged in the aisles and sales floors of large-scale stores such as shopping malls and department stores. Surveillance cameras C1 to Cn are attached to, for example, a ceiling or a wall surface, capture images of each surveillance area, and output video data thereof.
例えば、監視カメラC1~Cnには、それぞれ映像解析エンジンVE1~VEnが付設されている。映像解析エンジンVE1~VEnは映像解析部に相当し、映像解析部は、監視カメラC1~Cnからのそれぞれの映像データを解析する。例えば、映像解析エンジンVE1~VEnはそれぞれ、対応する監視カメラC1~Cnから出力される映像データに含まれる複数の画像フレームに対して、画角内追跡を実施し、複数の画像フレームから、画像フレーム内の位置情報等に基づき同一人物画像を判定する。
For example, the surveillance cameras C1 to Cn are equipped with video analysis engines VE1 to VEn, respectively. The video analysis engines VE1 to VEn correspond to the video analysis unit, and the video analysis unit analyzes each video data from the surveillance cameras C1 to Cn. For example, the video analysis engines VE1 to VEn each perform intra-angle tracking on a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn, and the images are imaged from the plurality of image frames. The same person image is determined based on the position information in the frame.
なお、映像解析エンジンVE1~VEnは監視カメラC1~Cnに対し一対一に配置せず、複数台のカメラに対しそれより少数の映像解析エンジンを配置して、少数の映像解析エンジンにより複数台の監視カメラの映像データを一括処理するようにしてもよい。
The video analysis engines VE1 to VEn are not arranged one-to-one with respect to the surveillance cameras C1 to Cn, but a smaller number of video analysis engines are arranged for a plurality of cameras, and a plurality of video analysis engines are used. The video data of the surveillance camera may be collectively processed.
また、一実施形態のシステムは、監視情報処理装置として使用されるWebサーバ装置SVを備える。映像解析エンジンVE1~VEnは、ネットワークNWを介してWebサーバ装置SVとの間でデータ通信が可能となっており、生成された映像解析結果をネットワークNWを介してWebサーバ装置SVへ送信する。ネットワークNWには、例えば有線LAN(Local Area Network)または無線LANが用いられるが、他のどのようなネットワークが使用されてもよい。
Further, the system of one embodiment includes a Web server device SV used as a monitoring information processing device. The video analysis engines VE1 to VEn are capable of data communication with the Web server device SV via the network NW, and transmit the generated video analysis result to the Web server device SV via the network NW. For the network NW, for example, a wired LAN (Local Area Network) or a wireless LAN is used, but any other network may be used.
なお、Webサーバ装置SVが、映像解析エンジンVE1~VEn又は1つの映像解析エンジンを備え、Webサーバ装置SVの映像解析エンジンVE1~VEn又は1つの映像解析エンジンが、ネットワークNWを介して、監視カメラC1~Cnからのそれぞれの映像データを受信し、受信した映像データを解析してもよい。
The web server device SV includes video analysis engines VE1 to VEN or one video analysis engine, and the video analysis engines VE1 to VEN of the web server device SV or one video analysis engine is a surveillance camera via a network NW. Each video data from C1 to Cn may be received and the received video data may be analyzed.
(2)Webサーバ装置SV
図2および図3は、それぞれWebサーバ装置SVのハードウェア構成およびソフトウェア構成の一例を示すブロック図である。
Webサーバ装置SVは、中央処理ユニット(Central Processing Unit:CPU)等のハードウェアプロセッサを有する制御部1を備え、この制御部1に対し、バス6を介して、プログラム記憶部2およびデータ記憶部3を有する記憶ユニットと、入出力インタフェース(入出力I/F)4と、通信インタフェース(通信I/F)5とを接続したものとなっている。 (2) Web server device SV
2 and 3 are block diagrams showing an example of a hardware configuration and a software configuration of the Web server device SV, respectively.
The Web server device SV includes acontrol unit 1 having a hardware processor such as a central processing unit (CPU), and the program storage unit 2 and a data storage unit are provided to the control unit 1 via a bus 6. The storage unit having 3 is connected to the input / output interface (input / output I / F) 4 and the communication interface (communication I / F) 5.
図2および図3は、それぞれWebサーバ装置SVのハードウェア構成およびソフトウェア構成の一例を示すブロック図である。
Webサーバ装置SVは、中央処理ユニット(Central Processing Unit:CPU)等のハードウェアプロセッサを有する制御部1を備え、この制御部1に対し、バス6を介して、プログラム記憶部2およびデータ記憶部3を有する記憶ユニットと、入出力インタフェース(入出力I/F)4と、通信インタフェース(通信I/F)5とを接続したものとなっている。 (2) Web server device SV
2 and 3 are block diagrams showing an example of a hardware configuration and a software configuration of the Web server device SV, respectively.
The Web server device SV includes a
入出力I/F4には、例えばモニタ装置MTおよび管理者端末OTが接続される。モニタ装置MTは、監視者が監視エリアを目視監視するために使用されるもので、監視カメラC1~Cnの映像や、監視対象となるクエリの検知結果または追跡結果を表す情報などを表示する。
For example, a monitor device MT and an administrator terminal OT are connected to the input / output I / F4. The monitoring device MT is used for the observer to visually monitor the monitoring area, and displays images of the surveillance cameras C1 to Cn, information indicating the detection result or the tracking result of the query to be monitored, and the like.
管理者端末OTは、システム管理者がシステム管理や保守等のために使用するもので、各種設定画面やシステム内の動作状態を表す情報を表示すると共に、システム管理者がシステムの管理・運用に必要な種々データを入力したときに当該データを受け付けてWebサーバ装置SVに設定する機能を有する。
The administrator terminal OT is used by the system administrator for system management and maintenance, and displays various setting screens and information indicating the operating status in the system, and the system administrator manages and operates the system. When inputting various necessary data, it has a function of accepting the data and setting it in the Web server device SV.
通信I/F5は、制御部1の制御の下、ネットワークNWにより定義される通信プロトコルを使用して、映像解析エンジンVE1~VEnとの間でデータ伝送を行うもので、例えば有線LANまたは無線LANに対応するインタフェースにより構成される。
The communication I / F5 transmits data between the video analysis engines VE1 to VEn using a communication protocol defined by the network NW under the control of the control unit 1, for example, a wired LAN or a wireless LAN. It is composed of the corresponding interfaces.
プログラム記憶部2は、例えば、記憶媒体としてHDD(Hard Disk Drive)またはSSD(Solid State Drive)等の随時書込みおよび読出しが可能な不揮発性メモリと、ROM(Read Only Memory)等の不揮発性メモリとを組み合わせて構成したもので、OS(Operating System)等のミドルウェアに加えて、この発明の一実施形態に係る各種制御処理を実行するために必要なプログラムを格納する。
The program storage unit 2 includes, for example, a non-volatile memory such as an HDD (Hard Disk Drive) or SSD (Solid State Drive) that can be written and read at any time as a storage medium, and a non-volatile memory such as a ROM (Read Only Memory). In addition to middleware such as an OS (Operating System), a program necessary for executing various control processes according to an embodiment of the present invention is stored.
データ記憶部3は、例えば、記憶媒体として、HDDまたはSSD等の随時書込みおよび読出しが可能な不揮発性メモリと、RAM(Random Access Memory)等の揮発性メモリと組み合わせたもので、この発明の一実施形態を実施するために必要な主たる記憶部として、カメラ情報テーブル31と、設定情報テーブル32と、追跡結果テーブル33とを備えている。
The data storage unit 3 is, for example, a combination of a non-volatile memory such as an HDD or SSD capable of writing and reading at any time and a volatile memory such as a RAM (RandomAccessMemory) as a storage medium, and is one of the present inventions. A camera information table 31, a setting information table 32, and a tracking result table 33 are provided as a main storage unit necessary for implementing the embodiment.
カメラ情報テーブル31は、監視カメラC1~Cn毎に、その識別情報(以後カメラIDと称する)に対応付けて、例えば監視カメラの名称、性能および設置位置を表す情報を記憶する。性能を表す情報には、例えば解像度やアスペクト比が含まれる。また設置位置を示す情報には、例えば緯度経度、撮像方向および撮像角度が含まれる。
The camera information table 31 stores, for example, information representing the name, performance, and installation position of the surveillance camera in association with the identification information (hereinafter referred to as the camera ID) for each of the surveillance cameras C1 to Cn. Information representing performance includes, for example, resolution and aspect ratio. The information indicating the installation position includes, for example, latitude / longitude, imaging direction, and imaging angle.
設定情報テーブル32は、クエリの画像特徴量を記憶する。例えば、設定情報テーブル32は、入出力I/F4を介して管理者端末OTから入力されるクエリの画像特徴量を記憶する。或いは、設定情報テーブル32は、通信I/F5を介して監視カメラC1~Cnから送信される映像データから検出されるクエリの画像特徴量を記憶する。また、設定情報テーブル32は、管理者端末OT等を介して入力されるアラート判定条件を記憶する。例えば、設定情報テーブル32は、管理者端末OT等を介して入力される第1又は第2のアラート判定条件を記憶する。
The setting information table 32 stores the image feature amount of the query. For example, the setting information table 32 stores the image feature amount of the query input from the administrator terminal OT via the input / output I / F4. Alternatively, the setting information table 32 stores the image feature amount of the query detected from the video data transmitted from the surveillance cameras C1 to Cn via the communication I / F5. Further, the setting information table 32 stores the alert determination condition input via the administrator terminal OT or the like. For example, the setting information table 32 stores the first or second alert determination condition input via the administrator terminal OT or the like.
ここで、クエリ画像の登録例について補足する。例えば、リアルタイムで得られるアラートに基づき、管理者が追跡したい人(画像)について、管理者端末OT上で追跡ボタンを押下する。制御部1は、この追跡ボタンの押下に対応して、自動的に最新検知画像のセット(顔画像と全身画像)をクエリ画像(クエリの画像特徴量)として登録し、追跡を開始する。また、リアルタイムで得られるアラートに基づき、管理者が追跡したい人(画像)について、管理者端末OT上で履歴ボタンを押下する。制御部1は、この履歴ボタンの押下に対応して、履歴一覧から任意の画像を選択してクエリ画像として登録し、追跡を開始する。また、制御部1は、管理者からの履歴検索に従い、監視カメラの画像から人物検索を実施し、管理者により人物検索結果の中から選択される画像をクエリ画像として登録し、追跡を開始する。また、管理者は、リアルタイムで得られる監視画像データに含まれる人(画像)を選択し、制御部1は、この選択された人をクエリ画像として登録し、追跡を開始する。また、管理者は、依頼者から提供された画像を管理者端末OTから取り込み、クエリ画像として登録し、追跡を開始する。
Here, supplement the registration example of the query image. For example, based on the alert obtained in real time, the administrator presses the tracking button on the administrator terminal OT for the person (image) to be tracked. The control unit 1 automatically registers the latest set of detected images (face image and whole body image) as a query image (query image feature amount) in response to pressing the tracking button, and starts tracking. Also, based on the alert obtained in real time, the administrator presses the history button on the administrator terminal OT for the person (image) that the administrator wants to track. In response to pressing this history button, the control unit 1 selects an arbitrary image from the history list, registers it as a query image, and starts tracking. Further, the control unit 1 performs a person search from the image of the surveillance camera according to the history search from the administrator, registers the image selected from the person search results by the administrator as a query image, and starts tracking. .. Further, the administrator selects a person (image) included in the surveillance image data obtained in real time, and the control unit 1 registers the selected person as a query image and starts tracking. In addition, the administrator takes in the image provided by the requester from the administrator terminal OT, registers it as a query image, and starts tracking.
追跡結果テーブル33は、追跡対象者の追跡結果を記憶する。例えば、追跡結果テーブル33は、時系列に編集された追跡対象者毎の追跡結果を記憶する。
The tracking result table 33 stores the tracking results of the tracked person. For example, the tracking result table 33 stores tracking results for each tracking target person edited in chronological order.
制御部1は、この発明の一実施形態に係る処理機能として、情報取得部11と、検出部12と、追跡部13と、出力部14とを備えている。各部は、何れもプログラム記憶部2に格納されたプログラムを制御部1のハードウェアプロセッサに実行させることにより実現される。
The control unit 1 includes an information acquisition unit 11, a detection unit 12, a tracking unit 13, and an output unit 14 as processing functions according to an embodiment of the present invention. Each unit is realized by causing the hardware processor of the control unit 1 to execute the program stored in the program storage unit 2.
情報取得部11は、監視カメラC1~Cnに接続された映像解析エンジンVE1~VEn、又はWebサーバ装置SVに設けられた映像解析エンジンVE1~VEnからの映像データ及び映像解析結果等を取得する。例えば、映像解析エンジンVE1~VEnはそれぞれ、対応する監視カメラC1~Cnから出力される映像データに含まれる複数の画像フレームから、画像フレーム内の位置情報等に基づき同一人物を判定し、判定結果を含む映像解析結果等を出力する。
The information acquisition unit 11 acquires video data and video analysis results from the video analysis engines VE1 to VEn connected to the surveillance cameras C1 to Cn or the video analysis engines VE1 to VEn provided in the Web server device SV. For example, the video analysis engines VE1 to VEn each determine the same person from a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn based on the position information in the image frame, and the determination result is obtained. Outputs video analysis results including.
検出部12は、映像解析結果及び監視カメラC1~Cnからの映像データを統合的に解析し追跡対象者画像を検出する。映像解析エンジンVE1~VEnは、例えば事前に与えられたクエリの画像特徴量(追跡対象者画像の特徴量)をもとに、監視カメラC1~Cnからの映像データに含まれる複数の画像フレームから、クエリの画像特徴量と類似する画像特徴量を有する人物画像(追跡対象者画像)を抽出する。例えば、事前に与えられるクエリは複数であり、複数のクエリの画像特徴量と類似する画像特徴量を有する人物画像が複数抽出される。
The detection unit 12 comprehensively analyzes the video analysis result and the video data from the surveillance cameras C1 to Cn to detect the tracked person image. The video analysis engines VE1 to VEn are, for example, from a plurality of image frames included in the video data from the surveillance cameras C1 to Cn based on the image feature amount (feature amount of the tracked person image) of the query given in advance. , A person image (tracked person image) having an image feature amount similar to the image feature amount of the query is extracted. For example, a plurality of queries are given in advance, and a plurality of person images having an image feature amount similar to the image feature amounts of the plurality of queries are extracted.
また、映像解析エンジンVE1~VEnは、抽出された人物画像とクエリ画像との類似度を表す情報と、監視カメラC1~CnのカメラIDと、画角内追跡IDと、撮影時刻(日時分秒)とを含む映像解析結果を生成する。人物画像には、顔(face)の画像と、全身(body)の画像が含まれ、類似度情報には顔画像および全身画像それぞれに対応する類似度が含まれる。カメラIDは、監視カメラに固有の識別情報である。画角内追跡IDは、同一の監視カメラ内で同一人物とみなす画像を追跡するためのIDである。
Further, the video analysis engines VE1 to VEn include information indicating the degree of similarity between the extracted person image and the query image, the camera IDs of the surveillance cameras C1 to Cn, the tracking ID within the angle of view, and the shooting time (date, time, minute, and second). ) And the video analysis result including. The portrait image includes a face image and a whole body image, and the similarity information includes the similarity corresponding to each of the face image and the whole body image. The camera ID is identification information unique to the surveillance camera. The in-angle tracking ID is an ID for tracking images regarded as the same person in the same surveillance camera.
また、検出部12は、監視カメラC1~Cnからの映像データに含まれる連続する複数のフレームにおける候補者画像の特徴量と追跡対象者画像の特徴量との平均類似度に基づき追跡対象者アラートの要否を判定してもよい。例えば、検出部12は、アラート頻度が多少多くなってもよい場合に管理者端末OT等を介して入力される第1のアラート判定条件に従い、連続する所定数のフレームにおける候補者画像の特徴量と追跡対象者画像の特徴量との平均類似度に基づき追跡対象者アラートの要否を判定する。連続する所定数のフレームにおける候補者画像の特徴量と追跡対象者画像の特徴量との平均類似度が第1の平均類似度閾値を超える場合に、検出部12は、追跡者アラートを必要と判定する。平均類似度が第1の平均類似度閾値を超えない場合に、検出部12は、追跡者アラートを不要と判定する。
Further, the detection unit 12 alerts the tracked subject based on the average similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a plurality of consecutive frames included in the video data from the surveillance cameras C1 to Cn. May be determined. For example, the detection unit 12 follows the first alert determination condition input via the administrator terminal OT or the like when the alert frequency may be slightly higher, and the feature amount of the candidate image in a predetermined number of consecutive frames. The necessity of the tracked person alert is determined based on the average similarity between the image and the feature amount of the tracked person image. When the average similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a predetermined number of consecutive frames exceeds the first average similarity threshold value, the detection unit 12 requires a tracker alert. judge. When the average similarity does not exceed the first average similarity threshold, the detection unit 12 determines that the tracker alert is unnecessary.
また、検出部12は、アラートを抑制したい場合に管理者端末OT等を介して入力される第2のアラート判定条件に従い、連続する所定数のフレームにおける候補者画像の特徴量と追跡対象者画像の特徴量との平均類似度に基づき追跡対象者アラートの要否を判定する。連続する所定数のフレームにおける候補者画像の特徴量と追跡対象者画像の特徴量との平均類似度が第1の平均類似度閾値より高い値の第2の平均類似度閾値を超える場合に、検出部12は、追跡者アラートを必要と判定する。平均類似度が第2の平均類似度閾値を超えない場合に、検出部12は、追跡者アラートを不要と判定する。
Further, the detection unit 12 follows the second alert determination condition input via the administrator terminal OT or the like when it is desired to suppress the alert, and the feature amount of the candidate image and the tracking target person image in a predetermined number of consecutive frames. The necessity of the tracked person alert is determined based on the average similarity with the feature amount of. When the average similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a predetermined number of consecutive frames exceeds the second average similarity threshold value higher than the first average similarity threshold value. The detection unit 12 determines that the tracker alert is necessary. When the average similarity does not exceed the second average similarity threshold, the detection unit 12 determines that the tracker alert is unnecessary.
追跡部13は、追跡対象者画像の検出結果に基づき追跡対象者を追跡する。例えば、追跡部13は、監視カメラC1~Cnからのそれぞれの映像データに含まれる複数のフレームの時刻情報に基づき、時系列の追跡対象者の追跡結果を生成する。
The tracking unit 13 tracks the tracking target person based on the detection result of the tracking target person image. For example, the tracking unit 13 generates tracking results of a time-series tracking target person based on the time information of a plurality of frames included in the video data from the surveillance cameras C1 to Cn.
出力部14は、追跡対象者の追跡結果を出力する。例えば、出力部14は、モニタ装置MTで表示するための追跡結果を出力する。また、出力部14は、追跡結果テーブル33に記憶するための追跡結果を出力する。例えば、出力される追跡結果は、時系列の追跡対象者の追跡結果である。
The output unit 14 outputs the tracking result of the tracked person. For example, the output unit 14 outputs a tracking result for display on the monitoring device MT. Further, the output unit 14 outputs the tracking result to be stored in the tracking result table 33. For example, the output tracking result is a tracking result of a time-series tracked person.
なお、以上の説明では、データ記憶部3に設けられた各テーブル31~33をWebサーバ装置SV内に設けた場合を例にとった。しかし、それに限らず、Webサーバ装置SV外に配置されたデータベースサーバまたはファイルサーバ内に設けるようにしてもよい。この場合、Webサーバ装置SVがデータベースサーバまたはファイルサーバ内の各テーブル31~33に対しアクセスし、必要な情報を取得することにより各処理を行う。
In the above description, the case where the tables 31 to 33 provided in the data storage unit 3 are provided in the Web server device SV is taken as an example. However, the present invention is not limited to this, and it may be provided in a database server or a file server arranged outside the Web server device SV. In this case, the Web server device SV accesses each of the tables 31 to 33 in the database server or the file server and performs each process by acquiring necessary information.
(動作例)
次に、以上のように構成されたシステムの動作例を説明する。
図4は、この発明の一実施形態に係るシステムによる追跡処理の一例を示すフローチャートである。
監視カメラC1~Cnは、撮影を開始し、映像データを出力する(ST1)。映像解析エンジンVE1~VEnはそれぞれ、対応する監視カメラC1~Cnからの映像データを解析する(ST2)。例えば、映像解析エンジンVE1~VEnはそれぞれ、対応する監視カメラC1~Cnから出力される映像データに含まれる複数の画像フレームに対して、画角内追跡を実施し、複数の画像フレームから、画像フレーム内の位置情報等に基づき同一人物を判定する。また、映像解析エンジンVE1~VEnは、監視カメラC1~Cnからの映像データに含まれる複数のフレームから得られる候補者画像の特徴量を検出し、候補者画像の特徴量と追跡対象者画像の特徴量との類似度を算出する。映像解析エンジンVE1~VEnは、映像データ及び同一人物の判定結果等を含む映像解析結果出力する。 (Operation example)
Next, an operation example of the system configured as described above will be described.
FIG. 4 is a flowchart showing an example of tracking processing by the system according to the embodiment of the present invention.
Surveillance cameras C1 to Cn start shooting and output video data (ST1). The video analysis engines VE1 to VEN analyze video data from the corresponding surveillance cameras C1 to Cn, respectively (ST2). For example, the video analysis engines VE1 to VEn each perform intra-angle tracking on a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn, and the images are imaged from the plurality of image frames. The same person is determined based on the position information in the frame. Further, the video analysis engines VE1 to VEn detect the feature amount of the candidate image obtained from a plurality of frames included in the video data from the surveillance cameras C1 to Cn, and the feature amount of the candidate image and the tracking target person image. Calculate the degree of similarity with the feature amount. The video analysis engines VE1 to VEn output video analysis results including video data and determination results of the same person.
次に、以上のように構成されたシステムの動作例を説明する。
図4は、この発明の一実施形態に係るシステムによる追跡処理の一例を示すフローチャートである。
監視カメラC1~Cnは、撮影を開始し、映像データを出力する(ST1)。映像解析エンジンVE1~VEnはそれぞれ、対応する監視カメラC1~Cnからの映像データを解析する(ST2)。例えば、映像解析エンジンVE1~VEnはそれぞれ、対応する監視カメラC1~Cnから出力される映像データに含まれる複数の画像フレームに対して、画角内追跡を実施し、複数の画像フレームから、画像フレーム内の位置情報等に基づき同一人物を判定する。また、映像解析エンジンVE1~VEnは、監視カメラC1~Cnからの映像データに含まれる複数のフレームから得られる候補者画像の特徴量を検出し、候補者画像の特徴量と追跡対象者画像の特徴量との類似度を算出する。映像解析エンジンVE1~VEnは、映像データ及び同一人物の判定結果等を含む映像解析結果出力する。 (Operation example)
Next, an operation example of the system configured as described above will be described.
FIG. 4 is a flowchart showing an example of tracking processing by the system according to the embodiment of the present invention.
Surveillance cameras C1 to Cn start shooting and output video data (ST1). The video analysis engines VE1 to VEN analyze video data from the corresponding surveillance cameras C1 to Cn, respectively (ST2). For example, the video analysis engines VE1 to VEn each perform intra-angle tracking on a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn, and the images are imaged from the plurality of image frames. The same person is determined based on the position information in the frame. Further, the video analysis engines VE1 to VEn detect the feature amount of the candidate image obtained from a plurality of frames included in the video data from the surveillance cameras C1 to Cn, and the feature amount of the candidate image and the tracking target person image. Calculate the degree of similarity with the feature amount. The video analysis engines VE1 to VEn output video analysis results including video data and determination results of the same person.
Webサーバ装置SVの通信I/F5は、映像解析エンジンVE1~VEnから映像データ及び同一人物判定を受信する。第1の情報取得部11は、映像解析エンジンVE1~VEnからの映像データ及び同一人物判定を取得する(ST3)。検出部12は、映像解析エンジンVE1~VEnからの映像データと同一人物判定とを統合的に解析し、映像解析エンジンVE1~VEnからの映像データに含まれる複数のフレームから追跡対象者画像を検出する(ST4)。追跡部13は、追跡対象者画像の検出結果に基づき追跡対象者を追跡する(ST5)。出力部14は、追跡対象者の追跡結果を出力する(ST6)。
The communication I / F5 of the Web server device SV receives the video data and the same person determination from the video analysis engines VE1 to VEn. The first information acquisition unit 11 acquires video data from the video analysis engines VE1 to VEn and determination of the same person (ST3). The detection unit 12 comprehensively analyzes the video data from the video analysis engines VE1 to VEn and the same person determination, and detects the tracking target person image from a plurality of frames included in the video data from the video analysis engines VE1 to VEn. (ST4). The tracking unit 13 tracks the tracked subject based on the detection result of the tracked subject image (ST5). The output unit 14 outputs the tracking result of the tracking target person (ST6).
図5は、この発明の一実施形態に係るWebサーバ装置による追跡処理の第1例を示すフローチャートである。図4に示す検出部12による追跡対象者画像の検出(ST4)について更に詳しく説明する。
FIG. 5 is a flowchart showing a first example of tracking processing by the Web server device according to the embodiment of the present invention. The detection (ST4) of the tracked subject image by the detection unit 12 shown in FIG. 4 will be described in more detail.
図5に示すように、検出部12は、映像解析結果及び監視カメラC1~Cnからのそれぞれの映像データに基づき追跡対象者画像を検出する。例えば、検出部12は、類似度閾値と映像解析エンジンVE1~VEnで算出された類似度とを比較し(ST412)、類似度閾値を超える候補者画像を抽出し(ST413)、さらに同一人物の候補者画像を抽出し(ST414)、抽出された候補者画像を追跡対象者画像として検出する(ST415)。
As shown in FIG. 5, the detection unit 12 detects the tracked person image based on the video analysis result and the respective video data from the surveillance cameras C1 to Cn. For example, the detection unit 12 compares the similarity threshold value with the similarity calculated by the video analysis engines VE1 to VEn (ST412), extracts candidate images exceeding the similarity threshold value (ST413), and further extracts the same person. The candidate image is extracted (ST414), and the extracted candidate image is detected as the tracking target image (ST415).
例えば、図8に示すように、類似度閾値として「25」が設定されるケースを想定する。また、監視カメラC1からの映像データに含まれる所定のフレームの候補者画像の特徴量と追跡対象者画像の特徴量との類似度として「29」が検出され、監視カメラC2からの映像データに含まれる所定のフレームの候補者画像の特徴量と追跡対象者画像の特徴量との類似度として「27」が検出されるケースを想定する。検出部12は、類似度閾値「29」を超える候補者画像を追跡対象者画像として検出する。
For example, as shown in FIG. 8, it is assumed that "25" is set as the similarity threshold. Further, "29" is detected as the degree of similarity between the feature amount of the candidate image of the predetermined frame included in the video data from the surveillance camera C1 and the feature amount of the tracked subject image, and the video data from the surveillance camera C2 It is assumed that "27" is detected as the degree of similarity between the feature amount of the candidate image of the included predetermined frame and the feature amount of the tracked subject image. The detection unit 12 detects the candidate image exceeding the similarity threshold value “29” as the tracking target person image.
追跡部13は、追跡対象者画像の検出結果に基づき追跡対象者を追跡する。出力部14は、追跡対象者の追跡結果を出力する。複数人の追跡対象者画像が検出された場合は、出力部14は、追跡者毎に追跡結果を出力する。追跡結果は、追跡対象者画像、追跡対象者画像を撮影したカメラID、及び追跡対象者画像の撮影時刻を含む。出力部14は、監視カメラC1~Cnからの映像データに含まれる複数のフレームの時刻情報に基づき時系列に追跡結果を出力する。
The tracking unit 13 tracks the tracking target person based on the detection result of the tracking target person image. The output unit 14 outputs the tracking result of the tracking target person. When a plurality of tracked person images are detected, the output unit 14 outputs the tracking result for each tracker. The tracking result includes the tracking target person image, the camera ID that captured the tracking target person image, and the shooting time of the tracking target person image. The output unit 14 outputs the tracking result in time series based on the time information of a plurality of frames included in the video data from the surveillance cameras C1 to Cn.
図6は、この発明の一実施形態に係るWebサーバ装置による追跡処理の第2例を示すフローチャートである。第2例では、対象となるフレーム数を動的に変化させるケースについて説明する。図4に示す検出部12による追跡対象者画像の検出(ST4)の出力について更に詳しく説明する。
FIG. 6 is a flowchart showing a second example of tracking processing by the Web server device according to the embodiment of the present invention. In the second example, a case where the number of target frames is dynamically changed will be described. The output of the detection (ST4) of the tracking target person image by the detection unit 12 shown in FIG. 4 will be described in more detail.
追跡対象者の検出に応じてアラートを出力する場合、そのアラートの出力頻度を抑制したい場合がある。また、追跡対象者の検出精度は必ずしも100%ではなく、誤検知の可能性もあり得る。このような検出精度の事情を加味して、アラートの出力頻度を抑制した場合もある。そこで、アラートの出力程度を制御するアラート判定条件を利用する。例えば、第1のアラート判定条件は、標準的にアラートの出力を受けたい場合に適用される条件であり、第2のアラート判定条件は、アラートを抑制したい場合に適用される条件である。
When an alert is output according to the detection of the tracked person, it may be desired to suppress the output frequency of the alert. In addition, the detection accuracy of the tracked subject is not always 100%, and there is a possibility of false detection. In some cases, the frequency of alert output may be suppressed in consideration of such circumstances of detection accuracy. Therefore, an alert judgment condition that controls the degree of alert output is used. For example, the first alert determination condition is a condition applied when it is desired to receive an alert output as standard, and the second alert determination condition is a condition applied when it is desired to suppress an alert.
アラートの抑制が指定されていなければ、つまり、第1のアラート判定条件が設定されている場合(ST421、YES)、検出部12は、連続する第1の所定数のフレーム(比較的短い時間の撮影画像)における候補者画像の特徴量を検出し、連続する第1の所定数のフレームにおける候補者画像の特徴量と追跡対象者画像の特徴量との類似度を算出する(ST423)。さらに、検出部12は、類似度閾値と算出された類似度とを比較し(ST425)、類似度閾値を超える候補者画像を抽出し(ST426)、さらに同一人物の候補者画像を抽出する(ST427)。
If alert suppression is not specified, that is, if the first alert determination condition is set (ST421, YES), the detection unit 12 will perform a first predetermined number of consecutive frames (relatively short time). The feature amount of the candidate image in the captured image) is detected, and the similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in the first predetermined number of consecutive frames is calculated (ST423). Further, the detection unit 12 compares the similarity threshold value with the calculated similarity degree (ST425), extracts a candidate image exceeding the similarity degree threshold value (ST426), and further extracts a candidate image of the same person (ST426). ST427).
さらに、検出部12は、第1の所定数のフレームにおける候補者画像の類似度の平均値を算出し(ST428)、平均類似度閾値と算出された平均類似度とを比較し(ST429)、算出された平均類似度が平均類似度閾値を超える場合に、追跡対象者アラートを必要と判定し、追跡対象者アラートを設定する(ST430)。出力部14は、このように追跡対象者アラートが必要と判定された場合に、追跡対象者アラートを含む追跡結果を出力する。
Further, the detection unit 12 calculates the average value of the similarity of the candidate images in the first predetermined number of frames (ST428), compares the average similarity threshold with the calculated average similarity (ST429), and compares it with the calculated average similarity. When the calculated average similarity exceeds the average similarity threshold value, it is determined that the tracked subject alert is necessary, and the tracked subject alert is set (ST430). When it is determined that the tracked person alert is necessary, the output unit 14 outputs the tracking result including the tracked person alert.
アラートの抑制が指定されていれば、つまり、第2のアラート判定条件が設定されている場合(ST422、YES)、検出部12は、第1の所定数より多く連続する第2の所定数のフレーム(比較的長い時間の撮影画像)における候補者画像の特徴量を検出し、連続する第2の所定数のフレームにおける候補者画像の特徴量と追跡対象者画像の特徴量との類似度を算出する(ST424)。さらに、検出部12は、類似度閾値と算出された類似度とを比較し(ST425)、類似度閾値を超える候補者画像を抽出し(ST426)、さらに同一人物の候補者画像を抽出する(ST427)。
If suppression of alerts is specified, that is, when the second alert determination condition is set (ST422, YES), the detection unit 12 has a second predetermined number of consecutive numbers more than the first predetermined number. The feature amount of the candidate image in the frame (the image taken for a relatively long time) is detected, and the similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a second predetermined number of consecutive frames is determined. Calculate (ST424). Further, the detection unit 12 compares the similarity threshold value with the calculated similarity degree (ST425), extracts a candidate image exceeding the similarity degree threshold value (ST426), and further extracts a candidate image of the same person (ST426). ST427).
さらに、検出部12は、第2の所定数のフレームにおける候補者画像の類似度の平均値を算出し(ST428)、平均類似度閾値と算出された平均類似度とを比較し(ST429)、算出された平均類似度が平均類似度閾値を超える場合に、追跡対象者アラートを必要と判定し、追跡対象者アラートを設定する(ST430)。出力部14は、このように追跡対象者アラートが必要と判定された場合に、追跡対象者アラートを含む追跡結果を出力する。
Further, the detection unit 12 calculates the average value of the similarity of the candidate images in the second predetermined number of frames (ST428), and compares the average similarity threshold with the calculated average similarity (ST429). When the calculated average similarity exceeds the average similarity threshold value, it is determined that the tracked subject alert is necessary, and the tracked subject alert is set (ST430). When it is determined that the tracked person alert is necessary, the output unit 14 outputs the tracking result including the tracked person alert.
本実施形態によれば、複数のカメラからの映像データに基づく監視員の監視負担の軽減を図るシステム、装置、方法、プログラムを提供することができる。本実施形態のWebサーバ装置SVは、リアルタイムに、複数の監視カメラC1~Cnからの映像データを統合的に解析し追跡対象者画像を検出し、追跡対象者画像の検出結果に基づき追跡対象者を追跡し、追跡対象者の追跡結果を出力する。
According to this embodiment, it is possible to provide a system, an apparatus, a method, and a program for reducing the monitoring burden of a watchman based on video data from a plurality of cameras. The Web server device SV of the present embodiment comprehensively analyzes video data from a plurality of surveillance cameras C1 to Cn in real time to detect a tracked subject image, and the tracked subject is based on the detection result of the tracked subject image. And output the tracking result of the tracked person.
統合的な解析とは、監視カメラC1~Cnからの映像データをカメラ別に解析した映像解析結果と、監視カメラC1~Cnからの映像データとに基づく映像解析である。例えば、映像解析エンジンVE1~VEnが、映像データをカメラ別に解析する処理を担い、Webサーバ装置SVは、映像解析エンジンVE1~VEnによるカメラ別の映像解析結果を用いて、監視カメラC1~Cnからの映像データから追跡対象者画像を検出する。
The integrated analysis is a video analysis based on the video analysis result obtained by analyzing the video data from the surveillance cameras C1 to Cn for each camera and the video data from the surveillance cameras C1 to Cn. For example, the video analysis engines VE1 to VEn are responsible for analyzing the video data for each camera, and the Web server device SV uses the video analysis results for each camera by the video analysis engines VE1 to VEn from the surveillance cameras C1 to Cn. The tracked person image is detected from the video data of.
このような統合的な解析により、カメラ別ではなく、追跡対象者毎に追跡結果を出力することができる。例えば、モニタ装置MTは、追跡対象者毎に追跡結果を時系列に表示する。追跡対象者が、複数の監視カメラC1~Cnに跨って撮影された場合でも、モニタ装置MTは、同一の追跡対象者の画像を纏めて時系列に表示する。これにより、監視員は、複数のモニタに跨って同一の追跡対象者を目視で追いかける必要がなく、監視員の監視負担が軽減される。
With such an integrated analysis, it is possible to output tracking results for each tracked person, not for each camera. For example, the monitoring device MT displays tracking results in chronological order for each tracking target person. Even when the tracking target person is photographed across a plurality of surveillance cameras C1 to Cn, the monitoring device MT collectively displays the images of the same tracking target person in chronological order. As a result, the observer does not have to visually chase the same tracked person across a plurality of monitors, and the monitoring burden of the observer is reduced.
また、本実施形態のWebサーバ装置SVは、画像又は音声等の追跡対象者アラートを含む追跡結果を出力する。例えば、モニタ装置MTは、アラートを示す記号、マーク、又は枠等により、検出された追跡対象者の画像を強調して表示する。これにより、監視員は、追跡追対象者を見落とすことなく目視確認することができる。
Further, the Web server device SV of the present embodiment outputs a tracking result including a tracking target person alert such as an image or a voice. For example, the monitoring device MT highlights and displays the detected image of the tracked person by a symbol, a mark, a frame, or the like indicating an alert. As a result, the observer can visually confirm the person to be tracked without overlooking it.
また、Webサーバ装置SVは、監視カメラC1~Cnからの映像データに含まれる連続する複数のフレームにおける候補者画像の特徴量と対象者画像の特徴量との平均類似度に基づき対象者アラートの要否を判定する。Webサーバ装置SVは、対象者アラートが必要と判定した場合に、対象者アラートを含む追跡結果を出力する。例えば、Webサーバ装置SVは、算出される平均類似度が平均類似度閾値を超えるか否かで、対象者アラートの要否を判定する。平均類似度閾値を高く設定すれば、対象者アラートの感度を抑えることができ、逆に、平均類似度閾値を低く設定すれば、対象者アラートの感度を上げることができる。
Further, the Web server device SV is a target person alert based on the average similarity between the feature amount of the candidate image and the feature amount of the target person image in a plurality of consecutive frames included in the video data from the surveillance cameras C1 to Cn. Judge the necessity. When the Web server device SV determines that the target person alert is necessary, the Web server device SV outputs the tracking result including the target person alert. For example, the Web server device SV determines whether or not the target person alert is necessary depending on whether or not the calculated average similarity exceeds the average similarity threshold. If the average similarity threshold is set high, the sensitivity of the subject alert can be suppressed, and conversely, if the average similarity threshold is set low, the sensitivity of the subject alert can be increased.
また、Webサーバ装置SVは、平均類似度の算出対象となるフレーム数を動的に設定して、対象アラートの感度を制御してもよい。例えば、対象者アラートの感度を特に抑制する必要がなく、第1のアラート判定条件が設定される場合に、Webサーバ装置SVは、連続する第1の所定数のフレームにおける候補者画像の特徴量と対象者画像の特徴量との平均類似度に基づき対象者アラートの要否を判定する。逆に、対象者アラートの感度を抑制する必要があり、第2のアラート判定条件が設定される場合に、Webサーバ装置SVは、第1の所定数より多く連続する第2の所定数のフレームにおける候補者画像の特徴量と対象者画像の特徴量との平均類似度に基づき対象者アラートの要否を判定する。このように、判定条件に応じて、平均類似度の算出対象となるフレーム数を動的に変化させることにより、対象者アラートの感度を制御することができる。
Further, the Web server device SV may dynamically set the number of frames for which the average similarity is calculated to control the sensitivity of the target alert. For example, when it is not necessary to particularly suppress the sensitivity of the target person alert and the first alert determination condition is set, the Web server device SV has the feature amount of the candidate image in the first predetermined number of consecutive frames. The necessity of the subject alert is determined based on the average similarity between the subject and the feature amount of the subject image. On the contrary, it is necessary to suppress the sensitivity of the target person alert, and when the second alert determination condition is set, the Web server device SV has a second predetermined number of frames that are continuous more than the first predetermined number. The necessity of the subject alert is determined based on the average similarity between the feature amount of the candidate image and the feature amount of the subject image in. In this way, the sensitivity of the target person alert can be controlled by dynamically changing the number of frames for which the average similarity is calculated according to the determination condition.
本実施形態に係るプログラムは、電子機器に記憶された状態で譲渡されてよいし、記憶媒体に記憶された状態で譲渡されてもよいし、ネットワーク等を介したダウンロードにより譲渡されてもよい。記録媒体は、磁気ディスク、光ディスク、又はフラッシュメモリ等の非一時的なコンピュータ可読記憶媒体である。
なお、本発明は、上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は適宜組み合わせて実施してもよく、その場合組み合わせた効果が得られる。更に、上記実施形態には種々の発明が含まれており、開示される複数の構成要件から選択された組み合わせにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、課題が解決でき、効果が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。 The program according to the present embodiment may be transferred in a state of being stored in an electronic device, may be transferred in a state of being stored in a storage medium, or may be transferred by downloading via a network or the like. The recording medium is a non-temporary computer-readable storage medium such as a magnetic disk, an optical disk, or a flash memory.
The present invention is not limited to the above embodiment, and can be variously modified at the implementation stage without departing from the gist thereof. In addition, each embodiment may be carried out in combination as appropriate, in which case the combined effect can be obtained. Further, the above-described embodiment includes various inventions, and various inventions can be extracted by a combination selected from a plurality of disclosed constituent requirements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiment, if the problem can be solved and the effect is obtained, the configuration in which the constituent elements are deleted can be extracted as an invention.
なお、本発明は、上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は適宜組み合わせて実施してもよく、その場合組み合わせた効果が得られる。更に、上記実施形態には種々の発明が含まれており、開示される複数の構成要件から選択された組み合わせにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、課題が解決でき、効果が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。 The program according to the present embodiment may be transferred in a state of being stored in an electronic device, may be transferred in a state of being stored in a storage medium, or may be transferred by downloading via a network or the like. The recording medium is a non-temporary computer-readable storage medium such as a magnetic disk, an optical disk, or a flash memory.
The present invention is not limited to the above embodiment, and can be variously modified at the implementation stage without departing from the gist thereof. In addition, each embodiment may be carried out in combination as appropriate, in which case the combined effect can be obtained. Further, the above-described embodiment includes various inventions, and various inventions can be extracted by a combination selected from a plurality of disclosed constituent requirements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiment, if the problem can be solved and the effect is obtained, the configuration in which the constituent elements are deleted can be extracted as an invention.
1…制御部
2…プログラム記憶部
3…データ記憶部
4…入出力インタフェース(入出力I/F)
5…通信インタフェース(通信I/F)
6…バス
11…情報取得部
12…検出部
13…追跡部
14…出力部
31…カメラ情報テーブル
32…設定情報テーブル
33…追跡結果テーブル
C1、C2、Cn…監視カメラ
MT…モニタ装置
NW…ネットワーク
OT…管理者端末
SV…サーバ装置
VE1、VEn…映像解析エンジン
1 ...Control unit 2 ... Program storage unit 3 ... Data storage unit 4 ... Input / output interface (input / output I / F)
5 ... Communication interface (communication I / F)
6 ...Bus 11 ... Information acquisition unit 12 ... Detection unit 13 ... Tracking unit 14 ... Output unit 31 ... Camera information table 32 ... Setting information table 33 ... Tracking result table C1, C2, Cn ... Surveillance camera MT ... Monitor device NW ... Network OT ... Administrator terminal SV ... Server device VE1, VEN ... Video analysis engine
2…プログラム記憶部
3…データ記憶部
4…入出力インタフェース(入出力I/F)
5…通信インタフェース(通信I/F)
6…バス
11…情報取得部
12…検出部
13…追跡部
14…出力部
31…カメラ情報テーブル
32…設定情報テーブル
33…追跡結果テーブル
C1、C2、Cn…監視カメラ
MT…モニタ装置
NW…ネットワーク
OT…管理者端末
SV…サーバ装置
VE1、VEn…映像解析エンジン
1 ...
5 ... Communication interface (communication I / F)
6 ...
Claims (8)
- 複数のカメラからの映像データを統合的に解析し対象者画像を検出する検出部と、
対象者画像の検出結果に基づき対象者を追跡する追跡部と、
対象者の追跡結果を出力する出力部と、
を備える情報処理装置。 A detector that comprehensively analyzes video data from multiple cameras and detects the target person's image,
A tracking unit that tracks the subject based on the detection result of the subject image,
An output unit that outputs the tracking result of the target person,
Information processing device equipped with. - 前記検出部は、複数のカメラからのそれぞれの映像データにおける同一人物の判定結果に基づき対象者画像を検出する請求項1の情報処理装置。 The information processing device according to claim 1, wherein the detection unit detects a target person image based on a determination result of the same person in each video data from a plurality of cameras.
- 前記検出部は、前記複数のカメラからの映像データから得られる候補者画像の特徴量と指定される対象者画像の特徴量との類似度に基づき対象者画像を検出する請求項1又は2の情報処理装置。 The detection unit detects the target image based on the degree of similarity between the feature amount of the candidate image obtained from the video data from the plurality of cameras and the feature amount of the designated target image, according to claim 1 or 2. Information processing device.
- 前記検出部は、各カメラからの映像データに含まれる連続する複数のフレームにおける候補者画像の特徴量と対象者画像の特徴量との平均類似度に基づき対象者アラートの要否を判定し、
前記出力部は、前記対象者アラートが必要と判定された場合に、前記対象者アラートを含む追跡結果を出力する請求項3の情報処理装置。 The detection unit determines the necessity of the target person alert based on the average similarity between the feature amount of the candidate image and the feature amount of the target person image in a plurality of consecutive frames included in the video data from each camera.
The information processing device according to claim 3, wherein the output unit outputs a tracking result including the target person alert when it is determined that the target person alert is necessary. - 前記出力部は、前記複数のカメラからの映像データに含まれる複数のフレームの時刻情報に基づき時系列に追跡結果を出力する請求項1の情報処理装置。 The information processing device according to claim 1, wherein the output unit outputs tracking results in time series based on time information of a plurality of frames included in video data from the plurality of cameras.
- 複数のカメラからのそれぞれの映像データを解析する映像解析部と、
前記映像解析部からの解析結果に基づき、複数のカメラからの映像データを統合的に解析し対象者画像を検出する検出部と、
前記対象者画像の検出結果に基づき対象者を追跡する追跡部と、
対象者の追跡結果を出力する出力部と、
を備える情報処理システム。 A video analysis unit that analyzes each video data from multiple cameras,
Based on the analysis result from the video analysis unit, the detection unit that comprehensively analyzes the video data from multiple cameras and detects the target person's image.
A tracking unit that tracks the subject based on the detection result of the subject image,
An output unit that outputs the tracking result of the target person,
Information processing system equipped with. - 複数のカメラからの映像データを統合的に解析し対象者画像を検出し、
対象者画像の検出結果に基づき対象者を追跡し、
対象者の追跡結果を出力する情報処理方法。 The video data from multiple cameras is analyzed in an integrated manner to detect the target person's image, and
Track the subject based on the detection result of the subject image,
An information processing method that outputs the tracking results of the target person. - 請求項1乃至5の何れかの情報処理装置が備える各部による処理をプロセッサに実行させるプログラム。
A program for causing a processor to execute processing by each part included in the information processing apparatus according to any one of claims 1 to 5.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020135111A JP7479987B2 (en) | 2020-08-07 | 2020-08-07 | Information processing device, information processing method, and program |
JP2020-135111 | 2020-08-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022030546A1 true WO2022030546A1 (en) | 2022-02-10 |
Family
ID=80117513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/028961 WO2022030546A1 (en) | 2020-08-07 | 2021-08-04 | Information processing device, information processing method, and program |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP7479987B2 (en) |
WO (1) | WO2022030546A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012208610A (en) * | 2011-03-29 | 2012-10-25 | Secom Co Ltd | Face image authentication apparatus |
JP2016131288A (en) * | 2015-01-13 | 2016-07-21 | 東芝テック株式会社 | Information processing apparatus and program |
JP2020047069A (en) * | 2018-09-20 | 2020-03-26 | 株式会社日立製作所 | Information processing system, and method and program for controlling information processing system |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101434768B1 (en) * | 2010-02-19 | 2014-08-27 | 가부시끼가이샤 도시바 | Moving object tracking system and moving object tracking method |
US9697420B2 (en) * | 2013-02-15 | 2017-07-04 | Nec Corporation | Information processing system, information processing method, and computer-readable recording medium |
JP7039409B2 (en) * | 2018-07-18 | 2022-03-22 | 株式会社日立製作所 | Video analysis device, person search system and person search method |
-
2020
- 2020-08-07 JP JP2020135111A patent/JP7479987B2/en active Active
-
2021
- 2021-08-04 WO PCT/JP2021/028961 patent/WO2022030546A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012208610A (en) * | 2011-03-29 | 2012-10-25 | Secom Co Ltd | Face image authentication apparatus |
JP2016131288A (en) * | 2015-01-13 | 2016-07-21 | 東芝テック株式会社 | Information processing apparatus and program |
JP2020047069A (en) * | 2018-09-20 | 2020-03-26 | 株式会社日立製作所 | Information processing system, and method and program for controlling information processing system |
Also Published As
Publication number | Publication date |
---|---|
JP7479987B2 (en) | 2024-05-09 |
JP2022030832A (en) | 2022-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11157778B2 (en) | Image analysis system, image analysis method, and storage medium | |
JP7497853B2 (en) | Face Detection System | |
JP2021072475A (en) | Monitoring system and monitoring system setting program | |
WO2018179202A1 (en) | Information processing device, control method, and program | |
CN113850849A (en) | Object tracking using multi-camera system | |
JP2019153986A (en) | Monitoring system, management apparatus, monitoring method, computer program, and storage medium | |
JP6485978B2 (en) | Image processing apparatus and image processing system | |
WO2022030546A1 (en) | Information processing device, information processing method, and program | |
JP7329967B2 (en) | IMAGE PROCESSING APPARATUS, SYSTEM, IMAGE PROCESSING APPARATUS CONTROL METHOD, AND PROGRAM | |
WO2022030548A1 (en) | Monitoring information processing device, method, and program | |
JP7210163B2 (en) | Image processing device, image processing method and program | |
KR101082026B1 (en) | Apparatus and method for displaying event moving picture | |
JP2020166590A (en) | Monitoring system, monitoring device, monitoring method, and monitoring program | |
EP4125002A2 (en) | A video processing apparatus, method and computer program | |
JP2022185634A (en) | Terminal device, information processing device, information processing method, information processing program and information processing system | |
JP2008085832A (en) | Monitoring camera, control method of monitoring camera, and monitoring camera system | |
JP7520662B2 (en) | Information processing device, information processing method, and program | |
JP7520663B2 (en) | Information processing device, information processing method, and program | |
WO2022030549A1 (en) | Information retrieval device, information retrieval method, and program | |
JP5618366B2 (en) | Monitoring system, monitoring device, monitoring method, and program | |
JP7479988B2 (en) | Monitoring information processing device, monitoring information processing method, and monitoring information processing program | |
GB2572007A (en) | A method and user device for displaying video data, a method and apparatus for streaming video data and a video surveillance system | |
JP2020078030A (en) | System, information processing device, information processing method, and program | |
JP2020102676A (en) | Information processing device, information processing method, and program | |
JP7520664B2 (en) | Information processing device, information processing method, and information processing program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21854532 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21854532 Country of ref document: EP Kind code of ref document: A1 |