WO2022030546A1

WO2022030546A1 - Information processing device, information processing method, and program

Info

Publication number: WO2022030546A1
Application number: PCT/JP2021/028961
Authority: WO
Inventors: 鮎美松本; 哲希柴田; 育弘宇田; 真一根本; 篤佐藤; 知也児玉; 貴司塩崎
Original assignee: エヌ・ティ・ティ・コミュニケーションズ株式会社
Priority date: 2020-08-07
Filing date: 2021-08-04
Publication date: 2022-02-10
Also published as: JP2022030832A; JP7479987B2

Abstract

Provided is a technique for alleviating the burden of monitoring that is performed by monitoring personnel on the basis of video data from a plurality of cameras.　An information processing device according to one aspect of the present invention is provided with: a detection unit that detects an image of a subject through analyzing video data from a plurality of cameras comprehensively; a tracking unit that tracks the subject on the basis of a result of detection of the image of the subject; and an output unit that outputs a result of tracking of the subject.

Description

Information processing equipment, information processing methods, and programs

An embodiment of the present invention relates to an information processing device, an information processing method, and a program for detecting an image of a target person by analyzing video data from a surveillance camera, for example.

In recent years, cameras have been installed in various places as part of crime prevention measures. The camera captures the monitored area and outputs video data. An information processing device such as a general-purpose personal computer receives video data from a camera, stores the received video data in a storage unit, analyzes the video data, and detects an image of a target person. In addition, the information processing device displays the detected image of the target person on a monitor or the like.

For example, in a facility used by many people, such as a large-scale store, an office building, or a station yard, multiple cameras are installed. Further, in the monitoring room or the like, a plurality of monitors corresponding to a plurality of cameras are installed. The information processing device receives the video data from each camera, analyzes each of the video data from each camera, and detects the image of the target person. An image of the target person detected from the video data from each camera is displayed on each monitor. The observer visually confirms the target person displayed on each monitor.

In order to monitor a person, a technique for detecting the target person with high accuracy is required, and a technique related to this is proposed (see, for example, Patent Document 1).

Japanese Patent Application Laid-Open No. 2019-164422

There are some proposals for technology to detect the target person with high accuracy, but there is a request to reduce the burden of monitoring the person.

As described above, when the analysis result (image of the target person) of the video data from multiple cameras is shared and displayed on multiple monitors, the observer visually tracks the target person across the multiple monitors. Will be done. In this case, the observer will follow the displays of a plurality of monitors at the same time, which not only imposes a heavy monitoring burden on the observer, but may also cause oversight.

The present invention has been made by paying attention to the above circumstances, and is intended to provide a technique for reducing the monitoring burden of a watchman based on video data from a plurality of cameras.

In order to solve the above problems, the information processing apparatus according to one aspect of the present invention has a detection unit that comprehensively analyzes video data from a plurality of cameras to detect a target person image, and a target based on the detection result of the target person image. It is provided with a tracking unit for tracking a person and an output unit for outputting the tracking result of the target person.

According to one aspect of the present invention, it is possible to provide a technique for reducing the monitoring burden of a watchman based on video data from a plurality of cameras.

FIG. 1 is a diagram showing an example of a configuration of a monitoring system including a monitoring information processing apparatus according to an embodiment of the present invention. FIG. 2 is a block diagram showing an example of a hardware configuration of a Web server device used as a monitoring information processing device according to an embodiment of the present invention. FIG. 3 is a block diagram showing an example of a software configuration of a Web server device used as a monitoring information processing device according to an embodiment of the present invention. FIG. 4 is a flowchart showing an example of tracking processing by the system according to the embodiment of the present invention. FIG. 5 is a flowchart showing a first example of tracking processing by the Web server device according to the embodiment of the present invention. FIG. 6 is a flowchart showing a second example of tracking processing by the Web server device according to the embodiment of the present invention. FIG. 7 is a conceptual diagram showing an example of video analysis by the video analysis engine according to the embodiment of the present invention. FIG. 8 is a conceptual diagram showing an example of integrated video analysis by a Web server device used as a monitoring information processing device according to an embodiment of the present invention.

Hereinafter, embodiments according to the present invention will be described with reference to the drawings.

[One Embodiment]
(Configuration example)
(1) System FIG. 1 is a diagram showing an overall configuration of a system including a monitoring information processing apparatus according to an embodiment of the present invention.
For example, a plurality of surveillance cameras C1 to Cn are distributed and arranged in the aisles and sales floors of large-scale stores such as shopping malls and department stores. Surveillance cameras C1 to Cn are attached to, for example, a ceiling or a wall surface, capture images of each surveillance area, and output video data thereof.

For example, the surveillance cameras C1 to Cn are equipped with video analysis engines VE1 to VEn, respectively. The video analysis engines VE1 to VEn correspond to the video analysis unit, and the video analysis unit analyzes each video data from the surveillance cameras C1 to Cn. For example, the video analysis engines VE1 to VEn each perform intra-angle tracking on a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn, and the images are imaged from the plurality of image frames. The same person image is determined based on the position information in the frame.

The video analysis engines VE1 to VEn are not arranged one-to-one with respect to the surveillance cameras C1 to Cn, but a smaller number of video analysis engines are arranged for a plurality of cameras, and a plurality of video analysis engines are used. The video data of the surveillance camera may be collectively processed.

Further, the system of one embodiment includes a Web server device SV used as a monitoring information processing device. The video analysis engines VE1 to VEn are capable of data communication with the Web server device SV via the network NW, and transmit the generated video analysis result to the Web server device SV via the network NW. For the network NW, for example, a wired LAN (Local Area Network) or a wireless LAN is used, but any other network may be used.

The web server device SV includes video analysis engines VE1 to VEN or one video analysis engine, and the video analysis engines VE1 to VEN of the web server device SV or one video analysis engine is a surveillance camera via a network NW. Each video data from C1 to Cn may be received and the received video data may be analyzed.

(2) Web server device SV
2 and 3 are block diagrams showing an example of a hardware configuration and a software configuration of the Web server device SV, respectively.
The Web server device SV includes a control unit 1 having a hardware processor such as a central processing unit (CPU), and the program storage unit 2 and a data storage unit are provided to the control unit 1 via a bus 6. The storage unit having 3 is connected to the input / output interface (input / output I / F) 4 and the communication interface (communication I / F) 5.

For example, a monitor device MT and an administrator terminal OT are connected to the input / output I / F4. The monitoring device MT is used for the observer to visually monitor the monitoring area, and displays images of the surveillance cameras C1 to Cn, information indicating the detection result or the tracking result of the query to be monitored, and the like.

The administrator terminal OT is used by the system administrator for system management and maintenance, and displays various setting screens and information indicating the operating status in the system, and the system administrator manages and operates the system. When inputting various necessary data, it has a function of accepting the data and setting it in the Web server device SV.

The communication I / F5 transmits data between the video analysis engines VE1 to VEn using a communication protocol defined by the network NW under the control of the control unit 1, for example, a wired LAN or a wireless LAN. It is composed of the corresponding interfaces.

The program storage unit 2 includes, for example, a non-volatile memory such as an HDD (Hard Disk Drive) or SSD (Solid State Drive) that can be written and read at any time as a storage medium, and a non-volatile memory such as a ROM (Read Only Memory). In addition to middleware such as an OS (Operating System), a program necessary for executing various control processes according to an embodiment of the present invention is stored.

The data storage unit 3 is, for example, a combination of a non-volatile memory such as an HDD or SSD capable of writing and reading at any time and a volatile memory such as a RAM (RandomAccessMemory) as a storage medium, and is one of the present inventions. A camera information table 31, a setting information table 32, and a tracking result table 33 are provided as a main storage unit necessary for implementing the embodiment.

The camera information table 31 stores, for example, information representing the name, performance, and installation position of the surveillance camera in association with the identification information (hereinafter referred to as the camera ID) for each of the surveillance cameras C1 to Cn. Information representing performance includes, for example, resolution and aspect ratio. The information indicating the installation position includes, for example, latitude / longitude, imaging direction, and imaging angle.

The setting information table 32 stores the image feature amount of the query. For example, the setting information table 32 stores the image feature amount of the query input from the administrator terminal OT via the input / output I / F4. Alternatively, the setting information table 32 stores the image feature amount of the query detected from the video data transmitted from the surveillance cameras C1 to Cn via the communication I / F5. Further, the setting information table 32 stores the alert determination condition input via the administrator terminal OT or the like. For example, the setting information table 32 stores the first or second alert determination condition input via the administrator terminal OT or the like.

Here, supplement the registration example of the query image. For example, based on the alert obtained in real time, the administrator presses the tracking button on the administrator terminal OT for the person (image) to be tracked. The control unit 1 automatically registers the latest set of detected images (face image and whole body image) as a query image (query image feature amount) in response to pressing the tracking button, and starts tracking. Also, based on the alert obtained in real time, the administrator presses the history button on the administrator terminal OT for the person (image) that the administrator wants to track. In response to pressing this history button, the control unit 1 selects an arbitrary image from the history list, registers it as a query image, and starts tracking. Further, the control unit 1 performs a person search from the image of the surveillance camera according to the history search from the administrator, registers the image selected from the person search results by the administrator as a query image, and starts tracking. .. Further, the administrator selects a person (image) included in the surveillance image data obtained in real time, and the control unit 1 registers the selected person as a query image and starts tracking. In addition, the administrator takes in the image provided by the requester from the administrator terminal OT, registers it as a query image, and starts tracking.

The tracking result table 33 stores the tracking results of the tracked person. For example, the tracking result table 33 stores tracking results for each tracking target person edited in chronological order.

The control unit 1 includes an information acquisition unit 11, a detection unit 12, a tracking unit 13, and an output unit 14 as processing functions according to an embodiment of the present invention. Each unit is realized by causing the hardware processor of the control unit 1 to execute the program stored in the program storage unit 2.

The information acquisition unit 11 acquires video data and video analysis results from the video analysis engines VE1 to VEn connected to the surveillance cameras C1 to Cn or the video analysis engines VE1 to VEn provided in the Web server device SV. For example, the video analysis engines VE1 to VEn each determine the same person from a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn based on the position information in the image frame, and the determination result is obtained. Outputs video analysis results including.

The detection unit 12 comprehensively analyzes the video analysis result and the video data from the surveillance cameras C1 to Cn to detect the tracked person image. The video analysis engines VE1 to VEn are, for example, from a plurality of image frames included in the video data from the surveillance cameras C1 to Cn based on the image feature amount (feature amount of the tracked person image) of the query given in advance. , A person image (tracked person image) having an image feature amount similar to the image feature amount of the query is extracted. For example, a plurality of queries are given in advance, and a plurality of person images having an image feature amount similar to the image feature amounts of the plurality of queries are extracted.

Further, the video analysis engines VE1 to VEn include information indicating the degree of similarity between the extracted person image and the query image, the camera IDs of the surveillance cameras C1 to Cn, the tracking ID within the angle of view, and the shooting time (date, time, minute, and second). ) And the video analysis result including. The portrait image includes a face image and a whole body image, and the similarity information includes the similarity corresponding to each of the face image and the whole body image. The camera ID is identification information unique to the surveillance camera. The in-angle tracking ID is an ID for tracking images regarded as the same person in the same surveillance camera.

Further, the detection unit 12 alerts the tracked subject based on the average similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a plurality of consecutive frames included in the video data from the surveillance cameras C1 to Cn. May be determined. For example, the detection unit 12 follows the first alert determination condition input via the administrator terminal OT or the like when the alert frequency may be slightly higher, and the feature amount of the candidate image in a predetermined number of consecutive frames. The necessity of the tracked person alert is determined based on the average similarity between the image and the feature amount of the tracked person image. When the average similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a predetermined number of consecutive frames exceeds the first average similarity threshold value, the detection unit 12 requires a tracker alert. judge. When the average similarity does not exceed the first average similarity threshold, the detection unit 12 determines that the tracker alert is unnecessary.

Further, the detection unit 12 follows the second alert determination condition input via the administrator terminal OT or the like when it is desired to suppress the alert, and the feature amount of the candidate image and the tracking target person image in a predetermined number of consecutive frames. The necessity of the tracked person alert is determined based on the average similarity with the feature amount of. When the average similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a predetermined number of consecutive frames exceeds the second average similarity threshold value higher than the first average similarity threshold value. The detection unit 12 determines that the tracker alert is necessary. When the average similarity does not exceed the second average similarity threshold, the detection unit 12 determines that the tracker alert is unnecessary.

The tracking unit 13 tracks the tracking target person based on the detection result of the tracking target person image. For example, the tracking unit 13 generates tracking results of a time-series tracking target person based on the time information of a plurality of frames included in the video data from the surveillance cameras C1 to Cn.

The output unit 14 outputs the tracking result of the tracked person. For example, the output unit 14 outputs a tracking result for display on the monitoring device MT. Further, the output unit 14 outputs the tracking result to be stored in the tracking result table 33. For example, the output tracking result is a tracking result of a time-series tracked person.

In the above description, the case where the tables 31 to 33 provided in the data storage unit 3 are provided in the Web server device SV is taken as an example. However, the present invention is not limited to this, and it may be provided in a database server or a file server arranged outside the Web server device SV. In this case, the Web server device SV accesses each of the tables 31 to 33 in the database server or the file server and performs each process by acquiring necessary information.

(Operation example)
Next, an operation example of the system configured as described above will be described.
FIG. 4 is a flowchart showing an example of tracking processing by the system according to the embodiment of the present invention.
Surveillance cameras C1 to Cn start shooting and output video data (ST1). The video analysis engines VE1 to VEN analyze video data from the corresponding surveillance cameras C1 to Cn, respectively (ST2). For example, the video analysis engines VE1 to VEn each perform intra-angle tracking on a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn, and the images are imaged from the plurality of image frames. The same person is determined based on the position information in the frame. Further, the video analysis engines VE1 to VEn detect the feature amount of the candidate image obtained from a plurality of frames included in the video data from the surveillance cameras C1 to Cn, and the feature amount of the candidate image and the tracking target person image. Calculate the degree of similarity with the feature amount. The video analysis engines VE1 to VEn output video analysis results including video data and determination results of the same person.

The communication I / F5 of the Web server device SV receives the video data and the same person determination from the video analysis engines VE1 to VEn. The first information acquisition unit 11 acquires video data from the video analysis engines VE1 to VEn and determination of the same person (ST3). The detection unit 12 comprehensively analyzes the video data from the video analysis engines VE1 to VEn and the same person determination, and detects the tracking target person image from a plurality of frames included in the video data from the video analysis engines VE1 to VEn. (ST4). The tracking unit 13 tracks the tracked subject based on the detection result of the tracked subject image (ST5). The output unit 14 outputs the tracking result of the tracking target person (ST6).

FIG. 5 is a flowchart showing a first example of tracking processing by the Web server device according to the embodiment of the present invention. The detection (ST4) of the tracked subject image by the detection unit 12 shown in FIG. 4 will be described in more detail.

As shown in FIG. 5, the detection unit 12 detects the tracked person image based on the video analysis result and the respective video data from the surveillance cameras C1 to Cn. For example, the detection unit 12 compares the similarity threshold value with the similarity calculated by the video analysis engines VE1 to VEn (ST412), extracts candidate images exceeding the similarity threshold value (ST413), and further extracts the same person. The candidate image is extracted (ST414), and the extracted candidate image is detected as the tracking target image (ST415).

For example, as shown in FIG. 8, it is assumed that "25" is set as the similarity threshold. Further, "29" is detected as the degree of similarity between the feature amount of the candidate image of the predetermined frame included in the video data from the surveillance camera C1 and the feature amount of the tracked subject image, and the video data from the surveillance camera C2 It is assumed that "27" is detected as the degree of similarity between the feature amount of the candidate image of the included predetermined frame and the feature amount of the tracked subject image. The detection unit 12 detects the candidate image exceeding the similarity threshold value “29” as the tracking target person image.

The tracking unit 13 tracks the tracking target person based on the detection result of the tracking target person image. The output unit 14 outputs the tracking result of the tracking target person. When a plurality of tracked person images are detected, the output unit 14 outputs the tracking result for each tracker. The tracking result includes the tracking target person image, the camera ID that captured the tracking target person image, and the shooting time of the tracking target person image. The output unit 14 outputs the tracking result in time series based on the time information of a plurality of frames included in the video data from the surveillance cameras C1 to Cn.

FIG. 6 is a flowchart showing a second example of tracking processing by the Web server device according to the embodiment of the present invention. In the second example, a case where the number of target frames is dynamically changed will be described. The output of the detection (ST4) of the tracking target person image by the detection unit 12 shown in FIG. 4 will be described in more detail.

When an alert is output according to the detection of the tracked person, it may be desired to suppress the output frequency of the alert. In addition, the detection accuracy of the tracked subject is not always 100%, and there is a possibility of false detection. In some cases, the frequency of alert output may be suppressed in consideration of such circumstances of detection accuracy. Therefore, an alert judgment condition that controls the degree of alert output is used. For example, the first alert determination condition is a condition applied when it is desired to receive an alert output as standard, and the second alert determination condition is a condition applied when it is desired to suppress an alert.

If alert suppression is not specified, that is, if the first alert determination condition is set (ST421, YES), the detection unit 12 will perform a first predetermined number of consecutive frames (relatively short time). The feature amount of the candidate image in the captured image) is detected, and the similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in the first predetermined number of consecutive frames is calculated (ST423). Further, the detection unit 12 compares the similarity threshold value with the calculated similarity degree (ST425), extracts a candidate image exceeding the similarity degree threshold value (ST426), and further extracts a candidate image of the same person (ST426). ST427).

Further, the detection unit 12 calculates the average value of the similarity of the candidate images in the first predetermined number of frames (ST428), compares the average similarity threshold with the calculated average similarity (ST429), and compares it with the calculated average similarity. When the calculated average similarity exceeds the average similarity threshold value, it is determined that the tracked subject alert is necessary, and the tracked subject alert is set (ST430). When it is determined that the tracked person alert is necessary, the output unit 14 outputs the tracking result including the tracked person alert.

If suppression of alerts is specified, that is, when the second alert determination condition is set (ST422, YES), the detection unit 12 has a second predetermined number of consecutive numbers more than the first predetermined number. The feature amount of the candidate image in the frame (the image taken for a relatively long time) is detected, and the similarity between the feature amount of the candidate image and the feature amount of the tracked subject image in a second predetermined number of consecutive frames is determined. Calculate (ST424). Further, the detection unit 12 compares the similarity threshold value with the calculated similarity degree (ST425), extracts a candidate image exceeding the similarity degree threshold value (ST426), and further extracts a candidate image of the same person (ST426). ST427).

Further, the detection unit 12 calculates the average value of the similarity of the candidate images in the second predetermined number of frames (ST428), and compares the average similarity threshold with the calculated average similarity (ST429). When the calculated average similarity exceeds the average similarity threshold value, it is determined that the tracked subject alert is necessary, and the tracked subject alert is set (ST430). When it is determined that the tracked person alert is necessary, the output unit 14 outputs the tracking result including the tracked person alert.

According to this embodiment, it is possible to provide a system, an apparatus, a method, and a program for reducing the monitoring burden of a watchman based on video data from a plurality of cameras. The Web server device SV of the present embodiment comprehensively analyzes video data from a plurality of surveillance cameras C1 to Cn in real time to detect a tracked subject image, and the tracked subject is based on the detection result of the tracked subject image. And output the tracking result of the tracked person.

The integrated analysis is a video analysis based on the video analysis result obtained by analyzing the video data from the surveillance cameras C1 to Cn for each camera and the video data from the surveillance cameras C1 to Cn. For example, the video analysis engines VE1 to VEn are responsible for analyzing the video data for each camera, and the Web server device SV uses the video analysis results for each camera by the video analysis engines VE1 to VEn from the surveillance cameras C1 to Cn. The tracked person image is detected from the video data of.

With such an integrated analysis, it is possible to output tracking results for each tracked person, not for each camera. For example, the monitoring device MT displays tracking results in chronological order for each tracking target person. Even when the tracking target person is photographed across a plurality of surveillance cameras C1 to Cn, the monitoring device MT collectively displays the images of the same tracking target person in chronological order. As a result, the observer does not have to visually chase the same tracked person across a plurality of monitors, and the monitoring burden of the observer is reduced.

Further, the Web server device SV of the present embodiment outputs a tracking result including a tracking target person alert such as an image or a voice. For example, the monitoring device MT highlights and displays the detected image of the tracked person by a symbol, a mark, a frame, or the like indicating an alert. As a result, the observer can visually confirm the person to be tracked without overlooking it.

Further, the Web server device SV is a target person alert based on the average similarity between the feature amount of the candidate image and the feature amount of the target person image in a plurality of consecutive frames included in the video data from the surveillance cameras C1 to Cn. Judge the necessity. When the Web server device SV determines that the target person alert is necessary, the Web server device SV outputs the tracking result including the target person alert. For example, the Web server device SV determines whether or not the target person alert is necessary depending on whether or not the calculated average similarity exceeds the average similarity threshold. If the average similarity threshold is set high, the sensitivity of the subject alert can be suppressed, and conversely, if the average similarity threshold is set low, the sensitivity of the subject alert can be increased.

Further, the Web server device SV may dynamically set the number of frames for which the average similarity is calculated to control the sensitivity of the target alert. For example, when it is not necessary to particularly suppress the sensitivity of the target person alert and the first alert determination condition is set, the Web server device SV has the feature amount of the candidate image in the first predetermined number of consecutive frames. The necessity of the subject alert is determined based on the average similarity between the subject and the feature amount of the subject image. On the contrary, it is necessary to suppress the sensitivity of the target person alert, and when the second alert determination condition is set, the Web server device SV has a second predetermined number of frames that are continuous more than the first predetermined number. The necessity of the subject alert is determined based on the average similarity between the feature amount of the candidate image and the feature amount of the subject image in. In this way, the sensitivity of the target person alert can be controlled by dynamically changing the number of frames for which the average similarity is calculated according to the determination condition.

The program according to the present embodiment may be transferred in a state of being stored in an electronic device, may be transferred in a state of being stored in a storage medium, or may be transferred by downloading via a network or the like. The recording medium is a non-temporary computer-readable storage medium such as a magnetic disk, an optical disk, or a flash memory.
The present invention is not limited to the above embodiment, and can be variously modified at the implementation stage without departing from the gist thereof. In addition, each embodiment may be carried out in combination as appropriate, in which case the combined effect can be obtained. Further, the above-described embodiment includes various inventions, and various inventions can be extracted by a combination selected from a plurality of disclosed constituent requirements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiment, if the problem can be solved and the effect is obtained, the configuration in which the constituent elements are deleted can be extracted as an invention.

1 ... Control unit 2 ... Program storage unit 3 ... Data storage unit 4 ... Input / output interface (input / output I / F)
5 ... Communication interface (communication I / F)
6 ... Bus 11 ... Information acquisition unit 12 ... Detection unit 13 ... Tracking unit 14 ... Output unit 31 ... Camera information table 32 ... Setting information table 33 ... Tracking result table C1, C2, Cn ... Surveillance camera MT ... Monitor device NW ... Network OT ... Administrator terminal SV ... Server device VE1, VEN ... Video analysis engine

Claims

A detector that comprehensively analyzes video data from multiple cameras and detects the target person's image,
A tracking unit that tracks the subject based on the detection result of the subject image,
An output unit that outputs the tracking result of the target person,
Information processing device equipped with.
The information processing device according to claim 1, wherein the detection unit detects a target person image based on a determination result of the same person in each video data from a plurality of cameras.
The detection unit detects the target image based on the degree of similarity between the feature amount of the candidate image obtained from the video data from the plurality of cameras and the feature amount of the designated target image, according to claim 1 or 2. Information processing device.
The detection unit determines the necessity of the target person alert based on the average similarity between the feature amount of the candidate image and the feature amount of the target person image in a plurality of consecutive frames included in the video data from each camera.
The information processing device according to claim 3, wherein the output unit outputs a tracking result including the target person alert when it is determined that the target person alert is necessary.
The information processing device according to claim 1, wherein the output unit outputs tracking results in time series based on time information of a plurality of frames included in video data from the plurality of cameras.
A video analysis unit that analyzes each video data from multiple cameras,
Based on the analysis result from the video analysis unit, the detection unit that comprehensively analyzes the video data from multiple cameras and detects the target person's image.
A tracking unit that tracks the subject based on the detection result of the subject image,
An output unit that outputs the tracking result of the target person,
Information processing system equipped with.
The video data from multiple cameras is analyzed in an integrated manner to detect the target person's image, and
Track the subject based on the detection result of the subject image,
An information processing method that outputs the tracking results of the target person.
A program for causing a processor to execute processing by each part included in the information processing apparatus according to any one of claims 1 to 5.