WO2022030549A1

WO2022030549A1 - Information retrieval device, information retrieval method, and program

Info

Publication number: WO2022030549A1
Application number: PCT/JP2021/028964
Authority: WO
Inventors: 鮎美松本; 哲希柴田; 育弘宇田; 真一根本; 篤佐藤; 知也児玉; 貴司塩崎
Original assignee: エヌ・ティ・ティ・コミュニケーションズ株式会社
Priority date: 2020-08-07
Filing date: 2021-08-04
Publication date: 2022-02-10
Also published as: JP2022030865A

Abstract

Provided is a technique for efficiently outputting images of the same person detected.　An information retrieval device according to one aspect of the present invention is provided with: a retrieval unit that retrieves, from a database in which detected images detected from a plurality of frames included in video data from at least one camera are registered, a subject image corresponding to a retrieval condition; a selecting unit that selects, on the basis of a first output designation, all subject images retrieved by the retrieval unit, and selects, on the basis of a second output designation, a first predetermined number of subject images among all subject images retrieved by the retrieval unit that satisfy an output condition and are considered to be of the same person; and an output unit that outputs the subject images selected by the selecting unit.

Description

Information retrieval device, information retrieval method, and program

An embodiment of the present invention relates to, for example, an information retrieval device, an information retrieval method, and a program for analyzing video data from a surveillance camera and searching for an image of a target person.

In recent years, cameras have been installed in various places as part of crime prevention measures. The camera captures the monitored area and outputs video data. An information processing device such as a general-purpose personal computer receives video data from a camera, stores the received video data in a storage unit, analyzes the video data, and detects an image of a target person. In addition, the information processing device displays the detected image of the target person on a monitor or the like.

The video data contains a large number of frames, and the burden of processing related to feature extraction for the faces contained in these frames is heavy, and a technology for realizing high-speed processing with a low-cost device has been proposed (). See, for example, Patent Document 1).

Japanese Patent No. 6568476

Since the information processing device detects the target person from a plurality of frames included in the stored video data, many similar images of the target person may be output as the detection result.

If the target person stayed in the same place for a long time, this tendency becomes stronger. Further, in order to output a large number of images, for example, the load of the display processing for output becomes large, and the display processing may take time.

In this way, if many images of the same person are output, the images of other subjects may be buried.

The present invention has been made by paying attention to the above circumstances, and is intended to provide a technique for improving the visibility of a detected image of the same person.

In order to solve the above problems, the information retrieval device according to one aspect of the present invention is based on a search condition from a database in which detected images detected from a plurality of frames included in video data from one or more cameras are registered. A search unit that searches for target person images and all target person images searched by the search unit based on the first output specification are selected, and all targets searched by the search unit based on the second output specification. It includes a selection unit that selects a first predetermined number of target person images that satisfy the output conditions and are regarded as the same person among the person images, and an output unit that outputs the target person images selected by the selection unit.

According to one aspect of the present invention, it is possible to provide a technique for improving the visibility of a detected image of the same person. It was

FIG. 1 is a diagram showing an example of a configuration of a monitoring system including a monitoring information processing apparatus according to an embodiment of the present invention. FIG. 2 is a block diagram showing an example of a hardware configuration of a Web server device used as a monitoring information processing device according to an embodiment of the present invention. FIG. 3 is a block diagram showing an example of a software configuration of a Web server device used as a monitoring information processing device according to an embodiment of the present invention. FIG. 4 is a flowchart showing an example of a search process by the system according to the embodiment of the present invention. FIG. 5 is a flowchart showing an example of a target image selection process in the search process by the system according to the embodiment of the present invention. FIG. 6 is a flowchart showing an example of a target image selection process for each tracking ID within the angle of view in the search process by the system according to the embodiment of the present invention.

Hereinafter, embodiments according to the present invention will be described with reference to the drawings.

[One Embodiment]
(Configuration example)
(1) System FIG. 1 is a diagram showing an overall configuration of a system including a monitoring information processing apparatus according to an embodiment of the present invention.
For example, a plurality of surveillance cameras C1 to Cn are distributed and arranged in the aisles and sales floors of large-scale stores such as shopping malls and department stores. Surveillance cameras C1 to Cn are attached to, for example, a ceiling or a wall surface, capture images of each surveillance area, and output video data thereof.

For example, the surveillance cameras C1 to Cn are equipped with video analysis engines VE1 to VEn, respectively. The video analysis engines VE1 to VEn correspond to the video analysis unit, and the video analysis unit analyzes each video data from the surveillance cameras C1 to Cn. For example, the video analysis engines VE1 to VEn each perform intra-angle tracking on a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn, and the images are imaged from the plurality of image frames. The same person image is determined based on the position information in the frame.

The video analysis engines VE1 to VEn are not arranged one-to-one with respect to the surveillance cameras C1 to Cn, but a smaller number of video analysis engines are arranged for a plurality of cameras, and a plurality of video analysis engines are used. The video data of the surveillance camera may be collectively processed.

Further, the system of one embodiment includes a Web server device SV used as a monitoring information processing device. The video analysis engines VE1 to VEn are capable of data communication with the Web server device SV via the network NW, and transmit the generated video analysis result to the Web server device SV via the network NW. For the network NW, for example, a wired LAN (Local Area Network) or a wireless LAN is used, but any other network may be used.

The web server device SV includes video analysis engines VE1 to VEN or one video analysis engine, and the video analysis engines VE1 to VEN of the web server device SV or one video analysis engine is a surveillance camera via a network NW. Each video data from C1 to Cn may be received and the received video data may be analyzed.

(2) Web server device SV
2 and 3 are block diagrams showing an example of a hardware configuration and a software configuration of the Web server device SV, respectively.
The Web server device SV includes a control unit 1 having a hardware processor such as a central processing unit (CPU), and the program storage unit 2 and a data storage unit are provided to the control unit 1 via a bus 6. The storage unit having 3 is connected to the input / output interface (input / output I / F) 4 and the communication interface (communication I / F) 5.

For example, a monitor device MT and an administrator terminal OT are connected to the input / output I / F4. The monitoring device MT is used for the observer to visually monitor the monitoring area, and displays images of the surveillance cameras C1 to Cn, information indicating the detection result or the tracking result of the query to be monitored, and the like.

The administrator terminal OT is used by the system administrator for system management and maintenance, and displays various setting screens and information indicating the operating status in the system, and the system administrator manages and operates the system. When inputting various necessary data, it has a function of accepting the data and setting it in the Web server device SV.

The communication I / F5 transmits data between the video analysis engines VE1 to VEn using a communication protocol defined by the network NW under the control of the control unit 1, for example, a wired LAN or a wireless LAN. It is composed of the corresponding interfaces.

The program storage unit 2 includes, for example, a non-volatile memory such as an HDD (Hard Disk Drive) or SSD (Solid State Drive) that can be written and read at any time as a storage medium, and a non-volatile memory such as a ROM (Read Only Memory). In addition to middleware such as an OS (Operating System), a program necessary for executing various control processes according to an embodiment of the present invention is stored.

The data storage unit 3 is, for example, a combination of a non-volatile memory such as an HDD or SSD capable of writing and reading at any time and a volatile memory such as a RAM (RandomAccessMemory) as a storage medium, and is one of the present inventions. A camera information table 31 and a setting information table 32 are provided as main storage units necessary for implementing the embodiment.

The camera information table 31 stores, for example, information representing the name, performance, and installation position of the surveillance camera in association with the identification information (hereinafter referred to as the camera ID) for each of the surveillance cameras C1 to Cn. Information representing performance includes, for example, resolution and aspect ratio. The information indicating the installation position includes, for example, latitude / longitude, imaging direction, and imaging angle. Further, the camera information table 31 stores a database, and the database registers the tracking target person image (detection image) detected from the video data from the surveillance cameras C1 to Cn.

The setting information table 32 stores the image feature amount of the query. For example, the setting information table 32 stores the image feature amount of the query input from the administrator terminal OT via the input / output I / F4. Alternatively, the setting information table 32 stores the image feature amount of the query detected from the video data transmitted from the surveillance cameras C1 to Cn via the communication I / F5. Further, the setting information table 32 stores the alert determination condition input via the administrator terminal OT or the like. For example, the setting information table 32 stores the first or second alert determination condition input via the administrator terminal OT or the like.

Here, supplement the registration example of the query image. For example, based on the alert obtained in real time, the administrator presses the tracking button on the administrator terminal OT for the person (image) to be tracked. The control unit 1 automatically registers the latest set of detected images (face image and whole body image) as a query image (query image feature amount) in response to pressing the tracking button, and starts tracking. Also, based on the alert obtained in real time, the administrator presses the history button on the administrator terminal OT for the person (image) that the administrator wants to track. In response to pressing the history button, the control unit 1 selects an arbitrary image from the history list, registers it as a query image, and starts tracking. Further, the control unit 1 performs a person search from the image of the surveillance camera according to the history search from the administrator, registers the image selected from the person search results by the administrator as a query image, and starts tracking. .. Further, the administrator selects a person (image) included in the surveillance image data obtained in real time, and the control unit 1 registers the selected person as a query image and starts tracking. In addition, the administrator takes in the image provided by the requester from the administrator terminal OT, registers it as a query image, and starts tracking.

The control unit 1 has an information acquisition unit 11, an image detection unit 12, a search condition setting unit 13, an image search unit 14, and an image selection unit (filter unit) 15 as processing functions according to an embodiment of the present invention. And an output unit 16. Each unit is realized by causing the hardware processor of the control unit 1 to execute the program stored in the program storage unit 2.

The information acquisition unit 11 acquires video data and video analysis results from the video analysis engines VE1 to VEn connected to the surveillance cameras C1 to Cn or the video analysis engines VE1 to VEn provided in the Web server device SV. For example, the video analysis engines VE1 to VEn each determine the same person from a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn based on the position information in the image frame, and the determination result is obtained. Output the video analysis result including the above.

Further, the information acquisition unit 11 can acquire not only the video data from the surveillance cameras C1 to Cn but also the moving image file input via the input / output I / F4 or the communication I / F5. The information acquisition unit 11 transfers a moving image file having a file name in the specified format to the specified folder. For example, the information acquisition unit 11 registers the camera name and the shooting time included in the file name of the moving image file. This makes it possible to search for moving image files by specifying the camera name or shooting time. Searching based on video files allows image search without the need for real-time tracking. Since it does not require real-time tracking processing, it is possible to import video files at high speed. The image search process based on the moving image file is substantially the same as the image search process based on the video data from the surveillance cameras C1 to Cn, except that the real-time tracking process is not required.

The image detection unit 12 comprehensively analyzes the video analysis result and the video data from the surveillance cameras C1 to Cn to detect the tracked person image. The video analysis engines VE1 to VEn are, for example, from a plurality of image frames included in the video data from the surveillance cameras C1 to Cn based on the image feature amount (feature amount of the tracked person image) of the query given in advance. , A person image (tracked person image) having an image feature amount similar to the image feature amount of the query is extracted. For example, a plurality of queries are given in advance, and a plurality of person images having an image feature amount similar to the image feature amounts of the plurality of queries are extracted.

Further, the video analysis engines VE1 to VEn include information indicating the degree of similarity between the extracted person image and the query image, the camera IDs of the surveillance cameras C1 to Cn, the tracking ID within the angle of view, and the shooting time (date, time, minute, and second). ) And the video analysis result including. The portrait image includes a face image and a whole body image, and the similarity information includes the similarity corresponding to each of the face image and the whole body image. The camera ID is identification information unique to the surveillance camera. The in-angle tracking ID is an ID for tracking images regarded as the same person in the same surveillance camera.

The image detection unit 12 registers the detected image information including the detected tracked person image (detected image) in the database of the camera information table 31. For example, the image detection unit 12 detects a huge number of tracked person images from a plurality of frames included in each video data from the surveillance cameras C1 to Cn, and stores a huge number of detected image information in the camera information table. Register in 31 databases. The detected image information includes the detected image and the attribute information. The attribute information includes a camera ID, a tracking ID within the angle of view, a score of personality, an image type (face image or whole body image), a shooting date and time, and the like. In the database of the camera information table 31, not only the tracking target person image is registered as a search target, but also various person images are registered.

The search condition setting unit 13 sets the search conditions to be read from the setting information table 32. The search condition includes a period (start date and time and end date and time). Further, the search condition may include a camera ID, or may include a search target image (face image, whole body image, or face and whole body image).

The image search unit 14 searches for the target person image according to the search condition from the database in which the detected image information including the detected images detected from the plurality of frames included in the video data from the surveillance cameras C1 to Cn is registered. .. For example, the image search unit 14 searches for the detected image detected in the designated period as a target person image (image of a plurality of people). The search result includes a plurality of images of each person. Alternatively, the image search unit 14 searches for an image similar to the image of a certain person among the detected images detected in the designated period as a target person image (image of a certain person). The search results include multiple images of a person.

The image selection unit 15 selects all the target person images searched by the image search unit 14 based on the first output designation (for example, the filter function OFF) of the setting information table 32. Further, the image selection unit 15 satisfies the output conditions among all the target person images searched by the image search unit 14 based on the second output designation (for example, the filter function is ON) of the setting information table 32, and is regarded as the same person. A first predetermined number of subject images are selected. The first predetermined number is one or two or more arbitrarily set. For example, if you want to narrow down and display the target person images that are regarded as the same person, set 1 or a number close to the first predetermined number, and if you want to display a relatively large number of target person images that are regarded as the same person, A number of 3 or more is set as the first predetermined number.

The output unit 15 outputs the target person image selected by the image selection unit 15. For example, the output unit 16 outputs a target person image to be displayed on the monitor device MT. When the first output designation is set in the setting information table 32, all the target person images searched by the image search unit 14 are displayed on the monitor device MT, and the second output designation is made in the setting information table 32. When is set, a first predetermined number of target person images that satisfy the output conditions and are regarded as the same person among all the target person images searched by the image search unit 14 are displayed on the monitor device MT. To.

In the above description, the case where the tables 31 and 32 provided in the data storage unit 3 are provided in the Web server device SV is taken as an example. However, the present invention is not limited to this, and it may be provided in a database server or a file server arranged outside the Web server device SV. In this case, the Web server device SV accesses the tables 31 and 32 in the database server or the file server, and performs each process by acquiring necessary information.

(Operation example)
Next, an operation example of the system configured as described above will be described.
FIG. 4 is a flowchart showing an example of a search process by the system according to the embodiment of the present invention.
Surveillance cameras C1 to Cn start shooting and output video data (ST1). The video analysis engines VE1 to VEN analyze video data from the corresponding surveillance cameras C1 to Cn, respectively (ST2). For example, the video analysis engines VE1 to VEn each perform intra-angle tracking on a plurality of image frames included in the video data output from the corresponding surveillance cameras C1 to Cn, and the images are imaged from the plurality of image frames. The same person is determined based on the position information in the frame. The video analysis engines VE1 to VEn output video data and the same person determination.

The communication I / F5 of the Web server device SV receives the video data and the same person determination from the video analysis engines VE1 to VEn. The information acquisition unit 11 acquires video data from the video analysis engines VE1 to VEn and determination of the same person (ST3). The image detection unit 12 comprehensively analyzes the video data from the video analysis engines VE1 to VEn and the same person determination, and detects the person image from a plurality of frames included in the video data from the video analysis engines VE1 to VEn. (ST4). For example, the number of frames is enormous, and the number of detected human images is enormous. The image detection unit 12 registers the detected image information including the detected person image in the database of the camera information table 31 (ST5).

The search condition setting unit 13 reads the search condition from the setting information table 32 according to the instruction from the administrator terminal OT, and sets the search condition in the image search unit 14. For example, the search condition includes a start date and time D1 and an end date and time D2. Further, the search condition setting unit 13 reads the first or second output designation from the setting information table 32 according to the instruction from the administrator terminal OT, and sets the first or second output designation to the image selection unit 15. do.

The image search unit 14 searches the database registered in the camera information table 31 for the target person image according to the set search condition (ST6). For example, the image search unit 14 refers to the attribute information included in the detected image information registered in the database, and searches for the target person image included in the period of the start date / time D1 and the end date / time D2.

The image selection unit 15 selects all or part of the target person images searched by the image search unit 14 based on the first or second output designation (ST7). The output unit 16 outputs the target person image selected by the image selection unit 15 to the monitoring device MT via the input / output I / F4 or the like (ST8).

FIG. 5 is a flowchart showing an example of a target image selection process in the search process by the system according to the embodiment of the present invention. FIG. 5 is a flowchart illustrating the details of ST7 shown in FIG.

When the first output designation is set (when the filter function is OFF) (ST701, YES), the image selection unit 15 is the all target image searched by the image search unit 14 based on the first output designation. Is selected (ST702).

Further, when the second output designation is set (when the filter function is ON) (ST701, NO) (ST703), the image selection unit 15 is searched by the image search unit 14 based on the second output designation. A first predetermined number of subject images that satisfy the output conditions and are considered to be the same person are selected from all the subject images (ST704).

For example, if the first predetermined number input via the administrator terminal OT or the like is "1" and the output condition is "personality score", the image selection unit 15 determines the second output designation. Based on this, one target person image that satisfies the output conditions and is regarded as the same person is selected, and the output unit 16 outputs one selected target person image. In this case, the subject image having the highest score of humanity satisfies the output condition.

If the first predetermined number input via the administrator terminal OT or the like is "2" and the output conditions are "personality score" and "face image", the image selection unit 15 is the second. Based on the output designation, two target person images that satisfy the output conditions and are regarded as the same person are selected, and the output unit 16 outputs the two selected target person images. In this case, among the facial images, the top two subject images having a high score of humanity satisfy the output condition.

When all the target person images searched by the image search unit 14 are selected based on the first output designation, the output unit 16 outputs all the selected target person images to the monitoring device MT. Further, the output unit 16 selects a first predetermined number of target person images that satisfy the output conditions and are regarded as the same person among all the target person images searched by the image search unit 14 based on the second output designation. If so, the selected first predetermined number of target images are output to the monitoring device MT.

Alternatively, the image selection unit 15 sets the target person image up to a first predetermined number or a first predetermined number considered to be the same person for the video data from one camera based on the second output designation. You may choose. In this case, the output unit 16 outputs the target person image up to the first predetermined number or the first predetermined number regarded as the same person for each camera, and the monitoring device MT outputs the target person image. Is displayed.

Alternatively, based on the second output designation, the image selection unit 15 has a first predetermined number that is regarded as the same person for a plurality of frames for a continuous predetermined time included in the video data from one camera. Alternatively, a target person image whose upper limit is the first predetermined number may be selected. In this case, the output unit 16 outputs the target person image up to the first predetermined number or the first predetermined number regarded as the same person, and the monitoring device MT displays the output target person image. It is possible to display a first predetermined number of the same person at predetermined time intervals, which is excellent in visibility and enables detailed confirmation.

FIG. 6 is a flowchart showing an example of a target image selection process for each tracking ID within the angle of view in the search process by the system according to the embodiment of the present invention. FIG. 6 is a flowchart illustrating the details of ST7 shown in FIG.

The image selection unit 15 receives the search result from the image search unit 14 (ST711) and reads the search result file (ST712). For example, the search result file includes the target person image information, and the target person image information includes the target person image and the attribute information. The attribute information includes a camera ID, a tracking ID within the angle of view, a score of personality, an image type (face image or whole body image), a shooting date and time, and the like.

The image selection unit 15 sorts the target person images according to the tracking ID within the angle of view (ST713). For example, the image selection unit 15 rearranges the target person images with the first priority as the date and time and the second priority as the score of person-likeness. When a face image is specified or a whole body image is specified, the image selection unit 15 sorts the images in descending order of similarity. When both the face image and the whole body image are specified, the image selection unit 15 sorts the images in descending order of the face image similarity for the images whose face image similarity exceeds the threshold, and the face image similarity is high. For images that do not exceed the threshold, the images are sorted in descending order of similarity to the whole body image.

If the tracking ID filter flag in the angle of view is not True (filter function OFF) (ST714, NO), the image selection unit 15 selects all the searched target image. If the tracking ID filter flag in the angle of view is True (filter function ON) (ST714, YES), the image selection unit 15 selects a part of the target person images among all the searched target person images (ST714, YES). ST715 to ST720).

The image selection unit 15 examines the tracking ID in each angle of view in order. The image selection unit 15 sets N = 1 (ST715) and acquires the target person image of the N-th angle of view tracking ID (ST716). If the target person image having the same angle of view tracking ID does not exist in the display list (ST717, NO), the image selection unit 15 adds the acquired target person image to the display list (ST718). If the tracking ID within the last angle of view has not been confirmed (ST719, NO), the image selection unit 15 sets N = N + 1 (ST720) and executes ST716 to ST719.

As a result, the target person image is registered in the display list for each tracking ID within the angle of view. The output unit 16 outputs a display list, and the monitoring device MT displays a target person image based on the display list.

According to the present embodiment, it is possible to provide a system, an apparatus, a method, and a program for improving the visibility of a detected image of the same person. When the same person is photographed for a long time with one camera, the images of the same person are included in a plurality of frames of the video data from this camera. Further, when the same person is photographed by a plurality of cameras, the images of the same person are included in the plurality of frames of the video data from the plurality of cameras. The Web server device SV displays a large number of images of the same person by the setting of the first output designation, and displays a small number of narrowed-down images of the same person by the setting of the second output designation. If the image displayed by the second output designation setting is an image that does not match the purpose, it may be changed to the first output designation setting. For example, if the first output designation is used, a person can be confirmed in detail from many images, and if the second output designation is used, a person can be confirmed from a small number of images in a short time. Further, if the second output designation is used, the time required for the display process can be shortened.

The program according to the present embodiment may be transferred in a state of being stored in an electronic device, may be transferred in a state of being stored in a storage medium, or may be transferred by downloading via a network or the like. The recording medium is a non-temporary computer-readable storage medium such as a magnetic disk, an optical disk, or a flash memory.
The present invention is not limited to the above embodiment, and can be variously modified at the implementation stage without departing from the gist thereof. In addition, each embodiment may be carried out in combination as appropriate, in which case the combined effect can be obtained. Further, the above-described embodiment includes various inventions, and various inventions can be extracted by a combination selected from a plurality of disclosed constituent requirements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiment, if the problem can be solved and the effect is obtained, the configuration in which the constituent elements are deleted can be extracted as an invention.

1 ... Control unit 2 ... Program storage unit 3 ... Data storage unit 4 ... Input / output interface (input / output I / F)
5 ... Communication interface (communication I / F)
6 ... Bus 11 ... Information acquisition unit 12 ... Image detection unit 13 ... Search condition setting 14 ... Image search unit 15 ... Image selection unit 16 ... Output unit 31 ... Camera information table 32 ... Setting information tables C1, C2, Cn ... Surveillance camera MT ... Monitor device NW ... Network OT ... Administrator terminal SV ... Server device VE1, VEn ... Video analysis engine

Claims

A search unit that searches for target person images according to search conditions from a database in which detected images detected from multiple frames included in video data from one or more cameras are registered.
All the target image searched by the search unit based on the first output designation is selected, and the output conditions are satisfied and the same among all the target image searched by the search unit based on the second output designation. A selection unit that selects a first predetermined number of target images that are considered to be people, and
An output unit that outputs a target person image selected by the selection unit, and an output unit.
An information retrieval device.
The search unit is the information retrieval device according to claim 1, which searches for a target person image according to a search condition from a database in which detected images detected from a plurality of frames included in video data from a plurality of cameras are registered.
The information retrieval device according to claim 2, wherein the selection unit selects a first predetermined number of target person images regarded as the same person for video data from one camera based on the second output designation. ..
Based on the second output designation, the selection unit selects a first predetermined number of target image images considered to be the same person for a plurality of frames for a predetermined time in a row from one camera. The information retrieval device according to claim 2.
The information retrieval device according to claim 1, wherein the selection unit selects a predetermined number of target person images from the top in descending order of personality score based on the second output designation.
From the database in which the detected images detected from a plurality of frames included in the video data from one or more cameras are registered, the target person image according to the target person's search condition is searched, and the target person image is searched.
Based on the first output specification, all the searched target person images are selected, and based on the second output specification, the first predetermined number of the same person satisfying the output condition among all the searched target person images. Select the subject image that is considered to be
An information retrieval method that outputs the selected target person image.
A program for causing a processor to execute processing by each part included in the information retrieval apparatus according to any one of claims 1 to 6.