CN113474769A

CN113474769A - Image retrieval device and supervised data extraction method

Info

Publication number: CN113474769A
Application number: CN202080015638.6A
Authority: CN
Inventors: 三井留以; 小味弘典; 五十岚跃一; 筱本将央; 关村贤司; 菊池博幸; 泷直人; 村井泰裕; 德田洋介
Original assignee: Hitachi Industry and Control Solutions Co Ltd
Current assignee: Hitachi Industry and Control Solutions Co Ltd
Priority date: 2019-02-20
Filing date: 2020-02-18
Publication date: 2021-10-01
Also published as: JP7018408B2; WO2020171066A1; JP2020135494A

Abstract

The invention can reduce the cost of monitoring data collection and facilitate additional learning operation. A person detection unit (112) detects a person from an image, and a feature extraction unit (113) extracts a feature of the person and stores the feature in a feature database (130). An image search unit (115) searches for an image, stores the image in a search result database (150), and transmits the image to a terminal (320) as a search result. A classification item registration unit (117) acquires classification items (such as matches and mismatches) assigned to images of search results, and stores the classification items in a search result database (150). A supervision data extraction unit (118) extracts supervision data from a search result database (150) on the basis of the correlation between the feature amount and the classification item.

Description

Image retrieval device and supervised data extraction method

Technical Field

The present invention relates to an image search device and a supervised data extraction method for performing image search using machine learning techniques.

Background

In recent years, in the technical fields of image recognition, image classification, and the like, effective use of machine Learning techniques typified by Deep Learning (Deep Learning) has been advanced. In image recognition and image classification by machine learning, a developer designs an algorithm and programs it as in the past, but a machine learning model itself can learn and classify it based on input data. In detail, when a pair of a plurality of image data and its correct classification result (correct label) is input, the machine learning model itself adjusts parameters within the machine learning model so that the classification result can be accurately output for the input image data. In addition, data of a pair of input image data and a classification result thereof is called supervised data (learning data), and machine learning using the supervised data is called supervised machine learning.

In supervised machine learning, a large amount of supervised data is put into a machine learning model to perform learning, thereby improving the accuracy of the machine learning model. Generally, the more the supervised data used for learning, the higher the accuracy of the machine learning model, and the more accurate classification results can be obtained.

However, since it is necessary to prepare a large amount of supervisory data in the supervised machine learning, there is a problem that the preparation work of the supervisory data becomes enormous and costs are required, such that the collection of the image data is associated with the correct classification result (correct label) for each of the collected image data. Further, even a machine learning model that has started to be actually used once is difficult to obtain an accuracy of 100%, and it is preferable to perform additional learning for improving accuracy. However, in the daily operation of the system, there are problems that the system downtime increases and the maintenance cost increases in the daily work when additional learning supervision data is collected or an accurate tag is given.

In the additional learning of patent document 1, distance information is obtained between a first feature vector, which is a feature vector of correctly labeled image data to which a previously prepared correct label (classification result, correct label) is given, and a second feature vector, which is a feature vector of an image (unlabeled image) to which a label generated by dividing the correctly labeled image is not given. Based on the distance information, the unlabelled image data presented to the user is selected, labeled by the user, and used as the second correctly labeled image for additional learning.

Thus, a new learning image different from the content reflected on the original first correct label image can be generated from the first correct label image. For example, a non-labeled image that is most dissimilar to the original first correct label image can be selected as the image to be presented to the user. As a result, it is described that the accuracy improvement rate of the classification model per 1 learning image can be increased, and the learning efficiency can be improved.

Documents of the prior art

Patent document

Patent document 1: japanese patent laid-open publication No. 2013-125322

Disclosure of Invention

Problems to be solved by the invention

There are various objects to be additionally learned according to the environment of the user. Patent document 1 does not mention a technique for efficiently collecting necessary data from an image database in which a large number of images are accumulated in order to perform appropriate machine learning in accordance with the operating environment of each machine learning model. In addition, even if images for learning can be collected, classification (giving an accurate mark) is required, and the problem of cost is not solved.

The present invention has been made in view of such a background, and an object thereof is to provide an image search device and a supervised data extraction method that can reduce the cost of collecting and creating supervised data and facilitate additional learning.

Means for solving the problems

In order to solve the above problem, an image search device according to the present invention includes: a feature extraction unit that extracts a feature amount from the acquired image using a machine learning model; an image search unit that searches for the image using the feature amount and outputs a search result; a classification item acquisition unit that classifies each image of the search result and acquires a classification item indicating an assigned classification result; and a supervised data extraction unit that extracts an image to be supervised data for additionally learning the machine learning model, based on a correlation between the feature amount and the classification item.

Effects of the invention

According to the present invention, it is possible to provide an image search device and a supervised data extraction method that can reduce the cost of collecting and creating supervised data and facilitate additional learning.

Drawings

Fig. 1 is a diagram showing an overall configuration of an image search system including an image search device according to the present embodiment.

Fig. 2 is a configuration diagram of an image search screen displayed on the terminal of the image search device according to the present embodiment.

Fig. 3 is a block diagram of the functional blocks of the image search device according to the present embodiment.

Fig. 4 is a diagram for explaining the operations of the image acquisition unit, the person detection unit, and the feature extraction unit according to the present embodiment.

Fig. 5 is a diagram for explaining a data structure of the feature quantity table included in the feature quantity database according to the present embodiment.

Fig. 6 is a diagram showing a data structure of a search condition table included in the search result database according to the present embodiment.

Fig. 7 is a diagram showing a data structure of a search result table included in the search result database according to the present embodiment.

Fig. 8 is a diagram showing a data configuration of the supervision data extraction condition table according to the present embodiment.

Fig. 9 is a graph for explaining classification items of extraction target data, threshold values a and B, and whether or not additional learning is performed, according to the present embodiment.

Fig. 10 is a diagram showing a data configuration of the supervised data extraction result table according to the present embodiment.

Fig. 11 is a flowchart of the supervisory data extraction process executed by the supervisory data extraction unit according to the present embodiment.

Fig. 12 is a configuration diagram of a supervisory data extraction condition setting screen displayed on a terminal of an image search device according to a modification of the present embodiment.

Fig. 13 is a graph showing the correlation between the feature amount and the classification item according to the modification of the present embodiment.

Fig. 14 is a graph showing the correlation between the feature amount and the classification item according to the modification of the present embodiment.

Detailed Description

Hereinafter, an image search device according to an embodiment (embodiment) for carrying out the present invention will be described. In detail, an image search device in a system for searching for an image of a person meeting a predetermined condition from images captured by a camera installed in facilities such as shopping malls and office buildings will be described. The image search device extracts image data serving as additional learning supervision data from the stored image data. The present system assumes a use for searching for a child lost in a facility, and a use for searching for a person involved in a problem occurring in a notification facility based on information obtained from the person, and the like, but is not limited to these uses and can be applied to a wide range of uses.

Overall Structure of image search System

Fig. 1 is a diagram showing the overall configuration of an image search system 10 including an image search device 100 according to the present embodiment. The image search system 10 includes: image search device 100, camera 310, network 330 for transferring the image captured by camera 310 to image search device 100, additional learning device 300, and terminal 320.

Camera 310 is a camera installed in a facility, and transmits a video captured via network 330 to image search apparatus 100. The terminal 320 is a terminal used for monitoring a person (hereinafter also referred to as a user) in a facility by using the image search system 10. The operation method and the display screen of the user terminal 320 will be described with reference to fig. 2 described later. The image search device 100 extracts a person matching the feature (search condition) of the person input from the terminal 320 from the image of the camera 310, and outputs the person to the terminal 320.

Further, a person who monitors the inside of a facility using the image search system 10 is referred to as a user, a machine learning model 114 (see fig. 3 described later) of the image search device 100 is maintained, and a person who manages additional learning is referred to as a manager. The administrator may also utilize the terminal 320.

The additional learning device 300 executes additional learning for generating a new machine learning model 420 that replaces the machine learning model 114 used for extracting the feature of the person. The additional learning device 300 acquires additional learning data 410 (supervised data) from the image search device 100, and executes additional learning to generate a new machine learning model 420.

Image search screen of image search device

Fig. 2 is a configuration diagram of an image search screen 500 displayed on the terminal 320 of the image search device 100 according to the present embodiment. In the present embodiment, the user operates the image search apparatus 100 using a Web browser. The image search screen 500 includes a search setting area 510, a search condition area 520, and a search result area 530.

The user sets the characteristics (search condition) of a person to be searched (person to be searched) in the search condition area 520. Specifically, the color of the head of the person (head color) is selected and set from the list box 521. In the present embodiment, the head color is selected from black, gray, and white. Similarly, the color of the upper body clothes and the color of the lower body clothes are selected from black, gray, and blue. When the condition clear button 522 is pressed, the selected head color, the color of the upper body clothes, and the color of the lower body clothes are cleared, and the head color, the color of the upper body clothes, and the color of the lower body clothes are all unselected. When the search button 523 is pressed, an image of a person meeting the set conditions is searched for and displayed in the search result area 530.

The search identification information 511 in the search setting area 510 is identification information assigned to the search condition and the search result, and is automatically assigned by the image search apparatus 100. When the save button 512 is pressed, the search condition set in the search condition area 520 and the search result displayed in the search result area 530 are saved in the image search device 100. When the user inputs the search identification information 511 and presses the load button 513, the search condition stored in association with the search identification information 511 is displayed in the search condition area 520, and the search result is displayed in the search result area 530.

The search result area 530 is an area in which the search results 531 are displayed in descending order of the degree of matching with the search condition (also referred to as average degree of similarity). In fig. 2, 6 search results are displayed. Each search result 531 is composed of 3 areas, i.e., an image confirmation area 532, an average similarity bar 533, and a classification item setting area 534.

A sample image including a person image is displayed in the image confirmation area 532. The image identification information assigned by the image retrieval apparatus 100 is displayed in the sample image. In addition, the region in which the person is detected is displayed surrounded by a rectangle. The image confirmation area 532 may include the shooting time, the camera to be shot, and identification information of the area where the camera is installed.

The average similarity bar 533 displays an average value of the similarity (feature quantity described later) of the head color, the color of the upper body clothing, and the color of the lower body clothing in the form of a histogram. The more the bar (shaded rectangle) extends to the right (larger area), the higher the average similarity, which is represented as 0 at the left end and 100 at the right end.

The classification item setting region 534 is a region in which the user inputs a result (hereinafter, also referred to as a classification item) of determining whether or not a person surrounded by a rectangle in an image (hereinafter, also referred to as a person of the image or simply as the image) is a person to be searched. If it is determined that the image is the person of the search target, the user selects coincidence. If it is determined that the image is not the person of the search target, the user selection does not coincide. If the user keeps the judgment, the reservation is selected. In the case where none of the choices for true, reserve, and false are made, the user chooses nothing and becomes unclassified.

Further, as the meaning of the retention, there are assumed use methods such as making a judgment impossible due to unclear images, making it impossible to specify whether or not the person is a search target person, retaining the judgment before performing visual confirmation, and making a mark so that the user does not forget the judgment result.

In the present embodiment, although the description has been made using the term of coincidence or non-coincidence, other terms such as target person/non-target person, confirmation/non-necessity confirmation, and attention/non-necessity may be used in the sense of coincidence or non-coincidence with the search target.

The display filter 536 can set whether or not the search results that match, hold, fail, and are not classified are displayed in the search result area 530. In fig. 2, all of the matching, holding, non-matching, and unclassified are selected, and all of the search results are displayed in the search result area 530. The user can classify the image into classified items of coincidence, retention, non-coincidence, and non-classification, and select which classified item is displayed through the display filter. The user can display only images for which the number of displayed search results is to be reduced or confirmed, and detailed comparison of persons between images, determination, or identification of a person to be searched for is facilitated.

The image identification information of the search result located at the upper left of the search result area 530 is "I3483", and as indicated by the average similarity bar, the average similarity is about 90%, and the user selects coincidence. The image of the image confirmation area includes 2 persons, but the person as a result of the search is a person surrounded by a right rectangle.

A scroll bar 535 is provided on the right side of the search result area 530. The user can view the search result 531 not displayed by operating the knob (knob) of the scroll bar 535 and the arrow (arrow) positioned above and below.

Integral Structure of image search device

Fig. 3 is a block diagram of the functional blocks of the image search device 100 according to the present embodiment. The image search device 100 is a computer, and includes a CPU (central Processing unit) that operates as a functional unit such as a feature extraction unit 113 and an image search unit 115, a storage unit (a hard disk, an SSD (Solid State Drive), or the like) that stores a database or temporary Processing data, and a program that operates the CPU as a functional unit, and a communication unit, which are not shown.

Image acquisition unit, person detection unit, and feature extraction unit

The image acquisition unit 111 acquires the video captured by the camera 310 and outputs each frame image of the video to the human detection unit 112.

The person detection unit 112 detects a person in a frame image (also simply referred to as an image or image data), and stores the detected region, imaging time information, identification information of the captured camera, and image identification information in an image database (referred to as an image db (database) in fig. 3) 120. The person detection unit 112 outputs the image of the region in which the person is detected to the feature extraction unit 113 together with the image recognition information.

As a conventional human detection technique, there are fast-RCNN and the like, and US9858496(B2) and the like describe a human detection algorithm using DNN (Deep Neural Networks).

The feature extraction unit 113 extracts a plurality of features from an image of a region of a person using the machine learning model 114, and stores the extracted features in the feature database 130 (see fig. 5 described later). The DNN for extracting a plurality of feature amounts is called a multi-label DNN, and for example, japanese patent application publication No. 2018-503161 discloses a technique for analyzing a plurality of feature amounts and outputting a DNN of a detection result thereof. The Machine learning model 114 is not limited to DNN, and may be a Machine learning model of another Machine learning technique such as SVM (Support Vector Machine).

Fig. 4 is a diagram for explaining the operations of the image acquisition unit 111, the person detection unit 112, and the feature extraction unit 113 according to the present embodiment. Referring to fig. 4, the data output from the image acquisition unit 111, the human detection unit 112, and the feature extraction unit 113 will be described in a supplementary manner.

The image 431 is a frame image of the video image output by the image acquisition unit 111, and includes 3 persons. The images 432 to 434 are images of the regions in which the human figures are detected in the image 431, which are output from the human figure detection unit 112, and are images obtained by cutting out 3 human figures included in the image 431.

The feature data 435 is data output by the feature extraction unit 113, and is data obtained by analyzing the feature of each of the images 432 to 434 by the machine learning model 114. The machine learning model 114 is a multi-label DNN that extracts a plurality of feature quantities. The feature values are 9 pieces of 9-dimensional vectors, which are a degree that the head color is black, a degree that the head color is gray, a degree that the head color is white, a degree that the clothes of the upper body is black, a degree that the clothes of the upper body is gray, a degree that the clothes of the upper body is blue, a degree that the clothes of the lower body is black, a degree that the clothes of the lower body is gray, and a degree that the clothes of the lower body is blue. Each feature amount is normalized to be at most 100 and at least 0, and is output. For example, the degree (feature amount) to which the head color of the image 432 of which the image identification information is "I0014" is black is 80.

Database of characteristic quantities

Fig. 5 is a diagram for explaining a data structure of the feature quantity table 131 included in the feature quantity database 130 according to the present embodiment. The feature amount database 130 is constituted by 1 or more feature amount tables 131. The feature table 131 includes feature table identification information 132, a machine

learning model version

133, and 1 or more image records. The feature table identification information 132 is identification information of the feature table 131. The machine learning model version 133 is a version of the machine learning model 114 when the feature amount included in the feature amount table 131 is calculated.

The image record includes attributes of image identification information (described as an image ID in fig. 5) 134, a feature quantity 135 of a degree that the head color is black, a feature quantity 136 of a degree that the head color is gray, a feature quantity 137 of a degree that the head color is white, a feature quantity 138 of a degree that the clothes of the upper body are black, a feature quantity 139 of a degree that the clothes of the upper body are gray, a feature quantity 140 of a degree that the clothes of the upper body are blue, a feature quantity 141 of a degree that the clothes of the lower body are black, a feature quantity 142 of a degree that the clothes of the lower body are gray, and a feature quantity 143 of a degree that the clothes of the lower body are blue.

There is a feature table 131 for each machine learning model 114. That is, when the machine learning model 114 is updated to the new machine learning model 420 (refer to fig. 1), the feature quantity table is switched to a new feature quantity table. Therefore, the feature quantity table identification information 132 and the machine learning model version 133 are 1 to 1. While the feature extraction unit 113 extracts features using the same machine learning model 114, an image record is added to the same feature table 131.

The feature table 131 may be divided for each area of each facility, each camera, and each day. In this case, the feature quantity table identification information 132 and the machine learning model version 133 correspond to N to 1.

The image retrieval unit: action at search time

The explanation returns to fig. 3. The Web server 116 receives an instruction from the user of the terminal 320 or transmits the instructed processing result to the terminal 320. When the user sets a search condition in the search condition area 520 (see fig. 2) and presses the search button 523 to instruct a search, the Web server 116 receives the search condition and outputs the search condition to the image search unit 115. The image search unit 115 stores the search condition in a search condition table 151 (see fig. 6 described later) of the search result database 150.

Next, the image search unit 115 stores, in the search result table 164 (see fig. 7 described later) of the search result database 150, the result of searching for a record in which the average value (average similarity) of the feature amounts included in the search condition specified in the image record of the feature amount table 131 is equal to or greater than a predetermined value. Next, the image searching unit 115 sorts the search results in descending order of the average similarity. Further, the image search unit 115 acquires image data corresponding to each record of the sorted result from the image database 120, and outputs data matching the image data, the average value, and the display data of the classification item setting area 534 as a search result for the search condition to the Web server 116. The Web server 116 transmits the retrieval result to the terminal 320. The Web browser of the terminal 320 displays the received search result in the search result area 530 (see fig. 2).

Search result database: search Condition Table

Fig. 6 is a diagram showing a data structure of the search condition table 151 included in the search result database 150 according to the present embodiment. The search result database 150 includes a search condition table 151 and a search result table 164 (see fig. 7 described later).

The search condition table 151 is, for example, data in a table format, and 1 record (row) represents 1 search condition, and includes: the attributes of the search identification information (described as search ID in fig. 6) 152, flags 153 to 155 indicating whether the head color includes black/gray/white in the search condition, flags 156 to 158 indicating whether the color of the upper body clothing includes black/gray/blue in the search condition, flags 159 to 161 indicating whether the color of the lower body clothing includes black/gray/blue in the search condition, feature-scale identification information 162, and a Machine Learning Model version 163 (described as MLM (Machine Learning Model) version in fig. 6).

Flags 153 to 161 are "1" if they are included in the search condition, and "0" if they are not included. For example, the record of the search identification information 152 of "S018" indicates the search condition in which the head color is gray, the upper body clothing color is blue, and the lower body clothing color is black. The feature-quantity-table identification information 162 and the machine learning model version 163 correspond to the feature-quantity-table identification information 132 and the machine learning model version 133 of the feature quantity table 131 (see fig. 5), respectively, and show the feature quantity table 131 to be searched and the version of the machine learning model 114 at the time of calculating the feature quantity.

Search result database: search results Table

Fig. 7 is a diagram showing a data structure of the search result table 164 included in the search result database 150 according to the present embodiment. The search result table 164 is, for example, data in a table format, and is composed of search result records as search results. The search result record includes the search identification information 165 corresponding to the search identification information 152 of the search condition table 151, the image identification information 166 corresponding to the image identification information 134 of the feature quantity table 131, the feature quantities 167 to 175 corresponding to the feature quantities 135 to 143 of the feature quantity table 131, the average similarity 176, the classification item 177, and the attribute of the machine learning model version 178 corresponding to the machine learning model version 163 of the search condition table 151.

The classification item 177 is an attribute for storing a classification item (see the classification item setting area 534 of fig. 2) set by the user for each search result, "1" if the classification item is matched, "2" if the classification item is left, "3" if the classification item is not matched, and "0" if the classification item is not classified. In an initial state before the user sets the classification item, the classification item 177 becomes "0" because the classification item is not set.

In the search result table 164, a search result record in which the image identification information 166 is "I0014" will be described. The search identification information 165 of the search result record is "S018", and indicates the result of the search by the search condition, which is the record in which the search identification information 152 is "S018" in the search condition table 151. The feature quantities 167 to 175 correspond to the feature quantities 135 to 143 of the records in which the image identification information 134 in the feature quantity table 131 is "I0014", respectively.

As a search condition for searching for a record in which the identification information 152 is "S018", the head color is gray, the upper body clothing color is blue, and the lower body clothing color is black. The corresponding feature quantities are 20, 30, respectively, and the average similarity 176 is 27.

Classification item registration unit: search action during storage

The explanation returns to fig. 3. When the user inputs a classification item in the classification item setting area 534 (see fig. 2) and presses the save button 512, the Web browser of the terminal 320 transmits a classification item (matching, holding, non-matching, or unclassified) for each search result to the Web server 116, and the Web server 116 outputs the classification item to the classification item registration unit 117. The classification item registration unit 117 stores classification items, which are determination results of the user corresponding to the search results, in the classification items 177 of the search result table 164 of the search result database 150.

The image retrieval unit: action when search result is loaded

When the user inputs search identification information 511 (see fig. 2) and presses a load button 513, the Web browser of the terminal 320 transmits the search identification information to the Web server 116, and the Web server 116 outputs the search identification information to the image search unit 115. The image search unit 115 acquires a record of a search condition in which the search identification information 152 matches the search identification information output by the Web server 116, among the records in the search condition table 151 in the search result database 150.

Further, the image search unit 115 acquires, among the search records in the search result table 164 in the search result database 150, a search result record in which the search identification information 165 matches the search identification information output by the Web server 116. The image search unit 115 outputs the record of the search condition and the record of the search result to the Web server 116, and the Web server 116 transmits the records to the terminal 320. The Web browser of the terminal 320 displays the received search condition record in the search condition area 520 (see fig. 2), and displays the search result record in the search result area 530.

Database of results of Supervisory data extraction

Before describing the operation of the supervisory data extracting unit 118, the supervisory data extraction result database 180 will be described. The supervisory data extraction result database 180 includes a supervisory data extraction condition table 181 (see fig. 8 described later) and a supervisory data extraction result table 191 (see fig. 10 described later).

A supervision data extraction result database: supervision data extraction Condition Table

Fig. 8 is a diagram showing a data configuration of the supervision data extraction condition table 181 according to the present embodiment. The supervised data extraction condition table 181 is, for example, data in a table format, and 1 record (row) indicates 1 extraction condition, and includes extraction condition identification information 182, a feature amount 183, a correct/error flag 184, a threshold a185, a threshold B186, whether or not to additionally learn 187, the number of collected data 188, and a machine learning model version 189. The extraction condition is a search condition when searching for a search result record corresponding to the image data in the search result table 164 (see fig. 7). The machine learning model version 189 shows the machine learning model version 178 of the search result record of the search result table 164 to be extracted.

The extraction condition identification information 182 is identification information of an extraction condition.

The feature amount 183 is a feature amount indicating a keyword (key) serving as an extraction condition, and is any one of a feature amount indicating that a head color is black/gray/white, a feature amount indicating that clothes of the upper body have a black/gray/blue color, and a feature amount indicating that clothes of the lower body have a black/gray/blue color.

The error flag 184 is correct, incorrect, or both. Correct means that the search result obtained when the classification item 177 is set to match is regarded as correct data, incorrect means that the search result obtained when the classification item 177 is set to not match is regarded as incorrect data, and both of them mean correct and incorrect data.

Whether additional learning 187 indicates extracting data requiring additional learning, extracting data not requiring additional learning, or extracting both data requiring additional learning and data not requiring additional learning. The additional learning is required when the determination result (classification item) of the user is satisfied and the value of the feature quantity indicated by the feature quantity 183 of the extraction target data is equal to or less than the threshold B186, or when the determination result of the user is not satisfied and the value of the feature quantity indicated by the feature quantity 183 of the extraction target data is equal to or more than the threshold a 185. The fact that additional learning is not required means that the determination result of the user is satisfied and the value of the feature quantity indicated by the feature quantity 183 of the extraction target data is equal to or greater than the threshold value a185, or that the determination result of the user is not satisfied and the value of the feature quantity indicated by the feature quantity 183 of the extraction target data is equal to or less than the threshold value B186. If the classification item is either reserved or unclassified in the case where the feature quantity of the extraction target data is between the threshold a185 and the threshold B186, it is neither necessary nor not necessary.

Whether or not to additionally learn 187 will be described again with reference to fig. 9 described later.

The collected data count 188 is the number of search result records in the search result table 164 (see fig. 7) matching the extraction condition, which is preferable as the supervised data for additional learning.

The record of the extraction condition identification information 182 "SC 01" indicates that the feature quantity 183 serving as the keyword is "head color black", the correct or incorrect flag 184 is both correct and incorrect, the threshold value a185 is 80, and the threshold value B186 is 30, and the extraction condition requiring additional learning is required.

Fig. 9 is a graph 450 for explaining the error flag 184, the threshold a185, the threshold B186, and whether or not to add learning 187 to the extraction target data (search result record) according to the present embodiment. The vertical axis of the graph 450 represents feature values, and the horizontal axis represents classification items. The broken line 455 is a threshold value indicated by the threshold value a185, and the broken line 456 is a threshold value indicated by the threshold value B186.

The extraction target data having the feature amount equal to or larger than the threshold value a185 is plotted in the area 451 when it is determined by the user to be matched. The extraction target data having a feature amount equal to or less than the threshold value B186 is plotted in the region 452 when the user determines that the feature amount is equal to or less than the threshold value B186. The extraction target data having a feature amount equal to or greater than the threshold value a185 is plotted in the area 454, which is determined by the user to be ineligible. When the user determines that the feature quantity is not met, the extraction target data having the feature quantity equal to or less than the threshold value B186 is plotted in the area 453.

The extraction target data to be additionally learned is the extraction target data plotted in the

region

452 or 454 where the feature amount does not match (match or not match) the classification item (there is an inverse correlation between the feature amount and the classification item). The extraction target data that does not require additional learning is the extraction target data plotted in the region 451 or the region 453 in which the feature amount matches the classification item (the feature amount has a correlation with the classification item).

The extraction target data whose error flag is correct is data corresponding to the classification item, and is extraction target data plotted in the region 451, the region 452, or between the region 451 and the region 452. The extraction target data that is incorrect is data that does not match the classification item, and is plotted in the region 453, the region 454, or between the region 453 and the region 454.

A supervision data extraction result database: supervision data extraction results Table

Fig. 10 is a diagram showing a data configuration of the supervisory data extraction result table 191 according to the present embodiment. The supervised data extraction result table 191 is, for example, data in a table format, and 1 record (row) indicates a search result record (image data) of the search result table 164 (see fig. 7) extracted in accordance with any one of the extraction conditions in the supervised data extraction condition table 181, and includes attributes of the supervised data identification information 192, the feature amount 193, the classification item 194, whether or not to add learning 195, the image identification information 196, the similarity 197, and the machine learning model version 198.

The supervisory data identification information 192 is identification information of image data extracted as supervisory data.

The feature amount 193 corresponds to the feature amount 183 in the search condition of the supervised data extraction condition table 181, and indicates which feature amount is a keyword.

The classification item 194 is a classification item as a result of determination by the user on the image data extracted as the supervision data, and is a match or a non-match.

Whether or not the additional learning 195 indicates whether or not additional learning is required for the image data extracted as the supervised data.

The image identification information 196 is identification information of image data extracted as supervision data, and corresponds to the image identification information 166 (see fig. 7).

The similarity 197 is a value of a feature indicated by the feature 193 of the image data.

The machine learning model version 198 represents the machine learning model version 178 of the image data extracted as the supervised data.

The records of which the supervised data identification information 192 is "LD 01" and "LD 02" are data that agree under the extraction condition of which the extraction condition identification information is "SC 01". The image data indicated by the record in which the supervised data identification information 192 is "LD 01" is determined to be image data that requires additional learning, if the feature amount is 15 to the extent that the head color is determined to be black and the value of the feature amount is equal to or less than the threshold B186, the image data is determined to be matched by the user. The image data indicated by the record in which the supervised data identification information 192 is "LD 02" is determined to be image data that does not match the user and requires additional learning when the feature amount of the image data determined to be of the degree that the head color is black is 85 and the value of the feature amount is equal to or greater than the threshold a 185.

Section for extracting supervision data

The explanation returns to fig. 3. When the administrator instructs to extract the supervisory data in the terminal 320, the Web server 116 notifies the supervisory data extraction section 118. The supervised data extraction unit 118 extracts a search result record that matches any one of the search conditions in the supervised data extraction condition table 181 from among the search result records (image data) in the search result table 164 (see fig. 7), and generates a supervised data extraction result table 191. The extracted supervised data is transmitted as additional learning data to the additional learning device 300 (see fig. 1) together with image data stored in the image database 120 and corresponding to the image recognition information 196. The details of the extraction process will be described with reference to fig. 11 described later.

Update section of machine learning model

The explanation returns to fig. 3. The machine learning model updating unit 119 receives the new machine learning model 420 (see fig. 1) generated by the additional learning device 300, and replaces the machine learning model 114 of the feature extracting unit 113 (updates the machine learning model 114 with the new machine learning model 420).

Supervision data extraction processing

Fig. 11 is a flowchart of the supervisory data extraction process executed by the supervisory data extraction unit 118 according to the present embodiment. The details of the supervised data extraction process will be described with reference to fig. 11.

In step S101, the supervisory data extraction unit 118 initializes the supervisory data extraction result table 191 and sets the number of records to 0.

In step S102, the supervisory data extraction unit 118 repeats steps S103 to S109 for the extraction conditions indicated by the records in the supervisory data extraction condition table 181. Hereinafter, the extraction conditions indicated by the record selected in step S102 will be referred to as present extraction conditions.

In step S103, the supervisory data extraction unit 118 repeats steps S104 to S108 for each search result record (image data) included in the latest search result table 164 (see fig. 7). Hereinafter, the image data indicated by the search result record selected in step S103 is referred to as extraction target data.

In step S104, the supervised data extraction unit 118 determines whether or not the machine learning model version 178 of the extraction target data matches the machine learning model version 189 of the present extraction condition, and if so (step S104 → yes), it proceeds to step S105, and if not (step S104 → no), it proceeds to step S109.

In step S105, the supervised data extraction unit 118 determines whether or not the extraction target data matches the error flag 184 of the present extraction condition, and if so (step S105 → yes), the process proceeds to step S106, and if not (step S105 → no), the process proceeds to step S109.

The correct and consistent alignment flag 184 indicates that the classification item 177 (see fig. 7) of the extraction target data matches. The incorrect and consistent match flag 184 indicates that the classification item 177 of the extracted object data is not matched. The agreement of the right and wrong flags 184 indicates that the classification item 177 of the extraction target data is coincident or non-coincident.

In step S106, the supervisory data extractor 118 determines whether or not addition determination is necessary based on the correlation between the features of the extraction target data and the classification items. Specifically, if the value of the feature value is equal to or greater than the threshold value a185 and the classification items match, the supervised data extraction unit 118 determines that there is a correlation and additional learning is not necessary (additional learning no). If the value of the feature amount is equal to or greater than the threshold value a185 and the classification item is not satisfied, the supervised data extraction unit 118 determines that there is an inverse correlation and additional learning is necessary. If the value of the feature value is equal to or less than the threshold value B186 and the classification items match, the supervised data extraction unit 118 determines that there is an inverse correlation and additional learning is necessary. If the value of the feature amount is equal to or less than the threshold value B186 and the classification item is not satisfied, the supervised data extraction unit 118 determines that there is a correlation and does not need additional learning.

In step S107, the supervised data extraction unit 118 determines whether or not the extraction target data matches the additional learning 187 of the present extraction condition, and if so (step S107 → yes), the process proceeds to step S108, and if not (step S107 → no), the process proceeds to step S109. Whether the extraction target data matches the additional learning 187 of the present extraction condition means that whether the additional learning determined in step S106 is necessary or not matches the condition (necessary, unnecessary, both (necessary or unnecessary)) indicated by the additional learning 187 of the present extraction condition.

In step S108, the supervisory data extraction unit 118 adds the extraction target data to the supervisory data extraction result table 191. At the time of addition, an addition record is added to the supervised data extraction result table 191, new identification information is stored in the supervised data identification information 192, the feature quantity 183 of the present extraction condition is stored in the feature quantity 193, the classification item 177 of the extraction target data is stored in the classification item 194, the determination result of whether or not to add, which is determined in step S106, is stored in the whether or not to add learning 195, the image identification information 166 of the extraction target data is stored in the image identification information 196, the feature quantity of the extraction target data corresponding to the feature quantity 183 of the present extraction condition is stored in the similarity 197, and the machine learning model version 178 of the extraction target data is stored in the machine learning model version 198.

In step S109, if the supervisory data extraction unit 118 executes steps S104 to S108 for all the search result records included in the latest search result table 164, the process proceeds to step S110. If there is a remainder, the next search result is recorded as the extraction target data, and steps S104 to S108 are executed.

In step S110, if the supervisory data extraction unit 118 executes steps S103 to S109 for all the extraction conditions included in the supervisory data extraction condition table 181, the supervisory data extraction process is ended. If there is a surplus, the next extraction condition is set as the present extraction condition, and steps S103 to S109 are executed.

Characteristics of supervision data extraction processing

The image data to be subjected to the extraction of the supervision data is data obtained by giving a result (classification item) to which the user determines whether or not the person matches the person to be searched, to a result obtained by setting the condition of the person to be searched and searching the result. Since the classification item is already assigned, the classification item can be used as supervision data without newly assigning a correct tag. For example, the data having the supervised data identification information 192 of "LD 01" shown in fig. 10 can be used for additional learning as the data corresponding to the feature amount 193 of "the head color is black". It is not necessary to collect data or newly provide a correct tag as in the conventional technique, and the cost for collecting supervision data and providing a correct tag can be reduced.

The extracted supervision data is output to the outside of the image retrieval apparatus 100. Thus, the additional learning period can be shortened by making the task of applying the correct tag efficient. In addition, additional learning may be executed inside the image search apparatus 100.

As for additional learning of deep learning, it is disclosed in many documents that various types of learning can be performed by a machine learning framework such as Caffe. The effects of the present embodiment are not limited to a specific frame. In particular, when additional learning is to be intensively performed on feature amounts having a high necessity of additional learning, it is possible to efficiently perform additional learning by extracting supervised data and performing learning while setting whether or not additional learning 187 is necessary (the feature amount and the classification item are inversely correlated) based on the collected data. In addition, when it is desired to improve the accuracy by the entire learning, the pair error flag 184 is extracted as both of the pieces of supervised data and learned based on the collected data, whereby the additional learning can be efficiently performed.

Modification example 1: machine learning model version

The image search apparatus 100 may send a warning to a user who refers to the search result table 164 including the feature values of the old version of the machine learning model 114 by referring to the machine

learning model versions

178 and 198 added to the search result table 164 (see fig. 7) and the supervised data extraction result table 191 (see fig. 10) and who loads and refers to the search result table 164 including the feature values of the old version of the machine learning model 114, and to a manager who performs the supervised data extraction. This prevents extraction of the supervision data from search result records of different versions, and enables consistent additional learning.

Modification example 2: supervision of data extraction Condition

In the above-described embodiment, the extraction conditions stored in the supervision data extraction condition table 181 (see fig. 8) are described as the set conditions. The extraction conditions may be changed by the administrator of the image search apparatus 100. Thus, for example, the additional learning may be performed by limiting the feature amount to a feature amount considered to be low in accuracy (for example, the head color, the color of clothes of the upper body, or the like, is gray) instead of extracting the supervision data for all the feature amounts. Alternatively, the number of pieces of supervision data can be increased or decreased by adjusting the error flag 184, the threshold a185, and the threshold B186.

Fig. 12 is a configuration diagram of a supervisory data extraction condition setting screen 600 displayed on the terminal 320 of the image search device 100 according to the modification of the present embodiment. The supervisory data extraction condition setting screen 600 includes an extraction setting area 610 and an extraction condition area 620.

The administrator sets the content reflected in the supervised data extraction condition table 181 (refer to fig. 8) in the extraction condition area 620. Specifically, the values set in the feature amount 183, the error flag 184, the threshold a185, the threshold B186, and the presence/absence learning 187 of the supervised data extraction condition table 181 are set in the list box 621, the list box 622, the text box 623, the text box 624, and the list box 625, respectively. For example, for the error flag 184, it is selected from "correct", "incorrect", and "both" of the list box 622.

The extraction condition identification information 611 of the extraction setting area 610 corresponds to the extraction condition identification information 182 of the supervised data extraction condition table 181. When the administrator inputs the extraction condition identification information 611 and presses the load button 613, the contents of a record in which the extraction condition identification information 182 coincides with the extraction condition identification information 611 input by the administrator among the records of the supervised data extraction condition table 181 are displayed in the extraction condition area 620.

When the manager inputs the extraction condition identification information 611 and presses the save button 612, the contents set in the extraction condition area 620 are reflected in a record in which the extraction condition identification information 182 coincides with the extraction condition identification information 611 input by the manager among the records of the supervised data extraction condition table 181. If there is no record matching the extraction condition identification information 611 input by the administrator, a record is added to the supervised data extraction condition table 181, reflecting the contents set in the extraction condition area 620.

Modification 3: additional necessity of learning

By visualizing the correlation between the feature values and the classification items, it is possible to determine which feature values need to be additionally learned. Fig. 13 and 14 are

graphs

470 and 480 showing the correlation between the feature amount and the classification item according to the modification of the present embodiment. The graph 470 is a graph in which search result records (see fig. 7) under search conditions such that a certain feature amount, for example, the color of upper body clothes is blue, are plotted. The graph 480 is a graph in which search result records under search conditions are plotted for a certain feature amount, for example, the color of upper body clothes is black. The "r ═ 8" and "r ═ 8" located on the upper right of the graph represent correlation coefficients.

In the graph 470, the search results plotted in the region 471 where the classification item is coincident and the feature amount is large and the region 472 where the classification item is non-coincident and the feature amount is small are many, and there is a correlation between the feature amount and the classification item. Therefore, it can be said that the necessity of additional learning for the feature amount "the color of the upper body garment is blue" is low.

On the other hand, in the graph 480, the search results plotted in the region 481 in which the classification item is coincident and the feature amount is small and the region 482 in which the classification item is non-coincident and the feature amount is large are many records, and there is an inverse correlation between the feature amount and the classification item. Therefore, it can be said that the necessity of additional learning for the feature amount "the color of the upper body garment is black" is relatively high.

The image search device 100 may include: and a chart generation unit that presents a chart showing the correlation between the feature amounts and the classification items shown in the

charts

470 and 480 to the administrator. The administrator can preferentially add learning to the feature amount having low correlation (inverse correlation (negative correlation) or weak correlation) on the graph.

The graph generating unit may present the graph to the administrator and calculate and display the correlation coefficient. This makes it easy for the administrator to make a decision of additional learning.

The graph generating unit may calculate and display an average value and a standard deviation for the feature values of the search result records that match and do not match, in addition to the correlation coefficient. The administrator can preferentially perform additional learning or the like with respect to feature amounts having a low matching average value, a high non-matching average value, and a large matching or non-matching standard deviation, and can determine the feature amount to be additionally learned with reference to the correlation coefficient, the average value, and the standard deviation.

The supervised data used for the additional learning may be extracted by setting extraction conditions on the supervised data extraction condition setting screen 600 (see fig. 12), or may be extracted as search result records specified by the administrator on a graph. The image search device 100 may extract a search result record of a region specified on a graph by a manager as the supervision data, for example.

Modification example 4: timing of supervision data extraction

In the above-described embodiment, the image search apparatus 100 extracts the supervised data for additional learning when instructed by the administrator. The image search device 100 may extract and output the extracted image to the additional learning device 300 at a predetermined cycle, for example. The image search device 100 may repeat the extraction periodically and output the extraction result to the additional learning device 300 when the number of pieces of the extraction result is equal to or greater than the number of collected data 188 (see fig. 8).

The supervision data may be extracted every time the user sets the classification items of the classification item setting area 534. In detail, each time the user sets a classification item, the Web browser on the terminal 320 transmits the image identification information and the classification item to the Web server 116. The Web server 116 outputs the image identification information and the classification items to the classification item registration unit 117 and the supervisory data extraction unit 118.

The classification item registration unit 117 changes the classification item 177 of the search result record corresponding to the image identification information, which is a record in the search result table 164 (see fig. 7) of the search result database 150, to a classification item output by the Web server 116.

If a search result record corresponding to the image identification information, which is a record in the search result table 164, matches any one of the extraction conditions in the supervised data extraction condition table 181 (see fig. 8), the supervised data extraction unit 118 stores the search result record in the supervised data extraction result table 191 (see fig. 10). When the user changes the classification item, the user changes the classification item 177, or when the user does not comply with any extraction condition of the supervised data extraction condition table 181 (see fig. 8) due to the change, deletes the record.

Thus, the supervision data is immediately extracted based on the classification item set by the user. When the number of extraction results reaches the collected data number 188, the supervised data can be immediately extracted and output to the additional learning device 300 (see fig. 1). As a result, the downtime of the system is reduced, the maintenance cost in daily work is reduced, and the update of the machine learning model 114 becomes fast.

The image search device 100 may periodically calculate the correlation coefficient, average value, and standard deviation of the features and classification items in the search result record, and output the calculated correlation coefficient, average value, and standard deviation to the additional learning device 300 when predetermined conditions are satisfied. For example, the image search apparatus 100 may output the correlation coefficient when the correlation coefficient is smaller than a predetermined value.

Thus, the image search apparatus 100 can perform additional learning as needed without an instruction from the administrator. Alternatively, the image search apparatus 100 can prompt the administrator to additionally learn.

Modification example 5: determination target data, feature quantity

The machine learning model 114 in the above embodiment extracts and classifies feature amounts of a head color, a color of upper body clothes, and a color of lower body clothes from image data of a person. Regarding color, it is not necessarily limited to black/gray/white/blue. Not limited to colors, articles worn on the body such as glasses and hats, articles carried by bags and smartphones, heights, and the like may be extracted and classified as feature quantities. By corresponding to such various feature amounts for the person, the accuracy of the search performed by the image search device 100 can be improved.

The object (input data) of the machine learning model 114 in the above-described embodiment is an image of a person, but is not limited to this, and may be an article. The search for images is not limited to the search for images, and may be a search for documents, for example. In the device for searching for a document by type, category, or the like, the supervised data of the machine learning model may be extracted using the type or category as the feature amount. The machine learning model is not limited to deep learning, and may be a machine learning model such as an SVM.

Modification example 6: search conditions

In the above embodiment, the search condition of the person is the head color, the color of the upper body clothes, and the color of the lower body clothes. In addition, the search may be performed based on conditions such as the shooting time of the camera 310, the shot camera 310, and the shooting area.

Alternatively, as the search condition, image data of the person to be searched for may be used instead of the head color, the color of the upper body clothes, and the color of the lower body clothes. The image searching unit 115 extracts the head color, the color of the upper body clothing, and the color of the lower body clothing from the image data serving as the search condition, and searches the feature quantity table 131 (see fig. 5) of the feature quantity database 130 on the condition of the extracted colors. Thus, the user can search for the image of the target person without specifying the head color, the color of the upper body clothes, and the color of the lower body clothes.

Modification example 7: display order of search results

In the above-described embodiment, the search result 531 is displayed in the search result area 530 in descending order of the degree of matching with the search condition (average degree of similarity). The images may be displayed in order based on other information such as the camera, the setting area thereof, and the shooting time.

Other modifications

The present invention is not limited to the above-described embodiments, and can be modified within a range not departing from the gist thereof. The image search device 100 executes image acquisition, search of a person to be searched, and extraction of supervision data by 1 computer, but may be executed by a plurality of computers. In addition, the user interface is a Web browser on the terminal 320, but is not limited thereto.

While some embodiments of the present invention have been described above, these embodiments are merely examples and do not limit the technical scope of the present invention. The present invention can take other various embodiments, and various modifications such as omission and replacement can be made without departing from the scope of the present invention. The order of the processes may be changed or the processes may be performed in parallel.

For example, the feature table 131 (see fig. 5), the search result table 164 (see fig. 7), and the supervised data extraction result table 191 (see fig. 10) include the

image identification information

134, 166, and 196, but may include the image itself instead. The image search unit 115, the classification item registration unit 117, and the supervisory data extraction unit 118 may be 1 functional unit. Steps S104 and S105 may be exchanged.

These embodiments and modifications thereof are included in the scope and gist of the invention described in the present specification and the like, and are included in the invention described in the patent claims and the scope equivalent thereto.

Description of the reference numerals

100 image search device

111 image obtaining part

112 human detection unit

113 feature extraction unit

114 machine learning model

115 image search unit

117 classification item registration unit (classification item acquisition unit)

118 supervisory data extracting part

119 machine learning model updating unit

300 additional learning device

510 search setting area

520 search for conditional area

530 search results area

531 search results

532 image confirmation area

533 average similarity bar

The classification items set the area 534.

Claims

1. An image search device is characterized by comprising:

a feature extraction unit that extracts a feature amount from the acquired image using a machine learning model;

an image search unit that searches for the image using the feature amount and outputs a search result;

a classification item acquisition unit that classifies each image of the search result and acquires a classification item indicating an assigned classification result; and

and a supervised data extraction unit that extracts an image to be supervised data for additionally learning the machine learning model, based on a correlation between the feature amount and the classification item.

2. The image retrieval device according to claim 1,

the image search unit acquires a feature of a target object, searches for the image by comparing the feature of the target object with a feature extracted from the image, and outputs a search result,

the classification items include a coincidence indicating that the image includes an object as the target and a non-coincidence indicating that the image does not include an object as the target.

3. The image retrieval device according to claim 1 or 2,

the supervision data extraction unit outputs the extracted image to the outside of the image retrieval device together with any one of the classification items and the feature quantities of the image.

4. The image retrieval device according to any one of claims 1 to 3,

the supervised data extraction unit extracts any one of an image in which the feature amount and the classification item are inversely related and an image in which the feature amount and the classification item are related.

5. The image retrieval device according to claim 4,

the supervision data extraction unit extracts any one of an image in which the feature amount is lower than a predetermined value and the classification item is a match, an image in which the feature amount is higher than a predetermined value and the classification item is a non-match, an image in which the feature amount is lower than a predetermined value and the classification item is a non-match, and an image in which the feature amount is higher than a predetermined value and the classification item is a match.

6. The image retrieval device according to claim 1,

the image search device is provided with: and a graph generating unit that generates a graph in which the image is plotted, using the feature amount and the classification item as axes.

7. The image retrieval device according to claim 6,

the graph generating unit calculates and displays any one of a correlation coefficient of the feature amount and the classification item of the image plotted on the graph, and an average and a standard deviation of the feature amount for each classification item.

8. The image retrieval device according to claim 1,

the supervision data extraction unit extracts an image to be used as supervision data, outputs the image to be used as supervision data to the outside of the image retrieval device, and notifies that the condition is satisfied at any time among a predetermined time elapsed from the last extraction, when the number of extracted items satisfies a predetermined condition, when the correlation between the feature amount of the image and the classification item satisfies a predetermined condition, when the average of the feature amounts of the classification items of the image satisfies a predetermined condition, and when the standard deviation of the feature amount of the classification items of the image satisfies a predetermined condition.

9. The image retrieval device according to claim 1,

assigning a version of the result of the additional learning to the machine learning model that is changed when the machine learning model is updated,

the supervision data extraction unit extracts an image as supervision data based on a correlation between a feature quantity extracted by a current version of the machine learning model and a classification item given based on the feature quantity search.

10. The image retrieval device according to claim 9,

when the supervision data extraction section is instructed to extract an image which becomes supervision data in accordance with a correlation with a classification item retrieved and given based on a feature amount extracted based on a machine learning model of a version different from a current version, a warning is issued.

11. The image retrieval device according to claim 1,

the image is an image containing a person,

the feature value is any one of a color of a head of the person, a color of clothes of an upper body, a color of clothes of a lower body, an article carried by the person, and a feature value related to an article worn by the person.

12. A method for extracting supervision data of an image retrieval device is characterized by executing the following steps:

a step of extracting a feature amount from the acquired image using a machine learning model;

a step of retrieving the image using the feature amount and outputting a retrieval result;

classifying each image of the search result to obtain a classification item indicating the assigned classification result;

and extracting an image to be supervised data for additionally learning the machine learning model based on the correlation between the feature amount and the classification item.