WO2024195069A1 - 検索装置、検索方法及び検索プログラム - Google Patents
検索装置、検索方法及び検索プログラム Download PDFInfo
- Publication number
- WO2024195069A1 WO2024195069A1 PCT/JP2023/011355 JP2023011355W WO2024195069A1 WO 2024195069 A1 WO2024195069 A1 WO 2024195069A1 JP 2023011355 W JP2023011355 W JP 2023011355W WO 2024195069 A1 WO2024195069 A1 WO 2024195069A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- search
- threshold
- target
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/535—Filtering based on additional data, e.g. user or group profiles
Definitions
- This disclosure relates to a technology for searching for an object captured in image data obtained by a camera whose shooting area is a target space, using an image of the object as a search key.
- a means for finding a specific person is required. For example, such a means is necessary when searching for a lost child, a wanderer, or a person separated from their companion at the request of a space user. Also, such a means is necessary when searching for a user who does not appear at the designated location at the reserved time or entry time. Also, such a means is necessary when searching for a user who is found to have left something behind or to have not completed the necessary procedures after leaving the store. From the perspective of crime prevention, such a means is necessary when locating and apprehending an escaped shoplifter, molester, or assaulter, or when analyzing the behavior of a key witness in a crime investigation.
- Live footage refers to real-time footage.
- the features of a person that can be extracted from a camera image include the following (1) to (4).
- Image features such as HoG. HoG stands for Histograms of Oriented Gradients.
- a person identification process When searching for a person, a person identification process is used that determines that two person images are images of the same person if the distance between the features for two person images is equal to or less than a threshold value.
- a threshold value For example, in person search processes that use features, differences in the distance between features will occur due to differences in people's appearances or camera shooting conditions. As a result, there is a possibility of "mis-searches" where the wrong person is searched for, and "missed searches” where the person to be searched for is omitted from the search results.
- Patent Document 1 describes a technology for solving problems caused by differences in shooting conditions.
- the problem with this patent document is that in face identification processing, the threshold for the similarity of facial features varies depending on the combination of cameras.
- Patent Document 1 identifies a person using a different logic, then calculates the error rate of facial feature matching using the identification result as the correct answer, and adjusts the threshold so that the error rate is constant for each combination of cameras.
- the technology described in Patent Document 1 is a technology for setting a threshold value for each combination of cameras.
- the optimal threshold value differs depending on the appearance of the target person. For example, the distribution of feature values of a person wearing dark-colored clothing from top to bottom may be small, while the distribution of feature values of a person wearing light-colored clothing from top to bottom and dark-colored clothing from bottom to top may be large.
- the threshold value can be set to a relatively small value for a person wearing dark-colored clothing from top to bottom than for a person wearing light-colored clothing from top to bottom and dark-colored clothing from bottom to top. Therefore, the technology described in Patent Document 1 may not be able to prevent erroneous detection or missed searches, and may not be able to properly search for a person.
- An object of the present disclosure is to enable appropriate searching for objects captured in image data.
- the search device comprises: a threshold derivation unit that derives a threshold for each of a plurality of clusters obtained by clustering a plurality of feature quantities stored in a feature database as a target cluster from a distribution of the feature quantities in the target cluster;
- the method further includes a search unit that uses the threshold derived by the threshold derivation unit for a cluster to which a search feature, which is a feature for an image in a search request, belongs as a target threshold, to identify a feature corresponding to the search feature from the plurality of features stored in the feature database.
- a threshold is derived for each cluster obtained by clustering features, and a search is performed using the threshold for the cluster that corresponds to the search feature. This allows searches to be performed using an appropriate threshold that corresponds to the search feature, making it possible to properly search for the target object.
- FIG. 1 is a configuration diagram of a search system 100 according to a first embodiment.
- FIG. 2 is a hardware configuration diagram of a feature extraction device 30 and a search device 40 according to the first embodiment.
- 13 is a flowchart of a collection process according to the first embodiment.
- 4 is a flowchart of a search process according to the first embodiment.
- FIG. 4 is an explanatory diagram of a threshold database 49 according to the first embodiment.
- FIG. 2 is an explanatory diagram of a cluster according to the first embodiment; 4 is a flowchart of a threshold value derivation process according to the first embodiment.
- FIG. 4 is a diagram illustrating the effect of the search system 100 according to the first embodiment.
- Embodiment 1 a case where the target object is a person will be described. That is, in the first embodiment, a case where a person is searched for will be described.
- the target object is not limited to a person, and may be an animal such as a dog or a cat, or an object such as a bag.
- the search system 100 includes a plurality of cameras 10, a hub 20, a feature extraction device 30, and a search device 40.
- the search system 100 includes N cameras 10, namely, camera 10-1 to camera 10-N, where N is an integer of 2 or more.
- Each camera 10 and the hub 20 are connected via a transmission line.
- the hub 20 and the feature extraction device 30 are connected via a transmission line.
- the feature extraction device 30 and the search device 40 are connected via a transmission line.
- the cameras 10 are installed in various locations in a target space for performing person search.
- the cameras 10 capture images of people moving in the target space.
- the cameras 10 transmit the captured images to the hub 20 via a transmission path such as an IP network.
- IP is an abbreviation for Internet Protocol.
- the cameras 10 may be arranged without sharing a field of view. In other words, there may be blind spots in the target space that are not captured by the cameras 10.
- the camera 10 is assumed to be an IP camera that compresses video and transmits the video over an IP network.
- the camera 10 may be a camera that transmits uncompressed video signals over a coaxial cable, or may be a camera that uses another transmission method.
- the hub 20 receives the video data transmitted by the camera 10 and transmits it to the feature extraction device 30 .
- the feature extraction device 30, which is also connected to the Internet may receive the video data via the Internet.
- the Internet corresponds to the hub 20.
- the hub 20 is an aggregation device that supports that protocol.
- the feature extraction device 30 is a computer that extracts features that can be used to identify people from people captured in the video data obtained by the camera 10.
- the feature extraction device 30 includes, as functional components, a video data acquisition unit 31, an object detection unit 32, and a feature extraction unit 33.
- the search device 40 is a computer that searches for a person in response to a search request from a user.
- the search device 40 has a database function that manages the feature amounts of people for searching.
- the database function may be realized by a device external to the search device 40.
- the search device 40 includes, as functional components, a feature acquisition unit 41, a database registration unit 42, a request acquisition unit 43, a search unit 44, an output unit 45, a feature extraction unit 46, and a threshold derivation unit 47.
- the search device 40 also includes, as database functions, a feature database 48 and a threshold database 49.
- the hardware configuration of the feature extraction device 30 and the search device 40 according to the first embodiment will be described with reference to FIG.
- the feature extraction device 30 and the search device 40 each include the following hardware components: a processor 101, a memory 102, a storage 103, and a communication interface 104.
- the processor 101 is connected to other hardware components via signal lines and controls the other hardware components.
- the processor 101 is an IC that performs processing.
- IC stands for Integrated Circuit.
- Specific examples of the processor 101 include a CPU, DSP, and GPU.
- CPU stands for Central Processing Unit.
- DSP stands for Digital Signal Processor.
- GPU stands for Graphics Processing Unit.
- Memory 102 is a storage device that temporarily stores data. Specific examples of memory 102 include SRAM and DRAM. SRAM stands for Static Random Access Memory. DRAM stands for Dynamic Random Access Memory.
- Storage 103 is a storage device that stores data.
- a specific example of storage 103 is a HDD.
- HDD is an abbreviation for Hard Disk Drive.
- Storage 103 may also be a portable recording medium such as an SD (registered trademark) memory card, CompactFlash (registered trademark), NAND flash, a flexible disk, an optical disk, a compact disk, a Blu-ray (registered trademark) disk, or a DVD.
- SD is an abbreviation for Secure Digital.
- DVD is an abbreviation for Digital Versatile Disk.
- the communication interface 104 is an interface for communicating with external devices. Specific examples of the communication interface 104 are Ethernet (registered trademark), USB, and HDMI (registered trademark) ports. USB is an abbreviation for Universal Serial Bus. HDMI is an abbreviation for High-Definition Multimedia Interface.
- the functions of the functional components of the feature extraction device 30 and the search device 40 are realized by software.
- the storage 103 of the feature extraction device 30 stores a program that realizes the function of each functional component of the feature extraction device 30. In the feature extraction device 30, this program is loaded into the memory 102 by the processor 101 and executed by the processor 101. In this way, the function of each functional component of the feature extraction device 30 is realized.
- the storage 103 of the search device 40 stores a program that realizes the function of each functional component of the search device 40. In the search device 40, this program is loaded into the memory 102 by the processor 101 and executed by the processor 101. In this way, the function of each functional component of the search device 40 is realized.
- the storage 103 of the search device 40 realizes a database function.
- the feature extraction device 30 and the search device 40 may each include multiple processors 101, and the multiple processors 101 may cooperate to execute programs that realize the respective functions.
- the operation of the search system 100 includes a collection process for collecting features, a search process for performing a search, and a threshold derivation process for deriving a threshold.
- the collection process according to the first embodiment will be described with reference to FIG.
- the collection process is always running while the search system 100 is in operation.
- Step S11 Transmission standby process
- the video data acquisition unit 31 of the feature extraction device 30 waits for the transmission of video data from the camera 10 via the hub 20.
- the search device 40 may be always running, or may be started simultaneously with the feature extraction device 30.
- the camera ID can be identified by referring to a table that is stored in advance in the feature extraction device 30 and shows the correspondence between the IP address of the camera 10 and the camera ID.
- the IP address of the camera 10 itself may be used as the camera ID.
- any information that is unique to the camera 10 and that can be used in some way to link the actual camera 10 with the video data sent can be used as the camera ID.
- Step S13 Target extraction process
- the object detection unit 32 of the feature extraction device 30 detects a person, which is an object appearing in the decoded image, from the decoded image output in step S13. Then, the object detection unit 32 outputs the detection result of the person, which is the detected object, and the camera ID and the shooting time, which are set with the decoded image, to the feature extraction unit 33.
- the detection of the object is performed by a method using image analysis techniques such as HoG.
- the detection of the object may be performed by a method using a machine learning approach such as CNN, Faster R-CNN, or SSD.
- CNN is an abbreviation for Convolutional Neural Network.
- Faster R-CNN is an abbreviation for Faster-Region-based CNN.
- SSD is an abbreviation for Single Shot Detector.
- the target to be detected needs to match the feature extracted in the process of step S14 described later. For example, if the feature requires a whole-body image of a person, the target detection unit 32 needs to detect a whole-body image of the person. If the feature requires a facial feature, the target detection unit 32 needs to detect a facial image.
- the detection result is an image of the detected person cut out from the decoded video.
- the detection result may be a set of the decoded video and position information in the video where the person is detected. If the feature extraction unit 33 has a means for accessing the recorded decoded video, the detection result may be a set of information for identifying the frame number of the recorded decoded video and position information in the video where the person is detected.
- multiple consecutive frames may be required. For example, when feature of a person's movements is to be extracted, multiple consecutive frames are required. In this case, the object detection unit 32 needs to continuously detect the same person across multiple frames and output the result as the detection result.
- the feature extraction unit 33 of the feature extraction device 30 extracts feature amounts from the detection result output in step S13.
- the feature amount extracted here is a feature amount that allows the similarity of a person to be calculated.
- the feature amount is an image feature such as HoG.
- the feature amount is vector data obtained by applying deep learning to convert image features of a person's entire body into a comparable form.
- the feature amount may be a gait feature that is a characteristic of the person's way of walking.
- the gait feature is the period and width of shaking of the limbs, the period and width of shaking of the upper body, proportions, posture, etc.
- the feature amount may be information obtained by extracting feature amounts that can be obtained from a single frame in each frame of the multiple frames and setting them as a set.
- Step S15 Registration process
- the feature extraction section 33 of the feature extraction device 30 outputs to the search device 40 the feature amount extracted in step S14 together with the camera ID and the shooting time output in step S13.
- the feature acquisition unit 41 of the search device 40 outputs a set of the feature amount output by the feature extraction unit 33, the camera ID, and the shooting time to the database registration unit 42.
- the database registration unit 42 registers the set of the feature amount output by the feature acquisition unit 41, the camera ID, and the shooting time in the feature database 48 as a new feature amount record.
- the database registration unit 42 may appropriately delete records from the feature database 48 that have been registered for a certain period of time. When registering a new record, the database registration unit 42 may overwrite the old record and save it. Alternatively, the database registration unit 42 may delete records from the feature database 48 based on other rules.
- Step S16 End determination process
- the database registration unit 42 of the search device 40 determines whether or not the end condition is satisfied.
- the end condition is, for example, that there is a termination request from the user.
- the end condition may also be that there is a termination trigger generated by a mechanism other than the search system 100, such as a timer. If the end condition is satisfied, the database registration unit 42 ends the process. On the other hand, if the end condition is not satisfied, the database registration unit 42 returns the process to step S11.
- the search process according to the first embodiment will be described with reference to FIG.
- the search process is triggered by a request from the user.
- Step S21 Input waiting process
- the request acquisition unit 43 of the search device 40 waits for input of a search request.
- the search request is input by a user.
- the search request includes image data of a person to be searched, which is an object to be searched.
- the search request may include at least one of a camera ID that captured the image data of the person to be searched and a capture time of the image data of the person to be searched.
- Step S22 Input determination process
- the request acquisition unit 43 of the search device 40 returns the process to step S21.
- the request acquisition unit 43 acquires the search request and outputs image data of the person to be searched for that is included in the search request to the feature extraction unit 46.
- the request acquisition unit 43 outputs the information included in the search request to the search unit 44.
- the image data of the person to be searched for must be an image from which the feature quantities used in person search can be extracted.
- the image data of the person to be searched for must be a full-body image that satisfies the conditions for extracting the full-body image features.
- the image data of the person to be searched for may also be a set of multiple image data.
- the image data of the person to be searched for may be a set of image data taken from multiple angles, or a set of images of the person in various clothing.
- the camera ID and the shooting time are used to identify the starting point of the search. For example, if image data of the person to be searched for was captured by any of the cameras 10 in the target area, the camera ID and the shooting time are the camera ID of the camera 10 that captured the image data and the time of capture. The camera ID and the shooting time may also be the camera 10 capturing the image of a location estimated from eyewitness testimony of the person to be searched for, and the estimated shooting time. The camera ID and the shooting time may also be identified from some kind of electronic log information linked to the person to be searched for, such as IC card touch information, two-dimensional code reading information, or beacon reception record.
- Step S23 Feature extraction process
- the feature extraction unit 46 of the search device 40 extracts feature amounts as search features from the image data of the person to be searched output in step S22.
- the feature extraction unit 46 extracts feature amounts from each piece of image data.
- the feature amounts extracted here are the same as the feature amounts extracted in step S14 of FIG. 3.
- the feature extraction unit 46 outputs the extracted search features to the search unit 44.
- the processes in steps S24 and S25 are executed with each camera 10 as the target camera 10.
- Step S24 Threshold extraction process
- the search unit 44 of the search device 40 uses the feature amount output by the feature extraction unit 46 to obtain a threshold value to be used for search from a threshold value database 49 as a target threshold value.
- the threshold database 49 stores records for the target camera 10 and the target cluster, with each camera 10 being a target camera 10 and each cluster for the target camera 10 being a target cluster. Specifically, for the target camera 10 and the target cluster, records including a camera ID, a cluster ID, a cluster center point, a cluster size, and a threshold are stored. 6, a cluster for each camera 10 is obtained by clustering a plurality of feature amounts for a person, which is an object, captured in image data obtained by the camera 10. Here, the number of clusters for camera 10-i is assumed to be Di (two in FIG. 6).
- the cluster center point is the average value of the features belonging to the cluster.
- the cluster center point may be the center of gravity of the features belonging to the cluster.
- the cluster center point may also be the feature that has the smallest average distance to other features among the features belonging to the cluster.
- the cluster size is the average value of the distance between the cluster center point and the feature values belonging to the cluster, or may be an index such as the variance or standard deviation of the feature values belonging to the cluster.
- the cluster center point and cluster size are information that enable identification of the location and range of a cluster in the feature space. Therefore, in addition to the cluster center point and cluster size, each record may also contain area information for each area obtained by Voronoi division of the feature space based on the cluster center point.
- the search unit 44 identifies the cluster to which the search feature output in step S23 belongs, among the multiple clusters for the target camera 10-i.
- the search unit 44 obtains the threshold value in the record corresponding to the target camera 10-i and the identified cluster from the threshold database 49 as the target threshold value.
- the search unit 44 identifies the cluster to which the search feature belongs by the following method 1 or method 2.
- Method 1 The search unit 44 calculates the distance between the cluster center point of each of the multiple clusters for the target camera 10-i and the search feature. The search unit 44 identifies the cluster with the shortest calculated distance as the cluster to which the search feature belongs.
- Method 2 The search unit 44 sets each of the multiple clusters for the target camera 10-i as a cluster to be calculated.
- the search unit 44 calculates the distance between the cluster center point of the cluster to be calculated and the search feature.
- the search unit 44 divides the calculated distance by the cluster size of the cluster to be calculated.
- the search unit 44 identifies the cluster with the smallest calculated value as the cluster to which the search feature belongs. If the Voronoi-divided region information is included in the records of the threshold database 49, the search unit 44 may specify the cluster to which the search feature belongs based on the region information.
- Step S25 Neighborhood search process
- the search unit 44 of the search device 40 performs a neighborhood search based on the search feature, and identifies a record having a feature close to the search feature from among a plurality of records for the target camera 10-i stored in the feature database 48. Specifically, the search unit 44 sets the threshold value acquired in step S24 as the target threshold value. The search unit 44 identifies one or more feature amounts corresponding to the search feature from the feature amounts of the multiple records for the target camera 10-i stored in the feature database 48. At this time, the search unit 44 identifies one or more feature amounts, among the feature amounts of the multiple records for the camera 10-i, whose distance from the search feature is equal to or less than the target threshold value. The search unit 44 identifies the record corresponding to the identified feature amount as a record having a feature close to the search feature.
- Step S26 Camera determination process
- the search unit 44 of the search device 40 determines whether or not the processes of steps S24 and S25 have been performed on all cameras 10 as the target cameras 10. If the processes have been performed, the search unit 44 advances the process to step S27. On the other hand, if the processes have not been performed, the search unit 44 returns the process to step S24 and performs the process on a new camera 10 as the target camera 10.
- Step S27 Output process
- the output unit 45 of the search device 40 outputs the records identified in step S25 with each camera 10 as the target camera 10. At this time, the output unit 45 outputs the records identified in step S25 after sorting, integrating, or converting them into a form that is easy to handle as a search result.
- An example of sorting is sorting of records.
- the search unit 44 sorts records in descending order of similarity to the search feature. By sorting records in descending order of similarity, it becomes possible to present records to the user in descending order of reliability.
- the search unit 44 may sort the images in ascending or descending order of the shooting time instead of the similarity.
- the search unit 44 may also sort the images in order of a value that combines the similarity, time, and other information with a priority.
- An example of integration is the extraction of representative records.
- the records identified in step S25 may include multiple records of video data captured at similar times by cameras 10 in the same or nearby locations.
- the search unit 44 keeps only a representative portion of these multiple records and excludes the rest.
- the output unit 45 outputs only the remaining records that were not excluded.
- Examples of conversion include extracting necessary information from a record and adding necessary information.
- the search unit 44 extracts, from the information contained in the record, the shooting time, the camera ID, and an image of a person, or an image in which a rectangle surrounding a person is superimposed on a video frame showing the person.
- the search unit 44 then adds the person's search reliability score to the extracted information and outputs it.
- the search reliability score may be, for example, the similarity described above or the distance.
- Step S28 End determination process
- the search unit 44 of the search device 40 determines whether or not a termination condition is satisfied.
- the termination condition is, for example, that a termination request is received from a user.
- the termination condition may also be that a termination trigger is generated by a mechanism other than the search system 100, such as a timer. If the termination condition is satisfied, the search unit 44 ends the process. On the other hand, if the termination condition is not satisfied, the search unit 44 returns the process to step S21.
- the threshold value derivation process operates when a condition is met.
- Step S31 Execution waiting process
- the threshold value derivation unit 47 of the search device 40 waits for the condition to be met.
- the condition is any one of the following (A) to (D) or a combination of any two or more of them: (A) A certain amount of time has passed since the previous threshold derivation process was executed. (B) The number of records stored in the feature database 48 has exceeded a certain number. (C) The number of records in the feature database 48 that include a specific camera ID has exceeded a certain number. (D) A user has requested that a threshold derivation process be executed.
- Step S32 Condition Determination Process If the condition is not met, the threshold value derivation unit 47 of the search device 40 returns the process to step S31. On the other hand, if the condition is met, the threshold value derivation unit 47 advances the process to step S33.
- the threshold derivation unit 47 of the search device 40 reads out a record for the target camera 10-i from the feature database 48.
- the threshold derivation unit 47 may read all records for the target camera 10-i.
- the threshold derivation unit 47 may read only a portion of records randomly sampled from the records for the target camera 10-i.
- the threshold derivation unit 47 may read only records limited to specific conditions, such as a time period (e.g., nighttime) and a season, from among the records for the target camera 10-i.
- Step S34 Clustering process
- the threshold derivation unit 47 of the search device 40 clusters the features in the records read out in step S33 in a feature space.
- the threshold derivation unit 47 can perform clustering using existing algorithms such as the k-Means algorithm, the Mean Shift, and the Gaussian Mixture Model.
- the threshold derivation unit 47 may divide the space of the feature quantities into subspaces of fixed sizes and treat each subspace as one cluster.
- it is assumed that the features of the target camera 10-i are clustered into k clusters, from cluster D i1 to cluster D ik .
- step S35 Based on the determination process in step S36, the process in step S35 is executed with each cluster as a target cluster.
- Step S35 Threshold calculation process
- the threshold derivation unit 47 of the search device 40 calculates a threshold for the target cluster D ij from the distribution of the target cluster D ij .
- the purpose of deriving this threshold is to solve the problem that the distance between features varies for each location in the feature space. Therefore, an index value corresponding to the distance between features for each cluster is specified, and a threshold is calculated according to the index value.
- the threshold derivation unit 47 specifies the variance or standard deviation from the cluster center point of the target cluster D ij as the index value.
- the threshold derivation unit 47 may use the average value of the distance between each feature belonging to the target cluster D ij and the nearest neighboring feature as the index.
- the threshold derivation unit 47 calculates the threshold by multiplying the index by a fixed coefficient.
- Step S36 Cluster determination process
- the threshold derivation unit 47 of the search device 40 determines whether or not the process of step S35 has been performed on all clusters as target clusters. If the process has been performed, the threshold derivation unit 47 advances the process to step S37. On the other hand, if the process has not been performed, the threshold derivation unit 47 returns the process to step S35 and performs the process on a new cluster as a target cluster.
- Step S37 Threshold value update process
- the threshold derivation unit 47 of the search device 40 updates the threshold for the target camera 10-i in the threshold database 49 with the threshold calculated in step S35.
- Step S38 Camera determination process
- the threshold derivation unit 47 of the search device 40 determines whether or not the processes from step S33 to step S37 have been performed on all cameras 10 as the target camera 10. If the processes have been performed, the threshold derivation unit 47 advances the process to step S39. On the other hand, if the processes have not been performed, the threshold derivation unit 47 returns the process to step S33 and performs the process on a new camera 10 as the target camera 10.
- Step S39 End determination process
- the threshold derivation unit 47 of the search device 40 determines whether or not the termination condition is satisfied.
- the termination condition is, for example, that there is a termination request from the user.
- the termination condition may also be that there is a termination trigger generated by a mechanism other than the search system 100, such as a timer. If the termination condition is satisfied, the threshold derivation unit 47 terminates the process. On the other hand, if the termination condition is not satisfied, the threshold derivation unit 47 returns the process to step S31.
- the search system 100 derives a threshold value for each cluster obtained by clustering features, and performs a search using the threshold value for the cluster corresponding to the search feature. This allows a search to be performed using an appropriate threshold value corresponding to the search feature, making it possible to appropriately search for the target object.
- FIG. 8 shows an image of feature amounts plotted in a feature amount space G1.
- Feature amount group G51 and feature amount group G52 are distributions made up of feature amounts of people wearing similar clothes.
- Feature amount group G51 is a distribution of people wearing clothes with dark colors on both the top and bottom.
- Feature amount group G52 is a distribution of people wearing clothes with light colors on the top and dark colors on the bottom.
- the feature group G51 has a small distribution variance, while the feature group G52 has a large distribution variance. In this case, a person wearing clothes as shown in the feature group G51 can be identified with a relatively small threshold value.
- a person wearing clothes as shown in the feature group G52 cannot be identified unless a relatively large threshold value is used.
- the appropriate threshold value may differ depending on the appearance of the object.
- a uniform threshold value is set for the camera 10
- the identification accuracy will vary depending on the appearance of the object.
- a search is performed using a threshold value for a cluster corresponding to a search feature. Therefore, the threshold value used varies depending on whether the search feature belongs to the feature amount group G51 or the feature amount group G52. This makes it possible to appropriately search for an object.
- the output unit 45 outputs the identified records after sorting, integrating, or converting them into a form that is easy to handle as a search result.
- the output unit 45 may estimate a movement route of the search target person from the identified records and output the estimated movement route.
- the output unit 45 excludes the rest of the records of video data captured by the cameras 10 that were shot close in time and in the same or nearby location, leaving only a representative portion of the records.
- the output unit 45 sorts the remaining records that have not been excluded in order of their shooting times.
- the output unit 45 plots the installation positions of the cameras 10 identified from the camera IDs of the records, and connects them with arrows in the sorting order of the records. The route indicated by the plotted points and arrows is the travel route of the person to be searched for.
- the output unit 45 calculates the likelihood of the movement path by statistical processing based on the reliability of the record identified in step S25 and the realization probability of the movement between the cameras 10 on the movement path. Then, the output unit 45 outputs the likelihood together with the movement path.
- the reliability of a record is the distance between the feature amount of the record and the search feature.
- the reliability of a record may be the similarity described in the example of sorting in step S27 of Fig. 4.
- the probability of movement is calculated based on, for example, whether the movement can be performed without being captured by other cameras 10, whether the movement can be performed taking into account the time of capture, etc.
- the output unit 45 may estimate multiple movement routes for one search target person by, for example, changing the method of selecting a representative portion of records in (1). The output unit 45 may then output each movement route together with the likelihood.
- each functional component is realized by software.
- each functional component may be realized by hardware. The following describes the second modification, focusing on the differences from the first embodiment.
- the feature extraction device 30 and the search device 40 have electronic circuits instead of the processor 101, memory 102, and storage 103.
- the electronic circuits are dedicated circuits that realize the functions of each functional component, the memory 102, and the storage 103.
- the electronic circuits may be a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, a logic IC, a GA, an ASIC, or an FPGA.
- GA is an abbreviation for Gate Array.
- ASIC is an abbreviation for Application Specific Integrated Circuit.
- FPGA is an abbreviation for Field-Programmable Gate Array.
- Each functional component may be realized by one electronic circuit, or each functional component may be realized by distributing it among a plurality of electronic circuits.
- ⁇ Modification 3> As a third modification, some of the functional components may be realized by hardware, and other functional components may be realized by software.
- the processor 101, memory 102, storage 103, and electronic circuitry are referred to as the processing circuit.
- the functions of each functional component are realized by the processing circuit.
- circuit In addition, the word "part” in the above explanation may be interpreted as “circuit,” “process,” “procedure,” “processing,” or “processing circuit.”
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB2511153.5A GB2641627A (en) | 2023-03-23 | 2023-03-23 | Search device, search method, and search program |
| PCT/JP2023/011355 WO2024195069A1 (ja) | 2023-03-23 | 2023-03-23 | 検索装置、検索方法及び検索プログラム |
| JP2025508036A JP7781342B2 (ja) | 2023-03-23 | 2023-03-23 | 検索装置、検索方法及び検索プログラム |
| US19/277,111 US20250348531A1 (en) | 2023-03-23 | 2025-07-22 | Search device, search method, and computer readable medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2023/011355 WO2024195069A1 (ja) | 2023-03-23 | 2023-03-23 | 検索装置、検索方法及び検索プログラム |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/277,111 Continuation US20250348531A1 (en) | 2023-03-23 | 2025-07-22 | Search device, search method, and computer readable medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024195069A1 true WO2024195069A1 (ja) | 2024-09-26 |
Family
ID=92841477
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2023/011355 Ceased WO2024195069A1 (ja) | 2023-03-23 | 2023-03-23 | 検索装置、検索方法及び検索プログラム |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20250348531A1 (https=) |
| JP (1) | JP7781342B2 (https=) |
| GB (1) | GB2641627A (https=) |
| WO (1) | WO2024195069A1 (https=) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2013147170A1 (ja) * | 2012-03-29 | 2013-10-03 | 楽天株式会社 | 画像検索装置、画像検索方法、プログラムおよびコンピュータ読取り可能な記憶媒体 |
| JP2022531594A (ja) * | 2019-05-03 | 2022-07-07 | サービスナウ, インコーポレイテッド | 類似のテキスト文書のクラスタリング及び動的再クラスタリング |
-
2023
- 2023-03-23 GB GB2511153.5A patent/GB2641627A/en active Pending
- 2023-03-23 WO PCT/JP2023/011355 patent/WO2024195069A1/ja not_active Ceased
- 2023-03-23 JP JP2025508036A patent/JP7781342B2/ja active Active
-
2025
- 2025-07-22 US US19/277,111 patent/US20250348531A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2013147170A1 (ja) * | 2012-03-29 | 2013-10-03 | 楽天株式会社 | 画像検索装置、画像検索方法、プログラムおよびコンピュータ読取り可能な記憶媒体 |
| JP2022531594A (ja) * | 2019-05-03 | 2022-07-07 | サービスナウ, インコーポレイテッド | 類似のテキスト文書のクラスタリング及び動的再クラスタリング |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250348531A1 (en) | 2025-11-13 |
| JPWO2024195069A1 (https=) | 2024-09-26 |
| GB202511153D0 (en) | 2025-08-27 |
| JP7781342B2 (ja) | 2025-12-05 |
| GB2641627A (en) | 2025-12-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11113587B2 (en) | System and method for appearance search | |
| Shao et al. | Crowdhuman: A benchmark for detecting human in a crowd | |
| US8055081B2 (en) | Image classification using capture-location-sequence information | |
| JP5848336B2 (ja) | 画像処理装置 | |
| US8983192B2 (en) | High-confidence labeling of video volumes in a video sharing service | |
| US11055538B2 (en) | Object re-identification with temporal context | |
| US10104345B2 (en) | Data-enhanced video viewing system and methods for computer vision processing | |
| Guo et al. | Video co-segmentation for meaningful action extraction | |
| CN111310728B (zh) | 基于监控相机和无线定位的行人重识别系统 | |
| US11200683B2 (en) | Image processing device and image processing method | |
| US10719735B2 (en) | Information processing method, information processing device and video surveillance system | |
| US11347739B2 (en) | Performing a chained search function | |
| US20140195560A1 (en) | Two way local feature matching to improve visual search accuracy | |
| KR102097768B1 (ko) | 인접거리 기준을 이용한 영상 검색장치, 방법 및 컴퓨터로 읽을 수 있는 기록매체 | |
| JP7781342B2 (ja) | 検索装置、検索方法及び検索プログラム | |
| CN116844077A (zh) | 视频处理的方法、装置、计算机设备及存储介质 | |
| CN112257628A (zh) | 一种户外比赛运动员的身份识别方法、装置及设备 | |
| CN116071569A (zh) | 图像选择方法、计算机设备及存储装置 | |
| US20200242155A1 (en) | Search apparatus, search method, and non-transitory storage medium | |
| CN113918768B (zh) | 视频的检索方法、检索装置与电子设备 | |
| Kim | Lifelong Learning Architecture of Video Surveillance System | |
| OKAMOTO et al. | First-Person Vision Based Prediction of Information Seeking and Use Behavior in Library |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23928648 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2025508036 Country of ref document: JP |
|
| ENP | Entry into the national phase |
Ref document number: 202511153 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20230323 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2511153.5 Country of ref document: GB |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 2511153.5 Country of ref document: GB |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23928648 Country of ref document: EP Kind code of ref document: A1 |