WO2021237967A1 - 一种目标检索方法及装置 - Google Patents
一种目标检索方法及装置 Download PDFInfo
- Publication number
- WO2021237967A1 WO2021237967A1 PCT/CN2020/112221 CN2020112221W WO2021237967A1 WO 2021237967 A1 WO2021237967 A1 WO 2021237967A1 CN 2020112221 W CN2020112221 W CN 2020112221W WO 2021237967 A1 WO2021237967 A1 WO 2021237967A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- feature vector
- image
- image area
- area corresponding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 239000013598 vector Substances 0.000 claims abstract description 220
- 238000000605 extraction Methods 0.000 claims description 45
- 238000012545 processing Methods 0.000 claims description 21
- 238000004422 calculation algorithm Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 16
- 238000001514 detection method Methods 0.000 abstract description 38
- 241000282472 Canis lupus familiaris Species 0.000 description 13
- 238000010586 diagram Methods 0.000 description 13
- 238000013145 classification model Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 241000282326 Felis catus Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/532—Query formulation, e.g. graphical querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/73—Querying
- G06F16/732—Query formulation
- G06F16/7335—Graphical querying, e.g. query-by-region, query-by-sketch, query-by-trajectory, GUIs for designating a person/face/object as a query predicate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
- G06F16/784—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
Definitions
- This application relates to the field of computer technology, in particular to a target retrieval method and device.
- the embodiments of the present application provide a target retrieval method and device to improve the efficiency and versatility of target retrieval.
- a target retrieval method including:
- the image area corresponding to the target is detected from the object to be retrieved. If image areas corresponding to multiple targets are detected from the object to be retrieved, the image areas corresponding to the multiple objects are displayed to the user through the terminal, And receiving a selection instruction input by the user returned by the terminal, wherein the selection instruction includes at least an image area corresponding to the selected target;
- a search is performed in a retrieval database to obtain an object whose target feature vector similarity is greater than a threshold, wherein the retrieval database includes at least a plurality of associated objects and the target of each target included in the object Feature vector.
- obtaining the target feature vector of the target according to the image area corresponding to the target in the selection instruction specifically includes:
- the image area corresponding to the target is used as an input parameter to perform feature extraction on the image area corresponding to the target to obtain an image of the target Feature vector, and use the image feature vector as the target feature vector of the target.
- detecting and obtaining the image area corresponding to the target from the object to be retrieved specifically includes:
- If the type of the object to be retrieved is video, determine the position information of the image area corresponding to the target in the video frame of the object to be retrieved;
- the target is tracked, and the position information of the target on each tracked video frame and the corresponding image area are determined.
- obtaining the target feature vector of the target according to the image area corresponding to the target in the selection instruction specifically includes:
- searching in a retrieval database according to the target feature vector of the target to obtain objects whose target feature vector similarity is greater than a threshold specifically includes:
- the search database includes multiple clusters, Each cluster category includes target feature vectors of multiple targets, and the multiple cluster categories are obtained by clustering each target in the retrieval database based on a clustering algorithm;
- the target whose feature vector similarity of the target is greater than the threshold is obtained.
- the method for obtaining the retrieval database is:
- the object sample set includes a plurality of object samples, and the type of the object samples is an image or a video;
- the target feature vector of each target is stored in association with the corresponding target sample, and updated to the retrieval database.
- a target retrieval device includes:
- the acquisition module is used to acquire the object to be retrieved
- the first processing module is used to detect and obtain the image area corresponding to the target from the object to be retrieved. If the image area corresponding to the multiple targets is detected from the object to be retrieved, then the image corresponding to the multiple objects is The area is displayed to the user through the terminal, and a selection instruction input by the user returned by the terminal is received, wherein the selection instruction includes at least an image area corresponding to the selected target;
- the second processing module is configured to obtain the target feature vector of the target according to the image area corresponding to the target in the selection instruction;
- the retrieval module is used to search in the retrieval database according to the target feature vector of the target to obtain objects whose similarity of the target feature vector is greater than a threshold, wherein the retrieval database includes at least a plurality of associated objects and the objects include The target feature vector of each target.
- the second processing module is specifically configured to:
- the image area corresponding to the target is used as an input parameter to perform feature extraction on the image area corresponding to the target to obtain an image of the target Feature vector, and use the image feature vector as the target feature vector of the target.
- the second processing module is specifically configured to:
- If the type of the object to be retrieved is video, determine the position information of the image area corresponding to the target in the video frame of the object to be retrieved;
- the target is tracked, and the position information of the target on each tracked video frame and the corresponding image area are determined.
- the second processing module is specifically configured to:
- the retrieval is performed in the retrieval database, and when an object whose similarity of the target feature vector is greater than a threshold is obtained, the retrieval module is specifically configured to:
- the search database includes multiple clusters, Each cluster category includes target feature vectors of multiple targets, and the multiple cluster categories are obtained by clustering each target in the retrieval database based on a clustering algorithm;
- the target whose feature vector similarity of the target is greater than the threshold is obtained.
- the method for obtaining the retrieval database further includes a building module for:
- the object sample set includes a plurality of object samples, and the type of the object samples is an image or a video;
- the target feature vector of each target is stored in association with the corresponding target sample, and updated to the retrieval database.
- An electronic device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor.
- the processor implements the steps of any of the above-mentioned target retrieval methods when the processor executes the program.
- a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of any of the above-mentioned target retrieval methods are realized.
- the object to be retrieved is obtained, the image area corresponding to the target is detected from the object to be retrieved, and the target feature vector of the target is obtained. Then, according to the target feature vector of the target, the similarity is greater than the threshold obtained by searching in the retrieval database In this way, retrieval can be achieved through target detection and feature extraction.
- the target feature vector of the target is obtained.
- the similarity is greater than the threshold obtained by searching in the retrieval database
- Figure 1 is a flowchart of a target retrieval method in an embodiment of the application
- FIG. 2 is a schematic diagram of the image detection and feature extraction process in an embodiment of this application.
- FIG. 3 is a schematic diagram of the detection and feature extraction process for videos in an embodiment of the application
- FIG. 4 is a schematic diagram of the structure of a target retrieval device in an embodiment of the application.
- FIG. 5 is a schematic diagram of the structure of an electronic device in an embodiment of the application.
- image or video retrieval is applied to more and more scenes. For example, taking a photo to identify the object contained in the photo and retrieving an image similar to the object, for example, a product on an e-commerce platform Search, after taking a product image, search for product images similar to the product.
- the retrieval method in the prior art mainly recognizes the object category based on the classification model during retrieval, and then retrieves objects similar to the same category. In this way, given an image containing a specific object, the user can retrieve the same category of objects.
- Object images but this method requires pre-training for different categories of objects. Therefore, the trained classification model is only suitable for a single category of objects. For example, a classification model trained with cat images is not suitable for dogs. Classification and retrieval have poor versatility. If you want to retrieve objects in other categories, you need to retrain, which reduces efficiency and increases costs.
- the embodiment of the present application provides a target retrieval method.
- the retrieval database stores multiple associated objects and the target feature vector of each target included in the object.
- the image area corresponding to the target is obtained in the middle detection, and when multiple targets are determined to be retrieved, the target selected by the user is determined according to the user selection instruction, which more satisfies the user’s retrieval needs, and then obtains the target feature vector of the target in the selection instruction.
- the target feature vector is retrieved in the retrieval database to obtain the objects whose target feature similarity is greater than the threshold. In this way, there is no need to perform pre-training for different types of target objects, and there is no need to determine the category first during retrieval, which achieves a
- the general object retrieval method can meet various retrieval application scenarios, improve versatility and flexibility, and also improve efficiency.
- the target retrieval method in the embodiments of the present application is mainly applied to the server.
- the user inputs an image including a target object through the terminal, clicks to search, and then the terminal sends the image to the server, and the server extracts the object.
- search in the search database match the image with higher similarity, and return it to the terminal, and the terminal will display the retrieved image.
- FIG. 1 is a flowchart of a target retrieval method in an embodiment of this application, the method includes:
- Step 100 Obtain the object to be retrieved.
- the type of the object to be retrieved may be an image or video, which is not limited in the embodiment of the present application, that is, the user can retrieve an image or video that is similar to the target in the image or video.
- Step 110 Detect and obtain the image area corresponding to the target from the object to be retrieved, and obtain the target feature vector of the target according to the image area corresponding to the target.
- step 110 it specifically includes:
- obtaining the image area corresponding to the target can be divided into the following two situations:
- the first case If the type of the object to be retrieved is an image, it specifically includes: Based on the trained detection model, the object to be retrieved is used as the input parameter to obtain the detection frame of the detected target, and the detection frame of the target is used as the target Image area.
- the detection model can be a region-convolutional neural network (Region-Convolutional Neural Networks, R-CNN), a fast region-convolutional neural network (Fast Region-Convolutional Neural Networks, Fast R-CNN) algorithm, a multi-category list
- R-CNN region-convolutional neural network
- Fast Region-Convolutional Neural Networks Fast R-CNN
- SSD Single Shot MultiBox Detector
- Third Edition You Only Look Once v3, yolov3
- the SSD network usually uses the Visual Geometry Group Network (VGG-16) as the basic network
- VGG-16 Visual Geometry Group Network
- the backbone network is VGG-16
- the VGG-16 includes multiple convolutional layers.
- yolov3 uses the previous 52 layers of darknet-53 (without a fully connected layer).
- yolov3 is a fully convolutional network that uses a large number of residual layer jump connections, and in order to reduce the negative effect of gradients caused by pooling, convolution The stride of the layer (conv) to achieve downsampling.
- an image area of the target can be detected.
- the second case if the type of the object to be retrieved is a video, it specifically includes: determining the position information of the image area corresponding to the target in the video frame of the object to be retrieved, and tracking the target according to the position information, and determining the tracked target Position information and corresponding image area on each video frame.
- the follow-up tracking algorithm can be used to determine in each subsequent video frame Whether the target is tracked, and the position information of the target on each tracked video frame is determined, and the corresponding image area is obtained.
- the tracking algorithm can use the tracking algorithm based on the Open Source Computer Vision Library (OpenCV), for example, the kernel correlation filter algorithm (Kernel Correlation Filter, KCF), the deep classification (deepsort) algorithm, etc., in this embodiment of the application There is no restriction in it.
- OpenCV Open Source Computer Vision Library
- KCF kernel correlation filter algorithm
- deepsort deep classification algorithm
- the tracking algorithm is faster than the detection algorithm, thus improving the computational efficiency and speed.
- each tracked video frame may be each continuous video frame from the next video frame in which the video frame containing the target is detected, until the video frame where the tracking disappears is determined, which is not limited in the embodiment of the present application.
- a target may continuously appear in multiple video frames, disappear for a period of time, and then continue to appear in the middle.
- the target of is considered to be the same target.
- the tracking algorithm can be ended According to the algorithm, the image area corresponding to the person is obtained as the image area from the first video frame to the tenth video frame.
- the tracking algorithm can be triggered from the 21st video frame to track until the end of the tracking. For example, if the person is tracked from the 21st video frame to the 25th video frame, then the target The person, who thinks it is another target, determines that the corresponding image area is the image area from the 20th video frame to the 25th video frame.
- the first implementation manner if the type of the object to be retrieved is an image, the target feature vector of the target is obtained according to the image area corresponding to the target, which specifically includes:
- the image area corresponding to the target is used as the input parameter, and the image area corresponding to the target is extracted to obtain the image feature vector of the target.
- the image feature vector is used as the target feature vector of the target.
- the feature extraction model may adopt a residual network (Residual Network, resnet) or a VGG network, which is not limited in the embodiment of the present application.
- a residual network Residual Network, resnet
- VGG network VGG network
- the VGG network structure is mainly composed of convolution and fully connected layers.
- the object to be retrieved is an image including a puppy
- the detection frame of the puppy is obtained through the detection model
- the detection frame of the puppy is input to the feature extraction model to obtain the target feature vector of the puppy.
- the second implementation mode if the type of the object to be retrieved is a video, the target feature vector of the target is obtained according to the image area corresponding to the target, which specifically includes:
- all the video frames in which the target appears can be determined, and the image area in all the video frames that the target appears in can be determined, and then the feature extraction model can be used to separately analyze each image Feature extraction is performed on the region, and each image feature vector is obtained.
- the puppy appears in the fifth video frame to the fifteenth video frame in the video.
- the corresponding image areas are image area 1, image area 2, ... image area 11.
- Step 120 According to the target feature vector of the target, perform a search in the retrieval database to obtain objects whose target feature vector similarity is greater than a threshold, wherein the retrieval database includes at least a plurality of associated objects and target features of each target included in the object vector.
- the target feature vector of the target is compared with the target feature vector of each target included in the retrieval database to retrieve similar images.
- the target feature vector of the target can be directly compared with the target feature vector of each target in the search database to determine an object with a similarity greater than a threshold.
- step 120 specifically includes:
- the search database includes multiple clusters, and each cluster includes multiple target feature vectors.
- the multiple clusters are obtained by clustering each target in the search database based on a clustering algorithm.
- the central target feature vector is the target feature vector of the center point of the cluster class.
- it can be compared with the target feature vectors of other targets in the cluster class.
- the target with the highest average feature similarity, this application In the examples and restricted.
- the target feature vector of the target is compared with the target feature vector of each target included in the cluster with the highest similarity to obtain the target with the target feature vector similarity greater than the threshold.
- each target in the search database can also be clustered in advance, for example, clustering is performed according to the target feature vector of each target.
- clustering is performed according to the target feature vector of each target.
- the comparison obtains objects whose similarity is greater than the threshold value, because when comparing clusters, only the central target feature vector of the clusters needs to be compared, so the efficiency can be improved and the retrieval time can be reduced.
- the search result can then be sent to the terminal, and the terminal can display the retrieved objects, and the terminal can display according to preset rules or methods. For example, the terminal can sequentially display the previously preset number of retrieved objects according to the similarity. For another example, the terminal can sequentially display the retrieved objects according to the degree of similarity from high to low.
- the search database can store the image and the detection frame of each target included in the image, as well as the target feature vector of each target.
- the search database can store the video, the key frame of the video, and the video.
- the image area and target feature vector of each target included in, of course, are not limited, and other information can also be stored in the retrieval database according to requirements.
- the terminal can be Return the key frame of the video, or return the entire video, or determine the link address of the video, and return the link address of the video so that the user can view the retrieved video after clicking the link address through the terminal.
- This application This is not limited in the embodiment.
- an image may include a table and a puppy at the same time.
- the user can select the object in the embodiment of the application. Determining the detection result actually required by the user specifically provides a possible implementation manner: after detecting the image area corresponding to the target from the object to be retrieved, the method further includes:
- the image areas corresponding to multiple targets are detected from the object to be retrieved, the image areas corresponding to the multiple targets are displayed to the user through the terminal.
- the image includes a puppy and a table
- the image areas corresponding to the puppy and the table are respectively determined
- the image area of the puppy and the image area of the table can be sent to the terminal, and the terminal displays the image area of the puppy and the table
- the user can select the target that he actually wants to retrieve, for example, the image area of the puppy is selected.
- the user can also select multiple at the same time.
- the terminal sends the selection instruction input by the user to the server, and the server can only search Images similar to puppies, in this way, can improve retrieval accuracy.
- the multiple detected targets can also be retrieved separately, and each target can find similar images in the retrieval database. For example, retrieve the dog and the table separately, and retrieve the corresponding image. The image similar to the puppy and the image similar to the table are sent to the terminal, and the terminal can simultaneously display the images retrieved for the puppy and the table.
- the image area corresponding to the target can be obtained from the object to be retrieved, and the target feature vector of the target can be determined, and then the retrieval can be performed directly according to the target feature vector of the target to obtain the target feature vector Objects whose similarity is greater than the threshold.
- the retrieval can be performed directly according to the target feature vector of the target to obtain the target feature vector Objects whose similarity is greater than the threshold.
- General target object detection and feature extraction can be used to achieve general target object retrieval, which can meet various retrieval application scenarios and does not need to target different categories of objects.
- Pre-training is highly versatile and can also improve efficiency and reduce costs.
- the search database in the embodiment of the present application includes at least a plurality of associated objects and the target feature vector of each target included in the object. Further, for the method of obtaining the search database, a possible implementation is provided in the embodiment of the present application. Way:
- the method of obtaining the object sample is not limited in the embodiment of the present application, and can be obtained according to different application scenarios and business requirements.
- object samples can be obtained from a network knowledge base.
- product images uploaded by various businesses can be obtained as image samples.
- the video stream of each security device can be accessed as a video sample.
- the image area corresponding to each target is detected from each object sample, and the target feature vector of each target is obtained according to the image area corresponding to each target.
- the first case if the type of the object sample is an image, the object sample is detected according to the detection model to obtain the detection frame (ie image area) of all objects (targets) in the object sample, and then the image areas of all objects , Input to the feature extraction model, perform feature extraction, and obtain target feature vectors of all objects.
- a detection is performed on an image, and the detection frame of a person, a tree and a puppy are detected from the image, and feature extraction is performed respectively to obtain the target feature vector of the person, the target feature vector of the tree and The target feature vector of the puppy.
- the second case If the type of the object sample is video, target detection and tracking are performed on each video frame in the video. For any target, when a video frame containing any target is detected, the tracking algorithm is triggered. Position information of any target in the video frame, track any target, determine the position information and image area of any target in each tracked video frame, and place any target in each video frame The image area of, is input to the feature extraction model to obtain multiple image feature vectors, and average calculation is performed to determine the average image feature of any target as the target feature vector of any target.
- the person and chair will be tracked separately from the second video frame. If the person is tracked in the second video frame to the first video frame 10 video frames. If the person is not tracked in the 11th video frame, the tracking process for the person is stopped, and the average image feature vector of the image area of the person in the first to the 10th video frames is calculated, and if the person is tracked When the chair appears from the second video frame to the fifth video frame, and the chair is not tracked in the sixth video frame, the chair tracking process is ended, and the average image feature vector corresponding to the chair in the first to fifth video frames is calculated.
- the video frame can also be detected at the same time to detect targets other than the tracked target. For example, a new target is detected in the second video frame. For example, the kitten is tracked from the third video frame until the tracking ends, and the target feature vector of the kitten is obtained.
- the image can be stored, as well as the detection frame (ie image area) and target feature vector of each target in the image.
- the video and the target feature vector of each target included in the video can be stored, and it is easy to manage And retrieval, you can also store the key frame of the video and the image area of the target.
- the target feature vector of each target is calculated and stored in advance instead of real-time calculation during retrieval, which can improve retrieval efficiency.
- the target feature vector of each target may not be pre-stored in the retrieval database, but is calculated separately during retrieval, which is not limited in the embodiment of the present application.
- each target in the search database in advance to obtain multiple clusters, and determine the central point and central target feature vector of each cluster, and then in subsequent search applications, It can be compared with the central target feature vector of each cluster class first, and then compared with the target feature vector of each target included in the determined cluster class to obtain the retrieved objects with high similarity, which can improve the retrieval efficiency.
- a retrieval database including images and videos can be established, and any type of target can be Detection does not need to separately establish search databases for different specific categories. It has strong versatility and realizes universal object detection and recognition, making it more convenient to access different application scenarios, such as accessing security or other video streams, suitable for various scenarios.
- FIG. 2 is a schematic diagram of the image detection and feature extraction process in the embodiment of this application.
- the image includes a dog and a table
- the detection frame of the puppy and the detection frame of the table are obtained through detection model detection, and then based on the feature extraction model, the detection frame of the puppy and the detection frame of the table are respectively characterized Extraction, the target feature vector 1 of the puppy and the target feature vector 2 of the table are obtained.
- FIG. 3 is a schematic diagram of the video detection and feature extraction process in an embodiment of this application.
- a video includes a puppy and a table, and each video frame of the video is detected.
- the puppy and the table are tracked separately, for example, it may be small.
- the dog has been moving and tracked to multiple video frames. Determine the image area of the dog in the detected and tracked video frames. Maybe the table has not moved. If it is not tracked in the subsequent video frames, it is determined that the table is detected.
- the feature extraction model perform feature extraction on an image area corresponding to the table to obtain the target feature vector 2 of the table, and perform feature extraction on multiple image areas corresponding to the puppy to obtain small
- the image feature vector corresponding to each image area of the dog is averaged to obtain the target feature vector 1 of the puppy.
- the target detection and feature extraction are realized based on the above process, which can then be applied to the database construction process or retrieval process in the embodiment of this application.
- the target feature vector 1 of the puppy can be combined with the Target feature vector 2, which is stored in association with the corresponding video or image, and the retrieval database is updated;
- Images or videos with similarity greater than the threshold are sent to the terminal for display; in addition, for the retrieval process, there can be other implementations, for example, the image area of the puppy and the image area of the table are sent to the terminal and displayed to the user ,
- the user selects the target he needs to retrieve, for example, the user selects a puppy, the server can only retrieve images or videos that have a similarity with the puppy's target feature vector 1 greater than the threshold, and return to the terminal for display.
- object retrieval is realized through object detection and feature extraction, without limiting object categories, and can be applied to all objects, providing a general object retrieval method, which is more flexible and versatile in application, and improves efficiency.
- the target retrieval method in the embodiment of the present application due to its universality, is suitable for various business scenarios, and can be easily applied to different business scenarios. Based on the foregoing embodiment, several specific application scenarios are used for description below.
- an embodiment of the present application also provides a target retrieval device.
- the target retrieval device may be, for example, the server in the foregoing embodiment.
- the target retrieval device may be a hardware structure, a software module, or a hardware structure plus software. Module.
- the target retrieval device in the embodiment of the present application specifically includes:
- the obtaining module 40 is used to obtain the object to be retrieved
- the first processing module 41 is configured to detect and obtain the image area corresponding to the target from the object to be retrieved. If the image area corresponding to the multiple targets is detected from the object to be retrieved, then the multiple objects corresponding to the The image area is displayed to the user through the terminal, and a selection instruction input by the user returned by the terminal is received, wherein the selection instruction includes at least the image area corresponding to the selected target;
- the second processing module 42 is configured to obtain the target feature vector of the target according to the image area corresponding to the target in the selection instruction;
- the retrieval module 43 is configured to perform a retrieval in a retrieval database according to the target feature vector of the target to obtain objects with a similarity of the target feature vector greater than a threshold, wherein the retrieval database includes at least a plurality of associated objects and objects.
- the target feature vector of each target included.
- the second processing module 42 is specifically configured to:
- the image area corresponding to the target is used as an input parameter to perform feature extraction on the image area corresponding to the target to obtain an image of the target Feature vector, and use the image feature vector as the target feature vector of the target.
- the second processing module 42 is specifically configured to:
- If the type of the object to be retrieved is video, determine the position information of the image area corresponding to the target in the video frame of the object to be retrieved;
- the target is tracked, and the position information of the target on each tracked video frame and the corresponding image area are determined.
- the second processing module 42 is specifically configured to:
- the retrieval is performed in the retrieval database, and when an object whose similarity of the target feature vector is greater than a threshold is obtained, the retrieval module 43 is specifically configured to:
- the search database includes multiple clusters, Each cluster category includes target feature vectors of multiple targets, and the multiple cluster categories are obtained by clustering each target in the retrieval database based on a clustering algorithm;
- the target whose feature vector similarity of the target is greater than the threshold is obtained.
- the method further includes a establishing module 44 for:
- the object sample set includes a plurality of object samples, and the type of the object samples is an image or a video;
- the target feature vector of each target is stored in association with the corresponding target sample, and updated to the retrieval database.
- FIG. 5 shows a schematic structural diagram of an electronic device in an embodiment of this application.
- the embodiment of the present application provides an electronic device.
- the electronic device may include a processor 510 (Center Processing Unit, CPU), a memory 520, an input device 530, an output device 540, etc.
- the input device 530 may include a keyboard, a mouse, a touch screen, etc.
- the output device 540 may include a display device, such as a liquid crystal display (LCD), a cathode ray tube (Cathode Ray Tube, CRT), and so on.
- LCD liquid crystal display
- CRT cathode Ray Tube
- the memory 520 may include a read only memory (ROM) and a random access memory (RAM), and provides the processor 510 with program instructions and data stored in the memory 520.
- the memory 520 may be used to store the program of any target retrieval method in the embodiment of the present application.
- the processor 510 calls the program instructions stored in the memory 520, and the processor 510 is configured to execute any target retrieval method in the embodiments of the present application according to the obtained program instructions.
- a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the target retrieval method in any of the foregoing method embodiments is implemented.
- this application can be provided as a method, a system, or a computer program product. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
- a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
- the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
- These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
- the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (14)
- 一种目标检索方法,其特征在于,包括:获取待检索对象;从所述待检索对象中检测获得目标对应的图像区域,若从所述待检索对象中检测获得多个目标对应的图像区域,则将所述多个目标对应的图像区域通过终端展示给用户,并接收所述终端返回的用户输入的选择指令,其中,所述选择指令中至少包括选中的目标对应的图像区域;根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量;根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,其中,所述检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量。
- 如权利要求1所述的方法,其特征在于,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量,具体包括:若所述待检索对象的类型为图像,则基于已训练的特征提取模型,以所述目标对应的图像区域为输入参数,对所述目标对应的图像区域进行特征提取,获得所述目标的图像特征向量,并将所述图像特征向量作为所述目标的目标特征向量。
- 如权利要求1所述的方法,其特征在于,从所述待检索对象中检测获得目标对应的图像区域,具体包括:若所述待检索对象的类型为视频,则确定所述目标对应的图像区域在所述待检索对象的视频帧的位置信息;并根据所述位置信息,对所述目标进行跟踪,确定所述目标在追踪到的各视频帧上的位置信息和对应的图像区域。
- 如权利要求3所述的方法,其特征在于,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量,具体包括:基于已训练的特征提取模型,分别对所述目标在对应的各视频帧中的图像区域,进行特征提取,获得所述目标在对应的各视频帧中的图像特征向量;根据获得的所述目标在对应的各视频帧中的图像特征向量,确定所述目标的平均图像特征向量,并将所述平均图像特征向量,作为所述目标的目标特征向量。
- 如权利要求1-4任一项所述的方法,其特征在于,根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,具体包括:将所述目标的目标特征向量,分别与所述检索数据库中各个簇类的中心目标特征向量进行比对,确定出相似度最高的簇类,其中,所述检索数据库中包括多个簇类,每个簇类中包括多个目标的目标特征向量,所述多个簇类是基于聚类算法将所述检索数据库中的各个目标进行聚类后获得的;将所述目标的目标特征向量,与所述相似度最高的簇类中包括的各目标的目标特征向量进行比对,获得目标特征向量相似度大于阈值的目标;根据目标与对象的关联关系,获得目标特征向量相似度大于阈值的对象。
- 如权利要求1所述的方法,其特征在于,所述检索数据库的获得方式为:获取对象样本集,其中,所述对象样本集中包括多个对象样本,所述对象样本的类型为图像或视频;分别从各个对象样本中检测获得各个目标对应的图像区域,并根据所述各个目标对应的图像区域,分别获得所述各个目标的目标特征向量;将所述各个目标的目标特征向量,与对应的对象样本关联存储,并更新到检索数据库中。
- 一种目标检索装置,其特征在于,包括:获取模块,用于获取待检索对象;第一处理模块,用于从所述待检索对象中检测获得目标对应的图像区域,若从所述待检索对象中检测获得多个目标对应的图像区域,则将所述多个目 标对应的图像区域通过终端展示给用户,并接收所述终端返回的用户输入的选择指令,其中,所述选择指令中至少包括选中的目标对应的图像区域;第二处理模块,用于根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量;检索模块,用于根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,其中,所述检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量。
- 如权利要求7所述的装置,其特征在于,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量时,第二处理模块具体用于:若所述待检索对象的类型为图像,则基于已训练的特征提取模型,以所述目标对应的图像区域为输入参数,对所述目标对应的图像区域进行特征提取,获得所述目标的图像特征向量,并将所述图像特征向量作为所述目标的目标特征向量。
- 如权利要求7所述的装置,其特征在于,从所述待检索对象中检测获得目标对应的图像区域时,第二处理模块具体用于:若所述待检索对象的类型为视频,则确定所述目标对应的图像区域在所述待检索对象的视频帧的位置信息;并根据所述位置信息,对所述目标进行跟踪,确定所述目标在追踪到的各视频帧上的位置信息和对应的图像区域。
- 如权利要求9所述的装置,其特征在于,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量时,第二处理模块具体用于:基于已训练的特征提取模型,分别对所述目标在对应的各视频帧中的图像区域,进行特征提取,获得所述目标在对应的各视频帧中的图像特征向量;根据获得的所述目标在对应的各视频帧中的图像特征向量,确定所述目标的平均图像特征向量,并将所述平均图像特征向量,作为所述目标的目标特征向量。
- 如权利要求7-10任一项所述的装置,其特征在于,根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象时,检索模块具体用于:将所述目标的目标特征向量,分别与所述检索数据库中各个簇类的中心目标特征向量进行比对,确定出相似度最高的簇类,其中,所述检索数据库中包括多个簇类,每个簇类中包括多个目标的目标特征向量,所述多个簇类是基于聚类算法将所述检索数据库中的各个目标进行聚类后获得的;将所述目标的目标特征向量,与所述相似度最高的簇类中包括的各目标的目标特征向量进行比对,获得目标特征向量相似度大于阈值的目标;根据目标与对象的关联关系,获得目标特征向量相似度大于阈值的对象。
- 如权利要求7所述的装置,其特征在于,针对所述检索数据库的获得方式,还包括建立模块,用于:获取对象样本集,其中,所述对象样本集中包括多个对象样本,所述对象样本的类型为图像或视频;分别从各个对象样本中检测获得各个目标对应的图像区域,并根据所述各个目标对应的图像区域,分别获得所述各个目标的目标特征向量;将所述各个目标的目标特征向量,与对应的对象样本关联存储,并更新到检索数据库中。
- 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1-6任一项所述方法的步骤。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现权利要求1-6任一项所述方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010472146.1A CN111581423B (zh) | 2020-05-29 | 2020-05-29 | 一种目标检索方法及装置 |
CN202010472146.1 | 2020-05-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021237967A1 true WO2021237967A1 (zh) | 2021-12-02 |
Family
ID=72111215
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/112221 WO2021237967A1 (zh) | 2020-05-29 | 2020-08-28 | 一种目标检索方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111581423B (zh) |
WO (1) | WO2021237967A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529751A (zh) * | 2021-12-28 | 2022-05-24 | 国网四川省电力公司眉山供电公司 | 一种电力场景智能识别样本数据的自动筛选方法 |
CN116401392A (zh) * | 2022-12-30 | 2023-07-07 | 以萨技术股份有限公司 | 一种图像检索的方法、电子设备及存储介质 |
CN117194698A (zh) * | 2023-11-07 | 2023-12-08 | 清华大学 | 一种基于oar语义知识库的任务处理系统和方法 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111581423B (zh) * | 2020-05-29 | 2021-10-26 | 上海依图网络科技有限公司 | 一种目标检索方法及装置 |
CN113704534A (zh) * | 2021-04-13 | 2021-11-26 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置及计算机设备 |
CN113239217B (zh) * | 2021-06-04 | 2024-02-06 | 图灵深视(南京)科技有限公司 | 图像索引库构建方法及系统,图像检索方法及系统 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069648A (zh) * | 2017-09-25 | 2019-07-30 | 杭州海康威视数字技术股份有限公司 | 一种图像检索方法及装置 |
CN110188719A (zh) * | 2019-06-04 | 2019-08-30 | 北京字节跳动网络技术有限公司 | 目标跟踪方法和装置 |
CN110209866A (zh) * | 2019-05-30 | 2019-09-06 | 苏州浪潮智能科技有限公司 | 一种图像检索方法、装置、设备及计算机可读存储介质 |
CN110245714A (zh) * | 2019-06-20 | 2019-09-17 | 厦门美图之家科技有限公司 | 图像识别方法、装置及电子设备 |
WO2020051704A1 (en) * | 2018-09-12 | 2020-03-19 | Avigilon Corporation | System and method for improving speed of similarity based searches |
CN111581423A (zh) * | 2020-05-29 | 2020-08-25 | 上海依图网络科技有限公司 | 一种目标检索方法及装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033308A (zh) * | 2018-07-16 | 2018-12-18 | 安徽江淮汽车集团股份有限公司 | 一种图像检索方法及装置 |
CN110297935A (zh) * | 2019-06-28 | 2019-10-01 | 京东数字科技控股有限公司 | 图像检索方法、装置、介质及电子设备 |
CN111143597B (zh) * | 2019-12-13 | 2023-06-20 | 浙江大华技术股份有限公司 | 图像检索方法、终端及存储装置 |
-
2020
- 2020-05-29 CN CN202010472146.1A patent/CN111581423B/zh active Active
- 2020-08-28 WO PCT/CN2020/112221 patent/WO2021237967A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069648A (zh) * | 2017-09-25 | 2019-07-30 | 杭州海康威视数字技术股份有限公司 | 一种图像检索方法及装置 |
WO2020051704A1 (en) * | 2018-09-12 | 2020-03-19 | Avigilon Corporation | System and method for improving speed of similarity based searches |
CN110209866A (zh) * | 2019-05-30 | 2019-09-06 | 苏州浪潮智能科技有限公司 | 一种图像检索方法、装置、设备及计算机可读存储介质 |
CN110188719A (zh) * | 2019-06-04 | 2019-08-30 | 北京字节跳动网络技术有限公司 | 目标跟踪方法和装置 |
CN110245714A (zh) * | 2019-06-20 | 2019-09-17 | 厦门美图之家科技有限公司 | 图像识别方法、装置及电子设备 |
CN111581423A (zh) * | 2020-05-29 | 2020-08-25 | 上海依图网络科技有限公司 | 一种目标检索方法及装置 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114529751A (zh) * | 2021-12-28 | 2022-05-24 | 国网四川省电力公司眉山供电公司 | 一种电力场景智能识别样本数据的自动筛选方法 |
CN116401392A (zh) * | 2022-12-30 | 2023-07-07 | 以萨技术股份有限公司 | 一种图像检索的方法、电子设备及存储介质 |
CN116401392B (zh) * | 2022-12-30 | 2023-10-27 | 以萨技术股份有限公司 | 一种图像检索的方法、电子设备及存储介质 |
CN117194698A (zh) * | 2023-11-07 | 2023-12-08 | 清华大学 | 一种基于oar语义知识库的任务处理系统和方法 |
CN117194698B (zh) * | 2023-11-07 | 2024-02-06 | 清华大学 | 一种基于oar语义知识库的任务处理系统和方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111581423A (zh) | 2020-08-25 |
CN111581423B (zh) | 2021-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021237967A1 (zh) | 一种目标检索方法及装置 | |
US10872424B2 (en) | Object tracking using object attributes | |
US10032072B1 (en) | Text recognition and localization with deep learning | |
US9922271B2 (en) | Object detection and classification | |
US9760792B2 (en) | Object detection and classification | |
US10831814B2 (en) | System and method for linking multimedia data elements to web pages | |
Feng et al. | Attention-driven salient edge (s) and region (s) extraction with application to CBIR | |
US11036790B1 (en) | Identifying visual portions of visual media files responsive to visual portions of media files submitted as search queries | |
US9886762B2 (en) | Method for retrieving image and electronic device thereof | |
JP2000123184A (ja) | 動画内のイベントを検出する方法 | |
CN114241548A (zh) | 一种基于改进YOLOv5的小目标检测算法 | |
US20170352162A1 (en) | Region-of-interest extraction device and region-of-interest extraction method | |
Rashmi et al. | Video shot boundary detection using block based cumulative approach | |
CN113766330A (zh) | 基于视频生成推荐信息的方法和装置 | |
TW201426353A (zh) | 互動式關聯物件檢索方法與系統 | |
US20230060211A1 (en) | System and Method for Tracking Moving Objects by Video Data | |
CN111539257B (zh) | 人员重识别方法、装置和存储介质 | |
CN115115825B (zh) | 图像中的对象检测方法、装置、计算机设备和存储介质 | |
US20170357853A1 (en) | Large Scale Video Search Using Queries that Define Relationships Between Objects | |
US20220300774A1 (en) | Methods, apparatuses, devices and storage media for detecting correlated objects involved in image | |
Obeso et al. | Comparative study of visual saliency maps in the problem of classification of architectural images with Deep CNNs | |
Krishna et al. | Hybrid method for moving object exploration in video surveillance | |
KR20160012901A (ko) | 이미지를 검색하는 방법 및 그 전자 장치 | |
CN111382628B (zh) | 同行判定方法及装置 | |
Wang et al. | Smoke recognition network based on dynamic characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20937684 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20937684 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20937684 Country of ref document: EP Kind code of ref document: A1 |