WO2021237967A1 - 一种目标检索方法及装置 - Google Patents

一种目标检索方法及装置 Download PDF

Info

Publication number
WO2021237967A1
WO2021237967A1 PCT/CN2020/112221 CN2020112221W WO2021237967A1 WO 2021237967 A1 WO2021237967 A1 WO 2021237967A1 CN 2020112221 W CN2020112221 W CN 2020112221W WO 2021237967 A1 WO2021237967 A1 WO 2021237967A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
feature vector
image
image area
area corresponding
Prior art date
Application number
PCT/CN2020/112221
Other languages
English (en)
French (fr)
Inventor
贾蕴哲
Original Assignee
上海依图网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海依图网络科技有限公司 filed Critical 上海依图网络科技有限公司
Publication of WO2021237967A1 publication Critical patent/WO2021237967A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7335Graphical querying, e.g. query-by-region, query-by-sketch, query-by-trajectory, GUIs for designating a person/face/object as a query predicate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people

Definitions

  • This application relates to the field of computer technology, in particular to a target retrieval method and device.
  • the embodiments of the present application provide a target retrieval method and device to improve the efficiency and versatility of target retrieval.
  • a target retrieval method including:
  • the image area corresponding to the target is detected from the object to be retrieved. If image areas corresponding to multiple targets are detected from the object to be retrieved, the image areas corresponding to the multiple objects are displayed to the user through the terminal, And receiving a selection instruction input by the user returned by the terminal, wherein the selection instruction includes at least an image area corresponding to the selected target;
  • a search is performed in a retrieval database to obtain an object whose target feature vector similarity is greater than a threshold, wherein the retrieval database includes at least a plurality of associated objects and the target of each target included in the object Feature vector.
  • obtaining the target feature vector of the target according to the image area corresponding to the target in the selection instruction specifically includes:
  • the image area corresponding to the target is used as an input parameter to perform feature extraction on the image area corresponding to the target to obtain an image of the target Feature vector, and use the image feature vector as the target feature vector of the target.
  • detecting and obtaining the image area corresponding to the target from the object to be retrieved specifically includes:
  • If the type of the object to be retrieved is video, determine the position information of the image area corresponding to the target in the video frame of the object to be retrieved;
  • the target is tracked, and the position information of the target on each tracked video frame and the corresponding image area are determined.
  • obtaining the target feature vector of the target according to the image area corresponding to the target in the selection instruction specifically includes:
  • searching in a retrieval database according to the target feature vector of the target to obtain objects whose target feature vector similarity is greater than a threshold specifically includes:
  • the search database includes multiple clusters, Each cluster category includes target feature vectors of multiple targets, and the multiple cluster categories are obtained by clustering each target in the retrieval database based on a clustering algorithm;
  • the target whose feature vector similarity of the target is greater than the threshold is obtained.
  • the method for obtaining the retrieval database is:
  • the object sample set includes a plurality of object samples, and the type of the object samples is an image or a video;
  • the target feature vector of each target is stored in association with the corresponding target sample, and updated to the retrieval database.
  • a target retrieval device includes:
  • the acquisition module is used to acquire the object to be retrieved
  • the first processing module is used to detect and obtain the image area corresponding to the target from the object to be retrieved. If the image area corresponding to the multiple targets is detected from the object to be retrieved, then the image corresponding to the multiple objects is The area is displayed to the user through the terminal, and a selection instruction input by the user returned by the terminal is received, wherein the selection instruction includes at least an image area corresponding to the selected target;
  • the second processing module is configured to obtain the target feature vector of the target according to the image area corresponding to the target in the selection instruction;
  • the retrieval module is used to search in the retrieval database according to the target feature vector of the target to obtain objects whose similarity of the target feature vector is greater than a threshold, wherein the retrieval database includes at least a plurality of associated objects and the objects include The target feature vector of each target.
  • the second processing module is specifically configured to:
  • the image area corresponding to the target is used as an input parameter to perform feature extraction on the image area corresponding to the target to obtain an image of the target Feature vector, and use the image feature vector as the target feature vector of the target.
  • the second processing module is specifically configured to:
  • If the type of the object to be retrieved is video, determine the position information of the image area corresponding to the target in the video frame of the object to be retrieved;
  • the target is tracked, and the position information of the target on each tracked video frame and the corresponding image area are determined.
  • the second processing module is specifically configured to:
  • the retrieval is performed in the retrieval database, and when an object whose similarity of the target feature vector is greater than a threshold is obtained, the retrieval module is specifically configured to:
  • the search database includes multiple clusters, Each cluster category includes target feature vectors of multiple targets, and the multiple cluster categories are obtained by clustering each target in the retrieval database based on a clustering algorithm;
  • the target whose feature vector similarity of the target is greater than the threshold is obtained.
  • the method for obtaining the retrieval database further includes a building module for:
  • the object sample set includes a plurality of object samples, and the type of the object samples is an image or a video;
  • the target feature vector of each target is stored in association with the corresponding target sample, and updated to the retrieval database.
  • An electronic device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor.
  • the processor implements the steps of any of the above-mentioned target retrieval methods when the processor executes the program.
  • a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of any of the above-mentioned target retrieval methods are realized.
  • the object to be retrieved is obtained, the image area corresponding to the target is detected from the object to be retrieved, and the target feature vector of the target is obtained. Then, according to the target feature vector of the target, the similarity is greater than the threshold obtained by searching in the retrieval database In this way, retrieval can be achieved through target detection and feature extraction.
  • the target feature vector of the target is obtained.
  • the similarity is greater than the threshold obtained by searching in the retrieval database
  • Figure 1 is a flowchart of a target retrieval method in an embodiment of the application
  • FIG. 2 is a schematic diagram of the image detection and feature extraction process in an embodiment of this application.
  • FIG. 3 is a schematic diagram of the detection and feature extraction process for videos in an embodiment of the application
  • FIG. 4 is a schematic diagram of the structure of a target retrieval device in an embodiment of the application.
  • FIG. 5 is a schematic diagram of the structure of an electronic device in an embodiment of the application.
  • image or video retrieval is applied to more and more scenes. For example, taking a photo to identify the object contained in the photo and retrieving an image similar to the object, for example, a product on an e-commerce platform Search, after taking a product image, search for product images similar to the product.
  • the retrieval method in the prior art mainly recognizes the object category based on the classification model during retrieval, and then retrieves objects similar to the same category. In this way, given an image containing a specific object, the user can retrieve the same category of objects.
  • Object images but this method requires pre-training for different categories of objects. Therefore, the trained classification model is only suitable for a single category of objects. For example, a classification model trained with cat images is not suitable for dogs. Classification and retrieval have poor versatility. If you want to retrieve objects in other categories, you need to retrain, which reduces efficiency and increases costs.
  • the embodiment of the present application provides a target retrieval method.
  • the retrieval database stores multiple associated objects and the target feature vector of each target included in the object.
  • the image area corresponding to the target is obtained in the middle detection, and when multiple targets are determined to be retrieved, the target selected by the user is determined according to the user selection instruction, which more satisfies the user’s retrieval needs, and then obtains the target feature vector of the target in the selection instruction.
  • the target feature vector is retrieved in the retrieval database to obtain the objects whose target feature similarity is greater than the threshold. In this way, there is no need to perform pre-training for different types of target objects, and there is no need to determine the category first during retrieval, which achieves a
  • the general object retrieval method can meet various retrieval application scenarios, improve versatility and flexibility, and also improve efficiency.
  • the target retrieval method in the embodiments of the present application is mainly applied to the server.
  • the user inputs an image including a target object through the terminal, clicks to search, and then the terminal sends the image to the server, and the server extracts the object.
  • search in the search database match the image with higher similarity, and return it to the terminal, and the terminal will display the retrieved image.
  • FIG. 1 is a flowchart of a target retrieval method in an embodiment of this application, the method includes:
  • Step 100 Obtain the object to be retrieved.
  • the type of the object to be retrieved may be an image or video, which is not limited in the embodiment of the present application, that is, the user can retrieve an image or video that is similar to the target in the image or video.
  • Step 110 Detect and obtain the image area corresponding to the target from the object to be retrieved, and obtain the target feature vector of the target according to the image area corresponding to the target.
  • step 110 it specifically includes:
  • obtaining the image area corresponding to the target can be divided into the following two situations:
  • the first case If the type of the object to be retrieved is an image, it specifically includes: Based on the trained detection model, the object to be retrieved is used as the input parameter to obtain the detection frame of the detected target, and the detection frame of the target is used as the target Image area.
  • the detection model can be a region-convolutional neural network (Region-Convolutional Neural Networks, R-CNN), a fast region-convolutional neural network (Fast Region-Convolutional Neural Networks, Fast R-CNN) algorithm, a multi-category list
  • R-CNN region-convolutional neural network
  • Fast Region-Convolutional Neural Networks Fast R-CNN
  • SSD Single Shot MultiBox Detector
  • Third Edition You Only Look Once v3, yolov3
  • the SSD network usually uses the Visual Geometry Group Network (VGG-16) as the basic network
  • VGG-16 Visual Geometry Group Network
  • the backbone network is VGG-16
  • the VGG-16 includes multiple convolutional layers.
  • yolov3 uses the previous 52 layers of darknet-53 (without a fully connected layer).
  • yolov3 is a fully convolutional network that uses a large number of residual layer jump connections, and in order to reduce the negative effect of gradients caused by pooling, convolution The stride of the layer (conv) to achieve downsampling.
  • an image area of the target can be detected.
  • the second case if the type of the object to be retrieved is a video, it specifically includes: determining the position information of the image area corresponding to the target in the video frame of the object to be retrieved, and tracking the target according to the position information, and determining the tracked target Position information and corresponding image area on each video frame.
  • the follow-up tracking algorithm can be used to determine in each subsequent video frame Whether the target is tracked, and the position information of the target on each tracked video frame is determined, and the corresponding image area is obtained.
  • the tracking algorithm can use the tracking algorithm based on the Open Source Computer Vision Library (OpenCV), for example, the kernel correlation filter algorithm (Kernel Correlation Filter, KCF), the deep classification (deepsort) algorithm, etc., in this embodiment of the application There is no restriction in it.
  • OpenCV Open Source Computer Vision Library
  • KCF kernel correlation filter algorithm
  • deepsort deep classification algorithm
  • the tracking algorithm is faster than the detection algorithm, thus improving the computational efficiency and speed.
  • each tracked video frame may be each continuous video frame from the next video frame in which the video frame containing the target is detected, until the video frame where the tracking disappears is determined, which is not limited in the embodiment of the present application.
  • a target may continuously appear in multiple video frames, disappear for a period of time, and then continue to appear in the middle.
  • the target of is considered to be the same target.
  • the tracking algorithm can be ended According to the algorithm, the image area corresponding to the person is obtained as the image area from the first video frame to the tenth video frame.
  • the tracking algorithm can be triggered from the 21st video frame to track until the end of the tracking. For example, if the person is tracked from the 21st video frame to the 25th video frame, then the target The person, who thinks it is another target, determines that the corresponding image area is the image area from the 20th video frame to the 25th video frame.
  • the first implementation manner if the type of the object to be retrieved is an image, the target feature vector of the target is obtained according to the image area corresponding to the target, which specifically includes:
  • the image area corresponding to the target is used as the input parameter, and the image area corresponding to the target is extracted to obtain the image feature vector of the target.
  • the image feature vector is used as the target feature vector of the target.
  • the feature extraction model may adopt a residual network (Residual Network, resnet) or a VGG network, which is not limited in the embodiment of the present application.
  • a residual network Residual Network, resnet
  • VGG network VGG network
  • the VGG network structure is mainly composed of convolution and fully connected layers.
  • the object to be retrieved is an image including a puppy
  • the detection frame of the puppy is obtained through the detection model
  • the detection frame of the puppy is input to the feature extraction model to obtain the target feature vector of the puppy.
  • the second implementation mode if the type of the object to be retrieved is a video, the target feature vector of the target is obtained according to the image area corresponding to the target, which specifically includes:
  • all the video frames in which the target appears can be determined, and the image area in all the video frames that the target appears in can be determined, and then the feature extraction model can be used to separately analyze each image Feature extraction is performed on the region, and each image feature vector is obtained.
  • the puppy appears in the fifth video frame to the fifteenth video frame in the video.
  • the corresponding image areas are image area 1, image area 2, ... image area 11.
  • Step 120 According to the target feature vector of the target, perform a search in the retrieval database to obtain objects whose target feature vector similarity is greater than a threshold, wherein the retrieval database includes at least a plurality of associated objects and target features of each target included in the object vector.
  • the target feature vector of the target is compared with the target feature vector of each target included in the retrieval database to retrieve similar images.
  • the target feature vector of the target can be directly compared with the target feature vector of each target in the search database to determine an object with a similarity greater than a threshold.
  • step 120 specifically includes:
  • the search database includes multiple clusters, and each cluster includes multiple target feature vectors.
  • the multiple clusters are obtained by clustering each target in the search database based on a clustering algorithm.
  • the central target feature vector is the target feature vector of the center point of the cluster class.
  • it can be compared with the target feature vectors of other targets in the cluster class.
  • the target with the highest average feature similarity, this application In the examples and restricted.
  • the target feature vector of the target is compared with the target feature vector of each target included in the cluster with the highest similarity to obtain the target with the target feature vector similarity greater than the threshold.
  • each target in the search database can also be clustered in advance, for example, clustering is performed according to the target feature vector of each target.
  • clustering is performed according to the target feature vector of each target.
  • the comparison obtains objects whose similarity is greater than the threshold value, because when comparing clusters, only the central target feature vector of the clusters needs to be compared, so the efficiency can be improved and the retrieval time can be reduced.
  • the search result can then be sent to the terminal, and the terminal can display the retrieved objects, and the terminal can display according to preset rules or methods. For example, the terminal can sequentially display the previously preset number of retrieved objects according to the similarity. For another example, the terminal can sequentially display the retrieved objects according to the degree of similarity from high to low.
  • the search database can store the image and the detection frame of each target included in the image, as well as the target feature vector of each target.
  • the search database can store the video, the key frame of the video, and the video.
  • the image area and target feature vector of each target included in, of course, are not limited, and other information can also be stored in the retrieval database according to requirements.
  • the terminal can be Return the key frame of the video, or return the entire video, or determine the link address of the video, and return the link address of the video so that the user can view the retrieved video after clicking the link address through the terminal.
  • This application This is not limited in the embodiment.
  • an image may include a table and a puppy at the same time.
  • the user can select the object in the embodiment of the application. Determining the detection result actually required by the user specifically provides a possible implementation manner: after detecting the image area corresponding to the target from the object to be retrieved, the method further includes:
  • the image areas corresponding to multiple targets are detected from the object to be retrieved, the image areas corresponding to the multiple targets are displayed to the user through the terminal.
  • the image includes a puppy and a table
  • the image areas corresponding to the puppy and the table are respectively determined
  • the image area of the puppy and the image area of the table can be sent to the terminal, and the terminal displays the image area of the puppy and the table
  • the user can select the target that he actually wants to retrieve, for example, the image area of the puppy is selected.
  • the user can also select multiple at the same time.
  • the terminal sends the selection instruction input by the user to the server, and the server can only search Images similar to puppies, in this way, can improve retrieval accuracy.
  • the multiple detected targets can also be retrieved separately, and each target can find similar images in the retrieval database. For example, retrieve the dog and the table separately, and retrieve the corresponding image. The image similar to the puppy and the image similar to the table are sent to the terminal, and the terminal can simultaneously display the images retrieved for the puppy and the table.
  • the image area corresponding to the target can be obtained from the object to be retrieved, and the target feature vector of the target can be determined, and then the retrieval can be performed directly according to the target feature vector of the target to obtain the target feature vector Objects whose similarity is greater than the threshold.
  • the retrieval can be performed directly according to the target feature vector of the target to obtain the target feature vector Objects whose similarity is greater than the threshold.
  • General target object detection and feature extraction can be used to achieve general target object retrieval, which can meet various retrieval application scenarios and does not need to target different categories of objects.
  • Pre-training is highly versatile and can also improve efficiency and reduce costs.
  • the search database in the embodiment of the present application includes at least a plurality of associated objects and the target feature vector of each target included in the object. Further, for the method of obtaining the search database, a possible implementation is provided in the embodiment of the present application. Way:
  • the method of obtaining the object sample is not limited in the embodiment of the present application, and can be obtained according to different application scenarios and business requirements.
  • object samples can be obtained from a network knowledge base.
  • product images uploaded by various businesses can be obtained as image samples.
  • the video stream of each security device can be accessed as a video sample.
  • the image area corresponding to each target is detected from each object sample, and the target feature vector of each target is obtained according to the image area corresponding to each target.
  • the first case if the type of the object sample is an image, the object sample is detected according to the detection model to obtain the detection frame (ie image area) of all objects (targets) in the object sample, and then the image areas of all objects , Input to the feature extraction model, perform feature extraction, and obtain target feature vectors of all objects.
  • a detection is performed on an image, and the detection frame of a person, a tree and a puppy are detected from the image, and feature extraction is performed respectively to obtain the target feature vector of the person, the target feature vector of the tree and The target feature vector of the puppy.
  • the second case If the type of the object sample is video, target detection and tracking are performed on each video frame in the video. For any target, when a video frame containing any target is detected, the tracking algorithm is triggered. Position information of any target in the video frame, track any target, determine the position information and image area of any target in each tracked video frame, and place any target in each video frame The image area of, is input to the feature extraction model to obtain multiple image feature vectors, and average calculation is performed to determine the average image feature of any target as the target feature vector of any target.
  • the person and chair will be tracked separately from the second video frame. If the person is tracked in the second video frame to the first video frame 10 video frames. If the person is not tracked in the 11th video frame, the tracking process for the person is stopped, and the average image feature vector of the image area of the person in the first to the 10th video frames is calculated, and if the person is tracked When the chair appears from the second video frame to the fifth video frame, and the chair is not tracked in the sixth video frame, the chair tracking process is ended, and the average image feature vector corresponding to the chair in the first to fifth video frames is calculated.
  • the video frame can also be detected at the same time to detect targets other than the tracked target. For example, a new target is detected in the second video frame. For example, the kitten is tracked from the third video frame until the tracking ends, and the target feature vector of the kitten is obtained.
  • the image can be stored, as well as the detection frame (ie image area) and target feature vector of each target in the image.
  • the video and the target feature vector of each target included in the video can be stored, and it is easy to manage And retrieval, you can also store the key frame of the video and the image area of the target.
  • the target feature vector of each target is calculated and stored in advance instead of real-time calculation during retrieval, which can improve retrieval efficiency.
  • the target feature vector of each target may not be pre-stored in the retrieval database, but is calculated separately during retrieval, which is not limited in the embodiment of the present application.
  • each target in the search database in advance to obtain multiple clusters, and determine the central point and central target feature vector of each cluster, and then in subsequent search applications, It can be compared with the central target feature vector of each cluster class first, and then compared with the target feature vector of each target included in the determined cluster class to obtain the retrieved objects with high similarity, which can improve the retrieval efficiency.
  • a retrieval database including images and videos can be established, and any type of target can be Detection does not need to separately establish search databases for different specific categories. It has strong versatility and realizes universal object detection and recognition, making it more convenient to access different application scenarios, such as accessing security or other video streams, suitable for various scenarios.
  • FIG. 2 is a schematic diagram of the image detection and feature extraction process in the embodiment of this application.
  • the image includes a dog and a table
  • the detection frame of the puppy and the detection frame of the table are obtained through detection model detection, and then based on the feature extraction model, the detection frame of the puppy and the detection frame of the table are respectively characterized Extraction, the target feature vector 1 of the puppy and the target feature vector 2 of the table are obtained.
  • FIG. 3 is a schematic diagram of the video detection and feature extraction process in an embodiment of this application.
  • a video includes a puppy and a table, and each video frame of the video is detected.
  • the puppy and the table are tracked separately, for example, it may be small.
  • the dog has been moving and tracked to multiple video frames. Determine the image area of the dog in the detected and tracked video frames. Maybe the table has not moved. If it is not tracked in the subsequent video frames, it is determined that the table is detected.
  • the feature extraction model perform feature extraction on an image area corresponding to the table to obtain the target feature vector 2 of the table, and perform feature extraction on multiple image areas corresponding to the puppy to obtain small
  • the image feature vector corresponding to each image area of the dog is averaged to obtain the target feature vector 1 of the puppy.
  • the target detection and feature extraction are realized based on the above process, which can then be applied to the database construction process or retrieval process in the embodiment of this application.
  • the target feature vector 1 of the puppy can be combined with the Target feature vector 2, which is stored in association with the corresponding video or image, and the retrieval database is updated;
  • Images or videos with similarity greater than the threshold are sent to the terminal for display; in addition, for the retrieval process, there can be other implementations, for example, the image area of the puppy and the image area of the table are sent to the terminal and displayed to the user ,
  • the user selects the target he needs to retrieve, for example, the user selects a puppy, the server can only retrieve images or videos that have a similarity with the puppy's target feature vector 1 greater than the threshold, and return to the terminal for display.
  • object retrieval is realized through object detection and feature extraction, without limiting object categories, and can be applied to all objects, providing a general object retrieval method, which is more flexible and versatile in application, and improves efficiency.
  • the target retrieval method in the embodiment of the present application due to its universality, is suitable for various business scenarios, and can be easily applied to different business scenarios. Based on the foregoing embodiment, several specific application scenarios are used for description below.
  • an embodiment of the present application also provides a target retrieval device.
  • the target retrieval device may be, for example, the server in the foregoing embodiment.
  • the target retrieval device may be a hardware structure, a software module, or a hardware structure plus software. Module.
  • the target retrieval device in the embodiment of the present application specifically includes:
  • the obtaining module 40 is used to obtain the object to be retrieved
  • the first processing module 41 is configured to detect and obtain the image area corresponding to the target from the object to be retrieved. If the image area corresponding to the multiple targets is detected from the object to be retrieved, then the multiple objects corresponding to the The image area is displayed to the user through the terminal, and a selection instruction input by the user returned by the terminal is received, wherein the selection instruction includes at least the image area corresponding to the selected target;
  • the second processing module 42 is configured to obtain the target feature vector of the target according to the image area corresponding to the target in the selection instruction;
  • the retrieval module 43 is configured to perform a retrieval in a retrieval database according to the target feature vector of the target to obtain objects with a similarity of the target feature vector greater than a threshold, wherein the retrieval database includes at least a plurality of associated objects and objects.
  • the target feature vector of each target included.
  • the second processing module 42 is specifically configured to:
  • the image area corresponding to the target is used as an input parameter to perform feature extraction on the image area corresponding to the target to obtain an image of the target Feature vector, and use the image feature vector as the target feature vector of the target.
  • the second processing module 42 is specifically configured to:
  • If the type of the object to be retrieved is video, determine the position information of the image area corresponding to the target in the video frame of the object to be retrieved;
  • the target is tracked, and the position information of the target on each tracked video frame and the corresponding image area are determined.
  • the second processing module 42 is specifically configured to:
  • the retrieval is performed in the retrieval database, and when an object whose similarity of the target feature vector is greater than a threshold is obtained, the retrieval module 43 is specifically configured to:
  • the search database includes multiple clusters, Each cluster category includes target feature vectors of multiple targets, and the multiple cluster categories are obtained by clustering each target in the retrieval database based on a clustering algorithm;
  • the target whose feature vector similarity of the target is greater than the threshold is obtained.
  • the method further includes a establishing module 44 for:
  • the object sample set includes a plurality of object samples, and the type of the object samples is an image or a video;
  • the target feature vector of each target is stored in association with the corresponding target sample, and updated to the retrieval database.
  • FIG. 5 shows a schematic structural diagram of an electronic device in an embodiment of this application.
  • the embodiment of the present application provides an electronic device.
  • the electronic device may include a processor 510 (Center Processing Unit, CPU), a memory 520, an input device 530, an output device 540, etc.
  • the input device 530 may include a keyboard, a mouse, a touch screen, etc.
  • the output device 540 may include a display device, such as a liquid crystal display (LCD), a cathode ray tube (Cathode Ray Tube, CRT), and so on.
  • LCD liquid crystal display
  • CRT cathode Ray Tube
  • the memory 520 may include a read only memory (ROM) and a random access memory (RAM), and provides the processor 510 with program instructions and data stored in the memory 520.
  • the memory 520 may be used to store the program of any target retrieval method in the embodiment of the present application.
  • the processor 510 calls the program instructions stored in the memory 520, and the processor 510 is configured to execute any target retrieval method in the embodiments of the present application according to the obtained program instructions.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the target retrieval method in any of the foregoing method embodiments is implemented.
  • this application can be provided as a method, a system, or a computer program product. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种目标检索方法及装置,获取待检索对象;从所述待检索对象中检测获得目标对应的图像区域,若从所述待检索对象中检测获得多个目标对应的图像区域,则将所述多个目标对应的图像区域通过终端展示给用户,并接收所述终端返回的用户输入的选择指令;并根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量;根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,其中,所述检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量,这样,实现通用物体检索,提高了效率和通用性,适用各种检索应用场景。

Description

一种目标检索方法及装置
相关申请的交叉引用
本申请要求在2020年05月29日提交中国专利局、申请号为202010472146.1、申请名称为“一种目标检索方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种目标检索方法及装置。
背景技术
随着科技的发展,图像检索被应用到越来越多的场景中,用户有检索图像或某物体的需求,现有技术中,通常是针对不同类别的物体分别训练获得其分类模型,进而在检索时,基于分类模型识别物体类别,然后检索出与其同类别相似的物体,但是这种方式,需要预先针对不同类别的物体进行训练,因此训练的分类模型也仅适用于单一类别物体,例如,采用猫的图像进行训练的分类模型,并不适用于狗的分类和检索,通用性较差,若想检索其它类别物体,还需要重新训练,也降低了效率,增加了成本。
发明内容
本申请实施例提供一种目标检索方法及装置,以提高目标检索的效率和通用性。
本申请实施例提供的具体技术方案如下:
一种目标检索方法,包括:
获取待检索对象;
从所述待检索对象中检测获得目标对应的图像区域,若从所述待检索对象中检测获得多个目标对应的图像区域,则将所述多个目标对应的图像区域 通过终端展示给用户,并接收所述终端返回的用户输入的选择指令,其中,所述选择指令中至少包括选中的目标对应的图像区域;
根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量;
根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,其中,所述检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量。
可选的,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量,具体包括:
若所述待检索对象的类型为图像,则基于已训练的特征提取模型,以所述目标对应的图像区域为输入参数,对所述目标对应的图像区域进行特征提取,获得所述目标的图像特征向量,并将所述图像特征向量作为所述目标的目标特征向量。
可选的,从所述待检索对象中检测获得目标对应的图像区域,具体包括:
若所述待检索对象的类型为视频,则确定所述目标对应的图像区域在所述待检索对象的视频帧的位置信息;
并根据所述位置信息,对所述目标进行跟踪,确定所述目标在追踪到的各视频帧上的位置信息和对应的图像区域。
可选的,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量,具体包括:
基于已训练的特征提取模型,分别对所述目标在对应的各视频帧中的图像区域,进行特征提取,获得所述目标在对应的各视频帧中的图像特征向量;
根据获得的所述目标在对应的各视频帧中的图像特征向量,确定所述目标的平均图像特征向量,并将所述平均图像特征向量,作为所述目标的目标特征向量。
可选的,根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,具体包括:
将所述目标的目标特征向量,分别与所述检索数据库中各个簇类的中心目标特征向量进行比对,确定出相似度最高的簇类,其中,所述检索数据库中包括多个簇类,每个簇类中包括多个目标的目标特征向量,所述多个簇类是基于聚类算法将所述检索数据库中的各个目标进行聚类后获得的;
将所述目标的目标特征向量,与所述相似度最高的簇类中包括的各目标的目标特征向量进行比对,获得目标特征向量相似度大于阈值的目标;
根据目标与对象的关联关系,获得目标特征向量相似度大于阈值的对象。
可选的,所述检索数据库的获得方式为:
获取对象样本集,其中,所述对象样本集中包括多个对象样本,所述对象样本的类型为图像或视频;
分别从各个对象样本中检测获得各个目标对应的图像区域,并根据所述各个目标对应的图像区域,分别获得所述各个目标的目标特征向量;
将所述各个目标的目标特征向量,与对应的对象样本关联存储,并更新到检索数据库中。
一种目标检索装置,包括:
获取模块,用于获取待检索对象;
第一处理模块,用于从所述待检索对象中检测获得目标对应的图像区域,若从所述待检索对象中检测获得多个目标对应的图像区域,则将所述多个目标对应的图像区域通过终端展示给用户,并接收所述终端返回的用户输入的选择指令,其中,所述选择指令中至少包括选中的目标对应的图像区域;
第二处理模块,用于根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量;
检索模块,用于根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,其中,所述检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量。
可选的,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量时,第二处理模块具体用于:
若所述待检索对象的类型为图像,则基于已训练的特征提取模型,以所述目标对应的图像区域为输入参数,对所述目标对应的图像区域进行特征提取,获得所述目标的图像特征向量,并将所述图像特征向量作为所述目标的目标特征向量。
可选的,从所述待检索对象中检测获得目标对应的图像区域时,第二处理模块具体用于:
若所述待检索对象的类型为视频,则确定所述目标对应的图像区域在所述待检索对象的视频帧的位置信息;
并根据所述位置信息,对所述目标进行跟踪,确定所述目标在追踪到的各视频帧上的位置信息和对应的图像区域。
可选的,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量时,第二处理模块具体用于:
基于已训练的特征提取模型,分别对所述目标在对应的各视频帧中的图像区域,进行特征提取,获得所述目标在对应的各视频帧中的图像特征向量;
根据获得的所述目标在对应的各视频帧中的图像特征向量,确定所述目标的平均图像特征向量,并将所述平均图像特征向量,作为所述目标的目标特征向量。
可选的,根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象时,检索模块具体用于:
将所述目标的目标特征向量,分别与所述检索数据库中各个簇类的中心目标特征向量进行比对,确定出相似度最高的簇类,其中,所述检索数据库中包括多个簇类,每个簇类中包括多个目标的目标特征向量,所述多个簇类是基于聚类算法将所述检索数据库中的各个目标进行聚类后获得的;
将所述目标的目标特征向量,与所述相似度最高的簇类中包括的各目标的目标特征向量进行比对,获得目标特征向量相似度大于阈值的目标;
根据目标与对象的关联关系,获得目标特征向量相似度大于阈值的对象。
可选的,针对所述检索数据库的获得方式,还包括建立模块,用于:
获取对象样本集,其中,所述对象样本集中包括多个对象样本,所述对象样本的类型为图像或视频;
分别从各个对象样本中检测获得各个目标对应的图像区域,并根据所述各个目标对应的图像区域,分别获得所述各个目标的目标特征向量;
将所述各个目标的目标特征向量,与对应的对象样本关联存储,并更新到检索数据库中。
一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一种目标检索方法的步骤。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任一种目标检索方法的步骤。
本申请实施例中,获取待检索对象,从待检索对象中检测获得目标对应的图像区域,并获得目标的目标特征向量,进而根据目标的目标特征向量,在检索数据库中检索获得相似度大于阈值的对象,这样,通过目标检测和特征提取即可以实现检索,无需区分类别,不需要针对不同类别目标分别进行训练,提高了效率,并且适用于各种类别物体,具有通用性,可以满足各种检索应用场景,并且若从待检索对象中检测获得多个目标对应的图像区域时,还可以由用户进行选择,从而检索用户实际所需检索的对象,更加满足用户需求,提高了准确性。
附图说明
图1为本申请实施例中目标检索方法流程图;
图2为本申请实施例中针对图像的检测和特征提取过程示意图;
图3为本申请实施例中针对视频的检测和特征提取过程示意图;
图4为本申请实施例中目标检索装置结构示意;
图5为本申请实施例中电子设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,并不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,图像或视频检索被应用到越来越多的场景中,例如,拍摄一张照片识别出该照片中包含的物体,并检索出与该物体相似的图像,又例如,电商平台的商品搜索,拍摄一张商品图像后,搜索和该商品相似的商品图像。现有技术中的检索方法,主要是在检索时,基于分类模型识别物体类别,然后检索出与其同类别相似的物体,这样,用户给定一个包含特定物体的图像,可以检索出与其同类别的物体的图像,但是这种方式,需要预先针对不同类别的物体进行训练,因此训练的分类模型也仅适用于单一类别物体,例如,采用猫的图像进行训练的分类模型,并不适用于狗的分类和检索,通用性较差,若想检索其它类别物体,还需要重新训练,降低了效率,增加了成本。
因此,为解决上述问题,本申请实施例中提供了一种目标检索方法,检索数据库中存储有多个关联的对象和对象中包括的各个目标的目标特征向量,在检索时,从待检索对象中检测获得目标对应的图像区域,并且在确定检索出多个目标时,根据用户选择指令确定用户所选择的目标,更加满足用户检索需求,进而获得选择指令中目标的目标特征向量,根据目标的目标特征向量,在检索数据库中进行检索,获得目标特征相似度大于阈值的对象,这样,不需要针对不同类别的目标物体分别进行预先训练,在检索时不需要先确定其类别,实现了一种通用物体检索方式,可以满足各种检索应用场景,提高了通用性和灵活性,也提高了效率。
需要说明的是,本申请实施例中的目标检索方法主要应用于服务器端,例如用户通过终端输入包括某目标物体的图像,点击搜索,进而终端将该图像发送给服务器,服务器提取出该物体后,根据该物体的目标特征向量,在检索数据库中进行检索,匹配到相似度较高的图像,并返回给终端,终端展 示检索到的图像。
基于上述实施例,参阅图1所示,为本申请实施例中目标检索方法流程图,该方法包括:
步骤100:获取待检索对象。
其中,所述待检索对象的类型可以为图像或视频,本申请实施例中并不进行限制,即用户可以检索与图像或视频中的目标相似的图像或视频。
步骤110:从待检索对象中检测获得目标对应的图像区域,并根据目标对应的图像区域,获得目标的目标特征向量。
执行步骤110时,具体包括:
S1、从待检索对象中检测获得目标对应的图像区域。
具体地,针对待检索对象的不同类型,获得目标对应的图像区域可以分为以下两种情况:
第一种情况:若待检索对象的类型为图像,则具体包括:基于已训练的检测模型,以待检索对象为输入参数,获得检测到的目标的检测框,将目标的检测框作为该目标的图像区域。
例如,检测模型可以为采用区域-卷积神经网络(Region-Convolutional Neural Networks,R-CNN)、快速区域-卷积神经网络(Fast Region-Convolutional Neural Networks,Fast R-CNN)算法、多分类单杆检测器(Single Shot MultiBox Detector,SSD)网络、只看一次的第三版(You Only Look Once v3,yolov3)网络等,本申请实施例中并不进行限制。
其中,SSD网络通常使用视觉几何群网络(Visual Geometry Group Network,VGG-16)作为基础网络,主干网络为VGG-16,VGG-16包含了多个卷积层。yolov3使用了darknet-53的前面的52层(没有全连接层),yolov3是一个全卷积网络,使用大量残差的跳层连接,并且为了降低池化带来的梯度负面效果,用卷积层(conv)的跨度(stride)来实现降采样。
即针对一张图像,即可以检测获得目标的一个图像区域。
第二情况:若待检索对象的类型为视频,则具体包括:确定目标对应的 图像区域在待检索对象的视频帧的位置信息,并根据位置信息,对目标进行跟踪,确定目标在追踪到的各视频帧上的位置信息和对应的图像区域。
也就是说,针对视频,获得的目标的图像区域可能有多个,即通过检测模型和跟踪模型,确定目标出现的多个视频帧,并获得在检测和跟踪到的各视频帧中的图像区域。
本申请实施例中,针对视频,由于一个相同目标可能会出现在视频的多个视频帧中,例如,视频中小狗,从一个地方走到另一个地方,则会出现在视频中的多个视频帧中,为提高检索准确性,可以确定出该目标出现的所有视频帧,并且为了进一步提高效率,在某视频帧中检测到目标时,后续可以采用跟踪算法,确定在后续的各视频帧中是否跟踪到该目标,并确定出该目标在追踪到的各视频帧上的位置信息,获得对应的图像区域。
其中,跟踪算法可以采用基于开源计算机视觉库(Open Source Computer Vision Library,OpenCV)的跟踪算法,例如,核相关滤波算法(Kernel Correlation Filter,KCF)、深度分类(deepsort)算法等,本申请实施例中并不进行限制。
这样,通过检测和跟踪结合,确定出目标在视频中出现的各个视频帧上对应的图像区域,由于跟踪的难度更低,跟踪算法比检测算法速度更快,因此提高了计算效率和速度。
其中,追踪到的各视频帧可以为检测到包含目标的视频帧的从下一个视频帧开始的各连续视频帧,直到确定跟踪消失的视频帧,本申请实施例中并不进行限制。
当然,一个目标可能连续出现在多个视频帧中,中间消失一段时间,后续又继续出现,针对这种情况,本申请实施例中为保证准确性,确保是同一目标,因此可以只对连续出现的目标认为是同一目标,消失又检测出现后,认为是另一目标,重新计算其目标特征向量,例如,在视频的第一视频帧检测到某个人,则从第二视频帧开始采用跟踪算法,确定从第二视频帧开始连续的各视频帧是否跟踪到该人,若从第二视频帧到第10视频帧,确定跟踪到 了,而第11视频帧没有跟踪到该人,则可以结束跟踪算法,获取到该人对应的图像区域为,分别从第一视频帧到第10视频帧中的图像区域。另外,若在第20视频帧又检测到了该人,可以从第21视频帧开始又触发跟踪算法进行跟踪,直至跟踪结束,例如确定从21视频帧到25视频帧跟踪到该人,则再针对该人,认为是另一个目标,确定其对应的图像区域为,从第20视频帧到25视频帧上的图像区域。
S2、根据目标对应的图像区域,获得目标的目标特征向量。
本申请实施例中,基于待检索对象的不同类型,在获得目标特征向量时,具体提供了不同的实施方式:
第一种实施方式:若待检索对象的类型为图像,则根据目标对应的图像区域,获得目标的目标特征向量,具体包括:
基于已训练的特征提取模型,以目标对应的图像区域为输入参数,对目标对应的图像区域进行特征提取,获得目标的图像特征向量,并将图像特征向量作为目标的目标特征向量。
其中,特征提取模型可以采用残差网络(Residual Network,resnet)、VGG网络,本申请实施例中并不进行限制。
其中,VGG网络结构主要由卷积和全连接层组成。
例如,待检索对象为一张包括小狗的图像,通过检测模型,获得该小狗的检测框,将该小狗的检测框输入到特征提取模型,获得该小狗的目标特征向量。
第二种实施方式:若待检索对象的类型为视频,根据目标对应的图像区域,获得目标的目标特征向量,具体包括:
1)基于已训练的特征提取模型,分别对目标在对应的各视频帧中的图像区域,进行特征提取,获得目标在对应的各视频帧中的图像特征向量。
也就是说,本申请实施例中,针对视频,可以确定出目标所出现的所有视频帧,并确定出目标在出现的所有视频帧中的图像区域,进而可以采用特征提取模型,分别对各个图像区域进行特征提取,获得各个图像特征向量。
2)根据获得的目标在对应的各视频帧中的图像特征向量,确定目标的平均图像特征向量,并将平均图像特征向量,作为目标的目标特征向量。
例如,目标为小狗,小狗出现在视频中的第5视频帧到第15视频帧,分别对应的图像区域为图像区域1、图像区域2,…图像区域11,分别对图像区域1、图像区域2,…图像区域11进行特征提取后,确定对应的图像特征向量为图像特征向量a1、图像特征向量a2,…图像特征向量a15,将这11个图像特征向量求平均,即(a1+a2+…+a15)/11,获得的该平均图像特征向量,作为小狗的目标特征向量。
这样,针对同一个目标,采用出现的多个视频帧中的平均图像特征向量表示,可以提高准确性和性能,并且最终一个目标采用一个目标特征向量表示,而不是分别采用多个特征向量表示,也可以减少特征向量数量。
步骤120:根据目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,其中,检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量。
具体地,将目标的目标特征向量与检索数据库中包括的各个目标的目标特征向量进行比对,检索相似的图像。
也就是说,本申请实施例中可以将目标的目标特征向量直接与检索数据库中的各个目标的目标特征向量进行比对,确定出相似度大于阈值的对象。
另外,本申请实施例中还提供了一种可能的实施方式,执行步骤120时具体包括:
1)将目标的目标特征向量,分别与检索数据库中各个簇类的中心目标特征向量进行比对,确定出相似度最高的簇类。
其中,检索数据库中包括多个簇类,每个簇类中包括多个目标的目标特征向量,多个簇类是基于聚类算法将检索数据库中的各个目标进行聚类后获得的。
并且,中心目标特征向量是簇类的中心点的目标特征向量,在确定中心点时,可以是分别与簇类中其它目标的目标特征向量对比,特征相似度的平 均值最高的目标,本申请实施例中并进行限制。
2)将目标的目标特征向量,与相似度最高的簇类中包括的各目标的目标特征向量进行比对,获得目标特征向量相似度大于阈值的目标。
3)根据目标与对象的关联关系,获得目标特征向量相似度大于阈值的对象。
即本申请实施例中还可以预先对检索数据库中各个目标进行聚类,例如根据各个目标的目标特征向量进行聚类,这样,在比对时可以先找到相似度最高的簇类,然后再进行比对获得相似度大于阈值的对象,由于在比对簇类时,只需要与簇类的中心目标特征向量进行比对,因此可以提高效率,降低检索时间。
进而可以将检索结果发送给终端,终端可以展示检索到的对象,并且终端可以按照预设规则或方式,进行展示,例如,终端可以按照相似度大小,依次展示前预设数目个检索到的对象,又例如,终端可以按照相似度从高到低,依次展示各个检索到的对象。
本申请实施例中,针对图像,检索数据库中可以存储图像和图像中包括的各个目标的检测框,以及各个目标的目标特征向量,针对视频,检索数据库中可以存储视频、视频的关键帧、视频中包括的各个目标的图像区域和目标特征向量,当然并不进行限制,也可以根据需求在检索数据库中存储其它的信息。
这样,若检索到目标特征向量相似度大于阈值的目标所关联的图像时,向终端返回目标所关联的图像,若检索到目标特征向量相似度大于阈值的目标所关联的视频时,可以向终端返回该视频的关键帧,也可以返回整个视频,或者也可以确定该视频的链接地址,返回该视频的链接地址,以使用户通过终端点击该链接地址后,可以查看检索到的视频,本申请实施例中对此并不进行限制。
进一步地,对待检索对象进行目标检测时,有可能会检测出多个目标,例如一个图像中可能同时包括桌子和小狗,这时为提高检索准确性,本申请 实施例中可以通过用户选择来确定用户实际所需的检测结果,具体提供了一种可能的实施方式:从待检索对象中检测获得目标对应的图像区域之后,还包括:
S1、若从待检索对象中检测获得多个目标对应的图像区域,则将多个目标对应的图像区域通过终端展示给用户。
S2、接收终端返回的用户输入的选择指令,并执行根据选择指令中的目标对应的图像区域,获得目标的目标特征向量的步骤,其中,选择指令中至少包括选中的目标对应的图像区域。
例如,检测到图像中包括小狗和桌子,并分别确定小狗和桌子对应的图像区域,可以将小狗的图像区域和桌子的图像区域都发送给终端,终端展示小狗的图像区域和桌子的图像区域,用户可以从中选择自己实际想要检索的目标,例如选中了小狗的图像区域,当然用户也可以同时选中多个,终端将用户输入的选择指令发送给服务器,进而服务器可以仅检索与小狗相似的图像,这样,可以提高检索准确性。
当然,本申请实施例中,也可以针对检测到的多个目标,分别进行检索,每个目标都可以在检索数据库中找到相似的图像,例如,分别针对小狗和桌子进行检索,检索出与小狗相似的图像和与桌子相似的图像,再发送给终端,终端可以同时展示分别针对小狗和桌子检索到的图像。
本申请实施例中,获取待检索对象,可以从待检索对象中获得目标对应的图像区域,并确定出目标的目标特征向量,进而可以直接根据目标的目标特征向量,进行检索,获得目标特征向量相似度大于阈值的对象,这样,检索时不需要进行类别识别,可以通过通用目标物体检测和特征提取,即可以实现通用目标物体的检索,可以满足各种检索应用场景,不需要针对不同类别物体预先进行训练,通用性强,也可以提高效率,降低成本。
基于上述实施例,下面对本申请实施例中的检索数据库进行说明。本申请实施例中的检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量,进一步地,针对检索数据库的获得方式,本申请实施例 中提供了一种可能的实施方式:
S1、获取对象样本集,其中,对象样本集中包括多个对象样本,对象样本的类型为图像或视频。
其中,对象样本的获取方式,本申请实施例中并不进行限制,可以根据不同应用场景和业务需求来获得。
例如,对象样本可以从网络知识库中获取。
还例如,针对电商平台,则可以获取各个商家上传的商品图像,作为图像样本。
又例如,针对安防业务场景,可以接入各个安防设备的视频流,作为视频样本。
S2、分别从各个对象样本中检测获得各个目标对应的图像区域,并根据各个目标对应的图像区域,分别获得各个目标的目标特征向量。
具体地,根据对象样本的类型,可以分为两种情况:
第一种情况:若对象样本的类型为图像,根据检测模型对该对象样本进行检测,获得该对象样本中所有物体(目标)的检测框(即图像区域),然后分别将所有物体的图像区域,输入到特征提取模型,进行特征提取,获得所有物体的目标特征向量。
例如,针对一张图像进行检测,从该图像中检测获得人的检测框、树的检测框和小狗的检测框,并分别进行特征提取,获得人的目标特征向量,树的目标特征向量和小狗的目标特征向量。
第二种情况:若对象样本的类型为视频,对视频中每个视频帧进行目标检测和跟踪,针对任意一个目标,检测到包含该任意一个目标的视频帧时,即触发跟踪算法,根据该任意一个目标在该视频帧中的位置信息,对该任意一个目标进行跟踪,确定该任意一个目标在跟踪到的各个视频帧中的位置信息和图像区域,将该任意一个目标在各个视频帧中的图像区域,输入到特征提取模型,获得多个图像特征向量,并进行平均计算,确定该任意一个目标的平均图像特征,作为该任意一个目标的目标特征向量。
例如,对一个视频中每帧进行检测,检测到第一视频帧中包括人和椅子,则从第二视频帧开始分别对人和椅子进行跟踪,若跟踪到人出现在第二视频帧到第10视频帧,第11视频帧中未跟踪到该人,则针对该人的跟踪流程停止,计算该人分别在第一到第10视频帧中的图像区域的平均图像特征向量,并且若跟踪到椅子出现到第二视频帧到第5视频帧,到第6视频帧中未跟踪到该椅子,则结束椅子的跟踪流程,计算椅子在第一到第5视频帧对应的平均图像特征向量,另外,需要说明的是,在对视频帧中某目标进行跟踪时还可以同时对该视频帧进行检测,检测除跟踪到目标之外的其他目标,例如,第二视频帧中检测到新的目标,如小猫,则从第三视频帧开始对该小猫进行跟踪,直至跟踪结束,获得小猫的目标特征向量。
S3、将各个目标的目标特征向量,与对应的对象样本关联存储,并更新到检索数据库中。
例如,对象的类型为图像时,则可以存储图像,以及图像中的各个目标的检测框(即图像区域)和目标特征向量。
又例如,若对象的类型为视频,则可以存储视频,以及视频中包括的各个目标的目标特征向量(即检测和跟踪到的各个视频帧中图像区域的平均图像特征向量),并且为便于管理和检索,还可以关联存储视频的关键帧和目标的图像区域,这样,预先计算并存储各个目标的目标特征向量,而不需要在检索时实时计算,可以提高检索效率。
另外,需要说明的是,本申请实施例中,检索数据库中也可以不预先存储各个目标的目标特征向量,而在检索时再分别计算,对此本申请实施例中并不进行限制。
进一步地,建立检索数据库时,还可以预先对检索数据库中的各个目标进行聚类,获得多个簇类,并确定每个簇类的中心点和中心目标特征向量,进而在后续检索应用时,可以先与各个簇类的中心目标特征向量进行比对,再与确定出的簇类中包括的各个目标的目标特征向量进行比对,获得检索到的相似度高的对象,可以提高检索效率。
这样,本申请实施例中,通过对图像中各目标进行检测和特征提取,并且对视频的每个视频帧进行检测和跟踪,可以建立包括图像和视频的检索数据库,并且可以对任何类别目标进行检测,不需要分别针对不同特定类别分别建立检索数据库,通用性强,实现了通用物体检测和识别,更加方便接入不同的应用场景,例如接入安防或其它视频流等,适用各种场景。
下面采用具体应用场景,对上述实施例中目标检索方法的实现原理进行简单说明。
参阅图2所示,为本申请实施例中针对图像的检测和特征提取过程示意图。如图2所示,例如图像中包括狗和桌子,通过检测模型检测获得小狗的检测框和桌子的检测框,然后基于特征提取模型,分别对小狗的检测框和桌子的检测框进行特征提取,获得小狗的目标特征向量1和桌子的目标特征向量2。
参阅图3所示,为本申请实施例中针对视频的检测和特征提取过程示意图。如图3所示,例如,视频中包括小狗和桌子,对视频的每个视频帧进行检测,检测到包括小狗和桌子的视频帧时,分别对小狗和桌子进行跟踪,例如可能小狗一直在动,跟踪到多个视频帧,分别确定小狗在检测和跟踪到的视频帧中的图像区域,可能桌子一直未动,在后续视频帧中未跟踪到,则确定桌子在检测到的视频帧中图像区域,然后,基于特征提取模型,对桌子对应的一个图像区域进行特征提取,获得桌子的目标特征向量2,并分别对小狗对应的多个图像区域进行特征提取,获得小狗的各个图像区域分别对应的图像特征向量,求平均即获得小狗的目标特征向量1。
进而基于上述过程实现目标的检测和特征提取,之后可以应用于本申请实施例中的建库过程或检索过程,1)若是在建库过程,则可以将小狗的目标特征向量1和桌子的目标特征向量2,与对应的视频或图像进行关联存储,更新检索数据库;2)若在是检索过程,则可以从检索数据库中,分别检索与小狗的目标特征向量1和桌子的目标特征向量2相似度大于阈值的图像或视频,并发送给终端分别进行展示;另外针对检索过程,还可以有其它实施方式, 例如,将小狗的图像区域和桌子的图像区域发送给终端,展示给用户,用户选择自己所需检索的目标,例如用户选择了小狗,则服务器可以仅检索与小狗的目标特征向量1相似度大于阈值的图像或视频,并返回给终端进行展示。
这样,本申请实施例中通过物体检测和特征提取,实现物体检索,无需限定物体类别,可以适用于所有物体,提供了一种通用物体检索方法,应用更加灵活和通用,提高了效率。
本申请实施例中的目标检索方法,由于其具有通用性,适用于各种业务场景,可以很方便地应用到不同业务场景,基于上述实施例,下面采用几个具体应用场景进行说明。
1)根据安防设备采集到视频流建立检索数据库,例如室外路边的摄像头。某户人家丢了狗,则可以通过终端上传一张狗的照片,服务器从该张照片中检测到狗的图像区域,并获得狗的目标特征向量,然后从检索数据库中进行检索,就可以检索到这只狗。
2)又例如,一个人的行李箱丢了,在附近的摄像头有这个功能,基于附近摄像头建立了检索数据库,则通过终端上传这个行李箱的照片,不管行李箱是静止在某地还是被人拉着走,都有机会找到这个行李箱。
这样,本申请实施例中不需要为了实现这个功能,而专门针对狗或行李箱进行训练和设计,可以获取大量图像或视频建立检索数据库,并可以直接基于本申请实施例中的目标检索方法来实现。
基于同一发明构思,本申请实施例中还提供了一种目标检索装置,该目标检索装置例如可以是前述实施例中的服务器,该目标检索装置可以是硬件结构、软件模块、或硬件结构加软件模块。基于上述实施例,参阅图4所示,本申请实施例中目标检索装置,具体包括:
获取模块40,用于获取待检索对象;
第一处理模块41,用于从所述待检索对象中检测获得目标对应的图像区域,若从所述待检索对象中检测获得多个目标对应的图像区域,则将所述多个目标对应的图像区域通过终端展示给用户,并接收所述终端返回的用户输 入的选择指令,其中,所述选择指令中至少包括选中的目标对应的图像区域;
第二处理模块42,用于根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量;
检索模块43,用于根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,其中,所述检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量。
可选的,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量时,第二处理模块42具体用于:
若所述待检索对象的类型为图像,则基于已训练的特征提取模型,以所述目标对应的图像区域为输入参数,对所述目标对应的图像区域进行特征提取,获得所述目标的图像特征向量,并将所述图像特征向量作为所述目标的目标特征向量。
可选的,从所述待检索对象中检测获得目标对应的图像区域时,第二处理模块42具体用于:
若所述待检索对象的类型为视频,则确定所述目标对应的图像区域在所述待检索对象的视频帧的位置信息;
并根据所述位置信息,对所述目标进行跟踪,确定所述目标在追踪到的各视频帧上的位置信息和对应的图像区域。
可选的,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量时,第二处理模块42具体用于:
基于已训练的特征提取模型,分别对所述目标在对应的各视频帧中的图像区域,进行特征提取,获得所述目标在对应的各视频帧中的图像特征向量;
根据获得的所述目标在对应的各视频帧中的图像特征向量,确定所述目标的平均图像特征向量,并将所述平均图像特征向量,作为所述目标的目标特征向量。
可选的,根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象时,检索模块43具体用于:
将所述目标的目标特征向量,分别与所述检索数据库中各个簇类的中心目标特征向量进行比对,确定出相似度最高的簇类,其中,所述检索数据库中包括多个簇类,每个簇类中包括多个目标的目标特征向量,所述多个簇类是基于聚类算法将所述检索数据库中的各个目标进行聚类后获得的;
将所述目标的目标特征向量,与所述相似度最高的簇类中包括的各目标的目标特征向量进行比对,获得目标特征向量相似度大于阈值的目标;
根据目标与对象的关联关系,获得目标特征向量相似度大于阈值的对象。
可选的,针对所述检索数据库的获得方式,还包括建立模块44,用于:
获取对象样本集,其中,所述对象样本集中包括多个对象样本,所述对象样本的类型为图像或视频;
分别从各个对象样本中检测获得各个目标对应的图像区域,并根据所述各个目标对应的图像区域,分别获得所述各个目标的目标特征向量;
将所述各个目标的目标特征向量,与对应的对象样本关联存储,并更新到检索数据库中。
基于上述实施例,参阅图5所示为本申请实施例中电子设备的结构示意图。
本申请实施例提供了一种电子设备,该电子设备可以包括处理器510(Center Processing Unit,CPU)、存储器520、输入设备530和输出设备540等,输入设备530可以包括键盘、鼠标、触摸屏等,输出设备540可以包括显示设备,如液晶显示器(Liquid Crystal Display,LCD)、阴极射线管(Cathode Ray Tube,CRT)等。
存储器520可以包括只读存储器(ROM)和随机存取存储器(RAM),并向处理器510提供存储器520中存储的程序指令和数据。在本申请实施例中,存储器520可以用于存储本申请实施例中任一种目标检索方法的程序。
处理器510通过调用存储器520存储的程序指令,处理器510用于按照获得的程序指令执行本申请实施例中任一种目标检索方法。
基于上述实施例,本申请实施例中,提供了一种计算机可读存储介质, 其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意方法实施例中的目标检索方法。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不 脱离本申请实施例的精神和范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (14)

  1. 一种目标检索方法,其特征在于,包括:
    获取待检索对象;
    从所述待检索对象中检测获得目标对应的图像区域,若从所述待检索对象中检测获得多个目标对应的图像区域,则将所述多个目标对应的图像区域通过终端展示给用户,并接收所述终端返回的用户输入的选择指令,其中,所述选择指令中至少包括选中的目标对应的图像区域;
    根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量;
    根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,其中,所述检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量。
  2. 如权利要求1所述的方法,其特征在于,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量,具体包括:
    若所述待检索对象的类型为图像,则基于已训练的特征提取模型,以所述目标对应的图像区域为输入参数,对所述目标对应的图像区域进行特征提取,获得所述目标的图像特征向量,并将所述图像特征向量作为所述目标的目标特征向量。
  3. 如权利要求1所述的方法,其特征在于,从所述待检索对象中检测获得目标对应的图像区域,具体包括:
    若所述待检索对象的类型为视频,则确定所述目标对应的图像区域在所述待检索对象的视频帧的位置信息;
    并根据所述位置信息,对所述目标进行跟踪,确定所述目标在追踪到的各视频帧上的位置信息和对应的图像区域。
  4. 如权利要求3所述的方法,其特征在于,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量,具体包括:
    基于已训练的特征提取模型,分别对所述目标在对应的各视频帧中的图像区域,进行特征提取,获得所述目标在对应的各视频帧中的图像特征向量;
    根据获得的所述目标在对应的各视频帧中的图像特征向量,确定所述目标的平均图像特征向量,并将所述平均图像特征向量,作为所述目标的目标特征向量。
  5. 如权利要求1-4任一项所述的方法,其特征在于,根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,具体包括:
    将所述目标的目标特征向量,分别与所述检索数据库中各个簇类的中心目标特征向量进行比对,确定出相似度最高的簇类,其中,所述检索数据库中包括多个簇类,每个簇类中包括多个目标的目标特征向量,所述多个簇类是基于聚类算法将所述检索数据库中的各个目标进行聚类后获得的;
    将所述目标的目标特征向量,与所述相似度最高的簇类中包括的各目标的目标特征向量进行比对,获得目标特征向量相似度大于阈值的目标;
    根据目标与对象的关联关系,获得目标特征向量相似度大于阈值的对象。
  6. 如权利要求1所述的方法,其特征在于,所述检索数据库的获得方式为:
    获取对象样本集,其中,所述对象样本集中包括多个对象样本,所述对象样本的类型为图像或视频;
    分别从各个对象样本中检测获得各个目标对应的图像区域,并根据所述各个目标对应的图像区域,分别获得所述各个目标的目标特征向量;
    将所述各个目标的目标特征向量,与对应的对象样本关联存储,并更新到检索数据库中。
  7. 一种目标检索装置,其特征在于,包括:
    获取模块,用于获取待检索对象;
    第一处理模块,用于从所述待检索对象中检测获得目标对应的图像区域,若从所述待检索对象中检测获得多个目标对应的图像区域,则将所述多个目 标对应的图像区域通过终端展示给用户,并接收所述终端返回的用户输入的选择指令,其中,所述选择指令中至少包括选中的目标对应的图像区域;
    第二处理模块,用于根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量;
    检索模块,用于根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象,其中,所述检索数据库中至少包括多个关联的对象和对象中包括的各个目标的目标特征向量。
  8. 如权利要求7所述的装置,其特征在于,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量时,第二处理模块具体用于:
    若所述待检索对象的类型为图像,则基于已训练的特征提取模型,以所述目标对应的图像区域为输入参数,对所述目标对应的图像区域进行特征提取,获得所述目标的图像特征向量,并将所述图像特征向量作为所述目标的目标特征向量。
  9. 如权利要求7所述的装置,其特征在于,从所述待检索对象中检测获得目标对应的图像区域时,第二处理模块具体用于:
    若所述待检索对象的类型为视频,则确定所述目标对应的图像区域在所述待检索对象的视频帧的位置信息;
    并根据所述位置信息,对所述目标进行跟踪,确定所述目标在追踪到的各视频帧上的位置信息和对应的图像区域。
  10. 如权利要求9所述的装置,其特征在于,根据所述选择指令中的目标对应的图像区域,获得所述目标的目标特征向量时,第二处理模块具体用于:
    基于已训练的特征提取模型,分别对所述目标在对应的各视频帧中的图像区域,进行特征提取,获得所述目标在对应的各视频帧中的图像特征向量;
    根据获得的所述目标在对应的各视频帧中的图像特征向量,确定所述目标的平均图像特征向量,并将所述平均图像特征向量,作为所述目标的目标特征向量。
  11. 如权利要求7-10任一项所述的装置,其特征在于,根据所述目标的目标特征向量,在检索数据库中进行检索,获得目标特征向量相似度大于阈值的对象时,检索模块具体用于:
    将所述目标的目标特征向量,分别与所述检索数据库中各个簇类的中心目标特征向量进行比对,确定出相似度最高的簇类,其中,所述检索数据库中包括多个簇类,每个簇类中包括多个目标的目标特征向量,所述多个簇类是基于聚类算法将所述检索数据库中的各个目标进行聚类后获得的;
    将所述目标的目标特征向量,与所述相似度最高的簇类中包括的各目标的目标特征向量进行比对,获得目标特征向量相似度大于阈值的目标;
    根据目标与对象的关联关系,获得目标特征向量相似度大于阈值的对象。
  12. 如权利要求7所述的装置,其特征在于,针对所述检索数据库的获得方式,还包括建立模块,用于:
    获取对象样本集,其中,所述对象样本集中包括多个对象样本,所述对象样本的类型为图像或视频;
    分别从各个对象样本中检测获得各个目标对应的图像区域,并根据所述各个目标对应的图像区域,分别获得所述各个目标的目标特征向量;
    将所述各个目标的目标特征向量,与对应的对象样本关联存储,并更新到检索数据库中。
  13. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1-6任一项所述方法的步骤。
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于:所述计算机程序被处理器执行时实现权利要求1-6任一项所述方法的步骤。
PCT/CN2020/112221 2020-05-29 2020-08-28 一种目标检索方法及装置 WO2021237967A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010472146.1A CN111581423B (zh) 2020-05-29 2020-05-29 一种目标检索方法及装置
CN202010472146.1 2020-05-29

Publications (1)

Publication Number Publication Date
WO2021237967A1 true WO2021237967A1 (zh) 2021-12-02

Family

ID=72111215

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112221 WO2021237967A1 (zh) 2020-05-29 2020-08-28 一种目标检索方法及装置

Country Status (2)

Country Link
CN (1) CN111581423B (zh)
WO (1) WO2021237967A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529751A (zh) * 2021-12-28 2022-05-24 国网四川省电力公司眉山供电公司 一种电力场景智能识别样本数据的自动筛选方法
CN116401392A (zh) * 2022-12-30 2023-07-07 以萨技术股份有限公司 一种图像检索的方法、电子设备及存储介质
CN117194698A (zh) * 2023-11-07 2023-12-08 清华大学 一种基于oar语义知识库的任务处理系统和方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581423B (zh) * 2020-05-29 2021-10-26 上海依图网络科技有限公司 一种目标检索方法及装置
CN113704534A (zh) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 图像处理方法、装置及计算机设备
CN113239217B (zh) * 2021-06-04 2024-02-06 图灵深视(南京)科技有限公司 图像索引库构建方法及系统,图像检索方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069648A (zh) * 2017-09-25 2019-07-30 杭州海康威视数字技术股份有限公司 一种图像检索方法及装置
CN110188719A (zh) * 2019-06-04 2019-08-30 北京字节跳动网络技术有限公司 目标跟踪方法和装置
CN110209866A (zh) * 2019-05-30 2019-09-06 苏州浪潮智能科技有限公司 一种图像检索方法、装置、设备及计算机可读存储介质
CN110245714A (zh) * 2019-06-20 2019-09-17 厦门美图之家科技有限公司 图像识别方法、装置及电子设备
WO2020051704A1 (en) * 2018-09-12 2020-03-19 Avigilon Corporation System and method for improving speed of similarity based searches
CN111581423A (zh) * 2020-05-29 2020-08-25 上海依图网络科技有限公司 一种目标检索方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033308A (zh) * 2018-07-16 2018-12-18 安徽江淮汽车集团股份有限公司 一种图像检索方法及装置
CN110297935A (zh) * 2019-06-28 2019-10-01 京东数字科技控股有限公司 图像检索方法、装置、介质及电子设备
CN111143597B (zh) * 2019-12-13 2023-06-20 浙江大华技术股份有限公司 图像检索方法、终端及存储装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069648A (zh) * 2017-09-25 2019-07-30 杭州海康威视数字技术股份有限公司 一种图像检索方法及装置
WO2020051704A1 (en) * 2018-09-12 2020-03-19 Avigilon Corporation System and method for improving speed of similarity based searches
CN110209866A (zh) * 2019-05-30 2019-09-06 苏州浪潮智能科技有限公司 一种图像检索方法、装置、设备及计算机可读存储介质
CN110188719A (zh) * 2019-06-04 2019-08-30 北京字节跳动网络技术有限公司 目标跟踪方法和装置
CN110245714A (zh) * 2019-06-20 2019-09-17 厦门美图之家科技有限公司 图像识别方法、装置及电子设备
CN111581423A (zh) * 2020-05-29 2020-08-25 上海依图网络科技有限公司 一种目标检索方法及装置

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529751A (zh) * 2021-12-28 2022-05-24 国网四川省电力公司眉山供电公司 一种电力场景智能识别样本数据的自动筛选方法
CN116401392A (zh) * 2022-12-30 2023-07-07 以萨技术股份有限公司 一种图像检索的方法、电子设备及存储介质
CN116401392B (zh) * 2022-12-30 2023-10-27 以萨技术股份有限公司 一种图像检索的方法、电子设备及存储介质
CN117194698A (zh) * 2023-11-07 2023-12-08 清华大学 一种基于oar语义知识库的任务处理系统和方法
CN117194698B (zh) * 2023-11-07 2024-02-06 清华大学 一种基于oar语义知识库的任务处理系统和方法

Also Published As

Publication number Publication date
CN111581423A (zh) 2020-08-25
CN111581423B (zh) 2021-10-26

Similar Documents

Publication Publication Date Title
WO2021237967A1 (zh) 一种目标检索方法及装置
US10872424B2 (en) Object tracking using object attributes
US10032072B1 (en) Text recognition and localization with deep learning
US9922271B2 (en) Object detection and classification
US9760792B2 (en) Object detection and classification
US10831814B2 (en) System and method for linking multimedia data elements to web pages
Feng et al. Attention-driven salient edge (s) and region (s) extraction with application to CBIR
US11036790B1 (en) Identifying visual portions of visual media files responsive to visual portions of media files submitted as search queries
US9886762B2 (en) Method for retrieving image and electronic device thereof
JP2000123184A (ja) 動画内のイベントを検出する方法
CN114241548A (zh) 一种基于改进YOLOv5的小目标检测算法
US20170352162A1 (en) Region-of-interest extraction device and region-of-interest extraction method
Rashmi et al. Video shot boundary detection using block based cumulative approach
CN113766330A (zh) 基于视频生成推荐信息的方法和装置
TW201426353A (zh) 互動式關聯物件檢索方法與系統
US20230060211A1 (en) System and Method for Tracking Moving Objects by Video Data
CN111539257B (zh) 人员重识别方法、装置和存储介质
CN115115825B (zh) 图像中的对象检测方法、装置、计算机设备和存储介质
US20170357853A1 (en) Large Scale Video Search Using Queries that Define Relationships Between Objects
US20220300774A1 (en) Methods, apparatuses, devices and storage media for detecting correlated objects involved in image
Obeso et al. Comparative study of visual saliency maps in the problem of classification of architectural images with Deep CNNs
Krishna et al. Hybrid method for moving object exploration in video surveillance
KR20160012901A (ko) 이미지를 검색하는 방법 및 그 전자 장치
CN111382628B (zh) 同行判定方法及装置
Wang et al. Smoke recognition network based on dynamic characteristics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937684

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20937684

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20937684

Country of ref document: EP

Kind code of ref document: A1