WO2023000764A1 - 目标检索方法、装置、设备及存储介质 - Google Patents

目标检索方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023000764A1
WO2023000764A1 PCT/CN2022/091495 CN2022091495W WO2023000764A1 WO 2023000764 A1 WO2023000764 A1 WO 2023000764A1 CN 2022091495 W CN2022091495 W CN 2022091495W WO 2023000764 A1 WO2023000764 A1 WO 2023000764A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
retrieved
detection
model
detection frame
Prior art date
Application number
PCT/CN2022/091495
Other languages
English (en)
French (fr)
Inventor
邱熙
Original Assignee
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京迈格威科技有限公司 filed Critical 北京迈格威科技有限公司
Publication of WO2023000764A1 publication Critical patent/WO2023000764A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • the present application relates to the technical field of image processing, in particular to an object retrieval method, device, equipment and storage medium.
  • image object retrieval technology The core task of image object retrieval technology is to find the object of interest in the image, which is one of the important research topics in the field of computer vision.
  • object retrieval technology has made great progress.
  • the goal of image object retrieval is to retrieve those pictures containing the target from the image library, and these pictures containing the target should be ranked in front of the retrieval results as much as possible after sorting by similarity measure.
  • Similar object retrieval is generally called object retrieval (Object Retrieval) in English literature, similar sample search or detection (Duplicate Search or Detection) can also be classified as the same object retrieval, and the same object retrieval method can be directly applied to approximate sample search or detect on.
  • Same object retrieval is of great value both in research and in the commercial image search industry, such as searching for clothes and shoes in shopping applications, face retrieval, etc.
  • the current image target retrieval schemes are based on a large amount of data to train the corresponding deep learning models. These models will learn the feature representations of these categories from a large amount of data, such as face features and human body features, and then different pictures will be extracted during the test phase.
  • the similarity measure of the features is used to judge the similarity of two pictures. It has a good application effect on tasks such as human faces and human bodies, and it has a wide range of implementations. In addition, there are also many applications for retrieval models of other objects such as cars, clothes, and shoes.
  • the embodiments of the present application are proposed to provide an object retrieval method, device, device, and storage medium that overcome the above problems or at least partially solve the above problems.
  • a target retrieval method including:
  • the detection model corresponding to the retrieval target is a model after updating the parameters of the pre-trained detection model based on the template image of the target to be retrieved;
  • the at least one detection frame is filtered through a post-processing model to obtain a detection frame corresponding to the target to be retrieved.
  • a target retrieval device including:
  • the target detection module is used to perform target detection on the image to be retrieved through the detection model corresponding to the target to be retrieved, and obtain at least one detection frame with a confidence degree greater than or equal to a confidence threshold, wherein the confidence threshold is lower than that of a conventional detection network.
  • the detection model corresponding to the target to be retrieved is a model based on the template image of the target to be retrieved to update the parameters of the pre-trained detection model;
  • a detection frame filtering module configured to filter the at least one detection frame through a post-processing model to obtain a detection frame corresponding to the target to be retrieved.
  • a computing processing device including:
  • One or more processors when the computer readable code is executed by the one or more processors, the computing processing device executes the object retrieval method as described in the first aspect.
  • a computer program including computer readable codes, when the computer readable codes run on a computing processing device, cause the computing processing device to execute the computer program according to the first aspect.
  • a computer-readable storage medium in which the computer program as described in the fourth aspect is stored.
  • the object retrieval method, device, equipment, and storage medium use the detection model corresponding to the object to be retrieved to perform object detection on the image to be retrieved, and obtain at least one detection frame with a confidence degree greater than or equal to the confidence degree threshold, through post-processing
  • the model filters at least one detection frame to obtain the detection frame corresponding to the target to be retrieved.
  • the detection model corresponding to the target to be retrieved is a model obtained by updating the parameters of the pre-trained detection model based on the template image of the target to be retrieved, you can use A small number of template images update the pre-trained detection model, and the confidence threshold is low, which can ensure that sufficient detection frames are recalled, and after filtering through the post-processing model, the detection frame corresponding to the target to be retrieved is obtained, which realizes the detection Retrieval of small sample targets.
  • Fig. 1 is a flow chart of the steps of a target retrieval method provided in the embodiment of the present application
  • FIG. 2 is a schematic diagram of a target retrieval system composed of a detection model and a post-processing model in the embodiment of the present application;
  • Fig. 3 is a structural block diagram of a target retrieval device provided by an embodiment of the present application.
  • Figure 4 schematically illustrates a block diagram of a computing processing device for performing the method according to the present application.
  • Fig. 5 schematically shows a storage unit for holding or carrying program codes for realizing the method according to the present application.
  • Fig. 1 is a flow chart of the steps of a kind of target retrieval method that the embodiment of the application provides, as shown in Fig. 1, this method can comprise:
  • Step 101 using the detection model corresponding to the target to be retrieved to perform target detection on the image to be retrieved, to obtain at least one detection frame with a confidence greater than or equal to a confidence threshold, wherein the confidence threshold is lower than the threshold used by a conventional detection network,
  • the detection model corresponding to the target to be retrieved is a model obtained by updating the parameters of the pre-trained detection model based on the template image of the target to be retrieved.
  • the type of the target to be retrieved is related to the data set used when pre-training the detection model.
  • the data set used by the pre-trained detection model includes the target to be retrieved, and may also include other targets. That is to say, if the pre-trained detection model is a general-purpose target detection model, the target to be retrieved can be a person or object with distinctive features, such as Zhang San, school bag, hat, red hat, etc., if the detection model is a specific type of target In the detection model, the target to be retrieved can only be a certain type of target. For example, if the data set used for pre-training the detection model is a type A data set, it will be more accurate when the target to be retrieved is closer to type A.
  • type A is a vehicle
  • the target to be retrieved is a human body.
  • the retrieval results may not be accurate when searching for human bodies; if type A is a vehicle (including a, b, c, d, e, etc. brands), the target to be retrieved is a vehicle, The brand is w (the brand not included in the data set used for pre-training).
  • the retrieval result After updating the parameters of the pre-trained detection model based on the template image containing the brand w vehicle, when retrieving the brand w vehicle, since the brand w vehicle is a vehicle
  • the type is the same as the type of the pre-training data set, and the retrieval result will be more accurate at this time.
  • the template image of the target to be retrieved is an image including the target to be retrieved, and the number of template images of the target to be retrieved used when updating the parameters of the pre-trained detection model may be 5-10.
  • the role of the template image on the one hand, let the detection model "know" the target to be retrieved and optimize it for the target to be retrieved; on the other hand, let the post-processing model extract and save the features of the target to be retrieved. Only later can the detection model corresponding to the target to be retrieved be used to detect the target to be retrieved from the image to be retrieved and the post-processing model to identify the target.
  • the template image of the target to be retrieved is often provided by the user, and the algorithm manufacturer directly provides the pre-trained detection model to the user, and the user uses the template image of the target to be retrieved to update the parameters of the pre-trained detection model to obtain the detection model corresponding to the target to be retrieved , so that algorithm vendors do not touch sensitive data, which improves the protection of customer data.
  • the detection model corresponding to the target to be retrieved performs target detection on the image to be retrieved, and multiple detection frames will be obtained, and each detection frame corresponds to its position and confidence in the image to be retrieved (indicating that it believes that the object in the frame is the target to be retrieved probability), the detection frame given by the detection model corresponding to the target to be retrieved indicates that it believes that the local image framed by the detection frame may be the target to be retrieved.
  • the detection model corresponding to the target to be retrieved is obtained.
  • the confidence threshold is set lower than the conventional The threshold used by the detection network enables the detection model corresponding to the target to be retrieved to recall more detection frames to ensure a high recall rate. Since the confidence threshold is low, at least one of the recalled detection frames contains the detection frame to be retrieved Some targets may not contain the target to be retrieved, and at least one recalled detection frame can be filtered through the post-processing model to obtain a more accurate detection frame corresponding to the target to be retrieved.
  • the confidence threshold is generally 10% to 20% lower than the threshold used by the conventional detection network.
  • the confidence threshold can be set to a value between 0.4 and 0.6, and the threshold used by the conventional detection network is generally 0.5 to 0.6. 0.8.
  • Step 102 Filter the at least one detection frame through a post-processing model to obtain a detection frame corresponding to the object to be retrieved.
  • the detection frame Due to the fact that there may be false positives in the detection frame obtained by detecting the target of the image to be retrieved through the detection model corresponding to the target to be retrieved (the partial image framed by the detection frame is not the target to be retrieved), that is, some detection frames exist
  • the target to be retrieved, and some detection frames may not have the target to be retrieved, so at least one detection frame needs to be filtered through the post-processing model to filter out the detection frame without the target to be retrieved and retain the target to be retrieved
  • the detection frame that is, the detection frame corresponding to the target to be retrieved is obtained.
  • the image to be retrieved does not substantially include the target to be retrieved, after detection through the detection model corresponding to the target to be retrieved, at least one detection frame may be obtained, or no detection frame may be obtained, but after filtering through the post-processing model, Theoretically, no detection frame should remain.
  • the target detection is performed on the image to be retrieved through the detection model corresponding to the target to be retrieved, and at least one detection frame with a confidence degree greater than or equal to the confidence threshold is obtained, and at least one detection frame is filtered through the post-processing model , to get the detection frame corresponding to the target to be retrieved, since the detection model corresponding to the target to be retrieved is a model obtained by updating the parameters of the pre-trained detection model based on the template image of the target to be retrieved, a small number of template images can be used to update the pre-trained The detection model is updated, and the confidence threshold is low, which can ensure that enough detection frames are recalled, and after filtering through the post-processing model, the detection frames corresponding to the target to be retrieved are obtained, realizing the retrieval of small sample targets.
  • At least one detection frame is filtered through a post-processing model to obtain a detection frame corresponding to the target to be retrieved, including:
  • the vector is obtained by performing feature extraction on the region where the target to be retrieved is located in the template image of the target to be retrieved through the post-processing model.
  • the post-processing model is a general model for extracting image features, which does not need to update network parameters according to the template image of the target to be retrieved, and can extract features for any target to be retrieved.
  • the network parameter update refers to updating the network parameters of the pre-trained model for the target to be retrieved, so that it can better match the target to be retrieved and improve the processing effect.
  • the template image of the target to be retrieved is used to update the detection model on the one hand, and to generate the template feature vector on the other hand.
  • the feature extraction is performed on the template image of the target to be retrieved, and the template feature vector is obtained.
  • feature extraction is performed on the local image in the image to be retrieved corresponding to each detection frame through the post-processing model, or each detection frame is separately extracted through the post-processing model
  • the corresponding feature map is subjected to feature extraction to obtain the feature vector corresponding to each detection frame, that is, to obtain the feature vector to be retrieved corresponding to each detection frame.
  • the feature map corresponding to the detection frame is a local feature map framed by the detection frame in the feature map of the image to be retrieved, and the feature map of the image to be retrieved can be extracted by the detection model during the process of object detection on the image to be retrieved.
  • the feature vector to be retrieved corresponding to each detection frame is matched with the template feature vector, and the similarity between the feature vector to be retrieved and the template feature vector can be used as the matching result, so as to filter at least one detection frame according to the matching result, such as Filter the detection frame whose similarity is greater than or equal to the similarity threshold as the detection frame of the target to be retrieved.
  • the specific target to be retrieved corresponding to the final detection frame can be determined by the target corresponding to the template feature vector, for example, the detection corresponding to the target to be retrieved
  • the model can detect 5 targets: A, B, C, D, E. One of the targets is included in an image to be retrieved.
  • the extracted partial images are respectively extracted through the post-processing model to be retrieved feature vectors, and after matching with the template feature vectors, if the similarity between one of the detection frames and the target B is greater than the similarity threshold, the local image framed by the detection frame can be determined
  • the target to be retrieved is target B.
  • the process of matching the feature vector to be retrieved with the template feature vector can be to calculate the distance between the feature vector to be retrieved and the template feature vector, and determine the similarity between the feature vector to be retrieved and the template feature vector based on the distance; or, it can also be directly calculated The similarity between the feature vector to be retrieved and the template feature vector.
  • the above operations of matching the feature vector to be retrieved corresponding to at least one detection frame with the template feature vector and/or filtering the at least one detection frame according to the matching result may be directly performed by the post-processing model, or may not be performed by the post-processing model, Instead, the feature vector to be retrieved outputted by the post-processing model is obtained to filter the detection frame.
  • a relatively accurate detection frame corresponding to the target to be retrieved can be obtained, thereby improving the accuracy of target retrieval.
  • feature extraction is performed on the at least one detection frame through a post-processing model, and a feature vector corresponding to each detection frame is obtained as a feature vector to be retrieved, including:
  • the global feature extraction is performed on the feature map corresponding to each detection frame, and the global feature vector corresponding to the detection frame is obtained;
  • the second branch of the metric learning module in the post-processing model the The feature map corresponding to each detection frame is subjected to local feature extraction to obtain a local feature vector corresponding to the detection frame;
  • the first branch and the second branch are Siamese networks with different parameters;
  • the feature vector to be retrieved corresponding to each detection frame is determined.
  • the post-processing model includes a first backbone network and a metric learning module
  • the first backbone network is a feature extractor (Feature Extractor)
  • its input can be a local image framed by a detection frame in the image to be retrieved, that is, an RGB image, or It can be the feature map corresponding to the detection frame
  • the output is the corresponding deep feature. It is usually configured as a common convolutional neural network for extracting high-dimensional features.
  • the metric learning module is mainly used to learn a more discriminative feature, so that when the template image of the target to be retrieved is given, the detection frame can be filtered by extracting the feature, that is, the detection frame can be classified based on the similarity between the extracted feature and the template feature vector, so as to obtain The detection frame of the target to be retrieved.
  • the classification of the detection frame is based on the template feature vector. For example, there are 5 template feature vectors. By performing target detection on the image to be retrieved, 4 detection frames are obtained. For each detection frame, 5 templates are obtained.
  • the similarity corresponding to the feature vector that is, 5 similarities are obtained, and the category of the detection frame can be determined according to these 5 similarities, that is, it can be determined that the detection frame belongs to one of the target categories corresponding to the 5 template feature vectors, or 5 targets None of the categories belong to. If a similarity is greater than the similarity threshold, the category of the detection frame is determined to be the target category to which the template feature vector corresponding to the similarity belongs.
  • each detection frame is input into the post-processing model, and the first backbone network in the post-processing model first performs feature extraction on the input detection frame , to obtain the feature map corresponding to the detection frame, the feature map output by the first backbone network is input into the metric learning module, the metric learning module includes the first branch and the second branch, the first branch and the second branch are twin networks with different parameters, respectively.
  • the feature map is processed, the first branch extracts the global features of the feature map, and obtains the global feature vector corresponding to the detection frame, the second branch extracts the local features of the feature map, obtains the local feature vector corresponding to the detection frame, and synthesizes the detection frame
  • the corresponding global feature vectors and local feature vectors are used to obtain the feature vectors to be retrieved corresponding to each detection frame.
  • first backbone network and the metric learning module Through the first backbone network and the metric learning module, feature vectors sufficient to distinguish different retrieval targets can be extracted to improve the accuracy of target retrieval, and the features of different perspectives can be extracted through the first branch and the second branch of the metric learning module. It can achieve better accuracy than the general network and improve the accuracy of detection box filtering.
  • Post-processing models are also pre-trained.
  • the two modes of classification learning and sample pair learning can be combined for learning, that is, during the training process, the target loss function includes classification loss and sample pair loss (such as Triplet loss), which can further improve The effect of processing the image to be retrieved.
  • sample pair learning refers to forming sample pairs from the same type of target samples to be retrieved and different types of target samples to be retrieved.
  • the post-processing model can be used to extract features from the template image of the target to be retrieved and save the template feature vector. This process can be completed quickly and takes about 3 seconds to complete.
  • the parameters of the post-processing model can also be updated based on the target to be retrieved.
  • practice has shown that the processing effect of the post-processing model without updating is close to the accuracy of the update, and only the template needs to be extracted when it is not updated. The characteristics of the image can be saved, and the speed will be faster than updating.
  • the pre-training The detection model includes a second backbone network and at least one online update network.
  • the second backbone network is the feature extractor, whose input is an RGB image and the output is the corresponding depth feature, which is usually configured as a common convolutional neural network for extracting high-dimensional features.
  • Each of the at least one online update network is directly or indirectly connected to the second backbone network, and if there are multiple online update networks, each is connected to the second backbone network.
  • the template image of the object to be retrieved may be a partial image including only the area where the object to be retrieved is located, or a panoramic image including both the area where the object to be retrieved is located and other areas.
  • Each online update network included in the detection model can use different template images of the target to be retrieved for online update, so that different online update networks can be used to detect different targets to be retrieved.
  • each online update network can detect a certain number (for example, 3-5) different objects to be retrieved.
  • the detection model corresponding to the target to be retrieved Before using the detection model corresponding to the target to be retrieved to perform target detection on the image to be retrieved, it is necessary to update the parameters of the pre-trained detection model to obtain the detection model corresponding to the target to be retrieved, so that the detection model can accurately detect the target from the image to be retrieved. out the target to be searched.
  • the pre-trained detection model When updating the parameters of the pre-trained detection model, first obtain the template image of the target to be retrieved and/or the label information corresponding to the template image, input the template image of the target to be retrieved into the pre-trained detection model, and process it through the detection model , to obtain at least one detection frame in the template image, based on the obtained at least one detection frame, and the template image and/or the label information corresponding to the template image, update the parameters of the online update network in the pre-trained detection model, and obtain the to-be-retrieved The detection model corresponding to the target.
  • the template image of the target to be retrieved is a partial image that only includes the region where the target to be retrieved is located, there is no need to acquire the label information corresponding to the template image.
  • the template image of the target to be retrieved is a panorama that includes both the area where the target to be retrieved is located and other areas, when obtaining the template image of the target to be retrieved, it is necessary to simultaneously obtain the labeling information corresponding to the template image, and the labeling information is used to indicate the template image The position of the object to be retrieved in the image.
  • the template image of the target to be retrieved By using the template image of the target to be retrieved to update the parameters of the online update network in the pre-trained detection model, for each target to be retrieved, a small number (usually 5-10) of the template images of the target to be retrieved can be used to update the pre-trained
  • the parameters of the trained detection model are updated.
  • the updated model is more suitable for the target to be retrieved, and only needs to update the parameters of the online update network, which can quickly complete the parameter update of the detection model.
  • the time required for the update process is about 1 minute. Inside.
  • the parameters of the online update network of the pre-trained detection model are updated to obtain the corresponding detection models, including:
  • the step of determining the predicted detection frame through the second backbone network and the online update network in the pre-trained detection model, determine the predicted detection frame corresponding to the template image;
  • Network update step determining a loss value according to the label information corresponding to the predicted detection frame and the template image; according to the loss value, updating the network parameters of the online update network to obtain an updated pre-trained detection model;
  • the updated pre-trained detection model is used as the detection model corresponding to the target to be retrieved.
  • the online update network is the last P layer of the detection model corresponding to the object to be retrieved, and P is less than or equal to 5, for example, P can be 3, etc.
  • the step of determining the predicted detection frame is used to determine the predicted detection frame in the template image, that is, input the template image into the pre-trained detection model, and the second backbone network and online update network in the pre-trained detection model process the template image sequentially, that is, the first
  • the two-backbone network first extracts the high-dimensional features in the template image, and then the online update network processes the high-dimensional features to obtain the predicted detection frame corresponding to the template image.
  • the network update step determines the loss value based on the determined predicted detection frame and the label information corresponding to the template image, and performs backpropagation based on the loss value to update the network parameters of the online update network, and the network parameters of the second backbone network do not need to be updated , to get the updated pre-trained detection model. Iteratively execute the step of determining the prediction detection frame and the step of updating the network until the end condition is reached (for example, the loss value converges, the loss value is less than the loss value threshold, or the number of repeated executions reaches the threshold of the number of repeated executions), and the training ends, and the updated pre-trained detection
  • the model serves as the detection model corresponding to the target to be retrieved.
  • the loss value may include localization loss and classification loss.
  • the positioning loss is calculated according to the difference between the position of the predicted detection frame and the label information in the template image; when the template image only includes the area to be retrieved
  • the partial map of the target area since the template image has no label information, the partial map can be transformed, and operations such as padding can be performed around the partial map to obtain a large image including the template image, and The position of the template image in the large image is used as the annotation information, so that the localization loss can be calculated based on the position of the predicted detection frame and the obtained annotation information.
  • the above-mentioned parameter update process is performed to obtain a detection model corresponding to the target to be retrieved.
  • the at least one online update network is a plurality of parallel online update networks, and there are multiple objects to be retrieved, so that at most N objects to be retrieved are grouped into one group to obtain M objects to be retrieved Target groups, each target group to be retrieved corresponds to an online update network;
  • the parameters of the online update network of the pre-trained detection model are updated to obtain the detection model corresponding to the target to be retrieved, including:
  • At least one online update network is parallel, that is, M online update networks are parallel, and different online update networks are used to detect different objects to be retrieved.
  • One online update network can detect at most N different objects to be retrieved.
  • Retrieval target such as N can be 5 and so on.
  • At least one online update network can also solve incremental learning. For example, when updating parameters for the first time, 3 targets correspond to an online update network , when updating the parameters for the second time, it is necessary to detect 2 more targets and do not want to lose the first 3 targets. At this time, an online update network can be added, and the parameters are updated based on the template images of the 2 more targets to be detected. In this way, the newly added online update network can detect these two targets.
  • the method before performing target detection on the image to be retrieved by using the detection model corresponding to the target to be retrieved, the method further includes: using the post-processing model in the template image of the target to be retrieved where the target is located Perform feature extraction to obtain template feature vectors.
  • the post-processing model can be used to extract the features of the template image of the target to be retrieved, and the template feature vector of the target to be retrieved can be obtained, and the template feature vector of the target to be retrieved can be saved for subsequent treatment.
  • the saved template feature vectors can be obtained directly, improving the efficiency of target retrieval.
  • the pre-trained detection model is a general object detection model.
  • the general target detection model is a model pre-trained on massive data (such as obj365, coco, openimage data), which can detect any target, regardless of the type of target.
  • the general target detection model can be a SOTA model, and the performance better.
  • FIG. 2 is a schematic diagram of a target retrieval system composed of a detection model and a post-processing model in the embodiment of the present application.
  • the last P layer of the RCNN is an online update network
  • the post-processing model includes the first backbone network and the metric learning module
  • the metric learning template includes the first branch and the second branch
  • the detection model is performed on the image to be retrieved.
  • Target detection obtain at least one detection frame with a confidence degree greater than or equal to the confidence threshold, input each detection frame into the post-processing model
  • the first backbone network in the post-processing model extracts features from the detection frame, and extracts the feature
  • the graph is input into the metric learning module.
  • the first branch in the metric learning module performs global feature extraction on the feature map to obtain the global feature vector.
  • the second branch in the metric learning module performs local feature extraction on the feature map to obtain the local feature vector.
  • determine the feature vector to be retrieved corresponding to each detection frame match the feature vector to be retrieved corresponding to each detection frame with the template feature vector, that is, determine the corresponding
  • the similarity between the feature vector to be retrieved and the template feature vector is used as the matching result, and at least one detection frame is filtered according to the matching result to obtain a detection frame corresponding to the target to be retrieved.
  • the second backbone network in the detection model performs feature extraction on the image to be retrieved to obtain the feature map of the image to be retrieved, and the feature map is input into RPN and RCNN respectively, and RPN generates a candidate detection frame corresponding to the feature map , the candidate detection frame is input to RCNN, and the RCNN processes the feature map and the candidate detection frame to obtain at least one detection frame corresponding to the image to be retrieved.
  • the online update network in RCNN is updated online based on the template image of the target to be retrieved, so that the detection model can detect the target to be retrieved more accurately.
  • the detection model may also include a gradient decoupling layer (Gradient Decouple Layer, GDL), located between the second backbone network and the RPN, and between the second backbone network and the RCNN, for performing parameter updates on the online update network
  • GDL Gradient Decouple Layer
  • the learning rate of different layers is adjusted from time to time, and the efficiency of parameter update is improved, so that the detection model after parameter update is more suitable for the target detection of small samples.
  • Fig. 3 is a structural block diagram of a target retrieval device provided in the embodiment of the present application. As shown in Fig. 2, the target retrieval device may include:
  • the target detection module 301 is configured to perform target detection on the image to be retrieved through a detection model corresponding to the target to be retrieved, and obtain at least one detection frame with a confidence degree greater than or equal to a confidence threshold, wherein the confidence threshold is lower than a conventional detection network
  • the threshold value used, the detection model corresponding to the target to be retrieved is the model after the parameter update of the pre-trained detection model based on the template image of the target to be retrieved;
  • the detection frame filtering module 302 is configured to filter the at least one detection frame through a post-processing model to obtain a detection frame corresponding to the target to be retrieved.
  • the detection frame filter module includes:
  • a feature extraction unit configured to perform feature extraction on the at least one detection frame through a post-processing model to obtain a feature vector corresponding to each detection frame as a feature vector to be retrieved;
  • a detection frame filtering unit configured to match the feature vector to be retrieved corresponding to the at least one detection frame with the template feature vector, filter the at least one detection frame according to the matching result, and obtain the target corresponding to the target to be retrieved
  • the template feature vector is obtained by performing feature extraction on the region where the target to be retrieved is located in the template image of the target to be retrieved through the post-processing model.
  • the feature extraction unit is specifically used for:
  • the global feature extraction is performed on the feature map corresponding to each detection frame, and the global feature vector corresponding to the detection frame is obtained;
  • the second branch of the metric learning module in the post-processing model the The feature map corresponding to each detection frame is subjected to local feature extraction to obtain a local feature vector corresponding to the detection frame;
  • the first branch and the second branch are Siamese networks with different parameters;
  • the feature vector to be retrieved corresponding to each detection frame is determined.
  • the device also includes:
  • a template image acquisition module configured to acquire a template image of a target to be retrieved and/or label information corresponding to the template image
  • a parameter update module configured to update the parameters of the online update network of the pre-trained detection model according to the template image of the target to be retrieved and/or the annotation information corresponding to the template image, so as to obtain the detection corresponding to the target to be retrieved Model;
  • the pre-trained detection model includes a second backbone network and at least one online update network.
  • the parameter updating module is specifically used for:
  • the step of determining the predicted detection frame through the second backbone network and the online update network in the pre-trained detection model, determine the predicted detection frame corresponding to the template image;
  • Network updating step determining a loss value according to the label information corresponding to the predicted detection frame and the template image; according to the loss value, updating the network parameters of the online update network to obtain an updated pre-trained detection model;
  • the updated pre-trained detection model is used as the detection model corresponding to the target to be retrieved.
  • the online update network is the last P layer of the detection model corresponding to the target to be retrieved, and P is less than or equal to 5.
  • the device also includes:
  • the template feature extraction module is used to perform feature extraction on the region where the target to be retrieved is located in the template image of the target to be retrieved through a post-processing model to obtain a template feature vector.
  • the at least one online update network is a plurality of parallel online update networks, and there are multiple objects to be retrieved, so that at most N objects to be retrieved are grouped into one group to obtain M groups of objects to be retrieved, each A target group to be retrieved corresponds to an online update network;
  • the parameter updating module is specifically used for:
  • the pre-trained detection model is a general object detection model.
  • the target retrieval device performs target detection on the image to be retrieved through the detection model corresponding to the target to be retrieved, obtains at least one detection frame with a confidence degree greater than or equal to the confidence threshold, and filters at least one detection frame through a post-processing model , to obtain the detection frame corresponding to the target to be retrieved.
  • the detection model is a model obtained by updating the parameters of the pre-trained detection model based on the template image of the target to be retrieved, a small number of template images can be used to update the pre-trained detection model.
  • the confidence threshold is low, which can ensure that sufficient detection frames are recalled, and after filtering through the post-processing model, the detection frames corresponding to the target to be retrieved are obtained, realizing the retrieval of small sample targets.
  • the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.
  • the various component embodiments of the present application may be realized in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the computing processing device according to the embodiments of the present application.
  • DSP digital signal processor
  • the present application can also be implemented as an apparatus or apparatus program (eg, computer program and computer program product) for performing a part or all of the methods described herein.
  • Such a program implementing the present application may be stored on a computer-readable storage medium, or may be in the form of one or more signals.
  • Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.
  • FIG. 4 shows a computing processing device that may implement a method according to the present application.
  • the computing processing device conventionally includes a processor 410 and a computer program product or computer readable medium in the form of memory 420 .
  • Memory 420 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 420 has a storage space 430 for program code 431 for performing any method step in the method described above.
  • the storage space 430 for program codes may include respective program codes 431 for respectively implementing various steps in the above methods. These program codes can be read from or written into one or more computer program products.
  • These computer program products comprise program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks.
  • Such a computer program product is typically a portable or fixed storage unit as described with reference to FIG. 5 .
  • the storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 420 in the computing processing device of FIG. 4 .
  • the program code can eg be compressed in a suitable form.
  • the storage unit includes computer readable code 431', i.e. code readable by, for example, a processor such as 410, which code, when executed by a computing processing device, causes the computing processing device to perform the above-described methods. each step.
  • embodiments of the embodiments of the present application may be provided as methods, devices, or computer program products. Therefore, the embodiment of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to the embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor or processor of other programmable data processing terminal equipment to produce a machine such that instructions executed by the computer or processor of other programmable data processing terminal equipment Produce means for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing terminal to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the The instruction means implements the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种目标检索方法、装置、电子设备及存储介质,该方法包括:通过待检索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,置信度阈值低于常规的检测网络用的阈值,待检索目标对应的检测模型是基于待检索目标的模板图像进行参数更新后的模型;通过后处理模型对至少一个检测框进行过滤,得到对应于待检索目标的检测框。本申请由于检测模型是基于待检索目标的模板图像进行参数更新得到的模型,可以使用少量的模板图像对预训练的检测模型进行更新,而且置信度阈值较低,可以保证召回足够的检测框,并通过后处理模型进行过滤后,得到对应于待检索目标的检测框,实现了对小样本目标的检索。

Description

目标检索方法、装置、设备及存储介质
本申请要求在2021年7月23日提交中国专利局、申请号为202110837127.9、发明名称为“目标检索方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及一种目标检索方法、装置、设备及存储介质。
背景技术
图像目标检索技术的核心任务是找出图像中所感兴趣的目标,是计算机视觉领域的重要研究课题之一。近年来,在深度学习技术和卷积神经网络的发展推动下,目标检索技术已经取得了长足的发展。给定一幅图像,图像目标检索的目标就是要从图像库中检索出那些包含有目标的图片,在经过相似性度量排序后这些包含有目标的图片尽可能的排在检索结果的前面。相似物体检索在英文文献中一般称为物体检索(Object Retrieval),近似样本搜索或检测(Duplicate Search or Detection)也可以归类于相同物体的检索,并且相同物体检索方法可以直接应用到近似样本搜索或检测上。相同物体检索不论是在研究还是在商业图像搜索产业中都具有重大的价值,比如购物应用中搜索衣服鞋子、人脸检索等。
目前的图像目标检索方案都是基于大量的数据训练对应的深度学习模型,这些模型会从大量的数据中学到这些类别的特征表示,比如人脸特征,人体特征,然后在测试阶段会抽取不同图片的特征进行相似度度量来判断两张图片的相似性。在人脸,人体等任务上应用效果很好落地也很广泛,除此之外也有汽车、衣服、鞋子等其他目标的检索模型也有很多应用。
现有的图像目标检索技术对训练数据的依赖很强,像百万数据级别的人脸、人体任务能实现业务落地的精度,几万几千数据级别的帽子鞋子也能在一些特殊的场景下有效。但是,在现实生活中像衣服帽子等视觉显著的目标有千千万,还有很多目标是缺少训练数据的,并且还有很多新出现的物体比如某品牌的潮流衣服、新的饮料等等,这些目标都无法采集足够的数据来支 持训练;即使好不容易收集到了足够的训练数据,但是训练过程是漫长的,短则几天长则几个月。可见,对于小样本的目标检索而言需要消耗较长的训练时间,而且在无法收集到足够的训练数据时则无法完成目标检索。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种目标检索方法、装置、设备及存储介质。
依据本申请实施例的第一方面,提供了一种目标检索方法,包括:
通过待检索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,其中,所述置信度阈值低于常规的检测网络用的阈值,所述待检索目标对应的检测模型是基于待检索目标的模板图像对预训练的检测模型进行参数更新后的模型;
通过后处理模型对所述至少一个检测框进行过滤,得到对应于所述待检索目标的检测框。
依据本申请实施例的第二方面,提供了一种目标检索装置,包括:
目标检测模块,用于通过待检索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,其中,所述置信度阈值低于常规的检测网络用的阈值,所述待检索目标对应的检测模型是基于待检索目标的模板图像对预训练的检测模型进行参数更新后的模型;
检测框过滤模块,用于通过后处理模型对所述至少一个检测框进行过滤,得到对应于所述待检索目标的检测框。
依据本申请实施例的第三方面,提供了一种计算处理设备,包括:
存储器,其中存储有计算机可读代码;
一个或多个处理器,当所述计算机可读代码被所述一个或多个处理器执行时,所述计算处理设备执行如第一方面所述的目标检索方法。
依据本申请实施例的第四方面,提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行根据第一方面所述的目标检索方法。
依据本申请实施例的第五方面,提供了一种计算机可读存储介质,其中存储了如第四方面所述的计算机程序。
本申请实施例提供的目标检索方法、装置、设备及存储介质,通过待检 索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,通过后处理模型对至少一个检测框进行过滤,得到对应于待检索目标的检测框,由于待检索目标对应的检测模型是基于待检索目标的模板图像对预训练的检测模型进行参数更新得到的模型,可以使用少量的模板图像对预训练的检测模型进行更新,而且置信度阈值较低,可以保证召回足够的检测框,并通过后处理模型进行过滤后,得到对应于待检索目标的检测框,实现了对小样本目标的检索。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。
图1是本申请实施例提供的一种目标检索方法的步骤流程图;
图2是本申请实施例中的检测模型和后处理模型组成的目标检索系统的示意图;
图3是本申请实施例提供的一种目标检索装置的结构框图;
图4示意性地示出了用于执行根据本申请的方法的计算处理设备的框图;以及
图5示意性地示出了用于保持或者携带实现根据本申请的方法的程序代码的存储单元。
具体实施例
下面将参照附图更详细地描述本申请的示例性实施例。虽然附图中显示了本申请的示例性实施例,然而应当理解,可以以各种形式实现本申请而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本申请,并且能够将本申请的范围完整的传达给本领域的技术人员。
图1是本申请实施例提供的一种目标检索方法的步骤流程图,如图1所 示,该方法可以包括:
步骤101,通过待检索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,其中,所述置信度阈值低于常规的检测网络用的阈值,所述待检索目标对应的检测模型是基于待检索目标的模板图像对预训练的检测模型进行参数更新后的模型。
待检索目标的类型是与预训练检测模型时使用的数据集有关的,预训练的检测模型在进行预训练时使用的数据集是包括待检索目标的,同时还可以包括其他目标。也就是说,如果预训练的检测模型是通用目标检测模型,待检索目标可以是具有显著特征的人或物体,比如张三、书包、帽子、红色帽子等等,如果检测模型是特定类型目标的检测模型,待检索目标只能是某个类型的目标。例如,如果预训练检测模型时使用的数据集是A类型的数据集,待检索目标与A类型较为接近时会更准确,比如A类型是车辆,待检索目标是人体,基于包含人体的模板图像对预训练的检测模型进行参数更新后,对人体进行检索时,检索结果可能不太准确;如果A类型是车辆(包括a、b、c、d、e等品牌),待检索目标为车辆,品牌为w(预训练时用的数据集中不包括的品牌),基于包含品牌w车辆的模板图像对预训练的检测模型进行参数更新后,对品牌w车辆进行检索时,由于品牌w车辆为车辆的类型,与预训练数据集的类型相同,这时检索结果会更加准确。
待检索目标的模板图像是包括待检索目标的图像,对预训练的检测模型进行参数更新时使用的待检索目标的模板图像数量可以为5-10张。
模板图像的作用:一方面让检测模型“认识”待检索目标,针对待检索目标进行优化,一方面让后处理模型把待检索目标的特征提取出来并保存起来。后面才能够用待检索目标对应的检测模型从待检索图像中检测出待检索目标并用后处理模型识别出目标。
待检索目标的模板图像常常由用户提供,算法厂商直接将预训练的检测模型提供给用户,用户使用待检索目标的模板图像对预训练的检测模型进行参数更新,得到待检索目标对应的检测模型,从而算法厂商不接触敏感数据,提高了对客户数据的保护。
待检索目标对应的检测模型对待检索图像进行目标检测,会得到多个检测框,每个检测框都对应有其在待检索图像中的位置和置信度(表明它认为框内物体是待检索目标的概率),待检索目标对应的检测模型给出的检测框表明它认为检测框框起来的局部图可能是待检索目标。
通过基于待检索目标的模板图像对预训练的检测模型进行参数更新,得到待检索目标对应的检测模型,在使用待检索目标对应的检测模型进行目标检测时,将置信度阈值设置的低于常规的检测网络用的阈值,使得待检索目标对应的检测模型可以召回较多的检测框,保证较高的召回率,由于置信度阈值较低,所以召回的至少一个检测框中有的包含待检索目标,有的可能不包含待检索目标,可以通过后处理模型对召回的至少一个检测框进行过滤,以获得较为准确的对应于待检索目标的检测框。
所述置信度阈值一般要比常规的检测网络用的阈值低10%~20%,所述置信度阈值例如可以设置为0.4~0.6之间的值,常规的检测网络用的阈值一般为0.5~0.8。
步骤102,通过后处理模型对所述至少一个检测框进行过滤,得到对应于所述待检索目标的检测框。
由于通过待检索目标对应的检测模型对待检索图像进行目标检测得到的检测框中可能有检测框存在误报的情况(检测框框起来的局部图并不是待检索目标),即有的检测框是存在待检索目标的,而有的检测框可能不存在待检索目标,所以需要通过后处理模型来对至少一个检测框进行过滤,以将不存在待检索目标的检测框过滤掉,保留存在待检索目标的检测框,即得到对应于待检索目标的检测框。
如果待检索图像中实质上不包括待检索目标,通过待检索目标对应的检测模型进行检测后,可能会得到至少一个检测框,也可能不会得到检测框,但是通过后处理模型进行过滤后,理论上应该没有检测框保留下来。
本实施例提供的目标检索方法,通过待检索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,通过后处理模型对至少一个检测框进行过滤,得到对应于待检索目标的检测框,由于待检索目标对应的检测模型是基于待检索目标的模板图像对预训练的检测模型进行参数更新得到的模型,可以使用少量的模板图 像对预训练的检测模型进行更新,而且置信度阈值较低,可以保证召回足够的检测框,并通过后处理模型进行过滤后,得到对应于待检索目标的检测框,实现了对小样本目标的检索。
在本申请的一个实施例中,通过后处理模型对至少一个检测框进行过滤,得到对应于待检索目标的检测框,包括:
通过后处理模型分别对所述至少一个检测框进行特征提取,得到每个检测框对应的特征向量,作为待检索特征向量;
分别将所述至少一个检测框对应的待检索特征向量与模板特征向量进行匹配,根据匹配结果对所述至少一个检测框进行过滤,得到对应于所述待检索目标的检测框,所述模板特征向量是通过所述后处理模型对所述待检索目标的模板图像中待检索目标所在区域进行特征提取得到的。
其中,所述后处理模型是通用的提取图像特征的模型,不需要依据待检索目标的模板图像进行网络参数更新,对于任何的待检索目标均可以提取特征。网络参数更新是指针对待检索目标对预训练的模型进行网络参数的更新,使其与待检索目标更加匹配,提高处理效果。
待检索目标的模板图像一方面用于更新检测模型,一方面用于生成模板特征向量。通过后处理模型对待检索目标的模板图像进行特征提取,得到模板特征向量。在通过待检索目标对应的检测模型得到至少一个检测框后,通过后处理模型分别对每个检测框对应的待检索图像中的局部图进行特征提取,或者通过后处理模型分别对每个检测框对应的特征图进行特征提取,得到每个检测框对应的特征向量,即得到每个检测框对应的待检索特征向量。其中,检测框对应的特征图是待检索图像的特征图中检测框框起来的局部特征图,待检索图像的特征图可以是检测模型对待检索图像进行目标检测的过程中提取得到的。
分别将每个检测框对应的待检索特征向量与模板特征向量进行匹配,可以将待检索特征向量与模板特征向量的相似度作为匹配结果,从而根据匹配结果对至少一个检测框进行过滤,如可以筛选相似度大于或等于相似度阈值的检测框,作为待检索目标的检测框,通过基于模板特征向量对应的目标可以确定最终的检测框对应的具体待检索目标,例如,待检索目标对应的检测模型可以检测5个目标:A、B、C、D、E,一个待检索图像中 包括其中一个目标,经过待检索目标对应的检测模型进行检测后输出2个检测框,将这2个检测框框起来的局部图分别通过后处理模型提取待检索特征向量,并与模板特征向量进行匹配后,其中一个检测框与目标B的相似度大于相似度阈值,则可以确定该检测框框起来的局部图中的待检索目标为目标B。其中,待检索特征向量与模板特征向量匹配的过程,可以是计算待检索特征向量与模板特征向量的距离,并基于距离确定待检索特征向量与模板特征向量的相似度;或者,也可以直接计算待检索特征向量与模板特征向量的相似度。
上述分别将至少一个检测框对应的待检索特征向量与模板特征向量进行匹配和/或根据匹配结果对至少一个检测框进行过滤的操作可以由后处理模型直接执行,也可以不由后处理模型执行,而是获取后处理模型输出的待检索特征向量以实现对检测框的过滤。
根据模板特征向量对至少一个检测框过滤,可以得到较为准确的对应于待检索目标的检测框,提高目标检索的准确性。
在本申请的一个实施例中,通过后处理模型分别对所述至少一个检测框进行特征提取,得到每个检测框对应的特征向量,作为待检索特征向量,包括:
通过后处理模型中的第一主干网络分别对所述至少一个检测框进行特征提取,得到每个检测框对应的特征图;
通过后处理模型中的度量学习模块的第一分支对每个检测框对应的特征图进行全局特征提取,得到检测框对应的全局特征向量;通过后处理模型中的度量学习模块的第二分支对每个检测框对应的特征图进行局部特征提取,得到检测框对应的局部特征向量;所述第一分支和第二分支为参数不同的孪生网络;
根据每个检测框对应的全局特征向量和局部特征向量,确定每个检测框对应的待检索特征向量。
其中,所述后处理模型包括第一主干网络和度量学习模块,第一主干网络为特征提取器(Feature Extractor),其输入可以为待检索图像中检测框框起来的局部图,即RGB图像,也可以是检测框对应的特征图,输出为对应的深度特征,通常被配置为常见的卷积神经网络,用于提取高维特征,度量 学习(metric learning)模块主要用于学习一个更有判别性的特征,从而可以实现在给定待检索目标的模板图像时,能够通过提取特征实现对检测框的过滤,即基于提取到的特征与模板特征向量的相似度实现对检测框的分类,以得到待检索目标的检测框。对检测框进行分类是基于模板特征向量进行分类的,例如,有5个模板特征向量,通过对待检索图像进行目标检测,得到4个检测框,对于每个检测框,会得到分别与5个模板特征向量对应的相似度,即得到5个相似度,根据这5个相似度可以确定检测框的类别,即可以确定检测框属于5个模板特征向量所对应的目标类别之一,还是5个目标类别均不属于,如果一个相似度大于相似度阈值,则确定该检测框的类别为该相似度所对应的模板特征向量所属的目标类别。
在通过待检索目标对应的检测模型得到待检索图像中的至少一个检测框后,将每个检测框分别输入后处理模型,后处理模型中的第一主干网络首先对输入的检测框进行特征提取,得到检测框对应的特征图,第一主干网络输出的特征图输入度量学习模块,度量学习模块包括第一分支和第二分支,第一分支和第二分支为参数不同的孪生网络,分别对特征图进行处理,第一分支提取所述特征图的全局特征,得到检测框对应的全局特征向量,第二分支提取所述特征图的局部特征,得到检测框对应的局部特征向量,综合检测框对应的全局特征向量和局部特征向量,得到每个检测框对应的待检索特征向量。通过第一主干网络和度量学习模块,可以提取到足够区分不同检索目标的特征向量,提高目标检索的准确性,而且通过度量学习模块中的第一分支和第二分支能够提取不同视角的特征,可以实现比一般网络更好的精度,提高检测框过滤的准确性。
后处理模型也是预训练的。在对后处理模型进行训练时,可以结合分类学习和样本对学习两种模式进行学习,即在训练的过程中,目标损失函数包括分类损失和样本对损失(如Triplet损失),这样可以进一步提高对待检索图像进行处理时的效果。其中,样本对学习是指将相同类型的待检索目标样本以及不同类型的待检索目标样本分别组成样本对。
后处理模型预训练完成后,可以使用后处理模型对待检索目标的模板图像进行特征提取并保存模板特征向量,这个过程可以快速完成,大概需要3秒的时间即可完成。当然,后处理模板预训练完成后,也可以基于待检 索目标对后处理模型进行参数更新,但是经过实践表明,后处理模型不更新的处理效果逼近更新的精度,而且不更新时只需要提取模板图像的特征进行保存即可,速度比更新会更快。
在上述技术方案的基础上,在所述通过待检索目标对应的检测模型对待检索图像进行目标检测之前,还包括:
获取待检索目标的模板图像和/或所述模板图像对应的标注信息;
根据所述待检索目标的模板图像和/或所述模板图像对应的标注信息,对预训练的检测模型的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型;所述预训练的检测模型包括第二主干网络和至少一个在线更新网络。
第二主干网络即特征提取器,其输入为RGB图像,输出为对应的深度特征,通常被配置为常见的卷积神经网络,用于提取高维特征。
至少一个在线更新网络中的每个在线更新网络分别与第二主干网络直接或间接连接,如果在线更新网络有多个,每个都和第二主干网络连接。
其中,所述待检索目标的模板图像可以是仅包括待检索目标所在区域的局部图,也可以是既包含待检索目标所在区域又包含其他区域的全景图。
检测模型中包括的各在线更新网络可以使用不同的待检索目标的模板图像进行在线更新,使得不同在线更新网络可以用于检测不同的待检索目标。同时,每个在线更新网络可以检测一定数量个(例如3-5个)不同的待检索目标。
在通过待检索目标对应的检测模型对待检索图像进行目标检测之前,需要对预训练的检测模型进行参数更新,以得到待检索目标对应的检测模型,使得该检测模型能够准确从待检索图像中检测出待检索目标。在对预训练的检测模型进行参数更新时,首先获取待检索目标的模板图像和/或模板图像对应的标注信息,将待检索目标的模板图像输入预训练的检测模型,通过检测模型进行处理后,得到模板图像中的至少一个检测框,基于得到的至少一个检测框,以及模板图像和/或模板图像对应的标注信息,对预训练的检测模型中的在线更新网络进行参数更新,得到待检索目标对应的检测模型。
在待检索目标的模板图像是仅包括待检索目标所在区域的局部图时,这时无需获取模板图像对应的标注信息。所述待检索目标的模板图像是既包含待检索目标所在区域又包含其他区域的全景图时,获取待检索目标的模板图像时,需要同时获取模板图像对应的标注信息,标注信息用于指示模板图像中待检索目标的位置。
通过使用待检索目标的模板图像对预训练的检测模型中的在线更新网络进行参数更新,对于每个待检索目标,可以使用待检索目标的少量(通常5-10个)的模板图像来对预训练的检测模型进行参数更新,更新后的模型更加适应待检索目标,而且只需要对在线更新网络进行参数更新,能够快速的完成检测模型的参数更新,更新过程所需的时间大概在1分钟之内。
在上述技术方案的基础上,根据所述待检索目标的模板图像和/或所述模板图像对应的标注信息,对预训练的检测模型的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型,包括:
预测检测框确定步骤:通过预训练的检测模型中的第二主干网络和在线更新网络,确定模板图像对应的预测检测框;
网络更新步骤:根据所述预测检测框和模板图像对应的标注信息,确定损失值;根据所述损失值,对在线更新网络的网络参数进行更新,得到更新后的预训练的检测模型;
重复执行所述预测检测框确定步骤和所述网络更新步骤,直到损失值小于损失值阈值或重复执行次数达到重复执行次数阈值;
将更新后的预训练的检测模型作为待检索目标对应的检测模型。
其中,所述在线更新网络为所述待检索目标对应的检测模型的最后P层,P小于或等于5,例如,P可以为3等。
预测检测框确定步骤用于确定模板图像中的预测检测框,即将模板图像输入预训练的检测模型,预训练的检测模型中的第二主干网络和在线更新网络依次对模板图像进行处理,即第二主干网络首先提取模板图像中的高维特征,之后在线更新网络对高维特征进行处理,得到模板图像对应的预测检测框。网络更新步骤基于确定的预测检测框和模板图像对应的标注信息,确定损失值,并基于损失值,进行反向传播,对在线更新网络的网络参数进行更新,第二主干网络的网络参数无需更新,得到更新后的预训 练的检测模型。迭代执行预测检测框确定步骤和网络更新步骤,直至达到结束条件(例如损失值收敛,损失值小于损失值阈值或重复执行次数达到重复执行次数阈值),结束训练,将更新后的预训练的检测模型作为待检索目标对应的检测模型。其中,所述损失值可以包括定位损失和分类损失。在模板图像为既包含待检索目标所在区域又包含其他区域的全景图时,根据预测检测框的位置和模板图像中的标注信息之间的差异来计算定位损失;在模板图像为仅包括待检索目标所在区域的局部图时,由于该模板图像没有标注信息,可以对该局部图进行变换,在该局部图的周围进行填充(padding)处理等操作,得到一张包括模板图像的大图,并将模板图像在大图中的位置作为标注信息,从而可以基于预测检测框的位置与得到的标注信息来计算定位损失。
对于检测模型中的至少一个在线更新网络经过上述的参数更新处理,得到对应于待检索目标的检测模型。
在上述技术方案的基础上,所述至少一个在线更新网络为多个并行的在线更新网络,所述待检索目标为多个,以至多N个待检索目标分为一组,得到M个待检索目标组,每个待检索目标组对应一在线更新网络;
根据所述待检索目标的模板图像和/或所述模板图像对应的标注信息,对预训练的检测模型的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型,包括:
根据第i个待检索目标组的模板图像,对预训练的检测模型中与第i个待检索目标组对应的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型;i=1-M。
在检测模型中,至少一个在线更新网络是并行的,即M个在线更新网络是并行的,不同的在线更新网络用于检测不同的待检索目标,一个在线更新网络可以检测至多N个不同的待检索目标,如N可以为5等。在对在线更新网络进行参数更新时,分别基于每个在线更新网络对应的待检索目标组对该待检索目标组对应的在线更新网络进行参数更新,即使用第i个待检索目标组的模板图像对预训练的检测模型中与第i个待检索目标组对应的在线更新网络进行参数更新,i=1-M,得到待检索目标对应的检测模型。通过至少一个在线更新网络并行,可以同时检测更多的待检索目标,同时, 至少一个在线更新网络并行还可以解决增量学习,比如第一次进行参数更新时,3个目标对应一个在线更新网络,第二次进行参数更新时,又要多检测2个目标,并且不想丢失前面3个目标,这时可以增加一个在线更新网络,并基于要多检测的2个目标的模板图像进行参数更新,这样新增加的在线更新网络便可以对这2个目标进行检测。
在本申请的一个实施例中,在所述通过待检索目标对应的检测模型对待检索图像进行目标检测之前,所述方法还包括:通过后处理模型对待检索目标的模板图像中待检索目标所在区域进行特征提取,得到模板特征向量。
在使用检测模型和后处理模型进行目标检索之前,可以先使用后处理模型对待检索目标的模板图像进行特征提取,得到待检索目标的模板特征向量,并保存待检索目标的模板特征向量,后续对待检索图像进行目标检索时,可以直接获取保存的模板特征向量,提高目标检索的效率。
在本申请的一个实施例中,所述预训练的检测模型为通用目标检测模型。通用目标检测模型是在海量数据(如obj365,coco,openimage数据)上预训练得到的模型,可以检测任意的目标,不论目标是什么类型都可以进行检测,通用目标检测模型可以是SOTA模型,性能较好。
图2是本申请实施例中的检测模型和后处理模型组成的目标检索系统的示意图,如图2所示,检测模型包括第二主干网络、区域生成网络(RegionProposal Network,RPN)和RCNN(Region with CNN feature),其中,所述RCNN的最后P层为在线更新网络,后处理模型包括第一主干网络和度量学习模块,度量学习模板包括第一分支和第二分支,检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,将每个检测框分别输入后处理模型,后处理模型中的第一主干网络对检测框进行特征提取,将提取到的特征图输入度量学习模块,度量学习模块中的第一分支对特征图进行全局特征提取,得到全局特征向量,度量学习模块中的第二分支对特征图进行局部特征提取,得到局部特征向量,根据每个检测框对应的全局特征向量和局部特征向量,确定每个检测框对应的待检索特征向量,将每个检测框对应的待检索特征向量和模板特征向量进行匹配,即确定每个检测框对应的待检索特征向量与模板特征向 量的相似度,将相似度作为匹配结果,并根据匹配结果对至少一个检测框进行过滤,得到对应于待检索目标的检测框。
检测模型对待检索图像进行目标检测时,检测模型中的第二主干网络对待检索图像进行特征提取,得到待检索图像的特征图,特征图分别输入RPN和RCNN,RPN生成特征图对应的候选检测框,候选检测框输入RCNN,RCNN对特征图和候选检测框进行处理,得到待检索图像对应的至少一个检测框。RCNN中的在线更新网络基于待检索目标的模板图像进行在线更新,使得检测模型能够更准确地检测到待检索目标。
所述检测模型中还可以包括梯度解耦层(Gradient Decouple Layer,GDL),位于第二主干网络和RPN之间,以及第二主干网络和RCNN之间,用于在对在线更新网络进行参数更新时调节不同层的学习率,提高参数更新的效率,使得参数更新后的检测模型更加适应于小样本的目标检测。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
图3是本申请实施例提供的一种目标检索装置的结构框图,如图2所示,该目标检索装置可以包括:
目标检测模块301,用于通过待检索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,其中,所述置信度阈值低于常规的检测网络用的阈值,所述待检索目标对应的检测模型是基于待检索目标的模板图像对预训练的检测模型进行参数更新后的模型;
检测框过滤模块302,用于通过后处理模型对所述至少一个检测框进行过滤,得到对应于所述待检索目标的检测框。
可选的,所述检测框过滤模块包括:
特征提取单元,用于通过后处理模型分别对所述至少一个检测框进行特征提取,得到每个检测框对应的特征向量,作为待检索特征向量;
检测框过滤单元,用于分别将所述至少一个检测框对应的待检索特征向量与模板特征向量进行匹配,根据匹配结果对所述至少一个检测框进行过滤,得到对应于所述待检索目标的检测框,所述模板特征向量是通过所述后处理模型对所述待检索目标的模板图像中待检索目标所在区域进行特征提取得到的。
可选的,所述特征提取单元具体用于:
通过后处理模型中的第一主干网络分别对所述至少一个检测框进行特征提取,得到每个检测框对应的特征图;
通过后处理模型中的度量学习模块的第一分支对每个检测框对应的特征图进行全局特征提取,得到检测框对应的全局特征向量;通过后处理模型中的度量学习模块的第二分支对每个检测框对应的特征图进行局部特征提取,得到检测框对应的局部特征向量;所述第一分支和第二分支为参数不同的孪生网络;
根据每个检测框对应的全局特征向量和局部特征向量,确定每个检测框对应的待检索特征向量。
可选的,所述装置还包括:
模板图像获取模块,用于获取待检索目标的模板图像和/或所述模板图像对应的标注信息;
参数更新模块,用于根据所述待检索目标的模板图像和/或所述模板图像对应的标注信息,对预训练的检测模型的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型;所述预训练的检测模型包括第二主干网络和至少一个在线更新网络。
可选的,所述参数更新模块具体用于:
预测检测框确定步骤:通过预训练的检测模型中的第二主干网络和在线更新网络,确定模板图像对应的预测检测框;
网络更新步骤:根据所述预测检测框和模板图像对应的标注信息,确定损失值;根据所述损失值,对所述在线更新网络的网络参数进行更新,得到更新后的预训练的检测模型;
重复执行所述预测检测框确定步骤和所述网络更新步骤,直到损失值小于损失值阈值或重复执行次数达到重复执行次数阈值;
将更新后的预训练的检测模型作为待检索目标对应的检测模型。
可选的,所述在线更新网络为所述待检索目标对应的检测模型的最后P层,P小于或等于5。
可选的,所述装置还包括:
模板特征提取模块,用于通过后处理模型对待检索目标的模板图像中待检索目标所在区域进行特征提取,得到模板特征向量。
可选的,所述至少一个在线更新网络为多个并行的在线更新网络,所述待检索目标为多个,以至多N个待检索目标分为一组,得到M个待检索目标组,每个待检索目标组对应一在线更新网络;
所述参数更新模块具体用于:
根据第i个待检索目标组的模板图像,对预训练的检测模型中与第i个待检索目标组对应的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型;i=1-M。
可选的,所述预训练的检测模型为通用目标检测模型。
本实施例提供的目标检索装置,通过待检索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,通过后处理模型对至少一个检测框进行过滤,得到对应于待检索目标的检测框,由于检测模型是基于待检索目标的模板图像对预训练的检测模型进行参数更新得到的模型,可以使用少量的模板图像对预训练的检测模型进行更新,而且置信度阈值较低,可以保证召回足够的检测框,并通过后处理模型进行过滤后,得到对应于待检索目标的检测框,实现了对小样本目标的检索。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的计算处理设备中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读存储介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图4示出了可以实现根据本申请的方法的计算处理设备。该计算处理设备传统上包括处理器410和以存储器420形式的计算机程序产品或者计算机可读介质。存储器420可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器420具有用于执行上述方法中的任何方法步骤的程序代码431的存储空间430。例如,用于程序代码的存储空间430可以包括分别用于实现上面的方法中的各种步骤的各个程序代码431。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图5所述的便携式或者固定存储单元。该存储单元可以具有与图4的计算处理设备中的存储器420类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码431’,即可以由例如诸如410之类的处理器读取的代码,这些代码当由计算处理设备运行时,导致该计算处理设备执行上面所描述的方法中的各个步骤。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请 实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方 法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种目标检索方法、装置、设备及存储介质,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (13)

  1. 一种目标检索方法,其特征在于,包括:
    通过待检索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,其中,所述置信度阈值低于常规的检测网络用的阈值,所述待检索目标对应的检测模型是基于待检索目标的模板图像对预训练的检测模型进行参数更新后的模型;
    通过后处理模型对所述至少一个检测框进行过滤,得到对应于所述待检索目标的检测框。
  2. 根据权利要求1所述的方法,其特征在于,通过后处理模型对至少一个检测框进行过滤,得到对应于待检索目标的检测框,包括:
    通过后处理模型分别对所述至少一个检测框进行特征提取,得到每个检测框对应的特征向量,作为待检索特征向量;
    分别将所述至少一个检测框对应的待检索特征向量与模板特征向量进行匹配,根据匹配结果对所述至少一个检测框进行过滤,得到对应于所述待检索目标的检测框,所述模板特征向量是通过所述后处理模型对所述待检索目标的模板图像中待检索目标所在区域进行特征提取得到的。
  3. 根据权利要求2所述的方法,其特征在于,通过后处理模型分别对所述至少一个检测框进行特征提取,得到每个检测框对应的特征向量,作为待检索特征向量,包括:
    通过后处理模型中的第一主干网络分别对所述至少一个检测框进行特征提取,得到每个检测框对应的特征图;
    通过后处理模型中的度量学习模块的第一分支对每个检测框对应的特征图进行全局特征提取,得到检测框对应的全局特征向量;通过后处理模型中的度量学习模块的第二分支对每个检测框对应的特征图进行局部特征提取,得到检测框对应的局部特征向量;所述第一分支和第二分支为参数不同的孪生网络;
    根据每个检测框对应的全局特征向量和局部特征向量,确定每个检测框对应的待检索特征向量。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,在所述通 过待检索目标对应的检测模型对待检索图像进行目标检测之前,还包括:
    获取待检索目标的模板图像和/或所述模板图像对应的标注信息;
    根据所述待检索目标的模板图像和/或所述模板图像对应的标注信息,对预训练的检测模型的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型;所述预训练的检测模型包括第二主干网络和至少一个在线更新网络。
  5. 根据权利要求4所述的方法,其特征在于,根据所述待检索目标的模板图像和/或所述模板图像对应的标注信息,对预训练的检测模型的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型,包括:
    预测检测框确定步骤:通过预训练的检测模型中的第二主干网络和在线更新网络,确定模板图像对应的预测检测框;
    网络更新步骤:根据所述预测检测框和模板图像对应的标注信息,确定损失值;根据所述损失值,对所述在线更新网络的网络参数进行更新,得到更新后的预训练的检测模型;
    重复执行所述预测检测框确定步骤和所述网络更新步骤,直到损失值小于损失值阈值或重复执行次数达到重复执行次数阈值;
    将更新后的预训练的检测模型作为待检索目标对应的检测模型。
  6. 根据权利要求4或5所述的方法,所述在线更新网络为所述待检索目标对应的检测模型的最后P层,P小于或等于5。
  7. 根据权利要求4-6任一项所述的方法,在所述通过待检索目标对应的检测模型对待检索图像进行目标检测之前,所述方法还包括:
    通过后处理模型对待检索目标的模板图像中待检索目标所在区域进行特征提取,得到模板特征向量。
  8. 根据权利要求4-7任一项所述的方法,其特征在于,所述至少一个在线更新网络为多个并行的在线更新网络,所述待检索目标为多个,以至多N个待检索目标分为一组,得到M个待检索目标组,每个待检索目标组对应一在线更新网络;
    根据所述待检索目标的模板图像和/或所述模板图像对应的标注信息,对预训练的检测模型的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型,包括:
    根据第i个待检索目标组的模板图像,对预训练的检测模型中与第i个待检索目标组对应的在线更新网络进行参数更新,得到所述待检索目标对应的检测模型;i=1-M。
  9. 根据权利要求4-7任一项所述的方法,其特征在于,所述预训练的检测模型为通用目标检测模型。
  10. 一种目标检索装置,其特征在于,包括:
    目标检测模块,用于通过待检索目标对应的检测模型对待检索图像进行目标检测,得到置信度大于或等于置信度阈值的至少一个检测框,其中,所述置信度阈值低于常规的检测网络用的阈值,所述待检索目标对应的检测模型是基于待检索目标的模板图像对预训练的检测模型进行参数更新后的模型;
    检测框过滤模块,用于通过后处理模型对所述至少一个检测框进行过滤,得到对应于所述待检索目标的检测框。
  11. 一种计算处理设备,其特征在于,包括:
    存储器,其中存储有计算机可读代码;
    一个或多个处理器,当所述计算机可读代码被所述一个或多个处理器执行时,所述计算处理设备执行如权利要求1-9中任一项所述的目标检索方法。
  12. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在计算处理设备上运行时,导致所述计算处理设备执行根据权利要求1-9中任一项所述的目标检索方法。
  13. 一种计算机可读存储介质,其中存储了如权利要求12所述的计算机程序。
PCT/CN2022/091495 2021-07-23 2022-05-07 目标检索方法、装置、设备及存储介质 WO2023000764A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110837127.9A CN113743455A (zh) 2021-07-23 2021-07-23 目标检索方法、装置、电子设备及存储介质
CN202110837127.9 2021-07-23

Publications (1)

Publication Number Publication Date
WO2023000764A1 true WO2023000764A1 (zh) 2023-01-26

Family

ID=78729131

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091495 WO2023000764A1 (zh) 2021-07-23 2022-05-07 目标检索方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113743455A (zh)
WO (1) WO2023000764A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024164675A1 (zh) * 2023-02-07 2024-08-15 上海瑾盛通信科技有限公司 抠图方法、装置、电子设备以及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743455A (zh) * 2021-07-23 2021-12-03 北京迈格威科技有限公司 目标检索方法、装置、电子设备及存储介质
CN114842350A (zh) * 2022-03-26 2022-08-02 西北工业大学 遥感图像中的目标检测方法、装置
CN115205555B (zh) * 2022-07-12 2023-05-26 北京百度网讯科技有限公司 确定相似图像的方法、训练方法、信息确定方法及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200175062A1 (en) * 2017-07-28 2020-06-04 Hangzhou Hikvision Digital Technology Co., Ltd. Image retrieval method and apparatus, and electronic device
CN112417970A (zh) * 2020-10-22 2021-02-26 北京迈格威科技有限公司 目标对象识别方法、装置和电子系统
CN112861720A (zh) * 2021-02-08 2021-05-28 西北工业大学 基于原型卷积神经网络的遥感图像小样本目标检测方法
CN113052165A (zh) * 2021-01-28 2021-06-29 北京迈格威科技有限公司 目标检测方法、装置、电子设备及存储介质
CN113743455A (zh) * 2021-07-23 2021-12-03 北京迈格威科技有限公司 目标检索方法、装置、电子设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960214B (zh) * 2017-02-17 2020-11-20 北京一维弦科技有限责任公司 基于图像的物体识别方法
CN110009628A (zh) * 2019-04-12 2019-07-12 南京大学 一种针对连续二维图像中多形态目标的自动检测方法
CN110298391B (zh) * 2019-06-12 2023-05-02 同济大学 一种基于小样本的迭代式增量对话意图类别识别方法
CN110796679B (zh) * 2019-10-30 2023-04-07 电子科技大学 一种面向航拍影像的目标跟踪方法
CN111652887B (zh) * 2020-05-13 2023-04-07 腾讯科技(深圳)有限公司 图像分割模型训练方法、装置、计算机设备及存储介质
CN112132856B (zh) * 2020-09-30 2024-05-24 北京工业大学 一种基于自适应模板更新的孪生网络跟踪方法
CN112001373B (zh) * 2020-10-28 2021-01-22 北京妙医佳健康科技集团有限公司 一种物品识别方法、装置及存储介质
CN112906685B (zh) * 2021-03-04 2024-03-26 重庆赛迪奇智人工智能科技有限公司 一种目标检测方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200175062A1 (en) * 2017-07-28 2020-06-04 Hangzhou Hikvision Digital Technology Co., Ltd. Image retrieval method and apparatus, and electronic device
CN112417970A (zh) * 2020-10-22 2021-02-26 北京迈格威科技有限公司 目标对象识别方法、装置和电子系统
CN113052165A (zh) * 2021-01-28 2021-06-29 北京迈格威科技有限公司 目标检测方法、装置、电子设备及存储介质
CN112861720A (zh) * 2021-02-08 2021-05-28 西北工业大学 基于原型卷积神经网络的遥感图像小样本目标检测方法
CN113743455A (zh) * 2021-07-23 2021-12-03 北京迈格威科技有限公司 目标检索方法、装置、电子设备及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KANG BINGYI; LIU ZHUANG; WANG XIN; YU FISHER; FENG JIASHI; DARRELL TREVOR: "Few-Shot Object Detection via Feature Reweighting", 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), IEEE, 27 October 2019 (2019-10-27), pages 8419 - 8428, XP033724049, DOI: 10.1109/ICCV.2019.00851 *
SHAOQING REN, HE KAIMING, GIRSHICK ROSS, SUN JIAN: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2015), 7 December 2015 (2015-12-07) - 12 December 2015 (2015-12-12), XP055488147 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024164675A1 (zh) * 2023-02-07 2024-08-15 上海瑾盛通信科技有限公司 抠图方法、装置、电子设备以及存储介质

Also Published As

Publication number Publication date
CN113743455A (zh) 2021-12-03

Similar Documents

Publication Publication Date Title
WO2023000764A1 (zh) 目标检索方法、装置、设备及存储介质
CN104035917B (zh) 一种基于语义空间映射的知识图谱管理方法和系统
CN106294344B (zh) 视频检索方法和装置
CN104573130B (zh) 基于群体计算的实体解析方法及装置
CN111950728B (zh) 图像特征提取模型的构建方法、图像检索方法及存储介质
CN107292349A (zh) 基于百科知识语义增强的零样本分类方法、装置
CN106649276B (zh) 标题中核心产品词的识别方法以及装置
CN112183600B (zh) 一种基于动态记忆库模板更新的目标跟踪方法
An et al. Hypergraph propagation and community selection for objects retrieval
CN107229731A (zh) 用于分类数据的方法和装置
CN112818162A (zh) 图像检索方法、装置、存储介质和电子设备
CN115115825B (zh) 图像中的对象检测方法、装置、计算机设备和存储介质
CN114972737B (zh) 基于原型对比学习的遥感图像目标检测系统及方法
Ibrahimi et al. Learning with label noise for image retrieval by selecting interactions
CN115424053A (zh) 小样本图像识别方法、装置、设备及存储介质
CN114140663A (zh) 一种基于多尺度注意力学习网络的害虫识别方法及系统
CN113869264A (zh) 一种商品识别方法、识别系统、存储介质和服务器
CN112861881A (zh) 一种基于改进MobileNet模型的蜂窝肺识别方法
CN117152669A (zh) 一种跨模态时域视频定位方法及系统
CN116958724A (zh) 一种产品分类模型的训练方法和相关装置
CN109144999B (zh) 一种数据定位方法、装置及存储介质、程序产品
CN111984812B (zh) 一种特征提取模型生成方法、图像检索方法、装置及设备
CN113139379B (zh) 信息识别方法和系统
CN111008294A (zh) 交通图像处理、图像检索方法及装置
Peng et al. The knowing camera 2: recognizing and annotating places-of-interest in smartphone photos

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22844931

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 27/05/2024)