WO2021237973A1 - 图像定位模型获取方法及装置、终端和存储介质 - Google Patents

图像定位模型获取方法及装置、终端和存储介质 Download PDF

Info

Publication number
WO2021237973A1
WO2021237973A1 PCT/CN2020/113099 CN2020113099W WO2021237973A1 WO 2021237973 A1 WO2021237973 A1 WO 2021237973A1 CN 2020113099 W CN2020113099 W CN 2020113099W WO 2021237973 A1 WO2021237973 A1 WO 2021237973A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
positioning
loss function
model
sample
Prior art date
Application number
PCT/CN2020/113099
Other languages
English (en)
French (fr)
Inventor
葛艺潇
朱烽
王海波
赵瑞
李鸿升
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Publication of WO2021237973A1 publication Critical patent/WO2021237973A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/587Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • This application relates to the field of data processing technology, and in particular to an image positioning model acquisition method, device, terminal and storage medium.
  • the image positioning technology aims to match the most similar (near) reference image to the target image in a large-scale database, and use the GPS (Global Positioning System, global positioning system) marked as the geographic location of the target image.
  • Image positioning technology is currently implemented mainly through three methods, including image retrieval, 3D (three dimensional) structure matching, and classification by geographic location.
  • the embodiment of the application proposes a method and device for acquiring an image positioning model, a terminal, and a storage medium.
  • the embodiment of the application provides an image positioning model acquisition method, which includes: determining the similarity between the target image and K first sample positioning images according to the first image positioning model to obtain a first similarity vector , K is an integer greater than 1; determine the first target loss function according to the first similarity vector; adjust the initial model according to the first target loss function to obtain the second image positioning model, and the initial model is the first image positioning model initialization The model obtained afterwards.
  • the first image positioning model is used to determine the similarity between the target image and the K first sample positioning images to obtain the first similarity vector
  • the first target loss function is determined according to the similarity vector.
  • a target loss function adjusts the initial model to obtain a second image positioning model, so that the first target loss function can be determined according to the first image positioning model, the target image, and K first sample positioning images, and the initial model
  • the similarity supervised learning is performed to obtain the second image positioning model, so that the accuracy of the second image positioning model in the case of image positioning can be improved.
  • determining the similarity between the target image and the K first sample positioning images to obtain the first similarity vector includes: adding the K first samples Each first sample positioning image in the positioning image is split to obtain N sub-first sample positioning images corresponding to each first sample positioning image; each first sample positioning image is determined according to the first image positioning model The feature values corresponding to the N sub-first sample location images corresponding to this location image are used to obtain the feature vector corresponding to each first sample location image; the feature value of the target image is determined according to the first image location model; The feature vector corresponding to the first sample positioning image and the feature value of the target image determine the first similarity vector.
  • N sub-first-sample positioning images are obtained by splitting each of the K first-sample positioning images into N sub-first-sample positioning images.
  • the feature value and the feature value of the target image are used to determine the first similarity vector, so that the first similarity vector can be determined in a fine-grained manner, which improves the accuracy of the first similarity vector’s reflection of the sample, thereby improving the determination of the second image The accuracy of the positioning model.
  • determining the first target loss function according to the first similarity vector includes: determining the first sub-loss function according to the first similarity vector; according to the difficult negative sample image corresponding to the target image, Determine the second sub-loss function; determine the first target loss function according to the first sub-loss function and the second sub-loss function.
  • the first target loss function can be determined according to the first sub-loss function determined by the first similarity vector and the second sub-loss function determined by the difficult negative sample image corresponding to the target image, so that the first target loss function can be determined according to the accurate first
  • the second sub-loss function determined by the similarity vector and the difficult negative sample image determines the first target loss function, thereby improving the accuracy of determining the first target loss function.
  • determining the first sub-loss function according to the first similarity vector includes: obtaining the similarity between the target image and the K first sample positioning images according to the initial model to obtain the second Similarity vector: Determine the first sub-loss function according to the first similarity vector and the second similarity vector.
  • the first sub-loss function can be determined by the second similarity vector determined by the initial model and the first similarity vector, so that the second similarity vector can be supervised by the similarity vector determined by the first image positioning model.
  • the accuracy of determining the first sub-loss function is improved. Since the first similarity vector supervises the second similarity vector, the accuracy of the second image positioning model in image positioning can also be improved.
  • determining the first target loss function according to the first sub-loss function and the second sub-loss function includes: according to the loss weighting factors corresponding to the first sub-loss function and the second sub-loss function, The first sub-loss function and the second sub-loss function are operated to obtain the first objective loss function.
  • the method further includes: receiving the image to be marked; obtaining K second sample positioning images corresponding to the image to be marked; positioning the image for each second sample of the K second sample positioning images Split to obtain N sub-second sample positioning images corresponding to each second sample positioning image; through the second image positioning model, determine the N sub-second sample corresponding to the image to be marked and each second sample positioning image Locate the similarity label corresponding to the image.
  • the second image positioning model is used to determine the similarity labels corresponding to the N sub-second sample positioning images corresponding to each second sample positioning image.
  • the image positioning model obtained by training determines the similarity label, which can improve the accuracy of the obtained similarity label.
  • the first image positioning model includes a basic image positioning model
  • the basic image positioning model is a model obtained by training the target image with the image with the highest similarity among the K first sample positioning images as a sample pair .
  • the method further includes: determining a second target loss function according to the second image positioning model, the target image, and the K first sample positioning images; adjusting the initial model according to the second target loss function, Obtain a third image positioning model; use the third image positioning model to replace the first image positioning model.
  • An embodiment of the present application provides an image positioning method, which includes: receiving an image to be detected; and positioning the image to be detected according to a second image positioning model as described above to obtain positioning information corresponding to the image to be detected.
  • An embodiment of the present application provides an image positioning model acquisition device, the device comprising: a first determining unit configured to determine the similarity between a target image and K first sample positioning images according to the first image positioning model, In order to obtain the first similarity vector, K is an integer greater than 1; the second determining unit is configured to determine the first target loss function according to the first similarity vector; the adjustment unit is configured to compare the initial model according to the first target loss function The adjustment is performed to obtain the second image positioning model, and the initial model is the model obtained after the first image positioning model is initialized.
  • An embodiment of the present application provides an image positioning device, which includes: a receiving unit configured to receive an image to be detected; and a positioning unit configured to position the image to be detected according to a second image positioning model as described above to obtain Positioning information corresponding to the image to be detected.
  • the embodiment of the present application provides a terminal, which includes a processor, an input device, an output device, and a memory.
  • the processor, input device, output device, and memory are connected to each other.
  • the memory is configured to store a computer program, and the computer program includes program instructions to process
  • the device is configured to call program instructions to execute the step instructions in the image positioning model acquisition method or the image positioning method in the embodiment of the present application.
  • An embodiment of the present application provides a computer-readable storage medium, wherein the above-mentioned computer-readable storage medium stores a computer program configured for electronic data exchange, wherein the above-mentioned computer program causes a computer to execute the image positioning model acquisition method as in the embodiment of the present application Or part or all of the steps described in the image positioning method.
  • An embodiment of the present application provides a computer program product, wherein the foregoing computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the foregoing computer program is operable to cause a computer to execute an image positioning model as in the embodiment of the present application Part or all of the steps described in the acquisition method or image positioning method.
  • the computer program product may be a software installation package.
  • FIG. 1a is a schematic diagram of an application scenario of an image positioning model provided by an embodiment of this application.
  • FIG. 1b is a schematic flowchart of a method for acquiring an image positioning model according to an embodiment of the application
  • Figure 2a is a schematic diagram of a sample positioning image provided by an embodiment of the application.
  • 2b is a schematic diagram of splitting a first sample positioning image provided by an embodiment of this application.
  • FIG. 2c is a schematic diagram of splitting another first sample positioning image provided by an embodiment of the application.
  • 2d is a schematic diagram of a sub-first sample positioning image provided by an embodiment of this application.
  • FIG. 3 is a schematic flowchart of another method for acquiring an image positioning model according to an embodiment of the application.
  • FIG. 4 is a schematic flowchart of another method for acquiring an image positioning model provided by an embodiment of the application.
  • FIG. 5 is a schematic structural diagram of a terminal provided by an embodiment of this application.
  • FIG. 6 is a schematic structural diagram of an image positioning model acquisition device provided by an embodiment of this application.
  • FIG. 7 is a schematic structural diagram of an image positioning device provided by an embodiment of the application.
  • the electronic devices described in the embodiments of this application may include smart phones (such as Android phones, iOS phones, Windows Phone phones, etc.), tablet computers, handheld computers, driving recorders, traffic command platforms, servers, laptops, mobile Internet devices (MID , Mobile Internet Devices) or wearable devices (such as smart watches, Bluetooth headsets), etc.
  • smart phones such as Android phones, iOS phones, Windows Phone phones, etc.
  • tablet computers such as Samsung phones, iOS phones, Windows Phone phones, etc.
  • driving recorders traffic command platforms
  • servers laptops
  • mobile Internet devices MID , Mobile Internet Devices
  • wearable devices such as smart watches, Bluetooth headsets
  • the above are only examples and not exhaustive, including but not limited to the above electronic devices.
  • Electronic devices can also be servers or video matrixes. This is not limited, and the electronic device can also be an Internet of Things device.
  • the terminal and the electronic device may be the same device.
  • the image positioning model 101a can be applied to the electronic device 102a.
  • the user needs to locate the position, for example, the user needs to inform others of his current location, and the user can collect the current location through the electronic device 102a.
  • the image near the location for example, if the user is next to the xx building, the image near the current location may be the image of the area near the xx building, and the image to be detected 103a is obtained.
  • the electronic device uses the image positioning model 101a to perform positioning analysis and calculation on the image to be detected 103a to obtain the positioning information 104a corresponding to the image to be detected.
  • the positioning information is the location information (xx building) of the area reflected by the image to be detected, for example
  • the location information can be the location information of the landmark building in the image 103a to be detected, etc.
  • the landmark building can be a building selected by the user, or a building determined by the image positioning model 101a, etc., of course, it can also be other
  • the location information of the logo is just an example here. In this way, the current location of the user can be determined through the image positioning model 101a, which brings higher convenience to the user.
  • the image positioning model of the related technology does not have high positioning accuracy when positioning the image
  • model training a single sample pair is usually used to train the initial model, so image positioning is required
  • the model is optimized and trained to improve the accuracy of image positioning by the image positioning model.
  • the following embodiments mainly describe the adjustment of the initial model to improve the accuracy of image positioning of the adjusted image positioning model.
  • the image positioning model acquisition method is applied to an electronic device, and the method includes steps 101b to 103b, as follows:
  • the electronic device determines the similarity between the target image and the K first sample positioning images according to the first image positioning model to obtain a first similarity vector, where K is an integer greater than 1.
  • the K first sample positioning images may be sample images determined according to GPS (Global Positioning System) positioning information of the target image, for example, may be images within a preset range at a location indicated by the GPS positioning information of the target image. For example, it may be a map image within 10 meters of the indicated position.
  • the target image can be collected by mobile terminals such as mobile phones, computers, etc.
  • the target image can be used to determine the sample pair to adjust the initial model through the sample pair, that is, the target image and the K first sample positioning images are the initial model of the pair The sample pair to be adjusted.
  • the preset range can be set through empirical values or historical data.
  • the similarity labels between the K first sample positioning images and the target image can be a value between 0-1, and can also include 0 or 1. As shown in Fig. 2a, Fig. 2a shows a possible target image and the first sample positioning image, where the similarity label between the first sample positioning image and the target image includes 0.45, 0.35, etc.
  • the elements in the first similarity vector may include the similarity between the target image and the first sample positioning image and the similarity between the target image and the sub-images after the first sample positioning image is split. Split the first sample positioning image to obtain multiple sub-first sample positioning images. In the case of image splitting, the image can be split into multiple sub-first sample positioning images with the same area. It can be divided into multiple sub-first sample positioning images with different areas, etc.
  • the electronic device can be used to adjust the initial model, and can also be used to adjust the initial model and use the image positioning model for image positioning.
  • the electronic device determines a first target loss function according to the first similarity vector.
  • the corresponding loss function may be determined according to the first similarity vector, and the first target loss function may be determined at least through the corresponding loss function.
  • the electronic device adjusts the initial model according to the first target loss function to obtain a second image positioning model, where the initial model is a model obtained after the first image positioning model is initialized.
  • the initial model is trained by the sample set including the target image and K first sample positioning images and the first target loss function to obtain the second image positioning model.
  • the initial model is a model obtained after the initialization of the first image positioning model, which can be understood as initializing the model parameters in the first image positioning model to obtain the initial model.
  • the second image positioning model is a model obtained by training the initial model through a sample set including the target image and K first sample positioning images.
  • the first image positioning model is used to determine the similarity between the target image and the K first sample positioning images to obtain the first similarity vector
  • the first target loss function is determined according to the similarity vector.
  • a target loss function adjusts the initial model to obtain a second image positioning model, so that the first target loss function can be determined according to the first image positioning model, the target image, and K first sample positioning images, and the initial model
  • the similarity supervised learning is performed to obtain the second image positioning model, so that the accuracy of the second image positioning model in image positioning can be improved.
  • a possible method of determining the similarity between the target image and the K first sample positioning images according to the first image positioning model to obtain the first similarity vector includes steps A1 to A4, as follows: A1. Split each first sample positioning image in the K first sample positioning images to obtain N sub-first sample positioning images corresponding to each first sample positioning image A2, according to the first image positioning model, determine the feature values corresponding to the N sub-first sample positioning images corresponding to each first sample positioning image, so as to obtain the feature vector corresponding to each first sample positioning image; A3 , Determine the feature value of the target image according to the first image positioning model; A4. Determine the first similarity vector according to the feature vector corresponding to each first sample positioning image and the feature value of the target image.
  • the image can be split into multiple sub-first sample positioning images with the same area, or into multiple sub-first sample positioning images with different areas.
  • One possible way of splitting can be: split the first sample positioning image into two sub-first sample positioning images with the same area, and split the first sample positioning image into four sub-first sample positioning images with the same area.
  • Sample positioning image As shown in Figure 2b, the first sample positioning image can be split into two upper and lower sub-first sample positioning images, or the first sample positioning image can be split into two left and right sub-first sample positioning images; As shown in Fig. 2c, the first sample positioning image can be split into four sub-first sample positioning images with the same area.
  • the N sub-first sample images may include sub-first sample images obtained in a variety of different splitting methods, for example, it may be all sub-first sample positioning images obtained in the splitting methods shown in Figure 2b and Figure 2c.
  • N 8
  • N can also be any other numerical value. This is only an example for illustration and is not limited.
  • the feature vector corresponding to each first sample positioning image can be expressed as: in, Locate the feature value of the image for the first sub-first sample of the i-th first sample.
  • the first similarity vector can be obtained by the calculation method of cross entropy, and the first similarity vector can be determined by the method shown in the following formula (1):
  • softmax is the normalization operation
  • ⁇ ⁇ is the hyperparameter (temperature coefficient)
  • Is the feature value of the target image Locate the feature value of image p1 for the first sample, Locate the feature value of the image for the first sub-first sample of the first sample image p1, Locate the feature value of the image pk for the first sample, The feature value of the image is located for the eighth sub-first sample of the first sample image pk.
  • N sub-first-sample positioning images are obtained by splitting each of the K first-sample positioning images into N sub-first-sample positioning images.
  • the feature value and the feature value of the target image are used to determine the first similarity vector, so that the first similarity vector can be determined in a fine-grained manner, which improves the accuracy of the first similarity vector’s reflection of the sample, thereby improving the determination of the second image The accuracy of the positioning model.
  • a possible method for determining the first target loss function according to the first similarity vector includes steps B1 to B3, as follows: B1, according to the first similarity vector, determine the first sub-loss Function; B2, determine the second sub-loss function according to the difficult negative sample image corresponding to the target image; B3, determine the first target loss function according to the first sub-loss function and the second sub-loss function.
  • the first sub-loss function may be determined according to the similarity vector between the target image and the first sample positioning image determined by the first image positioning model, that is, the first similarity vector.
  • the difficult negative sample image corresponding to the target image can be understood as a negative sample image corresponding to the target image that has a similarity lower than a preset threshold.
  • the preset threshold can be set by empirical values or historical data.
  • the second sub-loss function can be determined by the method shown in the following formula (2):
  • Is the second sub-loss function Is the feature value of the positive sample image with the highest similarity label, Is the feature value of the negative sample image with the lowest similarity label, and K is the number of the first sample positioning image.
  • the first sub-loss function and the second sub-loss function may be weighted to obtain the first target loss function.
  • the first target loss function can be determined according to the first sub-loss function determined by the first similarity vector and the second sub-loss function determined by the difficult negative sample image corresponding to the target image, so that the first target loss function can be determined according to the accurate first
  • the second sub-loss function determined by the similarity vector and the difficult negative sample image determines the first target loss function, thereby improving the accuracy of determining the first target loss function.
  • a possible method for determining the first sub-loss function according to the first similarity vector includes steps C1 to C2, as follows: C1, obtaining the target image according to the initial model and K first This locating the similarity between the images to obtain the second similarity vector; C2, according to the first similarity vector and the second similarity vector, determine the first sub-loss function.
  • the method for obtaining the second similarity vector may refer to the method for obtaining the first similarity vector in the foregoing embodiment, and in the case of implementation, the initial model is used for calculation to obtain the second similarity vector.
  • the first similarity vector and the second similarity vector may adopt a cross-entropy operation to obtain the first sub-loss function.
  • the first sub-loss function can be obtained in the manner shown in the following formula (3):
  • L soft ( ⁇ ⁇ ) is the first sub-loss function
  • l ec () is the cross-entropy operation
  • is a positive integer greater than or equal to 2.
  • the first sub-loss function can be determined by the second similarity vector determined by the initial model and the first similarity vector, so that the second similarity vector can be supervised by the similarity vector determined by the first image positioning model.
  • the accuracy of determining the first sub-loss function is improved. Since the first similarity vector supervises the second similarity vector, the accuracy of the second image positioning model in the case of image positioning can also be improved.
  • a possible method for determining the first target loss function based on the first sub-loss function and the second sub-loss function may be:
  • the first sub-loss function and the second sub-loss function are operated to obtain the first target loss function.
  • the loss weighting factor corresponds to the first sub-loss function and the second sub-loss function.
  • a possible corresponding way of the weighted loss factor can be: the loss weighting factor of the first sub-loss function is The loss weighting factor of the second sub-loss function is 1.
  • the method of obtaining the first objective loss function is also the method shown in the following formula (5):
  • L( ⁇ ⁇ ) is the first objective loss function
  • Is the second sub-loss function Is the first sub-loss function
  • is the weighting factor
  • the image to be marked may also be marked to obtain the similarity label between the image to be marked and the corresponding sample positioning image, which may include steps D1 to D4: D1, receiving the image to be marked; D2, obtaining K second sample positioning images corresponding to the image to be marked; D3. Split each second sample positioning image in the K second sample positioning images to obtain N corresponding to each second sample positioning image Sub-second sample positioning images; D4. Using the second image positioning model, determine the similarity labels corresponding to the N sub-second sample positioning images corresponding to the image to be labeled and each second sample positioning image.
  • the method for obtaining the second sample positioning image can refer to the method for obtaining the first sample positioning image in the foregoing embodiment, which will not be repeated here.
  • Step D3 can refer to the method shown in the foregoing step A1, which will not be repeated here.
  • the second image positioning model may be used to calculate the similarity label corresponding to the N sub-second sample positioning images corresponding to each second sample positioning image to be marked.
  • the distance between the feature vector of the image to be labeled and the feature vector of the N sub-sample positioning images can be used to determine the similarity, and determine the similarity as the corresponding similarity label.
  • the second image positioning model is used to determine the similarity labels corresponding to the N sub-second sample positioning images corresponding to the N sub-second sample positioning images corresponding to each second sample positioning image.
  • the image positioning model obtained by training determines the similarity label, which can improve the accuracy of the obtained similarity label.
  • the first image positioning model includes a basic image positioning model
  • the basic image positioning model includes training the target image and the image with the highest similarity among the K first sample positioning images as a sample pair. Model.
  • it also includes a method for obtaining a first image positioning model, including steps E1 to E3, as follows: E1, positioning images according to the second image positioning model, the target image, and K first samples , Determine the second target loss function; E2, adjust the initial model according to the second target loss function to obtain the third image positioning model; E3, use the third image positioning model to replace the first image positioning model.
  • the method for implementing the above step E1 may refer to the method for determining the first target loss function in the foregoing embodiment, and the method for implementing E2 may refer to the method for determining the second image positioning model in the foregoing embodiment.
  • the second image positioning model may be used to locate the image to be detected to obtain positioning information corresponding to the image to be detected, which may include steps F1 to F2, as follows: F1, receiving the image to be detected; F2, according to As in the second image positioning model in any of the foregoing embodiments, the image to be detected is positioned to obtain positioning information corresponding to the image to be detected.
  • the second image positioning model is used to locate the image to be detected, so that the accuracy of obtaining positioning information can be improved.
  • the image positioning model is adjusted multiple times according to the loss function, and then the final image positioning model is obtained.
  • the detailed method is as follows:
  • the initial model is adjusted three times.
  • the K first sample images have been split (not shown in the figure), as shown in the figure
  • the similarity bar can be understood as similarity or similarity label. The higher the similarity label, the larger the similarity label, the lower the similarity, the smaller the similarity label.
  • the similarity label of the sub-first sample positioning image calculated by the model after three adjustments is higher than the similarity label of the sub-first sample positioning image calculated by the model after the first adjustment. precise.
  • FIG. 3 is a schematic flowchart of another method for acquiring an image positioning model according to an embodiment of the present application.
  • the image positioning model acquisition method includes steps 301 to 306, as follows:
  • the K first sample positioning images may be sample images determined according to the GPS positioning information of the target image, for example, they may be images within a preset range at the position indicated by the GPS positioning information of the target image, and may be Map images within 10 meters of the location, etc.
  • the preset range can be set through empirical values or historical data.
  • the initial model is trained by the sample set including the target image and K first sample positioning images and the first target loss function to obtain the second image positioning model.
  • the initial model is a model obtained after the initialization of the first image positioning model, which can be understood as initializing the model parameters in the first image positioning model to obtain the initial model.
  • the first image positioning model is a model obtained by training the initial model through a sample set including the target image and K first sample positioning images.
  • N sub-first-sample positioning images are obtained by splitting each of the K first-sample positioning images into N sub-first-sample positioning images.
  • the feature value and the feature value of the target image are used to determine the first similarity vector, so that the first similarity vector can be determined in a fine-grained manner, which improves the accuracy of the first similarity vector’s reflection of the sample, thereby improving the determination of the second image The accuracy of the positioning model.
  • FIG. 4 is a schematic flowchart of another method for acquiring an image positioning model according to an embodiment of the present application.
  • the method for acquiring an image positioning model includes steps 401 to 405, as follows:
  • the first image positioning model determine the similarity between the target image and the K first sample positioning images to obtain a first similarity vector, where K is an integer greater than 1; 402. According to the first similarity vector , Determine the first sub-loss function; 403. Determine the second sub-loss function according to the difficult negative sample image corresponding to the target image; 404. Determine the first target loss function according to the first sub-loss function and the second sub-loss function 405. Adjust the initial model according to the first objective loss function to obtain a second image positioning model, where the initial model is a model obtained after the first image positioning model is initialized.
  • the initial model is trained by the sample set including the target image and K first sample positioning images and the first target loss function to obtain the second image positioning model.
  • the initial model is a model obtained after the initialization of the first image positioning model, which can be understood as initializing the model parameters in the first image positioning model to obtain the initial model.
  • the first image positioning model is a model obtained by training the initial model through a sample set including the target image and K first sample positioning images.
  • the first target loss function can be determined according to the first sub-loss function determined by the first similarity vector and the second sub-loss function determined by the difficult negative sample image corresponding to the target image, so that the first target loss function can be determined according to the accurate first
  • the second sub-loss function determined by the similarity vector and the difficult negative sample image determines the first target loss function, thereby improving the accuracy of determining the first target loss function.
  • FIG. 5 is a schematic structural diagram of a terminal provided by an embodiment of the application. As shown in the figure, it includes a processor, an input device, an output device, and a memory. , The output device and the memory are connected to each other, wherein the memory is configured to store a computer program, the computer program includes program instructions, and the processor is configured to call program instructions. Step instructions.
  • Image retrieval methods in related technologies are more effective in large-scale image positioning.
  • the basis and key of image retrieval lies in how to learn more resolving image features through neural networks.
  • the data sets used for image positioning in related technologies only provide noisysy GPS annotations, but images with similar GPS do not necessarily cover similar scenes and may face different directions. Therefore, the training process can be regarded as weakly supervised training. Neural network learning needs to use more difficult positive samples, and related This is ignored in the algorithm.
  • the relevant data set can only provide noisy GPS tags, and cannot effectively identify the correct positive sample pair; related algorithms cannot effectively use the more difficult positive samples to train the network. This leads to the lack of robustness of the network; the related algorithms are supervised at the image level, which misleads the training of the non-overlapping regions in the positive sample pairs; only the image-level labels are used for training, and the image area-level supervision is insufficient; additional time-consuming is required And algorithms with limited accuracy are used for image verification, and positive samples are selected for training.
  • the embodiment of this application proposes a self-supervised image similarity algorithm. See Figure 2d.
  • the self-supervised image-region similarity label proposed in this application comes from the prediction of the previous generation network, and the network performs iterative training, and the previous generation network prediction It is used to supervise the next-generation network training, so that the capabilities of the network and the accuracy of the self-monitoring label can be optimized simultaneously.
  • the regional label is composed by splitting the picture into four 1/2-area and four 1/4-area images.
  • the self-enhanced tags can be effectively used for supervised learning of image similarity, and the accuracy of the tags and the ability of the network are simultaneously enhanced, so as to make full use of the more difficult positive samples for network training and enhance the robustness;
  • Image-level tags are fine-grained into regional-level tags, and the similarity between images and regions is learned through self-supervised methods, and the interference caused by noisy tags on network learning is reduced; it is advanced in image location based on image retrieval It can effectively self-monitor the image-region similarity, thereby enhancing the robustness of the network;
  • the neural network trained by this algorithm can be used to extract the features of the target image, retrieve it from the street view image, and determine the shooting location of the image; you can use
  • the embodiment of the application improves the robustness of the neural network under the condition of self-supervision.
  • the terminal includes hardware structures and/or software modules corresponding to each function.
  • this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiment of the present application may divide the terminal into functional units according to the foregoing method examples.
  • each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 6 is a schematic structural diagram of an image positioning model acquisition device provided by an embodiment of the application.
  • the device includes: a first determining unit 601 configured to determine the similarity between the target image and the K first sample positioning images according to the first image positioning model to obtain the first similarity vector , K is an integer greater than 1; the second determining unit 602 is configured to determine the first target loss function according to the first similarity vector; the adjustment unit 603 is configured to adjust the initial model according to the first target loss function to obtain the first The second image positioning model, the initial model is the model obtained after the first image positioning model is initialized.
  • the first determining unit 601 is configured to: split each first sample positioning image in the K first sample positioning images to obtain a position related to each first sample.
  • the second determining unit 602 is configured to: determine the first sub-loss function according to the first similarity vector; determine the second sub-loss function according to the difficult negative sample image corresponding to the target image; According to the first sub-loss function and the second sub-loss function, the first target loss function is determined.
  • the second determining unit 602 is configured to: obtain the difference between the target image and the K first sample positioning images according to the initial model To obtain the second similarity vector; determine the first sub-loss function according to the first similarity vector and the second similarity vector.
  • the second determining unit 602 is configured to: according to the correlation between the first sub-loss function and the second sub-loss function The loss weighting factor corresponding to the function is performed on the first sub-loss function and the second sub-loss function to obtain the first target loss function.
  • the device is further configured to: receive the image to be marked; obtain K second sample positioning images corresponding to the image to be marked; position each second sample in the K second sample positioning images The image is split to obtain N sub-second sample positioning images corresponding to each second sample positioning image; through the second image positioning model, the N sub-second sample positioning images corresponding to the image to be marked and each second sample positioning image are determined The similarity label corresponding to the sample positioning image.
  • the first image positioning model includes a basic image positioning model
  • the basic image positioning model includes training the target image with the image with the highest similarity among the K first sample positioning images as a sample pair. Model.
  • the device is further configured to: determine a second target loss function according to the second image positioning model, the target image, and K first sample positioning images; perform the initial model according to the second target loss function Adjust to obtain a third image positioning model; use the third image positioning model to replace the first image positioning model.
  • FIG. 7 is a schematic structural diagram of an image positioning device provided in an embodiment of the application.
  • the device includes: a receiving unit 701, configured to receive an image to be detected; a positioning unit 702, configured to locate an image to be detected according to the second image positioning model as in any one of the above-mentioned embodiments, to obtain and Location information corresponding to the image to be detected.
  • An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program configured for electronic data exchange, and the computer program enables a computer to execute any of the image positioning model acquisition methods described in the above method embodiments Or part or all of the steps of the image positioning method.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the computer program enables a computer to execute any image recorded in the above-mentioned method embodiments. Part or all of the steps of the positioning model acquisition method or the image positioning method.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each functional unit in each embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software program module.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned memory includes: U disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), mobile hard disk, magnetic disk, or optical disk and other media that can store program codes.
  • the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory, random access device, magnetic or optical disk, etc.
  • the similarity between the target image and the K first sample positioning images is determined by the first image positioning model to obtain the first similarity vector, and the first target loss function is determined according to the similarity vector.
  • the first target loss function adjusts the initial model to obtain the second image positioning model, so that the first target loss function can be determined according to the first image positioning model, the target image, and K first sample positioning images.
  • the model performs similarity supervision learning to obtain the second image positioning model, so that the accuracy of the second image positioning model in the case of image positioning can be improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Processing (AREA)

Abstract

一种图像定位模型获取方法及相关装置,其中,所述方法包括:根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,K为大于1的整数;根据所述第一相似度向量,确定第一目标损失函数;根据所述第一目标损失函数对初始模型进行调整,得到第二图像定位模型,所述初始模型为所述第一图像定位模型初始化后得到的模型。

Description

图像定位模型获取方法及装置、终端和存储介质
相关申请的交叉引用
本申请基于申请号为202010478436.7、申请日为2020年05月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引入的方式引入本申请。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种图像定位模型获取方法、装置、终端和存储介质。
背景技术
图像定位技术旨在通过在大规模数据库中匹配与目标图像最相似(近)的参考图像,并将其所标注的GPS(Global Positioning System,全球定位系统)作为目标图像的地理位置。图像定位技术目前主要通过三种方法进行实现,包括图像检索、3D(three dimensional,三维)结构匹配、按地理位置进行分类。
目前在进行模型训练的情况下为了避免被错误的正样本(指代GPS相近但画面没有重叠的图像样本对)误导,只选用排行第一的最佳匹配样本作为正样本进行训练,也就是在训练中只使用最简单的样本进行匹配,其中,最佳匹配样本可以是在特征空间中距离最近的样本。然而,只通过学习最佳匹配的样本无法获得对不同视角、光线等条件鲁棒性佳的网络,导致了训练后的网络模型在进行图像定位的情况下的准确性较低。
发明内容
本申请实施例提出了一种图像定位模型获取方法及装置、终端和存储介质。
本申请实施例提供了一种图像定位模型获取方法,该方法包括:根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,K为大于1的整数;根据第一相似度向量,确定第一目标损失函数;根据第一目标损失函数对初始模型进行调整,得到第二图像定位模型,初始模型为第一图像定位模型初始化后得到的模型。
本示例中,通过第一图像定位模型确定目标图像与K个第一样本定位图像之间的相似度,得到第一相似度向量,根据该相似度向量确定出第一目标损失函数,根据第一目标损失函数对初始模型进行调整,以得到第二图像定位模型,从而可以根据第一图像定位模型、目标图像和K个第一样本定位图像来确定的第一目标损失函数,对初始模型进行相似度监督学习以得到第二图像定位模型,从而可以提升第二图像定位模型在进行图像定位的情况下的准确性。
在一个可能的实现方式中,根据第一图像定位模型,确定目标图像与K个第一样本 定位图像之间的相似度,以得到第一相似度向量,包括:将K个第一样本定位图像中的每个第一样本定位图像进行拆分,以得到与每个第一样本定位图像对应的N个子第一样本定位图像;根据第一图像定位模型确定每个第一样本定位图像对应的N个子第一样本定位图像对应的特征值,以得到与每个第一样本定位图像对应的特征向量;根据第一图像定位模型确定目标图像的特征值;根据每个第一样本定位图像对应的特征向量和目标图像的特征值,确定第一相似度向量。
本示例中,通过将K个第一样本定位图像中的每个第一样本定位图像进行拆分得到N个子第一样本定位图像,根据该K*N个子第一样本定位图像的特征值和目标图像的特征值,确定第一相似度向量,从而可以细粒度的确定第一相似度向量,提升了第一相似度向量对样本进行反映的准确性,进而提升了确定第二图像定位模型的准确性。
在一个可能的实现方式中,根据第一相似度向量,确定第一目标损失函数,包括:根据第一相似度向量,确定出第一子损失函数;根据与目标图像对应的困难负样本图像,确定第二子损失函数;根据第一子损失函数和第二子损失函数,确定第一目标损失函数。
本示例中,可以根据第一相似度向量确定的第一子损失函数,以及目标图像对应的困难负样本图像确定的第二子损失函数确定出第一目标损失函数,从而可以根据准确的第一相似度向量以及困难负样本图像确定的第二子损失函数确定出第一目标损失函数,从而提升了第一目标损失函数确定的准确性。
在一个可能的实现方式中,根据第一相似度向量,确定出第一子损失函数,包括:根据初始模型获取目标图像与K个第一样本定位图像之间的相似度,以得到第二相似度向量;根据第一相似度向量和第二相似度向量,确定第一子损失函数。
本示例中,可以通过初始模型确定的第二相似度向量和第一相似度向量确定第一子损失函数,从而可以通过第一图像定位模型确定的相似度向量对第二相似度向量进行监督,提升了第一子损失函数确定的准确性,由于第一相似度向量对第二相似度向量进行监督,从而也可以提升第二图像定位模型在进行图像定位的准确性。
在一个可能的实现方式中,根据第一子损失函数和第二子损失函数,确定第一目标损失函数,包括:根据与第一子损失函数和第二子损失函数对应的损失加权因子,对第一子损失函数和第二子损失函数进行运算,以得到第一目标损失函数。
在一个可能的实现方式中,该方法还包括:接收待标记图像;获取与待标记图像对应的K个第二样本定位图像;将K个第二样本定位图像中的每个第二样本定位图像进行拆分,以得到与每个第二样本定位图像对应的N个子第二样本定位图像;通过第二图像定位模型,确定待标记图像与每个第二样本定位图像对应的N个子第二样本定位图像对应的相似度标签。
本示例中,通过第二图像定位模型确定待标记图像与每个第二样本定位图像对应的N个子第二样本定位图像对应的相似度标签,相对于相关技术方案中通过单一的样本对(最优样本对)训练得到的图像定位模型确定相似度标签,能够提升获取到的相似度标签的准确性。
在一个可能的实现方式中,第一图像定位模型包括基础图像定位模型,基础图像定位模型为采用目标图像与K个第一样本定位图像中相似度最高的图像作为样本对进行训练得到的模型。
在一个可能的实现方式中,方法还包括:根据第二图像定位模型、目标图像与K个第一样本定位图像,确定第二目标损失函数;根据第二目标损失函数对初始模型进行调整,得到第三图像定位模型;采用第三图像定位模型替换第一图像定位模型。
本申请实施例提供了一种图像定位方法,该方法包括:接收待检测图像;根据如上述任一项的第二图像定位模型对待检测图像进行定位,得到与待检测图像对应的定位信息。
本申请实施例提供了一种图像定位模型获取装置,该装置包括:第一确定单元,配置为根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,K为大于1的整数;第二确定单元,配置为根据第一相似度向量,确定第一目标损失函数;调整单元,配置为根据第一目标损失函数对初始模型进行调整,得到第二图像定位模型,初始模型为第一图像定位模型初始化后得到的模型。
本申请实施例提供了一种图像定位装置,该装置包括:接收单元,配置为接收待检测图像;定位单元,配置为根据如上述任一项的第二图像定位模型对待检测图像进行定位,得到与待检测图像对应的定位信息。
本申请实施例提供一种终端,包括处理器、输入设备、输出设备和存储器,处理器、输入设备、输出设备和存储器相互连接,其中,存储器配置为存储计算机程序,计算机程序包括程序指令,处理器被配置为调用程序指令,执行如本申请实施例图像定位模型获取方法或图像定位方法中的步骤指令。
本申请实施例提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储配置为电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行如本申请实施例图像定位模型获取方法或图像定位方法中所描述的部分或全部步骤。
本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例图像定位模型获取方法或图像定位方法中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本申请实施例。
根据下面参考附图对示例性实施例的详细说明,本申请的其它特征及方面将变得清楚。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出 了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1a为本申请实施例提供的一种图像定位模型的应用场景的示意图;
图1b为本申请实施例提供的一种图像定位模型获取方法的流程示意图;
图2a为本申请实施例提供的一种样本定位图像的示意图;
图2b为本申请实施例提供的一种第一样本定位图像的拆分示意图;
图2c为本申请实施例提供的另一种第一样本定位图像的拆分示意图;
图2d为本申请实施例提供的一种子第一样本定位图像的示意图;
图3为本申请实施例提供的另一种图像定位模型获取方法的流程示意图;
图4为本申请实施例提供的又一种图像定位模型获取方法的流程示意图;
图5为本申请实施例提供的一种终端的结构示意图;
图6为本申请实施例提供的一种图像定位模型获取装置的结构示意图;
图7为本申请实施例提供的一种图像定位装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本申请中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本申请所描述的实施例可以与其它实施例相结合。
本申请实施例所描述电子设备可以包括智能手机(如Android手机、iOS手机、Windows Phone手机等)、平板电脑、掌上电脑、行车记录仪、交通指挥平台、服务器、笔记本电脑、移动互联网设备(MID,Mobile Internet Devices)或穿戴式设备(如智能手表、蓝牙耳机)等,上述仅是举例,而非穷举,包含但不限于上述电子设备,电子设备还可以为服务器,或者,视频矩阵,在此不做限定,电子设备还可以为物联网设备。本申请实施例中终端与电子设备可以是相同的设备。
为了更好的理解本申请实施例提供的一种图像定位模型获取方法,下面首先对通过图像定位模型获取方法确定的图像定位模型的应用场景进行简要的介绍。如图1a所示,图像定位模型101a可以应用于电子设备102a中,用户在需要进行位置定位的情况下,例如,用户需要告知其他人自己当前所在的位置,用户可以通过电子设备102a采集当前所在位置附近的图像,例如,用户在xx大厦旁边,当前所在位置附近的图像可以是该xx大厦附近的区域的图像,得到待检测图像103a。电子设备通过图像定位模型101a,对待检测图像103a进行定位分析和计算,得到与该待检测图像对应的定位信息104a,定位信息为该待检测图像所反映的区域的位置信息(xx大厦),例如,位置信息可以是待检测图像103a中标志性建筑的位置信息等,标志性建筑可以是用户自己选择的建筑物,也可以是通过图像定位模型101a确定出的建筑物等,当然也可以是其它的标志的位置信息,此处仅为举例说明。如此可以通过图像定位模型101a来确定出用户当前所在的位置,给用户带来较高的便捷性。由于相关技术的图像定位模型在对图像进行定位的情况下其定位的准确性不高,其在进行模型训练的情况下,通常采用单一的样本对,对初始模型进行训练,因此需要对图像定位模型进行优化训练,从而来提升图像定位模型对图像进行定位的准确性。下面实施例中主要阐述了对初始模型进行调整以提升调整后得到的图像定位模型进行图像定位的准确性。
如图1b所示,图像定位模型获取方法应用于电子设备,该方法包括步骤101b至103b,如下:
101b、电子设备根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,K为大于1的整数。
其中,K个第一样本定位图像可以是根据目标图像的GPS(全球定位系统)定位信息确定的样本图像,例如,可以是目标图像GPS定位信息指示的位置处的预设范围内的图像,例如可以为在该指示的位置处的10米范围内的地图图像等。目标图像可以通过手机、电脑等移动终端进行采集得到,该目标图像可以用于确定样本对,以通过样本对对初始模型进行调整,即目标图像与K个第一样本定位图像为对初始模型进行调整的样本对。预设范围可以是通过经验值或历史数据设定。
K个第一样本定位图像与目标图像之间的相似度标签可以是0-1之间的值,也可以包括0或1。如图2a所示,图2a中示出了一种可能的目标图像和第一样本定位图像,其中,第一样本定位图像与目标图像之间的相似度标签包括有0.45、0.35等。
第一相似度向量中元素可以包括目标图像与第一样本定位图像之间的相似度和目标图像与第一样本定位图像进行拆分后的子图像之间的相似度。对第一样本定位图像进行拆分,可以得到多个子第一样本定位图像,在进行图像拆分的情况下,可以将图像拆分为面积相同的多个子第一样本定位图像,也可以拆分为面积不同的多个子第一样本定位图像等。
电子设备可以用于对初始模型进行调整,也可以用于对初始模型进行调整以及采用图像定位模型进行图像定位。
102b、电子设备根据第一相似度向量,确定第一目标损失函数。
可以根据第一相似度向量确定出对应的损失函数,并至少通过该对应的损失函数来确定出第一目标损失函数。
103b、电子设备根据第一目标损失函数对初始模型进行调整,得到第二图像定位模型,初始模型为第一图像定位模型初始化后得到的模型。
通过包括目标图像以及K个第一样本定位图像的样本集和第一目标损失函数对初始模型进行训练,以得到第二图像定位模型。初始模型为第一图像定位模型初始化后得到的模型,可以理解为,将第一图像定位模型中的模型参数进行初始化,以得到初始模型。第二图像定位模型为通过包括目标图像以及K个第一样本定位图像的样本集对初始模型进行训练得到的模型。
本示例中,通过第一图像定位模型确定目标图像与K个第一样本定位图像之间的相似度,得到第一相似度向量,根据该相似度向量确定出第一目标损失函数,根据第一目标损失函数对初始模型进行调整,以得到第二图像定位模型,从而可以根据第一图像定位模型、目标图像和K个第一样本定位图像来确定的第一目标损失函数,对初始模型进行相似度监督学习以得到第二图像定位模型,从而可以提升第二图像定位模型在进行图像定位的准确性。
在一个可能的实施例中,一种可能的根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量的方法包括步骤A1至A4,如下:A1、将K个第一样本定位图像中的每个第一样本定位图像进行拆分,以得到与每个第一样本定位图像对应的N个子第一样本定位图像;A2、根据第一图像定位模型确定每个第一样本定位图像对应的N个子第一样本定位图像对应的特征值,以得到与每个第一样本定位图像对应的特征向量;A3、根据第一图像定位模型确定目标图像的特征值;A4、根据每个第一样本定位图像对应的特征向量和目标图像的特征值,确定第一相似度向量。
将第一样本定位图像进行拆分的情况下,可以将图像拆分为面积相同的多个子第一样本定位图像,也可以拆分为面积不同的多个子第一样本定位图像等。一种可能的拆分方式可以为:将第一样本定位图像拆分为面积相等的两个子第一样本定位图像,以及将第一样本定位图像拆分为面积相等的4个子第一样本定位图像。可以参见图2b所示,可以将第一样本定位图像拆分为上下两个子第一样本定位图像,也可以将第一样本定位图像拆分为左右两个子第一样本定位图像;如图2c所示,可以将第一样本定位图像拆分为4个面积相等的子第一样本定位图像。
N个子第一样本图像可以包括多种不同的拆分方式得到的子第一样本图像,例如可以是如图2b和图2c的拆分方式得到的所有子第一样本定位图像,此时N=8,当然N还可以是其它任意的数值,此处仅为举例说明,不作限定。
在确定特征向量以及目标图像的特征值的情况下,可以通过第一图像定位模型进行 计算得到。每个第一样本定位图像对应的特征向量可以表示为:
Figure PCTCN2020113099-appb-000001
其中,
Figure PCTCN2020113099-appb-000002
为第i个第一样本定位图像的第一个子第一样本定位图像的特征值。
可以通过交叉熵的计算方式来获取第一相似度向量,可以为通过如下公式(1)所示的方法来确定第一相似度向量:
Figure PCTCN2020113099-appb-000003
其中,
Figure PCTCN2020113099-appb-000004
为第一相似度向量,softmax为归一化运算,τ ω为超参(温度系数),
Figure PCTCN2020113099-appb-000005
为目标图像的特征值,
Figure PCTCN2020113099-appb-000006
为第一样本定位图像p1的特征值,
Figure PCTCN2020113099-appb-000007
为第一样本图像p1的第一个子第一样本定位图像的特征值,
Figure PCTCN2020113099-appb-000008
为第一样本定位图像pk的特征值,
Figure PCTCN2020113099-appb-000009
为第一样本图像pk的第八个子第一样本定位图像的特征值。
本示例中,通过将K个第一样本定位图像中的每个第一样本定位图像进行拆分得到N个子第一样本定位图像,根据该K*N个子第一样本定位图像的特征值和目标图像的特征值,确定第一相似度向量,从而可以细粒度的确定第一相似度向量,提升了第一相似度向量对样本进行反映的准确性,进而提升了确定第二图像定位模型的准确性。
在一个可能的实施例中,一种可能的根据第一相似度向量,确定第一目标损失函数的方法包括步骤B1至B3,如下:B1、根据第一相似度向量,确定出第一子损失函数;B2、根据与目标图像对应的困难负样本图像,确定第二子损失函数;B3、根据第一子损失函数和第二子损失函数,确定第一目标损失函数。
其中,可以根据第一图像定位模型确定的目标图像与第一样本定位图像之间的相似度向量,即第一相似度向量来确定第一子损失函数。与目标图像对应的困难负样本图像可以理解为,与目标图像对应的负样本图像中,相似度低于预设阈值的负样本图像,预设阈值可以通过经验值或历史数据设定。在确定第二子损失函数的情况下,可以通过如下公式(2)所示的方法来确定第二子损失函数:
Figure PCTCN2020113099-appb-000010
其中,
Figure PCTCN2020113099-appb-000011
为第二子损失函数,
Figure PCTCN2020113099-appb-000012
为相似度标签最高的正样本图像的特征 值,
Figure PCTCN2020113099-appb-000013
为相似度标签最低的负样本图像的特征值,K为第一样本定位图像的个数。
可以对第一子损失函数和第二子损失函数进行加权运算,以得到第一目标损失函数。
本示例中,可以根据第一相似度向量确定的第一子损失函数,以及目标图像对应的困难负样本图像确定的第二子损失函数确定出第一目标损失函数,从而可以根据准确的第一相似度向量以及困难负样本图像确定的第二子损失函数确定出第一目标损失函数,从而提升了第一目标损失函数确定的准确性。
在一个可能的实施例中,一种可能的根据第一相似度向量,确定出第一子损失函数的方法包括步骤C1至C2,如下:C1、根据初始模型获取目标图像与K个第一样本定位图像之间的相似度,以得到第二相似度向量;C2、根据第一相似度向量和第二相似度向量,确定第一子损失函数。
获取第二相似度向量的方法可以参考前述实施例中获取第一相似度向量的方法,在实施的情况下采用初始模型进行计算,以得到第二相似度向量。
第一相似度向量和第二相似度向量可以采用交叉熵的运算,获取到第一子损失函数。例如,可以通过如下公式(3)所示的方式获取到第一子损失函数:
Figure PCTCN2020113099-appb-000014
其中,L softω)为第一子损失函数,
Figure PCTCN2020113099-appb-000015
为第二相似度向量,
Figure PCTCN2020113099-appb-000016
为第一相似度向量,l ec()为交叉熵运算,ω为大于或等于2的正整数。上述公式在用于表示多次调整的情况下ω可以理解为调整的次数。
l ec()可以通过如下公式(4)表示:
Figure PCTCN2020113099-appb-000017
其中,y,
Figure PCTCN2020113099-appb-000018
为需要进行交叉熵运算的元素。
本示例中,可以通过初始模型确定的第二相似度向量和第一相似度向量确定第一子损失函数,从而可以通过第一图像定位模型确定的相似度向量对第二相似度向量进行监督,提升了第一子损失函数确定的准确性,由于第一相似度向量对第二相似度向量进行监督,从而也可以提升第二图像定位模型在进行图像定位的情况下的准确性。
在一个可能的实施例中,一种可能的根据第一子损失函数和第二子损失函数,确定第一目标损失函数的方法可以为:
根据与第一子损失函数和第二子损失函数对应的损失加权因子,对第一子损失函数和第二子损失函数进行运算,以得到第一目标损失函数。
损失加权因子与第一子损失函数和第二子损失函数相对应,一种可能的加权损失因 子的对应方式可以为:第一子损失函数的损失加权因子为
Figure PCTCN2020113099-appb-000019
第二子损失函数的损失加权因子为1。
获取第一目标损失函数的方法也是通过如下公式(5)所示的方法:
Figure PCTCN2020113099-appb-000020
其中,L(θ ω)为第一目标损失函数,
Figure PCTCN2020113099-appb-000021
为第二子损失函数,
Figure PCTCN2020113099-appb-000022
为第一子损失函数,λ为加权因子。
在一个可能的实施例中,还可以对待标记图像进行标记,得到待标记图像与对应的样本定位图像之间的相似度标签,可以包括步骤D1至D4:D1、接收待标记图像;D2、获取与待标记图像对应的K个第二样本定位图像;D3、将K个第二样本定位图像中的每个第二样本定位图像进行拆分,以得到与每个第二样本定位图像对应的N个子第二样本定位图像;D4、通过第二图像定位模型,确定待标记图像与每个第二样本定位图像对应的N个子第二样本定位图像对应的相似度标签。
获取第二样本定位图像的方法可以参考前述实施例中获取第一样本定位图像的获取方法,此处不再赘述。步骤D3可以参照前述步骤A1所示的方法,此处不再赘述。
在获取相似度标签的情况下,可以通过第二图像定位模型进行计算,以得到待标记图像与每个第二样本定位图像对应的N个子第二样本定位图像对应的相似度标签。在计算的情况下,可以是通过待标记图像的特征向量与N个子样本定位图像的特征向量之间的距离,确定相似度,以及将该相似度确定为对应的相似度标签。
本示例中,通过第二图像定位模型确定待标记图像与每个第二样本定位图像对应的N个子第二样本定位图像对应的相似度标签,相对于相关方案中通过单一的样本对(最优样本对)训练得到的图像定位模型确定相似度标签,能够提升获取到的相似度标签的准确性。
在一个可能的实施例中,第一图像定位模型包括基础图像定位模型,基础图像定位模型为包括采用目标图像与K个第一样本定位图像中相似度最高的图像作为样本对进行训练得到的模型。
在一个可能的实施例中,还包括有获取第一图像定位模型的方法,包括步骤E1至E3,如下所示:E1、根据第二图像定位模型、目标图像与K个第一样本定位图像,确定第二目标损失函数;E2、根据第二目标损失函数对初始模型进行调整,得到第三图像定位模型;E3、采用第三图像定位模型替换第一图像定位模型。
上述步骤E1的实现方法可以参照前述实施例中确定第一目标损失函数,E2的实现方法可以参照前述实施例中确定第二图像定位模型的方法。
在一个可能的实施例中,可以采用第二图像定位模型对待检测图像进行定位,得到与待检测图像对应的定位信息,可以包括步骤F1至F2,如下:F1、接收待检测图像; F2、根据如上述任一实施例中的第二图像定位模型对待检测图像进行定位,得到与待检测图像对应的定位信息。
本示例中,通过第二图像定位模型来对待检测图像进行定位,从而可以提升定位信息获取的准确性。
在一个可能的实现方式中,包括有多次根据损失函数对图像定位模型进行调整,然后获取到最终的图像定位模型,详细方法如下:
采用包括目标图像与K个第一样本定位图像中相似度最高的图像作为样本对,对初始模型进行训练,得到基础图像定位模型;采用基础图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,根据该第一相似度向量确定出第一子损失函数;根据初始模型、目标图像和目标图像对应的困难负样本,确定第二子损失函数;对第一子损失函数和第二子损失函数进行加权运算,得到第一目标损失函数,通过第一目标损失函数对初始模型进行调整,得到第二图像定位模型;再次根据第二图像定位模型、目标图像与K个第一样本定位图像,确定第二目标损失函数,并根据第二目标损失函数对初始模型进行调整训练,得到第三图像定位模型,以此,重复执行上述步骤,从而可以获取到最终的图像定位模型。如2d中所示,为三次对初始模型进行调整,在第一次进行调整的情况下,已经将K个第一样本图像进行了拆分(图中未示出),图中示出的相似度条,可以理解为相似度也可以理解为相似度标签,相似度高相似度标签的值越大,相似度越低相似度标签的值越小。图2d中,在进行三次调整后的模型计算得到的子第一样本定位图像的相似度标签,相对于第一次调整后的模型计算得到的子第一样本定位图像的相似度标签更加准确。
请参阅图3,图3为本申请实施例提供了另一种图像定位模型获取方法的流程示意图。如图3所示,图像定位模型获取方法包括步骤301至306,如下:
301、将K个第一样本定位图像中的每个第一样本定位图像进行拆分,以得到与每个第一样本定位图像对应的N个子第一样本定位图像,K为大于1的整数;
K个第一样本定位图像可以是根据目标图像的GPS定位信息确定的样本图像,例如,可以是目标图像GPS定位信息指示的位置处的预设范围内的图像,可以为,在该指示的位置处的10米范围内的地图图像等。预设范围可以是通过经验值或历史数据设定。
302、根据第一图像定位模型确定每个第一样本定位图像对应的N个子第一样本定位图像对应的特征值,以得到与每个第一样本定位图像对应的特征向量;
特征向量中包括有多个元素。
303、根据第一图像定位模型确定目标图像的特征值;
304、根据每个第一样本定位图像对应的特征向量和目标图像的特征值,确定第一相似度向量;
305、根据第一相似度向量,确定第一目标损失函数;
306、根据第一目标损失函数对初始模型进行调整,得到第二图像定位模型,初始 模型为第一图像定位模型初始化后得到的模型。
通过包括目标图像以及K个第一样本定位图像的样本集和第一目标损失函数对初始模型进行训练,以得到第二图像定位模型。初始模型为第一图像定位模型初始化后得到的模型,可以理解为,将第一图像定位模型中的模型参数进行初始化,以得到初始模型。第一图像定位模型为通过包括目标图像以及K个第一样本定位图像的样本集对初始模型进行训练得到的模型。
本示例中,通过将K个第一样本定位图像中的每个第一样本定位图像进行拆分得到N个子第一样本定位图像,根据该K*N个子第一样本定位图像的特征值和目标图像的特征值,确定第一相似度向量,从而可以细粒度的确定第一相似度向量,提升了第一相似度向量对样本进行反映的准确性,进而提升了确定第二图像定位模型的准确性。
请参阅图4,图4为本申请实施例提供了另一种图像定位模型获取方法的流程示意图。如图4所示,图像定位模型获取方法包括步骤401至405,如下:
401、根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,K为大于1的整数;402、根据第一相似度向量,确定出第一子损失函数;403、根据与目标图像对应的困难负样本图像,确定第二子损失函数;404、根据第一子损失函数和第二子损失函数,确定第一目标损失函数;405、根据第一目标损失函数对初始模型进行调整,得到第二图像定位模型,初始模型为第一图像定位模型初始化后得到的模型。
通过包括目标图像以及K个第一样本定位图像的样本集和第一目标损失函数对初始模型进行训练,以得到第二图像定位模型。初始模型为第一图像定位模型初始化后得到的模型,可以理解为,将第一图像定位模型中的模型参数进行初始化,以得到初始模型。第一图像定位模型为通过包括目标图像以及K个第一样本定位图像的样本集对初始模型进行训练得到的模型。
本示例中,可以根据第一相似度向量确定的第一子损失函数,以及目标图像对应的困难负样本图像确定的第二子损失函数确定出第一目标损失函数,从而可以根据准确的第一相似度向量以及困难负样本图像确定的第二子损失函数确定出第一目标损失函数,从而提升了第一目标损失函数确定的准确性。
与上述实施例一致的,请参阅图5,图5为本申请实施例提供的一种终端的结构示意图,如图所示,包括处理器、输入设备、输出设备和存储器,处理器、输入设备、输出设备和存储器相互连接,其中,存储器配置为存储计算机程序,计算机程序包括程序指令,处理器被配置为调用程序指令,上述程序包括配置为执行上述图像定位模型获取方法和图像定位方法中的步骤的指令。
相关技术中图像检索的方法在大规模图像定位中更为有效,图像检索的基础和关键 在于如何通过神经网络学习更有分辨力的图像特征,相关技术中用于图像定位的数据集只提供带有噪声的GPS标注,然而带有相似GPS的图像不一定涵盖相似的场景,可能面向不同方向,所以训练过程可以看作弱监督的训练,神经网络的学习需要采用较难的正样本,而相关算法中忽略了这一点。
此外,即使是正确的正样本对,它们也大概率存在没有画面重叠的区域。在基于图像级别的标注进行学习的情况下,会要求两张图片的所有区域都趋于相似,这对没有重叠的部分来说是一种误导。所以,我们需要将图像级别的标注细化为区域级别的标注,而相关的算法忽略了这一点。
在以图像检索为基础的图像定位技术中,相关的数据集仅能够提供带有噪声的GPS标签,无法有效识别正确的正样本对;相关的算法无法有效地利用较难的正样本训练网络,导致网络的鲁棒性不足;相关的算法针对图像级别进行监督,误导了正样本对中无重叠区域的训练;仅利用图像级标签进行训练,对图像区域级的监督不足;需要通过额外耗时且精度有限的算法进行图像验证,挑选正样本用于训练。
本申请实施例提出一种自监督图像相似性的算法,参见图2d,本申请提出的自监督图像-区域的相似性标签来自于上一代网络的预测,网络进行迭代训练,上一代的网络预测用于监督下一代的网络训练,从而网络的能力与自监测标签的精确度可以同步优化。其中区域级的标签通过将图片拆分成四张1/2区域和四张1/4区域的图像组成。
采用本申请实施例,能够有效地利用自我增强的标签进行图像相似性的监督学习,标签的精度和网络的能力同步增强,从而充分利用较难的正样本进行网络训练,增强鲁棒性;将图像级标签细粒化为区域级标签,通过自监督的方法学习图像与区域之间的相似性,减轻含有噪声的标签对网络学习造成的干扰;在图像检索为基础的图像定位问题上达到先进的识别度;可以有效的自监测图像-区域相似性,从而增强网络鲁棒性;可以利用本算法训练的神经网络提取目标图像特征,在街景图像中进行检索,判断图像所属拍摄位置;可以利用本申请实施例,在自监督的情况下提升神经网络的鲁棒性。
上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,终端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所提供的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对终端进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现的情况下可以有另外的划分方式。
与上述一致的,请参阅图6,图6为本申请实施例提供的一种图像定位模型获取装置的结构示意图。如图6所示,该装置包括:第一确定单元601,配置为根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,K为大于1的整数;第二确定单元602,配置为根据第一相似度向量,确定第一目标损失函数;调整单元603,配置为根据第一目标损失函数对初始模型进行调整,得到第二图像定位模型,初始模型为第一图像定位模型初始化后得到的模型。
在一种可能的实现方式中,第一确定单元601配置为:将K个第一样本定位图像中的每个第一样本定位图像进行拆分,以得到与每个第一样本定位图像对应的N个子第一样本定位图像;根据第一图像定位模型确定每个第一样本定位图像对应的N个子第一样本定位图像对应的特征值,以得到与每个第一样本定位图像对应的特征向量;根据第一图像定位模型确定目标图像的特征值;根据每个第一样本定位图像对应的特征向量和目标图像的特征值,确定第一相似度向量。
在一种可能的实现方式中,第二确定单元602配置为:根据第一相似度向量,确定出第一子损失函数;根据与目标图像对应的困难负样本图像,确定第二子损失函数;根据第一子损失函数和第二子损失函数,确定第一目标损失函数。
在一个可能的实现方式中,在根据第一相似度向量,确定出第一子损失函数方面,第二确定单元602配置为:根据初始模型获取目标图像与K个第一样本定位图像之间的相似度,以得到第二相似度向量;根据第一相似度向量和第二相似度向量,确定第一子损失函数。
在一个可能的实现方式中,在根据第一子损失函数和第二子损失函数,确定第一目标损失函数方面,第二确定单元602配置为:根据与第一子损失函数和第二子损失函数对应的损失加权因子,对第一子损失函数和第二子损失函数进行运算,以得到第一目标损失函数。
在一个可能的实现方式中,该装置还配置为:接收待标记图像;获取与待标记图像对应的K个第二样本定位图像;将K个第二样本定位图像中的每个第二样本定位图像进行拆分,以得到与每个第二样本定位图像对应的N个子第二样本定位图像;通过第二图像定位模型,确定待标记图像与每个第二样本定位图像对应的N个子第二样本定位图像对应的相似度标签。
在一个可能的实现方式中,第一图像定位模型包括基础图像定位模型,基础图像定位模型为包括采用目标图像与K个第一样本定位图像中相似度最高的图像作为样本对进行训练得到的模型。
在一个可能的实现方式中,该装置还配置为:根据第二图像定位模型、目标图像与K个第一样本定位图像,确定第二目标损失函数;根据第二目标损失函数对初始模型进行调整,得到第三图像定位模型;采用第三图像定位模型替换第一图像定位模型。
请参阅图7,图7为本申请实施例提供了一种图像定位装置的结构示意图。如图7所示,该装置包括:接收单元701,配置为接收待检测图像;定位单元702,配置为根 据如上述实施例中任一项的第二图像定位模型对待检测图像进行定位,得到与待检测图像对应的定位信息。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储配置为电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种图像定位模型获取方法或图像定位方法的部分或全部步骤。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种图像定位模型获取方法或图像定位方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在申请明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器、随机存取器、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。
工业实用性
本实施例中,通过第一图像定位模型确定目标图像与K个第一样本定位图像之间的相似度,得到第一相似度向量,根据该相似度向量确定出第一目标损失函数,根据第一目标损失函数对初始模型进行调整,以得到第二图像定位模型,从而可以根据第一图像定位模型、目标图像和K个第一样本定位图像来确定的第一目标损失函数,对初始模型进行相似度监督学习以得到第二图像定位模型,从而可以提升第二图像定位模型在进行图像定位的情况下的准确性。

Claims (21)

  1. 一种图像定位模型获取方法,所述方法包括:
    根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,K为大于1的整数;
    根据所述第一相似度向量,确定第一目标损失函数;
    根据所述第一目标损失函数对初始模型进行调整,得到第二图像定位模型,所述初始模型为所述第一图像定位模型初始化后得到的模型。
  2. 根据权利要求1所述的方法,其中,所述根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,包括:
    将所述K个第一样本定位图像中的每个第一样本定位图像进行拆分,以得到与所述每个第一样本定位图像对应的N个子第一样本定位图像;
    根据所述第一图像定位模型确定所述每个第一样本定位图像对应的N个子第一样本定位图像对应的特征值,以得到与所述每个第一样本定位图像对应的特征向量;
    根据所述第一图像定位模型确定所述目标图像的特征值;
    根据所述每个第一样本定位图像对应的特征向量和所述目标图像的特征值,确定所述第一相似度向量。
  3. 根据权利要求1或2所述的方法,其中,所述根据所述第一相似度向量,确定第一目标损失函数,包括:
    根据所述第一相似度向量,确定出第一子损失函数;
    根据与所述目标图像对应的困难负样本图像,确定第二子损失函数;
    根据所述第一子损失函数和所述第二子损失函数,确定所述第一目标损失函数。
  4. 根据权利要求3所述的方法,其中,所述根据所述第一相似度向量,确定出第一子损失函数,包括:
    根据所述初始模型获取所述目标图像与所述K个第一样本定位图像之间的相似度,以得到第二相似度向量;
    根据所述第一相似度向量和所述第二相似度向量,确定所述第一子损失函数。
  5. 根据权利要求3或4所述的方法,其中,所述根据所述第一子损失函数和所述第二子损失函数,确定所述第一目标损失函数,包括:
    根据与所述第一子损失函数和所述第二子损失函数对应的损失加权因子,对所述第一子损失函数和所述第二子损失函数进行运算,以得到所述第一目标损失函数。
  6. 根据权利要求1至5任一项所述的方法,其中,所述方法还包括:
    接收待标记图像;
    获取与所述待标记图像对应的K个第二样本定位图像;
    将K个第二样本定位图像中的每个第二样本定位图像进行拆分,以得到与所述每个 第二样本定位图像对应的N个子第二样本定位图像;
    通过所述第二图像定位模型,确定所述待标记图像与所述每个第二样本定位图像对应的N个子第二样本定位图像对应的相似度标签。
  7. 根据权利要求1至6任一项所述的方法,其中,所述第一图像定位模型包括基础图像定位模型,所述基础图像定位模型为包括采用所述目标图像与所述K个第一样本定位图像中相似度最高的图像作为样本对进行训练得到的模型。
  8. 根据权利要求1至7任一项所述的方法,其中,所述方法还包括:
    根据所述第二图像定位模型、所述目标图像与所述K个第一样本定位图像,确定第二目标损失函数;
    根据所述第二目标损失函数对所述初始模型进行调整,得到第三图像定位模型;
    采用所述第三图像定位模型替换所述第一图像定位模型。
  9. 一种图像定位方法,,所述方法包括:
    接收待检测图像;
    根据如权利要求1至8任一项所述的第二图像定位模型对所述待检测图像进行定位,得到与所述待检测图像对应的定位信息。
  10. 一种图像定位模型获取装置,所述装置包括:第一确定单元,配置为根据第一图像定位模型,确定目标图像与K个第一样本定位图像之间的相似度,以得到第一相似度向量,K为大于1的整数;第二确定单元,配置为根据所述第一相似度向量,确定第一目标损失函数;调整单元,配置为根据所述第一目标损失函数对初始模型进行调整,得到第二图像定位模型,所述初始模型为所述第一图像定位模型初始化后得到的模型。
  11. 根据权利要求10所述的装置,其中,所述第一确定单元配置为:将所述K个第一样本定位图像中的每个第一样本定位图像进行拆分,以得到与所述每个第一样本定位图像对应的N个子第一样本定位图像;根据所述第一图像定位模型确定所述每个第一样本定位图像对应的N个子第一样本定位图像对应的特征值,以得到与所述每个第一样本定位图像对应的特征向量;根据所述第一图像定位模型确定所述目标图像的特征值;根据所述每个第一样本定位图像对应的特征向量和所述目标图像的特征值,确定所述第一相似度向量。
  12. 根据权利要求10或11所述的装置,其中,所述第二确定单元配置为:根据所述第一相似度向量,确定出第一子损失函数;根据与所述目标图像对应的困难负样本图像,确定第二子损失函数;根据所述第一子损失函数和所述第二子损失函数,确定所述第一目标损失函数。
  13. 根据权利要求12所述的装置,其中,所述第二确定单元配置为:根据初始模型获取目标图像与K个第一样本定位图像之间的相似度,以得到第二相似度向量;根据第一相似度向量和第二相似度向量,确定第一子损失函数。
  14. 根据权利要求12或13所述的装置,其中,所述第二确定单元配置为:根据与第一子损失函数和第二子损失函数对应的损失加权因子,对第一子损失函数和第二子损 失函数进行运算,以得到第一目标损失函数。
  15. 根据权利要求10至14任一项所述的装置,其中,所述装置还配置为:接收待标记图像;获取与待标记图像对应的K个第二样本定位图像;将K个第二样本定位图像中的每个第二样本定位图像进行拆分,以得到与每个第二样本定位图像对应的N个子第二样本定位图像;通过第二图像定位模型,确定待标记图像与每个第二样本定位图像对应的N个子第二样本定位图像对应的相似度标签。
  16. 根据权利要求10至15任一项所述的装置,其中,所述第一图像定位模型包括基础图像定位模型,基础图像定位模型为包括采用目标图像与K个第一样本定位图像中相似度最高的图像作为样本对进行训练得到的模型。
  17. 根据权利要求10至16任一项所述的装置,其中,所述装置还配置为:根据第二图像定位模型、目标图像与K个第一样本定位图像,确定第二目标损失函数;根据第二目标损失函数对初始模型进行调整,得到第三图像定位模型;采用第三图像定位模型替换第一图像定位模型。
  18. 一种图像定位装置,所述装置包括:接收单元,配置为接收待检测图像;定位单元,配置为根据权利要求1至8任一项所述的方法中的第二图像定位模型对待检测图像进行定位,得到与待检测图像对应的定位信息。
  19. 一种终端,包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器配置为存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置为调用所述程序指令,执行如权利要求1至9任一项所述的方法。
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令在被处理器执行的情况下使所述处理器执行如权利要求1至9任一项所述的方法。
  21. 一种计算机程序产品,包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备中的处理器执行如权利要求1至9任一项所述的方法。
PCT/CN2020/113099 2020-05-29 2020-09-02 图像定位模型获取方法及装置、终端和存储介质 WO2021237973A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010478436.7 2020-05-29
CN202010478436.7A CN111522988B (zh) 2020-05-29 2020-05-29 图像定位模型获取方法及相关装置

Publications (1)

Publication Number Publication Date
WO2021237973A1 true WO2021237973A1 (zh) 2021-12-02

Family

ID=71909243

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/113099 WO2021237973A1 (zh) 2020-05-29 2020-09-02 图像定位模型获取方法及装置、终端和存储介质

Country Status (3)

Country Link
CN (1) CN111522988B (zh)
TW (1) TWI780563B (zh)
WO (1) WO2021237973A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522988B (zh) * 2020-05-29 2022-07-15 深圳市商汤科技有限公司 图像定位模型获取方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190197134A1 (en) * 2017-12-22 2019-06-27 Oracle International Corporation Computerized geo-referencing for images
CN110070579A (zh) * 2019-03-16 2019-07-30 平安科技(深圳)有限公司 基于图像检测的定位方法、装置、设备和存储介质
CN110347854A (zh) * 2019-06-13 2019-10-18 西安理工大学 基于目标定位的图像检索方法
CN110472092A (zh) * 2019-07-15 2019-11-19 清华大学 一种街景图片的地理定位方法及系统
CN111522988A (zh) * 2020-05-29 2020-08-11 深圳市商汤科技有限公司 图像定位模型获取方法及相关装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9792532B2 (en) * 2013-06-28 2017-10-17 President And Fellows Of Harvard College Systems and methods for machine learning enhanced by human measurements
CN106202329B (zh) * 2016-07-01 2018-09-11 北京市商汤科技开发有限公司 样本数据处理、数据识别方法和装置、计算机设备
CN107145900B (zh) * 2017-04-24 2019-07-26 清华大学 基于一致性约束特征学习的行人再识别方法
CN108898643B (zh) * 2018-06-15 2022-03-11 广东数相智能科技有限公司 图像生成方法、装置与计算机可读存储介质
AU2018101640A4 (en) * 2018-11-01 2018-12-06 Macau University Of Science And Technology A system and method for image processing
CN110413812B (zh) * 2019-08-06 2022-04-26 北京字节跳动网络技术有限公司 神经网络模型的训练方法、装置、电子设备及存储介质
CN110532417B (zh) * 2019-09-02 2022-03-29 河北省科学院应用数学研究所 基于深度哈希的图像检索方法、装置及终端设备
CN111178249A (zh) * 2019-12-27 2020-05-19 杭州艾芯智能科技有限公司 人脸比对的方法、装置、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190197134A1 (en) * 2017-12-22 2019-06-27 Oracle International Corporation Computerized geo-referencing for images
CN110070579A (zh) * 2019-03-16 2019-07-30 平安科技(深圳)有限公司 基于图像检测的定位方法、装置、设备和存储介质
CN110347854A (zh) * 2019-06-13 2019-10-18 西安理工大学 基于目标定位的图像检索方法
CN110472092A (zh) * 2019-07-15 2019-11-19 清华大学 一种街景图片的地理定位方法及系统
CN111522988A (zh) * 2020-05-29 2020-08-11 深圳市商汤科技有限公司 图像定位模型获取方法及相关装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIXIAO GE; DAPENG CHEN; HONGSHENG LI: "Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 6 January 2020 (2020-01-06), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081572662 *

Also Published As

Publication number Publication date
CN111522988A (zh) 2020-08-11
CN111522988B (zh) 2022-07-15
TW202145075A (zh) 2021-12-01
TWI780563B (zh) 2022-10-11

Similar Documents

Publication Publication Date Title
CN111709409B (zh) 人脸活体检测方法、装置、设备及介质
EP3940638B1 (en) Image region positioning method, model training method, and related apparatus
WO2019100724A1 (zh) 训练多标签分类模型的方法和装置
US9984280B2 (en) Object recognition system using left and right images and method
WO2019100723A1 (zh) 训练多标签分类模型的方法和装置
CN110209859B (zh) 地点识别及其模型训练的方法和装置以及电子设备
CN106203242B (zh) 一种相似图像识别方法及设备
US9098888B1 (en) Collaborative text detection and recognition
TWI822987B (zh) 用於確定影像的深度資訊的系統及方法
CN110765882B (zh) 一种视频标签确定方法、装置、服务器及存储介质
CN111667001B (zh) 目标重识别方法、装置、计算机设备和存储介质
JP6997369B2 (ja) プログラム、測距方法、及び測距装置
CN111414888A (zh) 低分辨率人脸识别方法、系统、装置及存储介质
CN110968734A (zh) 一种基于深度度量学习的行人重识别方法及装置
KR20220004009A (ko) 키 포인트 검출 방법, 장치, 전자 기기 및 저장 매체
CN111291887A (zh) 神经网络的训练方法、图像识别方法、装置及电子设备
CN111582027B (zh) 身份认证方法、装置、计算机设备和存储介质
CN111382791B (zh) 深度学习任务处理方法、图像识别任务处理方法和装置
CN112819011A (zh) 对象间关系的识别方法、装置和电子系统
US10013630B1 (en) Detection and recognition of objects lacking textures
WO2021237973A1 (zh) 图像定位模型获取方法及装置、终端和存储介质
CN112036362A (zh) 图像处理方法、装置、计算机设备和可读存储介质
US20200242410A1 (en) System for Training Descriptor with Active Sample Selection
CN116048682A (zh) 一种终端系统界面布局对比方法及电子设备
CN115618099A (zh) 神经网络架构搜索的方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937387

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20937387

Country of ref document: EP

Kind code of ref document: A1