CN116977437A - Image geographic position positioning method and device and electronic equipment - Google Patents

Image geographic position positioning method and device and electronic equipment Download PDF

Info

Publication number
CN116977437A
CN116977437A CN202311226356.2A CN202311226356A CN116977437A CN 116977437 A CN116977437 A CN 116977437A CN 202311226356 A CN202311226356 A CN 202311226356A CN 116977437 A CN116977437 A CN 116977437A
Authority
CN
China
Prior art keywords
target image
image
network model
capsule
geographic position
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311226356.2A
Other languages
Chinese (zh)
Inventor
管冬冬
潘乐飞
马峰
杨晓云
雷刚
杨奇松
李亚雄
李少朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rocket Force University of Engineering of PLA
Original Assignee
Rocket Force University of Engineering of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rocket Force University of Engineering of PLA filed Critical Rocket Force University of Engineering of PLA
Priority to CN202311226356.2A priority Critical patent/CN116977437A/en
Publication of CN116977437A publication Critical patent/CN116977437A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/587Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/38Outdoor scenes
    • G06V20/39Urban scenes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Library & Information Science (AREA)
  • Remote Sensing (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides an image geographic position positioning method, an image geographic position positioning device and electronic equipment, wherein the method comprises the following steps: acquiring reference images of a plurality of geographic positions to obtain a training data set, wherein the reference images comprise ground street view images and aerial images, and the reference images have geographic position information and shooting angle information; constructing a capsule network model, and training the capsule network model by utilizing a training data set to obtain a trained capsule network model; inputting the target image into a trained capsule network model, extracting depth characteristics of the target image, and obtaining feature vectors of the target image; predicting the classification category of the feature vector of the target image in the training data set to obtain the geographic position information and shooting angle information of the target image. The method and the device extract the feature vectors by using the trained capsule network model, obtain the geographic position information through the classification result of the feature vectors in the training data set, and solve the problem that the geographic position information of the image is difficult to determine in the related technology.

Description

Image geographic position positioning method and device and electronic equipment
Technical Field
The disclosure relates to the technical field of image positioning, in particular to an image geographic position positioning method, an image geographic position positioning device and electronic equipment.
Background
Currently, when the geographic position of the target image is located, the position information of the target image is determined mainly by taking a large number of ground view images with geographic position information as references. However, the reference images of the ground view angle are mainly concentrated in the places where people gather, such as cities, tourist attractions and the like, and for remote suburban areas, rural areas and the like, the coverage of the local ground reference images is too small, so that corresponding reference data are lacking, and therefore, the geographic position information of the images is difficult to determine.
Aiming at the problem that the geographic position information of the image is difficult to determine in the related technology, no effective technical solution is proposed at present.
Disclosure of Invention
The main objective of the present disclosure is to provide a method and an apparatus for positioning geographic position of an image, and an electronic device, so as to solve the problem that it is difficult to determine geographic position information of an image in the related art.
To achieve the above object, a first aspect of the present disclosure provides an image geographic position locating method, including:
acquiring reference images of a plurality of geographic positions to obtain a training data set, wherein the reference images comprise ground street view images and aerial images, and the reference images have geographic position information and shooting angle information;
constructing a capsule network model, and training the capsule network model by utilizing a training data set to obtain a trained capsule network model;
inputting the target image into a trained capsule network model, extracting depth characteristics of the target image, and obtaining feature vectors of the target image; and
predicting the classification category of the feature vector of the target image in the training data set to obtain the geographic position information and shooting angle information of the target image.
Optionally, acquiring reference images of a plurality of geographic locations to obtain a training dataset includes:
shooting ground panoramic images at different geographic positions by using a handheld device, and recording shooting angles of the handheld device, wherein each ground panoramic image is provided with geographic position information and shooting angle information;
cutting the ground panoramic image to obtain a plurality of ground street view images, wherein each ground street view image has geographic position information and shooting angle information;
acquiring aerial images corresponding to each ground street view image according to a preset scale, wherein each ground street view image has geographic position information;
and taking the ground street view image and the aerial image as reference images, wherein all the reference images form a training data set.
Optionally, constructing a capsule network model, and training the capsule network model using the training data set to obtain a trained capsule network model, including:
constructing a capsule network model according to the structure of a capsule network, wherein the structure of the capsule network comprises a convolution layer, a primary capsule layer and a classification capsule layer based on dynamic routing;
inputting the training data set into a capsule network model, and extracting depth features of a reference image in the training data set through the capsule network model to obtain feature vectors of the reference image;
respectively constructing a loss function by utilizing different semantic layers of the capsule network model for extracting image features, wherein the different capsule layers comprise a primary capsule layer and a classified capsule layer;
the feature vector of the reference image is sent into a loss function, and the loss function is used for measuring the error between the predicted value of the capsule network model to the sample class and the real label value;
performing back propagation based on errors, optimizing a capsule network model by adopting a random gradient descent algorithm, and completing one iteration based on a training data set;
and when the number of iterations by using the training data set reaches the preset number of iterations, obtaining a trained capsule network model and a corresponding network model weight.
Further, the classified capsules in the classified capsule layer are predicted according to the following loss functionLoss belonging to category k
wherein ,for marking parameters, for marking whether the current sample belongs to the kth class, if so +.>Otherwise->,/>For the first constant parameter, ++>For the second constant parameter, +.>As a loss parameter, a loss weight for decreasing the capsule vector of a non-corresponding class.
Optionally, inputting the target image into the trained capsule network model, extracting depth features of the target image, and obtaining feature vectors of the target image, including:
inputting the target image into a trained capsule network model, wherein the trained capsule network model corresponds to a network model weight;
and extracting depth features of the target image by using the network model weight to obtain feature vectors of the target image, wherein the feature vectors of the target image are capsule vectors.
Optionally, predicting classification categories of feature vectors of the target image in the training data set to obtain geographic position information and shooting angle information of the target image includes:
comparing the feature vector of the target image with the feature vector of the reference image in the training data set, and predicting the classification category of the geographic position of the target image in the training data set to obtain the geographic position information of the target image;
screening a geographic position mark image of the geographic position of the target image from the reference image of the training data set;
based on the feature vector and the geographic position information of the target image, comparing the target image with the geographic position mark image, and determining the relative position relation between the shooting position of the target image and the landmark position of the geographic position mark image;
and determining the angle of shooting the target image according to the relative position relationship, and obtaining shooting angle information of the target image.
Optionally, after obtaining the geographic position information and the shooting angle information of the target image, the method further includes:
the geographic position information and shooting angle information of the target image are sent to a client;
determining longitude and latitude coordinates corresponding to the geographic position information, and writing the longitude and latitude coordinates into a landmark file;
and calling an application programming interface through a handle command provided by the client, reading the landmark file, and displaying the read longitude and latitude coordinates.
A second aspect of the present disclosure provides an image geographic position locating device, comprising:
the acquisition unit is used for acquiring reference images of a plurality of geographic positions to obtain a training data set, wherein the reference images comprise ground street view images and aerial images, and the reference images have geographic position information and shooting angle information;
the training unit is used for constructing a capsule network model, and training the capsule network model by utilizing the training data set to obtain a trained capsule network model;
the extraction unit is used for inputting the target image into the trained capsule network model, extracting the depth characteristics of the target image and obtaining the characteristic vector of the target image; and
and the prediction unit is used for predicting the classification category of the feature vector of the target image in the training data set to obtain the geographic position information and shooting angle information of the target image.
A third aspect of the present disclosure provides a computer-readable storage medium storing computer instructions for causing a computer to perform the image geolocation method provided in any one of the first aspects.
A fourth aspect of the present disclosure provides an electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor to cause the at least one processor to perform the image geolocation method provided in any one of the first aspects.
In the image geographic position positioning method provided by the embodiment of the disclosure, reference images of a plurality of geographic positions are collected to obtain a training data set, wherein the reference images comprise ground street view images and aerial images, and the reference images comprise geographic position information and shooting angle information; the defect that the coverage of ground reference images in areas such as remote suburban areas or rural areas is too small is overcome by combining the ground street view images with geographic position information and aerial images;
constructing a capsule network model, and training the capsule network model by utilizing a training data set to obtain a trained capsule network model; inputting the target image into a trained capsule network model, extracting depth characteristics of the target image, and obtaining feature vectors of the target image; the trained capsule network model is utilized to extract the characteristics, so that the efficiency of extracting the characteristics is enhanced;
predicting the classification category of the feature vector of the target image in the training data set to obtain the geographic position information and shooting angle information of the target image. The geographic position positioning and shooting angle of the target image can be obtained through the classification result of the target image feature vector in the training data set, and the problem that the geographic position information of the image is difficult to determine in the related technology is solved.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the prior art, the drawings that are required in the detailed description or the prior art will be briefly described, it will be apparent that the drawings in the following description are only some embodiments of the present disclosure and that other drawings may be obtained from these drawings without inventive effort to those of ordinary skill in the art.
Fig. 1 is a flowchart of an image geographic location positioning method according to an embodiment of the present disclosure;
fig. 2 is a basic structural diagram of a capsule network provided in an embodiment of the present disclosure;
FIG. 3 is a computational schematic of a primary capsule layer provided by an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a dynamic routing mechanism provided by an embodiment of the present disclosure;
FIG. 5 is a block diagram of an image geolocation device provided by an embodiment of the present disclosure;
fig. 6 is a block diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
In order that those skilled in the art will better understand the present disclosure, a technical solution in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure, shall fall within the scope of the present disclosure.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the disclosure herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
With the development of electronic devices, many electronic handheld devices (such as digital cameras, smart phones, unmanned aerial vehicles, aerial cameras, etc.) integrate GPS functions, and these products can acquire geographic position information of the photographing ground while photographing. At the same time, the rapid popularization of the internet and the rapid increase of internet users make the multimedia data with the geographic position information shared by a large amount of users appear on the internet, and the number of images with the geographic position information on the internet is rapidly increased. In real-world applications, one of the most important attributes of an image is the geographic location, and if the geographic location of a photo can be automatically determined, hundreds of additional attributes can be known, including any attribute existing on a map, such as population density, average temperature, crime rate, altitude, distance, etc.; in addition, judging the geographic location of an image is also a common photo evidence obtaining task.
The image geographic position positioning is to acquire corresponding longitude and latitude coordinates of the geographic positions of images or videos shot by satellites, unmanned aerial vehicles and handheld equipment through a correlation algorithm, and the positioning method can be applied to various actual scenes. For example, useful geographic and geological features and their distribution in the region of interest can be automatically obtained and indexed through a large number of images, thereby improving decision making capability and assisting in accurate positioning in the unmanned area. Therefore, the geographic position positioning of the image has wide application prospect and is also attracting more and more researchers' interest.
Currently, when the geographic position of the target image is located, the position information of the target image is determined mainly by taking a large number of ground view images with geographic position information as references. However, the reference images of the ground view angle are mainly concentrated in the places where people gather, such as cities, tourist attractions and the like, and for remote suburban areas, rural areas and the like, the coverage of the local ground reference images is too small, so that corresponding reference data are lacking, and therefore, the geographic position information of the images is difficult to determine.
With the massive use of unmanned aerial vehicles, aerial photography devices and satellites, aerial photography visual angle images with geographic position information are more and more, and a new thought is brought to solving the problem that ground reference images are too small in coverage.
The embodiment of the disclosure provides an image geographic position positioning method, as shown in fig. 1, which includes the following steps S101 to S104:
step S101: acquiring reference images of a plurality of geographic positions to obtain a training data set, wherein the reference images comprise ground street view images and aerial images, and the reference images have geographic position information and shooting angle information; classifying and storing a plurality of reference images at different geographic positions to obtain a training data set with geographic position information and shooting angle information;
by utilizing the aerial image with the geographic position information and the ground street view image, cross-view image matching can be performed, the defect that the coverage of ground reference images in areas such as remote suburbs or villages is too small is overcome, the reference images in the training data set are enriched, and the geographic position information of the target image can be accurately positioned later.
In an alternative embodiment of the present disclosure, acquiring reference images of a plurality of geographic locations in step S101, to obtain a training dataset includes:
shooting ground panoramic images at different geographic positions by using a handheld device, and recording shooting angles of the handheld device, wherein each ground panoramic image is provided with geographic position information and shooting angle information; the handheld device comprises a digital camera, a smart phone, an unmanned aerial vehicle, an aerial camera and the like, wherein the shooting angle can be shooting of a scene from east to west and from southeast to northwest, can be shooting of a scene overlooking or looking up at a certain angle, and can be shooting angle set in map software; for example, for map software such as google maps, high-german maps, hundred-degree maps, etc., a first viewing angle set in the google map may be selected as a shooting angle;
cutting the ground panoramic image to obtain a plurality of ground street view images, wherein each ground street view image has geographic position information and shooting angle information; cutting each ground panoramic image randomly to obtain a plurality of ground street view images;
acquiring aerial images corresponding to each ground street view image according to a preset scale, wherein each ground street view image has geographic position information; inquiring map software according to a preset scale to obtain an aerial image for the ground street view image obtained after cutting, so as to obtain a ground street view image and an aerial image with comprehensive scene coverage; wherein, preset scale can be 1:3000, the resolution ratio of the aerial image can be 1.07m, the aerial image comprises satellite aerial images, unmanned aerial vehicle aerial images, aerial device aerial images and the like, for example, in an area with a fixed geographic position, satellite aerial images of different scenes and simulated handheld device images of three-dimensional street views are respectively intercepted on Google Earth digital Earth, and the shooting angles of the handheld device images are recorded;
and taking the ground street view image and the aerial image as reference images, wherein all the reference images form a training data set. The training data set comprises training data of a plurality of geographic positions, each training data in the training data set is marked, and the true label value of the category to which the sample belongs is marked.
Step S102: constructing a capsule network model, and training the capsule network model by utilizing a training data set to obtain a trained capsule network model; the capsule network model is used as a deep neural network model, has a simple structure and high calculation efficiency, and can improve the classification accuracy of sample types; and training the capsule network model through the training data set to obtain a trained capsule network model.
In an alternative embodiment of the present disclosure, step S102 includes:
constructing a capsule network model according to the structure of a capsule network, wherein the structure of the capsule network comprises a convolution layer, a primary capsule layer and a classification capsule layer based on dynamic routing;
inputting the training data set into a capsule network model, and extracting depth features of a reference image in the training data set through the capsule network model to obtain feature vectors of the reference image; extracting depth features of the reference image, and simultaneously obtaining image feature vectors of all reference image data;
respectively constructing a loss function by utilizing different semantic layers of the capsule network model for extracting image features, wherein the different capsule layers comprise a primary capsule layer and a classified capsule layer; the efficiency of feature extraction can be enhanced through different capsule layers of the capsule network model, the method has the characteristics of simple structure and high calculation efficiency, and the whole capsule network model can be effectively enhanced, so that the whole capsule network model has stronger generalization and classification accuracy is improved;
the feature vector of the reference image is sent into a loss function, and the loss function is used for measuring the error between the predicted value of the capsule network model to the sample class and the real label value; the feature vector of the reference image is sent into a loss function, the extraction degree of the current network feature is judged, and the error between the predicted value and the true label value of the trained capsule network model to the sample class is measured;
performing back propagation based on errors, optimizing a capsule network model by adopting a random gradient descent algorithm, and completing one iteration based on a training data set; the back propagation guides the optimization direction of the capsule network model;
and when the number of iterations by using the training data set reaches the preset number of iterations, obtaining a trained capsule network model and a corresponding network model weight.
The training data set may be used to perform multiple iterations, the condition for terminating training is generally a preset iteration number, the preset iteration number may be 30, 50, etc., in the embodiment of the present disclosure, the preset iteration number is 50, when the number of iterations performed by using the training data set reaches 50 times, a trained capsule network model is obtained, and after the capsule network model is trained, a trained network model weight may be obtained.
In the disclosed embodiments, a Capsule Network (simply called Capsule net) is made up of a set of neurons, and the activity of the neurons within the activation Capsule represents various attributes of a particular entity present in an image, which may include many different types of instantiation parameters, such as pose (position, size, direction), deformation, speed, reflectivity, hue, texture, etc. A very special attribute in the capsule is the existence of an instance of a certain class in the image, the total length of an instantiation parameter vector is adopted in a capsule network to represent the existence of an entity, the direction of the vector is forced to represent the attribute of the entity, and the non-linearity is applied to ensure that the direction of the vector is unchanged but the scale is reduced, so that the output length of the capsule vector is not more than 1.
In a conventional convolutional neural network (Convolutional Neural Networks, abbreviated as CNN), one activated neuron can represent only one entity, which greatly limits the ability of CNN to represent object properties, and the conventional artificial neural network outputs a scalar, while the output of the capsule neural network is a vector. The Capsule net overcomes this disadvantage of CNNs by encapsulating multiple neurons into a single Neuron Capsule (NC), the length of the NC representing the probability of the entity being present and the direction representing the instantiation parameters. In the deep feature extraction stage, the capsule net still uses a single convolutional layer as the feature extractor.
Embodiments of the present disclosure devise a way to construct the primary capsule layers that can shorten the distance of back propagation, and the back propagation of losses can reach the capsules of each layer directly, rather than back propagation layer by layer.
The basic structure of the capsule network is shown in fig. 2, wherein the basic structure of the capsule network comprises two common convolution layers and a classification capsule layer based on dynamic routing, and the first two layers are the same as the common convolution layers, and the difference is that the second layer output characteristic diagram is used for constructing capsule vectors (indicated by black arrows in fig. 2), and each vector is 8-dimensional and divided into 32 groups; the input and output of the third layer dynamic routing algorithm are each a single capsule vector, and each capsule corresponds to a unique weight matrix W ij The conversion of the capsule is achieved by a dynamic routing algorithm, wherein i represents the firstlLayer capsule number, j representslSerial number of +1 layer capsule; the last layer of capsule vectors is 10 in total, each dimension is 16, and each dimension corresponds to one category; these capsules have two roles, one for characterizing the current input sample and the other for the modulo length of the vector to represent the probability that the sample is input into the corresponding class.
The primary capsule layer is a normal convolutional network structure, except for the processing of the feature map. The calculation principle of the primary capsule layer is shown in fig. 3, and the left side in fig. 3 is a feature map output by the previous layer, and 256 channels are all provided. Every time a feature map is generated, a group of 256-channel convolution kernels are needed, a traditional convolution network is used, a final layer of convolution results in a plurality of feature map matrixes, the feature matrixes are stretched according to rows and spliced into a vector, and a capsule network is used for compressing all the feature maps according to the rowGrouping is performed according to a certain number, for example, every 8 feature maps are grouped into one group, as shown in FIG. 3, 256 feature maps are grouped into 32 groups in total, 8-dimensional capsule vectors can be extracted according to the channel direction, and then the convolution kernels can be separated correspondingly by groups, namely W 1 ,…,W 32
The third layer of the capsule network is a dynamic routing capsule layer, which can convert the capsule vector of the previous layer into the capsule vector of the next layer, and the schematic diagram of the dynamic routing mechanism is shown in fig. 4, wherein the primary capsule layer contains m capsules, the dynamic routing capsule layer contains m columns of n capsules each,is the firstlThe capsule vector of the layer belongs to the primary capsule, +.>Is the firstlThe capsule vector of +1 layer, belonging to the class capsule, and +.>All prediction vectors +.>By a weighted sum of +.>The linear transformation is carried out, namely:
wherein ,is a weight matrix which can be learned, let ∈ ->Is each +.>Mapping to +.>The corresponding coupling coefficient, i.e. the weighting coefficient, determines the firstlLayer->Transition to the firstl+1 layer corresponding->The required weight; formula calculated by combining dynamic routing algorithm>The capsule vector of the last layer can be derived +.>
By weight coefficient after dynamic routing and predictive vector of upper layerThe capsule vector of the last layer can be derived +.>Capsule vector->I.e. feature vectors; each capsule vector->To specify the probability that a mapping of the vector is required, a smaller vector whose modulo length is close to 0 and a larger vector whose modulo length is close to 1 is desired, and a nonlinear mapping of the vector is chosen using a squaring function that defines the following equation:
thus, the next layer (the firstl+1 layer) capsule vector. Coupling coefficient->Is calculated as follows:
wherein i represents the firstlLayer capsule number, j representslSerial number of +1 layer capsule,is the initial coupling coefficient, the value is determined iteratively by dynamic routing, the value representing the firstlPrediction vector of layer->And the firstlCapsules of +1 layer->Similarity between, i.e. increment of coupling coefficient +.>And->Update +.>. Initial->Set to the 0 vector, each prediction vector is equally passed to the next layer, and then the initial next layer capsule vector is obtained.
At the last layer of the capsule network, eachOne capsule is called a digital capsule, and is used for directly participating in classification, each capsule corresponds to a sample class label, the modulo length of a vector represents the probability that the sample belongs to a certain class, and a classification loss function is marginal loss. For one of the classified capsulesTo predict the classified capsules in the classified capsule layer according to the following loss function>Loss ∈k belonging to category>
wherein ,for marking parameters, for marking whether the current sample belongs to the kth class, if so +.>Otherwise->;/>For the first constant parameter, ++>For the second constant parameter, +.>The value of (2) may be 0.9, (-)>The value of (2) can be 0.1, and the first constant parameter and the second constant parameter are used as marginal values, so that the network generalization performance is improved; />For loss parameter->The value of (2) may be 0.5 for reducing the loss weight of the capsule vector of the non-corresponding class, preventing the modular length of the capsule from being excessively reduced when training is started.
The capsule network model provided by the embodiment of the disclosure can be trained on a GeForce RTX 2080Ti GPU, and random gradient descent is adopted to realize optimization in the training process, wherein the initial learning rate is set to be 10 -3 Using a batch process of size 128, the iteration is 50 rounds using weight decay.
Step S103: inputting the target image into a trained capsule network model, extracting depth characteristics of the target image, and obtaining feature vectors of the target image; the feature extraction can be performed by inputting the target image into a trained capsule network model, so as to obtain the feature vector of the target image, namely the capsule vector of the target image.
In an alternative embodiment of the present disclosure, step S103 includes:
inputting the target image into a trained capsule network model, wherein the trained capsule network model corresponds to a network model weight; after the capsule network model is trained, the trained network model weight can be obtained;
and extracting depth features of the target image by using the network model weight to obtain feature vectors of the target image, wherein the feature vectors of the target image are capsule vectors.
Step S104: predicting the classification category of the feature vector of the target image in the training data set to obtain the geographic position information and shooting angle information of the target image. Predicting classification categories of capsule vectors of the target image in the training data set, classifying the target image according to the capsule vectors, and matching geographic position information and shooting angle information of the target image according to classification results to obtain geographic position information and shooting angle of the handheld device corresponding to the target image.
In an alternative embodiment of the present disclosure, step S104 includes:
comparing the feature vector of the target image with the feature vector of the reference image in the training data set, and predicting the classification category of the geographic position of the target image in the training data set to obtain the geographic position information of the target image;
screening a geographic position mark image of the geographic position of the target image from the reference image of the training data set;
based on the feature vector and the geographic position information of the target image, comparing the target image with the geographic position mark image, and determining the relative position relation between the shooting position of the target image and the landmark position of the geographic position mark image;
and determining the angle of shooting the target image according to the relative position relationship, and obtaining shooting angle information of the target image.
And predicting the classification category of the feature vector of the target image in the training data set according to the depth feature of the target image obtained by the capsule network model by utilizing feature transformation and feature matching, and obtaining the geographic position information corresponding to the target image and the shooting angle of the handheld device.
The embodiment of the disclosure is based on feature transformation, introduces a capsule network model, utilizes the capsule network model to have the capability of decomposing an object into different components to represent different attribute features, and pertinently extracts target attribute features of a target image, such as depth features of the target image, shooting angles of the target image and the like, so that the capsule network model has stronger robustness to various angle changes caused by different imaging conditions.
In an optional embodiment of the present disclosure, after obtaining the geographic location information and the shooting angle information of the target image in step S104, the method further includes:
the geographic position information and shooting angle information of the target image are sent to a client;
determining longitude and latitude coordinates corresponding to the geographic position information, and writing the longitude and latitude coordinates into a landmark file;
and calling an application programming interface through a handle command provided by the client, reading the landmark file, and displaying the read longitude and latitude coordinates.
For example, the Google earth digital earth can be used, after the longitude and latitude coordinates corresponding to the geographic position information are determined, the longitude and latitude coordinates are written into the KML file of Google, and the Google earth interface is directly called to read the KML file through the handle command provided by the client, so that the longitude and latitude coordinates of the target image can be displayed on the digital earth.
From the above description, it can be seen that the present disclosure achieves the following technical effects:
the method and the device make up for the defect that the coverage of ground reference images in areas such as remote suburban areas or rural areas is too small by matching the ground street view images with geographic position information with the aerial image;
the geographic position positioning and shooting angle of the target image can be obtained through the classification result of the target image feature vector in the training data set, and the problem that the geographic position information of the image is difficult to determine in the related technology is solved;
the semantic level of the image feature extraction is different by utilizing different capsule layers in the capsule network model, the loss function is respectively constructed, the feature extraction efficiency is enhanced, the method has the characteristics of simple structure and high calculation efficiency, and the operations effectively gain the whole capsule network model, so that the method has stronger generalization and classification accuracy.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The embodiment of the present disclosure further provides an image geographic position positioning device for implementing the image geographic position positioning method, as shown in fig. 5, the image geographic position positioning device 50 includes:
the acquisition unit 51 is configured to acquire reference images of a plurality of geographic locations, so as to obtain a training data set, where the reference images include a ground street view image and an aerial image, and the reference images include geographic location information and shooting angle information;
a training unit 52, configured to construct a capsule network model, and train the capsule network model using the training data set to obtain a trained capsule network model;
an extracting unit 53, configured to input the target image into the trained capsule network model, extract depth features of the target image, and obtain feature vectors of the target image; and
the prediction unit 54 is configured to predict classification categories of feature vectors of the target image in the training data set, so as to obtain geographic location information and shooting angle information of the target image.
The specific manner in which the units of the above embodiments of the apparatus perform their operations has been described in detail in relation to the embodiments of the method and is not described in detail here.
The disclosed embodiment also provides an electronic device, as shown in fig. 6, which includes one or more processors 61 and a memory 62, and in fig. 6, one processor 61 is taken as an example.
The controller may further include: an input device 63 and an output device 64.
The processor 61, the memory 62, the input means 63 and the output means 64 may be connected by a bus or otherwise, in fig. 6 by way of example.
The processor 61 may be a central processing unit (Central Processing Unit, abbreviated as CPU), the processor 61 may also be other general purpose processors, digital signal processors (DigitalSignal Processor, abbreviated as DSP), application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), field programmable gate arrays (Field-Programmable Gate Array, abbreviated as FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination of the foregoing types of chips, and the general purpose processor may be a microprocessor or any conventional processor.
The memory 62 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the control methods in the embodiments of the present disclosure. The processor 61 executes various functional applications of the server and data processing, i.e., implements the image geolocation method of the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 62.
Memory 62 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of a processing device operated by the server, or the like. In addition, the memory 62 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 62 may optionally include memory located remotely from processor 61, which may be connected to a network connection device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 63 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing device of the server. The output device 64 may include a display device such as a display screen.
One or more modules are stored in the memory 62 that, when executed by the one or more processors 61, perform the method as shown in fig. 1.
It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the embodiment of the above-described motor control method when executed. The storage medium may be a magnetic disk, an optical disc, a Read-Only Memory (ROM), a random access Memory (RandomAccess Memory, RAM), a Flash Memory (FM), a hard disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
Although embodiments of the present disclosure have been described with reference to the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the disclosure, and such modifications and variations fall within the scope as defined by the appended claims.

Claims (10)

1. A method for locating a geographic location of an image, comprising:
acquiring reference images of a plurality of geographic positions to obtain a training data set, wherein the reference images comprise ground street view images and aerial images, and the reference images have geographic position information and shooting angle information;
constructing a capsule network model, and training the capsule network model by utilizing the training data set to obtain a trained capsule network model;
inputting a target image into the trained capsule network model, extracting depth characteristics of the target image, and obtaining a characteristic vector of the target image; and
and predicting the classification category of the feature vector of the target image in the training data set to obtain the geographic position information and shooting angle information of the target image.
2. The method of claim 1, wherein acquiring reference images of a plurality of geographic locations to obtain a training dataset comprises:
shooting ground panoramic images at different geographic positions by using a handheld device, and recording shooting angles of the handheld device, wherein each ground panoramic image is provided with geographic position information and shooting angle information;
cutting the ground panoramic image to obtain a plurality of ground street view images, wherein each ground street view image is provided with geographic position information and shooting angle information;
acquiring aerial images corresponding to each ground street view image according to a preset scale, wherein each ground street view image has geographic position information;
and taking the ground street view image and the aerial image as reference images, wherein all the reference images form the training data set.
3. The method of claim 1, wherein constructing the capsule network model and training the capsule network model using the training dataset to obtain a trained capsule network model comprises:
constructing the capsule network model according to the structure of a capsule network, wherein the structure of the capsule network comprises a convolution layer, a primary capsule layer and a classification capsule layer based on dynamic routing;
inputting the training data set into the capsule network model, and extracting depth features of a reference image in the training data set through the capsule network model to obtain feature vectors of the reference image;
respectively constructing a loss function by utilizing different semantic layers of the capsule network model for extracting image features, wherein the different capsule layers comprise the primary capsule layer and the classified capsule layer;
the feature vector of the reference image is sent into the loss function, and the loss function is used for measuring errors between predicted values and real label values of the capsule network model on sample types;
performing back propagation based on the error, optimizing the capsule network model by adopting a random gradient descent algorithm, and completing one iteration based on the training data set;
and when the number of iterations by using the training data set reaches the preset number of iterations, obtaining a trained capsule network model and a corresponding network model weight.
4. A method according to claim 3, characterized in that the classification capsules in the classification capsule layer are predicted according to the following loss functionLoss ∈k belonging to category>
wherein ,for marking parameters, for marking whether the current sample belongs to the kth class, if so +.>Otherwise,/>For the first constant parameter, ++>For the second constant parameter, +.>As a loss parameter, a loss weight for decreasing the capsule vector of a non-corresponding class.
5. The method of claim 1, wherein inputting the target image into the trained capsule network model, extracting depth features of the target image, and obtaining feature vectors of the target image, comprises:
inputting a target image into the trained capsule network model, wherein the trained capsule network model corresponds to a network model weight;
and extracting the depth characteristics of the target image by using the network model weight to obtain the characteristic vector of the target image, wherein the characteristic vector of the target image is a capsule vector.
6. The method according to claim 1, wherein predicting the classification category of the feature vector of the target image in the training dataset, to obtain the geographic location information and the shooting angle information of the target image, comprises:
comparing the feature vector of the target image with the feature vector of a reference image in the training data set, and predicting the classification category of the geographic position of the target image in the training data set to obtain geographic position information of the target image;
screening out a geographic position mark image of the geographic position of the target image from the reference image of the training data set;
based on the feature vector and the geographic position information of the target image, comparing the target image with the geographic position mark image, and determining the relative position relation between the shooting position of the target image and the landmark position of the geographic position mark image;
and determining the angle for shooting the target image according to the relative position relation, and obtaining shooting angle information of the target image.
7. The method according to claim 1, wherein after obtaining the geographical position information and the shooting angle information of the target image, the method further comprises:
the geographic position information and shooting angle information of the target image are sent to a client;
determining longitude and latitude coordinates corresponding to the geographic position information, and writing the longitude and latitude coordinates into a landmark file;
and calling an application programming interface through a handle command provided by the client, reading the landmark file, and displaying the read longitude and latitude coordinates.
8. An image geographic location positioning device, comprising:
the system comprises an acquisition unit, a storage unit and a display unit, wherein the acquisition unit is used for acquiring reference images of a plurality of geographic positions to obtain a training data set, the reference images comprise ground street view images and aerial images, and the reference images are provided with geographic position information and shooting angle information;
the training unit is used for constructing a capsule network model, and training the capsule network model by utilizing the training data set to obtain a trained capsule network model;
the extraction unit is used for inputting a target image into the trained capsule network model, extracting depth characteristics of the target image and obtaining a characteristic vector of the target image; and
and the prediction unit is used for predicting the classification category of the feature vector of the target image in the training data set to obtain the geographic position information and shooting angle information of the target image.
9. A computer readable storage medium having stored thereon computer instructions for causing a computer to perform the image geolocation method of any one of claims 1-7.
10. An electronic device, the electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor to cause the at least one processor to perform the image geolocation method of any one of claims 1-7.
CN202311226356.2A 2023-09-22 2023-09-22 Image geographic position positioning method and device and electronic equipment Pending CN116977437A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311226356.2A CN116977437A (en) 2023-09-22 2023-09-22 Image geographic position positioning method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311226356.2A CN116977437A (en) 2023-09-22 2023-09-22 Image geographic position positioning method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116977437A true CN116977437A (en) 2023-10-31

Family

ID=88483463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311226356.2A Pending CN116977437A (en) 2023-09-22 2023-09-22 Image geographic position positioning method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116977437A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103067856A (en) * 2011-10-24 2013-04-24 康佳集团股份有限公司 Geographic position locating method and system based on image recognition
CN114241464A (en) * 2021-11-30 2022-03-25 武汉大学 Cross-view image real-time matching geographic positioning method and system based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103067856A (en) * 2011-10-24 2013-04-24 康佳集团股份有限公司 Geographic position locating method and system based on image recognition
CN114241464A (en) * 2021-11-30 2022-03-25 武汉大学 Cross-view image real-time matching geographic positioning method and system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
B. SUN等: ""GEOCAPSNET: Ground to Aerial View Image Geo-Localization using Capsule Network"", 《2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》, pages 742 - 747 *
C. XIANG等: ""MS-CapsNet: A Novel Multi-Scale Capsule Network"", 《IEEE SIGNAL PROCESSING LETTERS》, vol. 25, no. 12, pages 1850 - 1854, XP011698976, DOI: 10.1109/LSP.2018.2873892 *

Similar Documents

Publication Publication Date Title
US10592780B2 (en) Neural network training system
CN108596101B (en) Remote sensing image multi-target detection method based on convolutional neural network
CN110245709B (en) 3D point cloud data semantic segmentation method based on deep learning and self-attention
CN105740894B (en) Semantic annotation method for hyperspectral remote sensing image
Workman et al. A unified model for near and remote sensing
CN108596108B (en) Aerial remote sensing image change detection method based on triple semantic relation learning
Lenjani et al. Automated building image extraction from 360 panoramas for postdisaster evaluation
CN106529538A (en) Method and device for positioning aircraft
CN111640116B (en) Aerial photography graph building segmentation method and device based on deep convolutional residual error network
US10755146B2 (en) Network architecture for generating a labeled overhead image
CN113610905B (en) Deep learning remote sensing image registration method based on sub-image matching and application
CN115457396B (en) Surface target ground object detection method based on remote sensing image
CN114758337B (en) Semantic instance reconstruction method, device, equipment and medium
Zhaosheng et al. Rapid detection of wheat ears in orthophotos from unmanned aerial vehicles in fields based on YOLOX
Xu et al. Building height calculation for an urban area based on street view images and deep learning
Hu et al. Soil moisture retrieval using convolutional neural networks: Application to passive microwave remote sensing
CN112084989A (en) Unmanned aerial vehicle and CNN-based large-range pine wood nematode withered vertical wood intelligent detection method
CN116012709B (en) High-resolution remote sensing image building extraction method and system
CN116758419A (en) Multi-scale target detection method, device and equipment for remote sensing image
CN114663714B (en) Image classification and ground feature classification method and device
CN114550016B (en) Unmanned aerial vehicle positioning method and system based on context information perception
CN116977437A (en) Image geographic position positioning method and device and electronic equipment
CN115240168A (en) Perception result obtaining method and device, computer equipment and storage medium
CN112528803B (en) Road feature extraction method, device, equipment and storage medium
CN114238541A (en) Sensitive target information acquisition method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination