US20170228871A1 - System and method for labelling aerial images - Google Patents

System and method for labelling aerial images Download PDF

Info

Publication number
US20170228871A1
US20170228871A1 US15/497,378 US201715497378A US2017228871A1 US 20170228871 A1 US20170228871 A1 US 20170228871A1 US 201715497378 A US201715497378 A US 201715497378A US 2017228871 A1 US2017228871 A1 US 2017228871A1
Authority
US
United States
Prior art keywords
pixels
pixel
neural network
labeled
object class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/497,378
Inventor
Volodymyr Mnih
Geoffrey E. Hinton
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Priority to US15/497,378 priority Critical patent/US20170228871A1/en
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HINTON, Geoffrey E., MNIH, VOLODYMYR
Publication of US20170228871A1 publication Critical patent/US20170228871A1/en
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06K9/0063
    • G06K9/6269
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/182Network patterns, e.g. roads or rivers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation

Definitions

  • the following relates generally to the labelling of aerial images with map data.
  • aerial image interpretation is usually formulated as a pixel labelling task.
  • the goal is to produce either a complete semantic segmentation of an aerial image into classes such as building, road, tree, grass, and water or a binary classification of the image for a single object class.
  • the availability of accurately labelled data for training tends to be the limiting factor.
  • Hand-labelled data tends to be reasonably accurate, but the cost of hand-labelling and the lack of publicly available hand-labelled datasets strongly restrict the size of the training and test sets for aerial image labelling tasks.
  • FIG. 1 shows an example of omission noise and registration noise in a mapping application.
  • Omission noise occurs when an object that appears in an aerial image does not appear in the map. This is the case for many buildings (even in major cities) due to incompleteness of the maps. It is also true for small roads and alleys, which tend to be omitted from maps, often with no clear criterion for when they should be omitted.
  • Registration noise occurs when the location of an object in a map is inaccurate. Such errors are quite common because not requiring pixel level accuracy makes maps cheaper to produce for human experts without significantly reducing their usefulness for most purposes.
  • a system for labelling aerial images comprising a neural network for generating predicted map data wherein the parameters of the neural network are trained by optimizing an objective function which compensates for noise in the map images.
  • FIG. 1 is an example of omission noise and registration noise in a mapping application
  • FIG. 2 is a system in accordance with the present invention
  • FIG. 3 is an exemplary graphical representation of a model of the present invention.
  • FIG. 4 is a demonstration of a function of the present invention.
  • any module, component, server, computer, terminal or device exemplified herein that executes instructions may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape.
  • Computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of the device or accessible or connectable thereto. Any application or module herein described may be implemented using computer readable/executable instructions that may be stored or otherwise held by such computer readable media.
  • the present invention provides a system and method for labelling aerial images.
  • the invention provides a training unit for reducing and compensating omissions noise and registration noise in map images.
  • a neural network generates predicted map data.
  • the parameters of the neural network are trained by optimizing an objective function which compensates for noise in the map images.
  • the function compensates both omission noise and registration noise.
  • the training unit comprises a memory 202 , a neural network 204 , an omission noise compensator 206 , a registration noise compensator 208 and a comparator 210 .
  • the parameters of the neural network are selected by reducing a negative log likelihood function, or another objective function, generated by the omission noise compensator and registration noise compensator.
  • the neural network can be used to generate map data during and after training.
  • the memory stores existing map data 212 and corresponding aerial images 214 of a training set.
  • the neural network is trained by adjusting a loss function, which models or compensates the omission noise and registration noise in the map image, to produce output map data that closely corresponds to the existing map data.
  • the degree of such correspondence is determined by inputting the output map data and existing map data to a comparator.
  • the training process for the neural network learns to label aerial images from existing maps that may provide abundant labels, notwithstanding that such existing maps may be incomplete, poorly registered or both.
  • the neural network may comprise an input layer, output layer and three hidden layers, although it will be appreciated that any number of hidden layers may be included in the neural network.
  • the neural network is trained using a robust loss function for reducing the effect of omission errors on a resulting classifier and for compensating omission noise and registration noise in the training data. Initializing neural networks using unsupervised learning methods is believed to improve performance on a variety of vision tasks and, in the present disclosure, unsupervised pretraining is used to initialize the deep neural network.
  • An exemplary aerial image labeling task comprises binary labels, where all pixels are labelled as either belonging to an object class of interest with 1's or all other pixels with 0's.
  • Road and buildings are two examples of such object classes, however it will be appreciated that the present invention is operable for many other classes.
  • a single pixel can represent a square patch of land that is anywhere between several meters and tens of centimeters wide.
  • one is typically interested in detecting roads in a large area such as an entire town or city.
  • one is generally faced with the problem of making predictions for millions if not billions of map pixels based on an equally large number of satellite image pixels.
  • the goal is to learn to predict patches of map M from patches of S.
  • the distribution may be modelled as:
  • n(I i,j ,w) is the w ⁇ w patch of image I centered at location (i,j).
  • w m is set to be smaller than w s because some context is required to predict the value of a map pixel. While w n can be set to 1 to predict one pixel at a time, it is generally more efficient to predict a small patch of labels from the same context.
  • Vectors s and ⁇ tilde over (m) ⁇ denote the aerial image patch n(S i,j ,w s ) and the map patch n( ⁇ tilde over (M) ⁇ i,j , w m ), respectively.
  • s) may be modelled by a Bernoulli distribution whose mean value is determined by the i th output unit of the neural network. This may be referred to as the “noise free” model.
  • a deep neural network may be used to model the map distribution.
  • the input to the neural network is a w s by w s patch of an aerial image encoded in the RGB color space, while the output is a w m by w m map patch.
  • the input layer to the neural network contains one input unit for each pixel of each color channel of the aerial image patch, for a total of 3w s 2 input units.
  • the input layer is followed by three hidden layers, although it will be appreciated that any number of hidden layers may be included in the neural network.
  • Each unit in a hidden layer computes a linear combination of some or all units in the previous layer. This linear combination is known as the input to the unit.
  • Each hidden unit computes an output by applying an activation function to its input. All hidden layers of the deep neural network make use of the rectified linear activation function, for which the output is defined as max(0, input). It has been found that rectified linear units typically perform better than logistic units on various image classification tasks and it has further been found that this advantage typically exists on image labeling tasks.
  • the first two hidden layers in the neural network are locally connected layers, in which each hidden unit is connected to only a subset of the input units.
  • the input units of a locally connected layer make up a w in ⁇ w in image, possibly consisting of multiple channels.
  • the channels can include but are not limited to RGB color channels, an infrared channel, and channels corresponding to other types of electromagnetic radiation.
  • the input image may be covered by evenly spaced filter sites by moving a w f ⁇ w f pixel window over the image by a stride of w str pixels vertically and horizontally, for a total of ((w in ⁇ w f )/w str +1) 2 filters sites. Each site then consists of a w f ⁇ w f window pixels which act as the input to f hidden units in the next layer.
  • each input patch may be preprocessed by subtracting the mean value of the pixels in that patch from all pixels and then dividing by the standard deviation found over all pixels in the dataset. This type of preprocessing may achieve some amount of contrast normalization between different patches.
  • a different set of f filters of size w f ⁇ w f and consisting of the same number of channels as the input image is applied at each filter site.
  • a single locally connected layer results in f ⁇ ((w in ⁇ w f )/w str +1) 2 hidden units.
  • the hidden units of one locally connected layer can then act as the input to the next locally connected layer by viewing the hidden units as a square image with f channels and width (w in ⁇ w f )/w str +1.
  • weight-sharing in convolutional architectures is advantageous on smaller datasets because it helps reduce overfitting by restricting the number of parameters, such a restriction is not required in the present invention because the abundance of labels, combined with random rotations, enables the avoidance of overfitting by training on millions of labeled aerial image patches.
  • the present locally connected architecture may be computationally and statistically more efficient than a fully connected architecture.
  • the third hidden layer may be fully connected, with each unit connected to every unit in the preceding hidden layer.
  • the output layer consists of w p 2 logistic units for which the output is 1/(1+exp( ⁇ input)).
  • w p w m and each output unit models the probability that the corresponding pixel in the w m by w m output map patch belongs to the class of interest.
  • the values of the parameters such as the number of filters f, their width w f , and stride w str may vary from problem to problem.
  • the parameters of the neural network may be learned by minimizing the negative log likelihood of the training data.
  • the negative log likelihood takes the form of a cross entropy between the patch ⁇ tilde over (m) ⁇ derived from the given map and the predicted patch ⁇ circumflex over (m) ⁇ , as follows:
  • ⁇ i 1 w m 2 ⁇ ( m ⁇ i ⁇ ln ⁇ ⁇ m ⁇ i + ( 1 - m ⁇ i ) ⁇ ln ⁇ ( 1 - m ⁇ i ) ) . ( 3 )
  • the foregoing objective function may be optimized using mini-batched stochastic gradient descent with momentum or any other suitable optimization method.
  • the output of the foregoing neural network is image map data.
  • the image map data and the existing map data from memory may be input to the comparator to determine a degree of similarity between them. It is likely there are dissimilarities based in part on noise in the input image.
  • Omission noise occurs when some map pixels are labeled as not belonging to the object class of interest when they, in fact, do.
  • a classifier When trained on data containing a substantial number of such pixels a classifier will be penalized for correctly predicting the value of 1 for pixels affected by omission noise. This will cause a classifier to be less confident and potentially increase the false negative rate.
  • the omission noise compensator applies a robust loss function that explicitly models asymmetric omission noise in order to reduce its effect on the final classifier to predict the generated map data close to the existing map data.
  • the noise-free model of the data from (2) assumes that the observed labels ⁇ tilde over (m) ⁇ are generated directly from the aerial image s.
  • label noise it is assumed that a true, uncorrupted, and unobserved map patch m is first generated from the aerial image patch s according to some distribution p(m
  • the corrupted, observed map ⁇ tilde over (m) ⁇ is then generated from the uncorrupted m according to a noise distribution p( ⁇ tilde over (m) ⁇
  • the omission model assumes that conditioned on m, all components of ⁇ tilde over (m) ⁇ are independent and that each ⁇ tilde over (m) ⁇ i is independent of all m j for j ⁇ i.
  • the observed map distribution that corresponds to this model can then be obtained by marginalizing out m, leading to:
  • m i ) may be assumed to be the same for all pixels i, and thus may be determined by parameters:
  • m i 1).
  • the relationship ⁇ 0 ⁇ 1 may be set because the probability that the observed label ⁇ tilde over (m) ⁇ i is 1 given that the true label m i is 0 should be very close to 0, while the probability that the observed ⁇ tilde over (m) ⁇ i is 0 given that the true label m i is 1 should still be small but not as close to 0 as ⁇ 0 .
  • This model may be referred to as the asymmetric Bernoulli noise model, or the ABN model for short.
  • m i ) that compensate for omission noise can also be used.
  • the map distribution in (2) was modelled directly by a deep neural network.
  • the neural network may be used to model the true map distribution p(m
  • the derivative of the negative log probability with respect to the input to the i th output unit of the neural network takes the form ⁇ tilde over (m) ⁇ i ⁇ circumflex over (m) ⁇ i .
  • the learning procedure predicts ⁇ circumflex over (m) ⁇ i close to the observed label ⁇ tilde over (m) ⁇ i .
  • the learning procedure predicts ⁇ circumflex over (m) ⁇ i close to the posterior probability that the unobserved true label m i is 1.
  • FIG. 3 demonstrates how the derivatives for the noise-free and the ABN models differ as a function of the prediction ⁇ circumflex over (m) ⁇ i .
  • the derivative of the log probability with respect to the input to the i th output unit for varying predictions ⁇ circumflex over (m) ⁇ i is shown.
  • the noise free model penalizes incorrect predictions more than the ABN model, which penalizes incorrect but confident predictions less.
  • Registration noise occurs when an aerial image and the corresponding map are not perfectly aligned. As shown in FIG. 1 , the error in alignment between the map and the aerial image can vary over the dataset and cannot be corrected by a global translation.
  • the registration noise compensator reduces registration errors using local translations of the labels.
  • the registration noise compensator extends the robust loss function to also handle local registration errors.
  • a generative model of the observed map patches is used.
  • the generative model works by first generating an uncorrupted and perfectly registered map from the aerial image, then selecting a random subpatch of the true map, and finally generating the observed map by corrupting the selected subpatch with asymmetric noise. More formally, the generative process is as follows:
  • a translation variable t is sampled from some distribution p(t) over T+1 possible values 0, . . . , T.
  • Crop(m,t) selects a w m by w m subpatch from the w m′ by w m′ patch m according to the translation variable t as shown in FIG. 4
  • p ABN ⁇ tilde over (m) ⁇ i
  • t max a total of four parameters
  • ⁇ t two parameters needed to define p ABN ( ⁇ tilde over (m) ⁇ i
  • This generative model may be referred to as the translational asymmetric binary noise model, or the TABN model for short.
  • m,t) may be set using a validation set and the parameters of p(m
  • the required EM updates can be performed efficiently.
  • M-step Since p(m
  • the required derivative of the expected log likelihood is:
  • ⁇ circumflex over (m) ⁇ i is value of the i th output unit of the neural network and x i is the input to the i th output unit.
  • the updates for all weights of the neural network can be computed from the above equation using backpropagation.
  • E-step The role of the E-step is to compute p(m i
  • C t is defined to be the set of indices of pixels of m that are cropped for transformation t. Since this set will have w m 2 entries, it may also be used to index into m.
  • p ⁇ ( m i ⁇ m ⁇ , s ) [ ⁇ t ⁇ ⁇ m - i ⁇ p ⁇ ( t ) ⁇ p ⁇ ( m ⁇ ⁇ m , t ) ⁇ p ⁇ ( m ⁇ s ) ] / p ⁇ ( m ⁇ s ) , ( 10 )
  • the w m′ ⁇ w m′ patch ⁇ tilde over (m) ⁇ may be constructed out of 16 non-overlapping w p ⁇ w p patches predicted by the neural net.
  • ⁇ tilde over (m) ⁇ ,s) is then determined as described above, separated into 16 non-overlapping w p ⁇ w p subpatches, and the derivatives from all the subpatches backpropagated through the neural network.
  • the neural network predicts 16 by 16 patches of map from 64 by 64 patches of aerial image.
  • the first hidden layer may have filter width 12 with stride 4 and 64 filters at each site.
  • the second hidden layer may have filter width 4 with stride 2 and 256 filters at each site.
  • the third hidden layer may have 4096 hidden units.

Abstract

A system and method for labelling aerial images. A neural network generates predicted map data. The parameters of the neural network are trained by optimizing an objective function which compensates for noise in the map images. The function compensates both omission noise and registration noise.

Description

    CLAIM OF PRIORITY
  • This application claims priority under 35 USC §119(e) to U.S. patent application Ser. No. 13/924,320, filed on Jun. 21, 2013, which claims the benefit of priority to U.S. Provisional Patent Application Ser. No. 61/663,297, filed on Jun. 22, 2012, the entire contents of which are hereby incorporated by reference.
  • TECHNICAL FIELD
  • The following relates generally to the labelling of aerial images with map data.
  • BACKGROUND
  • Information extracted from photographs of the earth's surface that were taken by airborne sensors has found applications in a wide range of areas including urban planning, crop and forest management, disaster relief, and climate modeling. Relying on human experts for extracting information from aerial imagery is both slow and costly, so automatic aerial image interpretation has received much attention in the remote sensing community. So far, there are only a few, semi-automated systems that operate in limited domains.
  • In machine learning applications, aerial image interpretation is usually formulated as a pixel labelling task. The goal is to produce either a complete semantic segmentation of an aerial image into classes such as building, road, tree, grass, and water or a binary classification of the image for a single object class. In both scenarios, the availability of accurately labelled data for training tends to be the limiting factor. Hand-labelled data tends to be reasonably accurate, but the cost of hand-labelling and the lack of publicly available hand-labelled datasets strongly restrict the size of the training and test sets for aerial image labelling tasks.
  • At present, maps of many major cities not only provide the locations of most roads and parks, but also the locations of buildings. Thus, one alternative to using hand-labelled data is to use maps from projects such as OpenStreetMap™ for constructing the labels. For object types covered by these maps, it is now possible to construct datasets that are much larger than the ones that have been hand-labelled. While the use of these larger datasets has improved the performance of machine learning methods on some aerial image recognition tasks, datasets constructed from maps suffer from two types of label noise: omission noise and registration noise. FIG. 1 shows an example of omission noise and registration noise in a mapping application.
  • Omission noise occurs when an object that appears in an aerial image does not appear in the map. This is the case for many buildings (even in major cities) due to incompleteness of the maps. It is also true for small roads and alleys, which tend to be omitted from maps, often with no clear criterion for when they should be omitted.
  • Registration noise occurs when the location of an object in a map is inaccurate. Such errors are quite common because not requiring pixel level accuracy makes maps cheaper to produce for human experts without significantly reducing their usefulness for most purposes.
  • The presence of these kinds of errors in the training labels significantly reduces the accuracy of classifiers trained on this data.
  • It is an object of the present invention to mitigate or obviate at least one of the above disadvantages.
  • SUMMARY OF THE INVENTION
  • In one aspect, a system for labelling aerial images is provided, the system comprising a neural network for generating predicted map data wherein the parameters of the neural network are trained by optimizing an objective function which compensates for noise in the map images.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Embodiments will now be described by way of example only with reference to the appended drawings wherein:
  • FIG. 1 is an example of omission noise and registration noise in a mapping application;
  • FIG. 2 is a system in accordance with the present invention;
  • FIG. 3 is an exemplary graphical representation of a model of the present invention; and
  • FIG. 4 is a demonstration of a function of the present invention.
  • DETAILED DESCRIPTION OF THE DRAWINGS
  • Embodiments will now be described with reference to the figures. It will be appreciated that for simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the embodiments described herein. Also, the description is not to be considered as limiting the scope of the embodiments described herein.
  • It will also be appreciated that any module, component, server, computer, terminal or device exemplified herein that executes instructions may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of the device or accessible or connectable thereto. Any application or module herein described may be implemented using computer readable/executable instructions that may be stored or otherwise held by such computer readable media.
  • When training a system to label images, the amount of labeled training data tends to be a limiting factor.
  • The present invention provides a system and method for labelling aerial images. The invention provides a training unit for reducing and compensating omissions noise and registration noise in map images. A neural network generates predicted map data. The parameters of the neural network are trained by optimizing an objective function which compensates for noise in the map images. The function compensates both omission noise and registration noise.
  • Referring now to FIG. 2, the training unit comprises a memory 202, a neural network 204, an omission noise compensator 206, a registration noise compensator 208 and a comparator 210. The parameters of the neural network are selected by reducing a negative log likelihood function, or another objective function, generated by the omission noise compensator and registration noise compensator. The neural network can be used to generate map data during and after training.
  • The memory stores existing map data 212 and corresponding aerial images 214 of a training set. The neural network is trained by adjusting a loss function, which models or compensates the omission noise and registration noise in the map image, to produce output map data that closely corresponds to the existing map data. The degree of such correspondence is determined by inputting the output map data and existing map data to a comparator.
  • By adjusting the neural network based on the degree of correspondence and to implement the omission noise compensator and registration noise compensator, the training process for the neural network learns to label aerial images from existing maps that may provide abundant labels, notwithstanding that such existing maps may be incomplete, poorly registered or both.
  • In one example, the neural network may comprise an input layer, output layer and three hidden layers, although it will be appreciated that any number of hidden layers may be included in the neural network. The neural network is trained using a robust loss function for reducing the effect of omission errors on a resulting classifier and for compensating omission noise and registration noise in the training data. Initializing neural networks using unsupervised learning methods is believed to improve performance on a variety of vision tasks and, in the present disclosure, unsupervised pretraining is used to initialize the deep neural network.
  • An exemplary aerial image labeling task comprises binary labels, where all pixels are labelled as either belonging to an object class of interest with 1's or all other pixels with 0's. Road and buildings are two examples of such object classes, however it will be appreciated that the present invention is operable for many other classes.
  • In a high-resolution aerial image, a single pixel can represent a square patch of land that is anywhere between several meters and tens of centimeters wide. At the same time one is typically interested in detecting roads in a large area such as an entire town or city. Hence, one is generally faced with the problem of making predictions for millions if not billions of map pixels based on an equally large number of satellite image pixels.
  • For these reasons, the probability that {tilde over (M)}i,j=1 has typically been modeled as a function of some relatively small subset of S that contains location (i,j) instead of the entire image S. If S is an aerial/satellite image and {tilde over (M)} is the corresponding map image of equal size produced from the given map, {tilde over (M)}i,j=1 whenever the pixel at location (i,j) contains the object of interest and {tilde over (M)}i,j=0 otherwise.
  • The goal is to learn to predict patches of map M from patches of S. Following a probabilistic approach, the distribution may be modelled as:

  • P(n({tilde over (M)}i,j,wm)|n(Si,j,ws)),   (1)
  • where n(Ii,j,w) is the w×w patch of image I centered at location (i,j). Typically wm is set to be smaller than ws because some context is required to predict the value of a map pixel. While wn can be set to 1 to predict one pixel at a time, it is generally more efficient to predict a small patch of labels from the same context.
  • The following notation will be used in the present specification. Vectors s and {tilde over (m)} denote the aerial image patch n(Si,j,ws) and the map patch n({tilde over (M)}i,j, wm), respectively. Given the work of the present inventors in Mnih, Volodymyr and Hinton, Geoffrey, “Learning to Detect Roads in High-Resolution Aerial Images”, Proceedings of the 11th European Conference on Computer Vision (ECCV), September 2010, incorporated by reference herein, conditional independence of the map pixels may be presumed and the map distribution may be modelled as:
  • p ( m ~ s ) = i = 1 w m 2 p ( m ~ i s ) . ( 2 )
  • using a neural network. Each p({tilde over (m)}i|s) may be modelled by a Bernoulli distribution whose mean value is determined by the i th output unit of the neural network. This may be referred to as the “noise free” model.
  • A deep neural network may be used to model the map distribution. The input to the neural network is a ws by ws patch of an aerial image encoded in the RGB color space, while the output is a wm by wm map patch. The input layer to the neural network contains one input unit for each pixel of each color channel of the aerial image patch, for a total of 3ws 2 input units.
  • The input layer is followed by three hidden layers, although it will be appreciated that any number of hidden layers may be included in the neural network. Each unit in a hidden layer computes a linear combination of some or all units in the previous layer. This linear combination is known as the input to the unit. Each hidden unit computes an output by applying an activation function to its input. All hidden layers of the deep neural network make use of the rectified linear activation function, for which the output is defined as max(0, input). It has been found that rectified linear units typically perform better than logistic units on various image classification tasks and it has further been found that this advantage typically exists on image labeling tasks.
  • The first two hidden layers in the neural network are locally connected layers, in which each hidden unit is connected to only a subset of the input units.
  • To precisely define the connectivity pattern, assume that the input units of a locally connected layer make up a win×win image, possibly consisting of multiple channels. The channels can include but are not limited to RGB color channels, an infrared channel, and channels corresponding to other types of electromagnetic radiation. The input image may be covered by evenly spaced filter sites by moving a wf×wf pixel window over the image by a stride of wstr pixels vertically and horizontally, for a total of ((win−wf)/wstr+1)2 filters sites. Each site then consists of a wf×wf window pixels which act as the input to f hidden units in the next layer.
  • Further, each input patch may be preprocessed by subtracting the mean value of the pixels in that patch from all pixels and then dividing by the standard deviation found over all pixels in the dataset. This type of preprocessing may achieve some amount of contrast normalization between different patches.
  • A different set of f filters of size wf×wf and consisting of the same number of channels as the input image is applied at each filter site. Hence, a single locally connected layer results in f·((win−wf)/wstr+1)2 hidden units. The hidden units of one locally connected layer can then act as the input to the next locally connected layer by viewing the hidden units as a square image with f channels and width (win−wf)/wstr+1.
  • Unlike a convolutional or tiled net, there is no weight-sharing of any kind. While weight-sharing in convolutional architectures is advantageous on smaller datasets because it helps reduce overfitting by restricting the number of parameters, such a restriction is not required in the present invention because the abundance of labels, combined with random rotations, enables the avoidance of overfitting by training on millions of labeled aerial image patches. Like convolutional architectures, the present locally connected architecture may be computationally and statistically more efficient than a fully connected architecture.
  • The third hidden layer may be fully connected, with each unit connected to every unit in the preceding hidden layer. The output layer consists of wp 2 logistic units for which the output is 1/(1+exp(−input)). Typically, wp=wm and each output unit models the probability that the corresponding pixel in the wm by wm output map patch belongs to the class of interest.
  • The values of the parameters such as the number of filters f, their width wf, and stride wstr may vary from problem to problem.
  • As previously mentioned, the parameters of the neural network may be learned by minimizing the negative log likelihood of the training data. For the noise-free model given in (2) the negative log likelihood takes the form of a cross entropy between the patch {tilde over (m)} derived from the given map and the predicted patch {circumflex over (m)}, as follows:
  • i = 1 w m 2 ( m ~ i ln m ^ i + ( 1 - m ~ i ) ln ( 1 - m ^ i ) ) . ( 3 )
  • The foregoing objective function may be optimized using mini-batched stochastic gradient descent with momentum or any other suitable optimization method.
  • The output of the foregoing neural network is image map data. The image map data and the existing map data from memory may be input to the comparator to determine a degree of similarity between them. It is likely there are dissimilarities based in part on noise in the input image.
  • Omission noise, as shown in FIG. 1, occurs when some map pixels are labeled as not belonging to the object class of interest when they, in fact, do. When trained on data containing a substantial number of such pixels a classifier will be penalized for correctly predicting the value of 1 for pixels affected by omission noise. This will cause a classifier to be less confident and potentially increase the false negative rate.
  • The omission noise compensator applies a robust loss function that explicitly models asymmetric omission noise in order to reduce its effect on the final classifier to predict the generated map data close to the existing map data.
  • The noise-free model of the data from (2) assumes that the observed labels {tilde over (m)} are generated directly from the aerial image s. In order to model label noise, it is assumed that a true, uncorrupted, and unobserved map patch m is first generated from the aerial image patch s according to some distribution p(m|s). The corrupted, observed map {tilde over (m)} is then generated from the uncorrupted m according to a noise distribution p({tilde over (m)}|m).
  • For simplicity, the omission model assumes that conditioned on m, all components of {tilde over (m)} are independent and that each {tilde over (m)}i is independent of all mj for j≠i. The observed map distribution that corresponds to this model can then be obtained by marginalizing out m, leading to:
  • p ( m ~ s ) = m p ( m ~ m ) p ( m s )                                                         ( 4 ) = i = 1 w m 2 m i p ( m ~ i m i ) p ( m i s ) . ( 5 )
  • The noise distribution p({tilde over (m)}i|mi) may be assumed to be the same for all pixels i, and thus may be determined by parameters:

  • θ0=p({tilde over (m)}i=1|mi=0) and,

  • θ1=p({tilde over (m)}i=0|mi=1).
  • For modeling omission noise, the relationship θ0<<θ1 may be set because the probability that the observed label {tilde over (m)}i is 1 given that the true label mi is 0 should be very close to 0, while the probability that the observed {tilde over (m)}i is 0 given that the true label mi is 1 should still be small but not as close to 0 as θ0. This model may be referred to as the asymmetric Bernoulli noise model, or the ABN model for short. Other ways of parameterizing p({tilde over (m)}i|mi) that compensate for omission noise can also be used.
  • In the noise-free scenario, the map distribution in (2) was modelled directly by a deep neural network. In the noisy setting, the neural network may be used to model the true map distribution p(m|s). Learning can still be done efficiently by minimizing the negative log probability of the training data under the ABN model given in (5). Since the ABN model factorizes over the pixels i and there is only a single Bernoulli latent variable mi for each pixel i, the derivative of the negative log probability can be found directly. The resulting updates can also be seen as an application of the EM algorithm with an online partial M-step.
  • In the noise-free scenario, the derivative of the negative log probability with respect to the input to the i th output unit of the neural network takes the form {tilde over (m)}i−{circumflex over (m)}i. The learning procedure predicts {circumflex over (m)}i close to the observed label {tilde over (m)}i. Under the ABN model, this derivative takes the form p(mi=1|{tilde over (m)}i,s)−{circumflex over (m)}i. Hence, the learning procedure predicts {circumflex over (m)}i close to the posterior probability that the unobserved true label mi is 1. This has the effect that the neural network gets penalized less for making a confident but incorrect prediction. FIG. 3 demonstrates how the derivatives for the noise-free and the ABN models differ as a function of the prediction {circumflex over (m)}i. The derivative of the log probability with respect to the input to the i th output unit for varying predictions {circumflex over (m)}i is shown. The observed value {tilde over (m)}i is set to 0 while the parameters of the ABN model are θ0=0.001 and θ1=0.05. The noise free model penalizes incorrect predictions more than the ABN model, which penalizes incorrect but confident predictions less.
  • Registration noise occurs when an aerial image and the corresponding map are not perfectly aligned. As shown in FIG. 1, the error in alignment between the map and the aerial image can vary over the dataset and cannot be corrected by a global translation. The registration noise compensator reduces registration errors using local translations of the labels.
  • The registration noise compensator extends the robust loss function to also handle local registration errors. As with the ABN model, a generative model of the observed map patches is used. On a high level, the generative model works by first generating an uncorrupted and perfectly registered map from the aerial image, then selecting a random subpatch of the true map, and finally generating the observed map by corrupting the selected subpatch with asymmetric noise. More formally, the generative process is as follows:
  • 1) An uncorrupted and perfectly registered true map patch m of size wm′×wm′ is generated from s according to p(m|s). The relationship wm′=wm+2tmax is set, where tmax is the maximum possible registration error/translation between the map and aerial image measured in pixels.
  • 2) A translation variable t is sampled from some distribution p(t) over T+1 possible values 0, . . . , T. For example, T=8, where t=0 corresponds to no translation while 1, . . . , T index 8 possible translations by 0 or tmax pixels in the vertical and horizontal directions as well as their combination. This is shown in FIG. 4, wherein for each dark gray patch representing m, the lighter gray subpatch highlights the area cropped by Crop(m,t) for the stated translation parameter t.
  • 3) An observed map is sampled from the translational noise distribution:
  • p ( m ~ m , t ) = p ( m ~ Crop ( m , t ) ) ( 6 ) = i = 1 w m 2 p ABN ( m ~ i Crop ( m , t ) i ) , ( 7 )
  • where Crop(m,t) selects a wm by wm subpatch from the wm′ by wm′ patch m according to the translation variable t as shown in FIG. 4, and pABN({tilde over (m)}i|mi) is the pixelwise asymmetric binary noise model defined in the previous section.
  • For simplicity it may be assumed that p(t=i)=(1−p(t=0))/T for all i≠0 and p(t) may be parameterized using only a single parameter, θi=p(t=0). Hence, a total of four parameters may be used: tmax, θt, and two parameters needed to define pABN({tilde over (m)}i|mi) This generative model may be referred to as the translational asymmetric binary noise model, or the TABN model for short.
  • The observed map distribution under the TABN model is given by:
  • p ( m ~ s ) = t = 0 T p ( t ) m p ( m ~ m , t ) p ( m s ) . ( 8 )
  • The parameters p(t) and p({tilde over (m)}|m,t) may be set using a validation set and the parameters of p(m|s) may be learned by minimizing the negative log likelihood in (8) using the EM-algorithm. The required EM updates can be performed efficiently.
  • M-step: Since p(m|s) is modelled by a neural network, a full M-step cannot be performed and instead an approximate partial M-step is performed by a single gradient descent update of the neural network parameters on a mini-batch of training cases. The required derivative of the expected log likelihood is:
  • x i t m p ( m , t m ~ , s ) ln p ( m s ) = p ( m i = 1 m ~ , s ) - m ^ i
  • where {circumflex over (m)}i, is value of the i th output unit of the neural network and xi is the input to the i th output unit. The updates for all weights of the neural network can be computed from the above equation using backpropagation.
  • E-step: The role of the E-step is to compute p(mi|{tilde over (m)},s) for use in the M-step, and this computation can be done in time T·wm 2 by exploiting the structure of the noise model.
  • Ct is defined to be the set of indices of pixels of m that are cropped for transformation t. Since this set will have wm 2 entries, it may also be used to index into m. By defining:
  • P t = i C t ( m i p ( m ~ i m i ) p ( m i s ) ) , ( 9 )
  • the observed map distribution can be rewritten as p({tilde over (m)}|s)=Σtp(t)·Pt. Now using the identity
  • p ( m i m ~ , s ) = [ t m - i p ( t ) p ( m ~ m , t ) p ( m s ) ] / p ( m s ) , ( 10 )
  • where mi denotes all entries of m other than i, p(mi|{tilde over (m)},s) can be expressed as
  • [ t p ( t ) · P t · p ( m ~ i m i ) p ( m i s ) m i p ( m ~ i m i ) p ( m i s ) ] / [ t p ( t ) · P t ] . ( 11 )
  • The width of patches to which the noise model is applied (wm′) can be different from the width of patches predicted by the neural network (wp). This enables the decoupling of the size of patch for which registration error is assumed to be constant from the size of predicted patch. For example, wm′=4wp. In this case, the wm′×wm′ patch {tilde over (m)} may be constructed out of 16 non-overlapping wp×wp patches predicted by the neural net. The wm′×wm′ patch of posterior marginals p(mi|{tilde over (m)},s) is then determined as described above, separated into 16 non-overlapping wp×wp subpatches, and the derivatives from all the subpatches backpropagated through the neural network.
  • In an exemplary embodiment the neural network predicts 16 by 16 patches of map from 64 by 64 patches of aerial image. The first hidden layer may have filter width 12 with stride 4 and 64 filters at each site. The second hidden layer may have filter width 4 with stride 2 and 256 filters at each site. The third hidden layer may have 4096 hidden units.
  • Although the above has been described with reference to certain specific example embodiments, various modifications thereof will be apparent to those skilled in the art without departing from the scope of the claims appended hereto.

Claims (21)

1-20. (canceled)
21. A method performed by one or more computers, the method comprising:
training a neural network on training data, wherein the training data includes a plurality of labeled aerial images, wherein the neural network is configured to generate a respective score for each of a plurality of pixels of an input aerial image, wherein the respective scores for each of the plurality of pixels represent a probability that the pixel belongs to an object class, and
wherein training the neural network on the training data comprises:
optimizing an objective function having a derivative that includes a posterior probability term that represents a likelihood that labels for pixels from the labeled aerial images differ from true labels for the pixels, wherein, for each labeled aerial image, the posterior probability term accounts for (i) a first predetermined likelihood representative of a label for the pixels indicating that the pixels belong to the object class given that a true label for the pixels indicates that the pixels do not belong to the object class, and (ii) a second predetermined likelihood representative of a label for the pixels indicating that the pixels do not belong to the object class given that the true label for the pixels indicates that the pixels belong to the object class, a value of the first predetermined likelihood relative to a value of the second predetermined likelihood being determined according to an asymmetric noise distribution model,
wherein the posterior probability term compensates for omission errors in the training data, an omission error occurring when a pixel in a labeled aerial image has been incorrectly labeled as not belonging to the object class.
22. The method of claim 21, wherein the objective function is a negative log likelihood function.
23. The method of claim 21, wherein the derivative takes the form of, for a pixel i in a particular labeled aerial image, p(m1=1|{tilde over (m)}i,s)−{circumflex over (m)}i, wherein p(mi=1|{tilde over (m)}i,s) is a posterior probability that a true label mi for the pixel i is 1 given a label for the pixel i in the particular labeled aerial image {tilde over (m)}i and a portion s of the aerial image containing the pixel i, and wherein {circumflex over (m)}i is a probability predicted for the pixel i by the neural network.
24. The method of claim 21, wherein:
the object class is roads; and
the respective scores for each of the plurality of pixels represent a probability that the pixel is a part of an image of a road.
25. The method of claim 22, wherein:
the value of the second predetermined likelihood greater than the first predetermined likelihood and less than 0.1; and
the value of the first predetermined likelihood is greater than zero.
26. The method of claim 25, wherein:
the asymmetric noise distribution model is an asymmetric Bernoulli noise distribution model; and
optimizing the objective function comprises minimizing the objective function using the asymmetric Bernoulli noise distribution model.
27. The method of claim 22, further comprising:
selecting one or more parameters of the neural network; and
generating map data during and after the training of the neural network using the selected one or more parameters.
28. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising:
training a neural network on training data, wherein the training data includes a plurality of labeled aerial images, wherein the neural network is configured to generate a respective score for each of a plurality of pixels of an input aerial image, wherein the respective scores for each of the plurality of pixels represent a probability that the pixel belongs to an object class, and
wherein training the neural network on the training data comprises:
optimizing an objective function having a derivative that includes a posterior probability term that represents a likelihood that labels for pixels from the labeled aerial images differ from true labels for the pixels, wherein, for each labeled aerial image, the posterior probability term accounts for (i) a first predetermined likelihood representative of a label for the pixels indicating that the pixels belong to the object class given that a true label for the pixels indicate that the pixels do not belong to the object class, and (ii) a second predetermined likelihood representative of a label for the pixels indicating that the pixels do not belong to the object class given that the true label for the pixels indicates that the pixels belong to the object class, a value of the first predetermined likelihood relative to a value of the second predetermined likelihood being determined according to an asymmetric noise distribution model,
wherein the posterior probability term compensates for omission errors in the training data, an omission error occurring when a pixel in a labeled aerial image has been incorrectly labeled as not belonging to the object class.
29. The system of claim 28, wherein the objective function is a negative log likelihood function.
30. The system of claim 28, wherein the derivative takes the form of, for a pixel i in a particular labeled aerial image, p(mi=1|{tilde over (m)}i,s)−{circumflex over (m)}i, wherein p(mi=1|{tilde over (m)}i,s) is a posterior probability that a true label mi for the pixel i is 1 given a label for the pixel i in the particular labeled aerial image {tilde over (m)}i and a portion s of the aerial image containing the pixel i, and wherein {circumflex over (m)}i is a probability predicted for the pixel i by the neural network.
31. The system of claim 28, wherein:
the object class is roads; and
the respective scores for each of the plurality of pixels represent a probability that the pixel is a part of an image of a road.
32. The system of claim 28, wherein:
the value of the second predetermined likelihood greater than the first predetermined likelihood and less than 0.1; and
the value of the first predetermined likelihood is greater than zero.
33. The system of claim 28, wherein:
the asymmetric noise distribution model is an asymmetric Bernoulli noise distribution model; and
optimizing the objective function comprises minimizing the objective function using the asymmetric Bernoulli noise distribution model.
34. The system of claim 28, wherein the operations further comprise:
selecting one or more parameters of the neural network; and
generating map data during and after the training of the neural network using the selected one or more parameters.
35. One or more non-transitory computer-readable storage media comprising instructions, which, when executed by one or more computers, cause the one or more computers to perform actions comprising:
training a neural network on training data, wherein the training data includes a plurality of labeled aerial images, wherein the neural network is configured to generate a respective score for each of a plurality of pixels of an input aerial image, wherein the respective scores for each of the plurality of pixels represent a probability that the pixel belongs to an object class, and
wherein training the neural network on the training data comprises:
optimizing an objective function having a derivative that includes a posterior probability term that represents a likelihood that labels for pixels from the labeled aerial images differ from true labels for the pixels, wherein, for each labeled aerial image, the posterior probability term accounts for (i) a first predetermined likelihood representative of a label for the pixels indicating that the pixels belong to the object class given that a true label for the pixels indicate that the pixels do not belong to the object class, and (ii) a second predetermined likelihood representative of a label for the pixels indicating that the pixels do not belong to the object class given that the true label for the pixels indicates that the pixels belong to the object class, a value of the first predetermined likelihood relative to a value of the second predetermined likelihood being determined according to an asymmetric noise distribution model,
wherein the posterior probability term compensates for omission errors in the training data, an omission error occurring when a pixel in a labeled aerial image has been incorrectly labeled as not belonging to the object class.
36. The one or more non-transitory computer-readable storage media of claim 35, wherein the objective function is a negative log likelihood function.
37. The one or more non-transitory computer-readable storage media of claim 29, wherein the derivative takes the form of, for a pixel i in a particular labeled aerial image, p(mi=1|{tilde over (m)}i,s)−{circumflex over (m)}i, wherein p(mi=1|{tilde over (m)}i,s) is a posterior probability that a true label mi for the pixel i is 1 given a label for the pixel i in the particular labeled aerial image {tilde over (m)}i and a portion s of the aerial image containing the pixel i, and wherein {circumflex over (m)}i is a probability predicted for the pixel i by the neural network.
38. The one or more non-transitory computer-readable storage media of claim 35, wherein:
the object class is roads; and
the respective scores for each of the plurality of pixels represent a probability that the pixel is a part of an image of a road.
39. The one or more non-transitory computer-readable storage media of claim 35, wherein the operations further comprise:
selecting one or more parameters of the neural network; and
generating map data during and after the training of the neural network using the selected one or more parameters.
40. The one or more non-transitory computer-readable storage media of claim 35, wherein:
the asymmetric noise distribution model is an asymmetric Bernoulli noise distribution model; and
optimizing the objective function comprises minimizing the objective function using the asymmetric Bernoulli noise distribution model.
US15/497,378 2012-06-22 2017-04-26 System and method for labelling aerial images Abandoned US20170228871A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/497,378 US20170228871A1 (en) 2012-06-22 2017-04-26 System and method for labelling aerial images

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201261663297P 2012-06-22 2012-06-22
US13/924,320 US9704068B2 (en) 2012-06-22 2013-06-21 System and method for labelling aerial images
US15/497,378 US20170228871A1 (en) 2012-06-22 2017-04-26 System and method for labelling aerial images

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/924,320 Continuation US9704068B2 (en) 2012-06-22 2013-06-21 System and method for labelling aerial images

Publications (1)

Publication Number Publication Date
US20170228871A1 true US20170228871A1 (en) 2017-08-10

Family

ID=49774516

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/924,320 Active 2034-02-14 US9704068B2 (en) 2012-06-22 2013-06-21 System and method for labelling aerial images
US15/497,378 Abandoned US20170228871A1 (en) 2012-06-22 2017-04-26 System and method for labelling aerial images

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/924,320 Active 2034-02-14 US9704068B2 (en) 2012-06-22 2013-06-21 System and method for labelling aerial images

Country Status (1)

Country Link
US (2) US9704068B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503049A (en) * 2019-08-26 2019-11-26 重庆邮电大学 Based on the satellite video number of vehicles estimation method for generating confrontation network
CN111368903A (en) * 2020-02-28 2020-07-03 深圳前海微众银行股份有限公司 Model performance optimization method, device, equipment and storage medium
RU2747044C1 (en) * 2020-06-15 2021-04-23 Российская Федерация, от имени которой выступает ФОНД ПЕРСПЕКТИВНЫХ ИССЛЕДОВАНИЙ Hardware-software complex designed for training and (or) re-training of processing algorithms for aerial photographs of the territory for detection, localization and classification up to type of aviation and ground equipment
RU2747214C1 (en) * 2020-06-10 2021-04-29 Российская Федерация, от имени которой выступает ФОНД ПЕРСПЕКТИВНЫХ ИССЛЕДОВАНИЙ Hardware-software complex designed for training and (or) re-training of processing algorithms for aerial photographs in visible and far infrared band for detection, localization and classification of buildings outside of localities
US11669724B2 (en) * 2018-05-17 2023-06-06 Raytheon Company Machine learning using informed pseudolabels

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9401148B2 (en) 2013-11-04 2016-07-26 Google Inc. Speaker verification using neural networks
US9620145B2 (en) 2013-11-01 2017-04-11 Google Inc. Context-dependent state tying using a neural network
WO2015154216A1 (en) 2014-04-08 2015-10-15 Microsoft Technology Licensing, Llc Deep learning using alternating direction method of multipliers
US9978013B2 (en) * 2014-07-16 2018-05-22 Deep Learning Analytics, LLC Systems and methods for recognizing objects in radar imagery
US9558268B2 (en) 2014-08-20 2017-01-31 Mitsubishi Electric Research Laboratories, Inc. Method for semantically labeling an image of a scene using recursive context propagation
US10223816B2 (en) 2015-02-13 2019-03-05 Here Global B.V. Method and apparatus for generating map geometry based on a received image and probe data
US10339440B2 (en) 2015-02-19 2019-07-02 Digital Reasoning Systems, Inc. Systems and methods for neural language modeling
WO2016197303A1 (en) 2015-06-08 2016-12-15 Microsoft Technology Licensing, Llc. Image semantic segmentation
US9786270B2 (en) 2015-07-09 2017-10-10 Google Inc. Generating acoustic models
US10229672B1 (en) 2015-12-31 2019-03-12 Google Llc Training acoustic models using connectionist temporal classification
US9858340B1 (en) 2016-04-11 2018-01-02 Digital Reasoning Systems, Inc. Systems and methods for queryable graph representations of videos
US9940551B1 (en) * 2016-06-17 2018-04-10 Google Llc Image generation using neural networks
US20180018973A1 (en) 2016-07-15 2018-01-18 Google Inc. Speaker verification
CN109564633B (en) * 2016-08-08 2023-07-18 诺基亚技术有限公司 Artificial neural network
US10592776B2 (en) * 2017-02-08 2020-03-17 Adobe Inc. Generating multimodal image edits for a digital image
CN106980896B (en) * 2017-03-16 2019-11-26 武汉理工大学 The crucial convolutional layer hyper parameter of Classification in Remote Sensing Image convolutional neural networks determines method
US10706840B2 (en) * 2017-08-18 2020-07-07 Google Llc Encoder-decoder models for sequence to sequence mapping
US10572775B2 (en) * 2017-12-05 2020-02-25 X Development Llc Learning and applying empirical knowledge of environments by robots
CN111465936B (en) 2017-12-06 2023-11-24 北京嘀嘀无限科技发展有限公司 System and method for determining new road on map
US11030457B2 (en) * 2018-12-17 2021-06-08 Here Global B.V. Lane feature detection in aerial images based on road geometry
CN110110780B (en) * 2019-04-30 2023-04-07 南开大学 Image classification method based on antagonistic neural network and massive noise data
CN110175574A (en) * 2019-05-28 2019-08-27 中国人民解放军战略支援部队信息工程大学 A kind of Road network extraction method and device
CN110211138B (en) * 2019-06-08 2022-12-02 西安电子科技大学 Remote sensing image segmentation method based on confidence points
CN110598564B (en) * 2019-08-16 2022-02-11 浙江工业大学 OpenStreetMap-based high-spatial-resolution remote sensing image transfer learning classification method
US11250580B2 (en) * 2019-09-24 2022-02-15 Dentsply Sirona Inc. Method, system and computer readable storage media for registering intraoral measurements
CN111079847B (en) * 2019-12-20 2023-05-02 郑州大学 Remote sensing image automatic labeling method based on deep learning
WO2022025788A1 (en) * 2020-07-31 2022-02-03 Harman International Industries, Incorporated Method and apparatus for predicting virtual road sign locations
CN112966579B (en) * 2021-02-24 2021-11-30 湖南三湘绿谷生态科技有限公司 Large-area camellia oleifera forest rapid yield estimation method based on unmanned aerial vehicle remote sensing
US11908185B2 (en) * 2022-06-30 2024-02-20 Metrostudy, Inc. Roads and grading detection using satellite or aerial imagery

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6721660B2 (en) * 2002-08-19 2004-04-13 Korea Advanced Institute Of Science And Technology Road extraction from images using template matching
US8675995B2 (en) * 2004-07-09 2014-03-18 Terrago Technologies, Inc. Precisely locating features on geospatial imagery
US7660441B2 (en) * 2004-07-09 2010-02-09 Southern California, University System and method for fusing geospatial data
US7359555B2 (en) * 2004-10-08 2008-04-15 Mitsubishi Electric Research Laboratories, Inc. Detecting roads in aerial images using feature-based classifiers
US8111923B2 (en) * 2008-08-14 2012-02-07 Xerox Corporation System and method for object class localization and semantic class based image segmentation
US8319793B2 (en) * 2009-04-17 2012-11-27 Definiens Ag Analyzing pixel data by imprinting objects of a computer-implemented network structure into other objects
US8565536B2 (en) * 2010-04-01 2013-10-22 Microsoft Corporation Material recognition from an image

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11669724B2 (en) * 2018-05-17 2023-06-06 Raytheon Company Machine learning using informed pseudolabels
CN110503049A (en) * 2019-08-26 2019-11-26 重庆邮电大学 Based on the satellite video number of vehicles estimation method for generating confrontation network
CN111368903A (en) * 2020-02-28 2020-07-03 深圳前海微众银行股份有限公司 Model performance optimization method, device, equipment and storage medium
RU2747214C1 (en) * 2020-06-10 2021-04-29 Российская Федерация, от имени которой выступает ФОНД ПЕРСПЕКТИВНЫХ ИССЛЕДОВАНИЙ Hardware-software complex designed for training and (or) re-training of processing algorithms for aerial photographs in visible and far infrared band for detection, localization and classification of buildings outside of localities
RU2747044C1 (en) * 2020-06-15 2021-04-23 Российская Федерация, от имени которой выступает ФОНД ПЕРСПЕКТИВНЫХ ИССЛЕДОВАНИЙ Hardware-software complex designed for training and (or) re-training of processing algorithms for aerial photographs of the territory for detection, localization and classification up to type of aviation and ground equipment

Also Published As

Publication number Publication date
US20130343641A1 (en) 2013-12-26
US9704068B2 (en) 2017-07-11

Similar Documents

Publication Publication Date Title
US20170228871A1 (en) System and method for labelling aerial images
US10839543B2 (en) Systems and methods for depth estimation using convolutional spatial propagation networks
Mnih et al. Learning to label aerial images from noisy data
US10643320B2 (en) Adversarial learning of photorealistic post-processing of simulation with privileged information
Mnih Machine learning for aerial image labeling
US8355576B2 (en) Method and system for crowd segmentation
CN110516095B (en) Semantic migration-based weak supervision deep hash social image retrieval method and system
US11797725B2 (en) Intelligent imagery
CN108960184B (en) Pedestrian re-identification method based on heterogeneous component deep neural network
CN110569696A (en) Neural network system, method and apparatus for vehicle component identification
US10579907B1 (en) Method for automatically evaluating labeling reliability of training images for use in deep learning network to analyze images, and reliability-evaluating device using the same
WO2017004803A1 (en) An apparatus and a method for semantic image labeling
CN113298815A (en) Semi-supervised remote sensing image semantic segmentation method and device and computer equipment
CN113344206A (en) Knowledge distillation method, device and equipment integrating channel and relation feature learning
WO2022218396A1 (en) Image processing method and apparatus, and computer readable storage medium
Li et al. Unsupervised domain adaptation with self-attention for post-disaster building damage detection
KR20210072689A (en) Method for creating obstruction detection model using deep learning image recognition and apparatus thereof
Bacca et al. Long-term mapping and localization using feature stability histograms
Hao et al. Unsupervised change detection using a novel fuzzy c-means clustering simultaneously incorporating local and global information
CN115577768A (en) Semi-supervised model training method and device
Sharjeel et al. Real time drone detection by moving camera using COROLA and CNN algorithm
Ragab Leveraging mayfly optimization with deep learning for secure remote sensing scene image classification
Yang et al. Robust visual tracking using adaptive local appearance model for smart transportation
Chawla Possibilistic c-means-spatial contextual information based sub-pixel classification approach for multi-spectral data
CN114627397A (en) Behavior recognition model construction method and behavior recognition method

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MNIH, VOLODYMYR;HINTON, GEOFFREY E.;SIGNING DATES FROM 20130620 TO 20130627;REEL/FRAME:042150/0483

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044129/0001

Effective date: 20170929

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION