WO2020068744A1 - Advanced cloud detection using neural networks and optimization techniques - Google Patents

Advanced cloud detection using neural networks and optimization techniques Download PDF

Info

Publication number
WO2020068744A1
WO2020068744A1 PCT/US2019/052593 US2019052593W WO2020068744A1 WO 2020068744 A1 WO2020068744 A1 WO 2020068744A1 US 2019052593 W US2019052593 W US 2019052593W WO 2020068744 A1 WO2020068744 A1 WO 2020068744A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
cloud
imagery
pixel
ground
Prior art date
Application number
PCT/US2019/052593
Other languages
French (fr)
Inventor
Michael Aschenbeck
Original Assignee
Digitalglobe, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/140,052 external-priority patent/US10685253B2/en
Application filed by Digitalglobe, Inc. filed Critical Digitalglobe, Inc.
Priority to EP19864631.7A priority Critical patent/EP3857443A4/en
Publication of WO2020068744A1 publication Critical patent/WO2020068744A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/513Sparse representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/7635Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks based on graphs, e.g. graph cuts or spectral clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/772Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Definitions

  • geospatial imagery e.g., satellite imagery
  • entities e.g., government entities, corporations, individuals, or others
  • satellite imagery may vary widely such that satellite images may be used for a variety of differing purposes.
  • the process includes, for multiple portions of an image, making an initial determination about whether each of the multiple portions primarily contains cloud imagery or primarily contains ground imagery by, with a processor, utilizing a neural network to classify each of the multiple portions of the image as one of cloud or ground.
  • the process also includes, with a processor, performing an optimization technique on the multiple portions of the overhead image using the initial determination to determine which portions of the overhead image include cloud imagery or ground imagery.
  • the optimization technique may include identifying adjacent pixels and calculating a capacity between the identified adjacent pixels.
  • the optimization technique may further include creating a score for each pixel to represent the likelihood that the pixel does or does not contain a cloud and creating a grid-graph of the scores of the pixels with adjacency information associated with each set of adjacent pixels.
  • the optimization technique may further include connecting the pixels of the grid-graph to both a source and a sink using the pixel score as the capacity, wherein one of the source and the sink represents cloud and one represents ground, and performing a min- cut/max-flow segmentation on the image.
  • the optimization technique may include applying a window, having a height and width that are less than a height and width of the image and that are the same as that of the multiple portions of the image, the various portions partially overlapping adjacent portions, in order to determine if each portion most likely contains cloud imagery or ground imagery, and incrementing or decrementing a score for each pixel in the portion based on whether the determination was of cloud imagery or ground imagery, respectively.
  • the overhead image may be a satellite-based image.
  • the process may further include adding to metadata associated with each pixel an indication of whether each such pixel includes cloud imagery.
  • the process may further include using the indication of cloud imagery in the metadata to select pixels for an orthomosaic image free of clouds.
  • a computer-implemented process for determining whether given imagery in an overhead image is cloud imagery or ground imagery.
  • the process includes, for multiple portions of an image, making an initial determination about whether each of the multiple portions primarily contains cloud imagery or primarily contains ground imagery by with a processor, utilizing a neural network to classify each of the multiple portions of the image as one of cloud or ground; applying a window, having a height and width that are less than a height and width of the image and that are the same as that of the multiple portions of the image, to various portions of the image, the various portions partially overlapping adjacent portions, in order to determine if each portion most likely contains cloud imagery or ground imagery, and incrementing or decrementing a score for each pixel in the portion based on whether the determination was of cloud imagery or ground imagery, respectively; creating a weight for each pixel to represent the likelihood that the pixel does or does not contain a cloud; identifying adjacent pixels and calculating a capacity between the identified adjacent pixels; creating a grid-graph of the scores of the pixels with adj
  • the overhead image may be a satellite-based image.
  • the process may further include adding to metadata associated with each pixel an indication of whether each such pixel includes cloud imagery.
  • the process may further include using the indication of cloud imagery in the metadata to select pixels for an orthomosaic image free of clouds.
  • Also disclosed is a computer-implemented process for determining whether given imagery in an overhead image is cloud imagery or ground imagery.
  • the process includes receiving an image having a plurality of pixels; sliding an n x n overlapping subwindow throughout the image so that image portions of the image can be classified; for the image portion seen in each subwindow, classifying the image portion as cloud or ground with the use of a Neural Network; adding a vote to each pixel based on the classification of each image portion containing the pixel; and with a processor, performing an optimization technique on the classifications of the pixels in the image using the initial determination to determine which pixels of the image include cloud imagery or ground imagery.
  • the optimization technique may include identifying adjacent pixels and calculating a capacity between the identified adjacent pixels.
  • the optimization technique may further include creating a score for each pixel to represent the likelihood that the pixel does or does not contain a cloud and creating a grid-graph of the scores of the pixels with adjacency information associated with each set of adjacent pixels.
  • the optimization technique may further include connecting the pixels of the grid-graph to both a source and a sink using the pixel score as the capacity, wherein one of the source and the sink represents cloud and one represents ground, and performing a min- cut/max-flow segmentation on the image.
  • the adding a vote operation may include applying a window, having a height and width that are less than a height and width of the image and that are the same as that of the subwindow, to various portions of the image, the various portions partially overlapping adjacent portions, in order to determine if each portion most likely contains cloud imagery or ground imagery, and incrementing or decrementing a score for each pixel in the portion based on whether the determination was of cloud imagery or ground imagery, respectively.
  • the image may be a satellite-based image.
  • the process may further include adding to metadata associated with each pixel an indication of whether each such pixel includes cloud imagery.
  • the process may further include using the indication of cloud imagery in the metadata to select pixels for an orthomosaic image free of clouds.
  • Figure 1 is an illustration of a cloud dictionary and a ground dictionary.
  • Figures 2Ai through 2J 2 show how a cloud score for each pixel can be determined.
  • Figure 3 shows a grid-graph of the cloud scores for a 4 x 4 array of pixels in an image.
  • Figure 4 shows the grid-graph connected to a source and a sink as part of a Min-Cut
  • Figure 5 is a block diagram illustrating an exemplary hardware architecture of a computing device used in an embodiment of the disclosure herein.
  • Figure 6 is a block diagram illustrating an exemplary logical architecture for a client device, according to an embodiment of the disclosure herein.
  • Figure 7 is a block diagram illustrating an exemplary architectural arrangement of clients, servers, and external services, according to an embodiment of the disclosure herein.
  • Figure 8 is an illustration of a cloud dictionary, a patch from an input image, and a linear combination of words from the cloud dictionary.
  • Figure 9 shows the grid-graph connected to the source and the sink of Figure 4, after the capacities have been assigned between the pixels/vertices the applicable one or both of the source and sink.
  • Figure 10 shows a Neural Network such as is used in the techniques described herein.
  • Figure 11 shows a process flow for a dictionary-based technique described herein.
  • Figure 12 shows a process flow for a Neural Network-based technique described herein.
  • the present disclosure generally relates to functionality that may be utilized in processing of a geospatial images.
  • the geospatial source images may be satellite images acquired using low earth orbit satellites such as QuickBird, WorldView-1 , WorldView-2, WorldView-3, WorldView-4, IKONOS, or GeoEye-1 which are currently operated or proposed for operation by DigitalGlobe, Inc. of Longmont, CO.
  • other geospatial imagery may also be used to generate an orthomosaic as described herein such as for example, other geospatial imagery obtained from satellites other than those previously listed, high altitude aerial photograph, or other appropriate remotely sensed imagery.
  • the images to be selected may comprise raw image data or pre-processed geospatial images (e.g., that have undergone orthorectification, pan-sharpening, or other processes known in the art that are commonly applied to geospatial imagery).
  • an algorithm has been developed to detect and delimit clouds in remotely-sensed Pan and MSI imagery. In other words, on a pixel by pixel basis, determining whether each pixel contains an image portion of a cloud or an image portion of ground imagery. And further, to determine the precise boundaries between cloud regions and non-cloud regions.
  • the algorithm can operate on a panchromatic (Pan) image created when the imaging sensor is sensitive to a wide range of wavelengths of light, typically spanning a large part of the visible part of the spectrum (and potentially some portions of the electromagnetic spectrum outside of the visible part).
  • the algorithm can also or alternatively operate on any single-band image of a multispectral (MSI) image.
  • MSI image is one for which image data has been captured at specific wavelength bands across the electromagnetic spectrum.
  • DigitalGlobe’s WV-3 satellite has eight MSI bands: coastal (approximately 400-452 nm), blue (approximately 400-452 nm), green (approximately 448-510 nm), yellow (approximately 518-586 nm), red (approximately 590-630 nm), red edge (approximately 706-746 nm), near infrared 1 (NIR1) (approximately 772-890 nm), and near infrared 2 (NIR2) (approximately 866-954 nm). It could also be applied to hyperspectral or other types of imagery.
  • the term“input image” will refer to a generic grayscale image. Any multi-band image, as discussed above, may have one band extracted and used as a single band grayscale image.
  • the algorithm may utilize Sparse Coding for Dictionary Learning and/or one or more Neural Networks in a first phase.
  • the algorithm may also include Max-flow/Min-cut Segmentation in a second phase. At least the adaptation of these ideas to cloud detection and the manner in which they are adapted is believed to be novel.
  • a first dictionary of “cloud words” is created. This “cloud dictionary” includes a number of picture elements of clouds, and these picture elements are called the cloud words.
  • a second dictionary of“ground words” is created. This“ground dictionary” includes a number of picture elements of the surface of the Earth (and the things built, formed, or growing thereon), and these picture elements are called the ground words.
  • a training step is required in advance.
  • a patch refers to a normalized k x k square of pixels (and their values) in the input image.
  • a known cloud patch is a patch that has manually been deemed to be within a cloud.
  • Each of these k x k cloud patches is reshaped as a column vector in k 2 -dimensional space (using any linear indexing scheme for the k x k array of pixels).
  • a cloud training matrix is constructed with these reshaped cloud patches. It is from this matrix, that a dictionary is created using sparse coding.
  • a sparse coding dictionary is a collection (dictionary) of (generally contrived) patches (reshaped as column vectors) such that any patch in the training set can be well-approximated by a sparse linear combination of patches in the dictionary.
  • the number of patches in the training set and in the sparse coding dictionary is typically larger than k 2 .
  • Various algorithms are known in the literature for constructing a sparse coding dictionary from a training set.
  • Figure 1 shows examples of such a cloud dictionary 100 and a ground dictionary 102.
  • the cloud dictionary 100 includes 64 different cloud words 104, which in this illustration are shown arranged in an 8 x 8 array. Close inspection shows that each cloud word 104 is composed of an 8 x 8 array of pixels.
  • the ground dictionary 102 includes 64 different ground words 106, which in this illustration are shown arranged in an 8 x 8 array. Close inspection shows that each ground word 106 is composed of an 8 x 8 array of pixels.
  • either or both of the dictionaries may contain more or less different words, such as 128 words.
  • FIG. 1 Also shown in Figure 1 is an example patch 110 from an input image to be classified as cloud or non-cloud. Further detail is shown in Figure 8, it which it can be seen that three particular cloud words, words 112, 114, and 116 (particularly word 112) are similar to the patch 110 and could be used in a linear combination (perhaps in combination with other words) to fairly accurately represent the patch 110. The linear combination is illustrated in the drawing. On the other hand, and referring back to Figure 1 , it can be seen that none of the ground words 106 would fairly accurately represent the patch 110, either alone or in a sparse linear combination.
  • a processor determines the best representation of the image patch using solely words from the cloud dictionary 100 and also determines the best representation of the image patch using solely words from the ground dictionary 102.
  • this representation it may be important to use a small sparsity constraint which matches or is close to the one used for training. For example, if the dictionaries were trained to yield good representations using only 3 words, then only 3 words should be used in this step.
  • the two representations could each be compared to the original patch and a determination is made as to which representation more accurately represented the image patch. The outcome of that determination is an initial classification of the image patch as cloud or non-cloud.
  • the initial classification of image patches is used to initially assign each pixel a net score as follows.
  • every pixel is assigned a net cloud score as follows.
  • the net cloud score of every pixel in the input image is set to 0.
  • k x k patch P in the input image (processed as a /c 2 -dimensional vector): (a) approximate it as a sparse linear combination of patches in the cloud dictionary; (b) approximate it as a sparse linear combination of patches in the ground dictionary; and (c) add +1 to the net cloud score of every pixel in P if the cloud dictionary approximation to P is more accurate than the ground dictionary approximation to P.
  • one or more Neural Networks may be used to initially classify patches as“cloud” or“ground.”
  • a 2 x 2 subwindow provides image intensity values of 3, 6, 10, and 20 for the 4 pixels in the window. These values of 3, 6, 10, and 20 are used for the values of the 4 nodes of an input layer.
  • Each of the nodes of the input layer is connected to each of the 5 nodes of an Li layer by a linear combination. It should be noted that the number of nodes of the Li layer could be some other number and could be less than, equal to, or greater than the number of nodes in the input layer.
  • Each node in Li consists of a linear combination of the inputs, followed by an activation.
  • Each linear combination can be represented by a length-4 weight vector. These 5 vectors weight vectors can be concatenated into a weight matrix, Wi.
  • the layer is easily computed with a vector-matrix multiplication of the 4 dimension input vector with the Wi followed by the chosen activation function. There are many common choices for activations such as sigmoid, hyperbolic tangent, and rectified linear unit (ReLU).
  • Each of the outputs of the nodes of the Li layer are inputs to the 5 nodes of an L 2 layer. It should be noted that the number of nodes of the L 2 layer could be some other number and could be less than, equal to, or greater than the number of nodes in the input layer and/or the number of nodes in the Li layer.
  • Each node of L 2 is constructed in the same manner as Li - that is, a linear combination of it’s inputs followed by an activation.
  • Each of the nodes of the L 2 layer are connected to each of the 5 nodes of an L n layer.
  • the number of nodes of the l_n layer could be some other number and could be less than, equal to, or greater than the number of nodes in the input layer and/or the number of nodes in the layer and/or the number of nodes in the l_2 layer (or any intervening layers). It should be noted that there could be any number of layers in the neural network.
  • Each of the nodes of the L n layer are connected to a single output node.
  • This output node consists of a linear combination of L n ’s outputs, followed by a sigmoid activation function.
  • the sigmoid function outputs a score between 0 and 1 that represents the probability the patch is a cloud. If this number is above 0.5, the patch is classified as a cloud. If this number is below 0.5, the patch is classified as ground.
  • weight parameters in the model must be chosen. The best values of these weights are learned by minimizing a loss function on labeled training data. This loss function could be constructed in any way, the simplest being binary cross entropy. The back-propagation algorithm may be used to minimize this and other loss functions.
  • this classification of image patches is used to initially assign each pixel a net cloud score as described above and as follows.
  • a simplified example of adding a vote to each pixel is is described below and illustrated in Figure 2 with a 4 x 4 array of pixels.
  • the 4 x 4 array is back-filled with a score of 0 in each position (pixel).
  • Bi it is illustrated that a 2 x 2 portion of the array is looked at and the initial classification was determined to be a cloud. If it had been determined that this image patch was a non-cloud, then a value of -1 would have been added to the corresponding pixels in A 2 to get the score shown in B 2 .
  • the initial classification was“cloud,” then a value of +1 is added to the corresponding pixels in A 2 to get the score shown in B 2 . So, B 2 shows the result after +1 was added to each of the four most upper left pixels.
  • the 2 x 2 window is slid over so that it is in the position shown in Ci.
  • the initial classification of that patch was non-cloud, so a value of -1 is added to the corresponding pixels and the resulting score is shown in C2.
  • the 2 x 2 window is slid over so that it is in the position shown in Di.
  • the initial classification of that patch was cloud, so a value of +1 is added to the corresponding pixels and the resulting score is shown in D2.
  • the 2 x 2 window is moved down a row and slid into the positions shown in Ei.
  • that patch had an initial classification of cloud and a value of +1 is added to the corresponding pixels in E2.
  • the 2 x 2 window is slid over so that it is in the position shown in Fi.
  • the initial classification of that patch was cloud, so a value of +1 is added to the corresponding pixels and the resulting score is shown in F2.
  • the 2 x 2 window is slid over so that it is in the position shown in G1.
  • the initial classification of that patch was non-cloud, so a value of -1 is added to the corresponding pixels and the resulting score is shown in G2.
  • the 2 x 2 window is moved down a row and slid into the positions shown in Hi.
  • that patch had an initial classification of non-cloud and a value of -1 is added to the corresponding pixels in H2.
  • the 2 x 2 window is slid over so that it is in the position shown in h.
  • the initial classification of that patch was non-cloud, so a value of -1 is added to the corresponding pixels and the resulting score is shown in I2.
  • the 2 x 2 window is slid over so that it is in the position shown in Ji.
  • the initial classification of that patch was non-cloud, so a value of -1 is added to the corresponding pixels and the resulting score is shown in J 2 .
  • a flow-graph is built for min-cut/max-flow segmentation. This is done by first creating a grid-graph whose vertices are the pixels of the input image and whose bi directional edges are the 4-way adjacencies between pixels in the image. With reference back to the simplified example, the information in the 4 x 4 array of Figure 2J 2 is then transferred to the grid- graph representation shown in Figure 3, where the cloud score from Figure 2J 2 is represented as V for each pixel. Note that each pixel has a score (V) and an intensity value (I, from the image) and each pixel shows a line connecting it to each adjacent pixel in the same row or column.
  • capacity w
  • w e - 1 c - y i
  • Pi j refers to the pixel in row / and in column j
  • w (x,y) refers to the capacity between pixel x and pixel y
  • V (x) refers to the cloud or ground capacity of pixel x (if a positive value, then it is a cloud, and if a negative value, then it is the ground)
  • I (x) refers to the image intensity value of pixel x
  • a new source vertex and new sink vertex are contrived.
  • the grid-graph is represented in Figure 4, where a connection is shown from each pixel to each of a source/cloud and to a sink/ground.
  • a directional edge from the source vertex to vertex v (pixel p,y) in the grid-graph is created if v’s net cloud score (V) is positive (v is more like cloud than ground). A positive capacity is put on this edge. It may be a constant value or proportional to the absolute value of the cloud score of v.
  • the source has a directional edge to every vertex v of the grid-graph whose weight is the cloud score (not-netted) of the vertex v.
  • a directional edge is created from vertex v in the grid-graph to the sink vertex if v’s net cloud score is negative (v is more like ground than cloud).
  • a positive capacity is put on this edge. It may be a constant value or proportional to the absolute value of the cloud score of v.
  • Figure 9 shows the grid-graph after the directional edge capacities have been placed thereon. Note that unlabeled directional edges have a capacity of zero, and when a non-zero capacity is assigned to a directional edge between a given vertex and the source, then the directional edge between that vertex and the sink is then removed.
  • a capacity value is assigned to each edge (u, v) in the grid-graph, where that value depends on the grayscale similarity between pixel u and pixel v in the input image.
  • One possible implementation is to use a Gaussian of the difference. Since it may be helpful to segment the cloud and non-cloud portions of the image along edges or transitions in the image, it can be seen that the capacity between adjacent pixels will be smaller when the intensity differences are larger. So, seeking to segment between pixels with large intensity differences may be desirable.
  • auxiliary information known about cloud or ground pixels are incorporated into the graph as follows. If a pixel (p,y) is known to be a cloud, a directional edge is created from v to the source with a very large capacity. If p,y is known to be ground, a directional edge is created from p,y to the sink with a very large capacity. This auxiliary information may be available from manual annotations or other means.
  • the min-cut partition of the graph (typical algorithms do this by finding the max-flow) between source and sink is solved for.
  • This is a partition of the graph into two sets of vertices, one that contains the sink, the other that contains the source, and such that the total capacity of edges that span the two sets is as small as possible (meaning that the intensity differences are larger).
  • There are many methods to solve for this partition. Some of the most common methods are the Ford-Fulkerson algorithm, the Push Relabel Max-Flow algorithm, Dinic’s algorithm, and the Boykov-Kolmogorov Max-Flow algorithm.
  • the above partition induces the division of the input image into cloud regions (including pixels grouped with the source) and ground regions (including pixels grouped with the sink).
  • Cloud textures display repeatable patterns that are usually different than other features’ textures present in imagery. These textures are difficult to describe heuristically, but can be learned with the process described above.
  • the cloud dictionary is specialized to represent these textures with few dictionary elements, while the ground dictionary will not represent these textures well with few dictionary elements. Patches are normalized, thus the average intensity of each patch plays no role in the weighting. This will ensure that, while mainly determined by the cloud score, classification follows natural transitions in the image.
  • a thresholded cloud score itself contains many false positives and false negatives. This application of the Min-cut/Max-flow segmenter is very important for this reason as it smooths out the incorrect scores in an intuitive way.
  • the overall algorithm is shown in Figure 11 for the case of using the dictionary method and in Figure 12 for the case of showing the neural network method.
  • the image may be a single band of a multi-band image.
  • an n x n overlapping subwindow is slid throughout the image so that patches or portions of the image can be classified.
  • the patches/portions may be normalized.
  • the dictionary case Figure 11
  • the image portion seen in each subwindow the image portion is reconstructed in both a cloud dictionary and a ground dictionary to choose a classification based upon which has a better reconstruction.
  • the image portion seen in each subwindow is classified as cloud or ground with the use of a Neural Network. Next, a vote is added to each pixel. Next, the Min-Cut, Max-Flow segmentation is performed.
  • the techniques disclosed herein may be implemented on any suitable hardware or any suitable combination of software and hardware. For example, they may be implemented in an operating system kernel, in a separate user process, in a library package bound into network applications, on a specially constructed machine, on an application-specific integrated circuit (ASIC), or on a network interface card.
  • ASIC application-specific integrated circuit
  • Software/hardware hybrid implementations of at least some of the embodiments disclosed herein may be implemented on a programmable network-resident machine (which should be understood to include intermittently connected network-aware machines) selectively activated or reconfigured by a computer program stored in memory.
  • a programmable network-resident machine which should be understood to include intermittently connected network-aware machines
  • Such network devices may have multiple network interfaces that may be configured or designed to utilize different types of network communication protocols.
  • a general architecture for some of these machines may be disclosed herein in order to illustrate one or more exemplary means by which a given unit of functionality may be implemented.
  • At least some of the features or functionalities of the various embodiments disclosed herein may be implemented on one or more general-purpose computers associated with one or more networks, such as for example an end-user computer system, a client computer, a network server or other server system, a mobile computing device (e.g., tablet computing device, mobile phone, smartphone, laptop, and the like), a consumer electronic device, a music player, or any other suitable electronic device, router, switch, or the like, or any combination thereof.
  • at least some of the features or functionalities of the various embodiments disclosed herein may be implemented in one or more virtualized computing environments (e.g., network computing clouds, virtual machines hosted on one or more physical computing machines, or the like).
  • Computing device 230 may be, for example, any one of the computing machines listed in the previous paragraph, or indeed any other electronic device capable of executing software- or hardware-based instructions according to one or more programs stored in memory.
  • Computing device 230 may be adapted to communicate with a plurality of other computing devices, such as clients or servers, over communications networks such as a wide area network a metropolitan area network, a local area network, a wireless network, the Internet, or any other network, using known protocols for such communication, whether wireless or wired.
  • computing device 230 includes one or more central processing units (CPU) 234, one or more interfaces 240, and one or more busses 238 (such as a peripheral component interconnect (PCI) bus).
  • CPU 234 may be responsible for implementing specific functions associated with the functions of a specifically configured computing device or machine.
  • a computing device 230 may be configured or designed to function as a server system utilizing CPU 234, local memory 232 and/or remote memory 242, and interface(s) 240.
  • CPU 234 may be caused to perform one or more of the different types of functions and/or operations under the control of software modules or components, which for example, may include an operating system and any appropriate applications software, drivers, and the like.
  • CPU 234 may include one or more processors 236 such as, for example, a processor from one of the Intel, ARM, Qualcomm, and AMD families of microprocessors.
  • processors 236 may include specially designed hardware such as application-specific integrated circuits (ASICs), electrically erasable programmable read-only memories (EEPROMs), field-programmable gate arrays (FPGAs), and so forth, for controlling operations of computing device 230.
  • ASICs application-specific integrated circuits
  • EEPROMs electrically erasable programmable read-only memories
  • FPGAs field-programmable gate arrays
  • a local memory 232 may also form part of CPU 234.
  • RAM non-volatile random access memory
  • ROM read-only memory
  • memory may be coupled to system 230.
  • Memory 232 may be used for a variety of purposes such as, for example, caching and/or storing data, programming instructions, and the like.
  • processor is not limited merely to those integrated circuits referred to in the art as a processor, a mobile processor, or a microprocessor, but broadly refers to a microcontroller, a microcomputer, a programmable logic controller, an application-specific integrated circuit, and any other programmable circuit.
  • interfaces 240 are provided as network interface cards (NICs).
  • NICs control the sending and receiving of data packets over a computer network; other types of interfaces 240 may for example support other peripherals used with computing device 230.
  • the interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, graphics interfaces, and the like.
  • interfaces may be provided such as, for example, universal serial bus (USB), Serial, Ethernet, Firewire.TM., PCI, parallel, radio frequency (RF), Bluetooth near-field communications (e.g., using near-field magnetics), 802.11 (WiFi), frame relay, TCP/IP, ISDN, fast Ethernet interfaces, Gigabit Ethernet interfaces, asynchronous transfer mode (ATM) interfaces, high-speed serial interface (HSSI) interfaces, Point of Sale (POS) interfaces, fiber data distributed interfaces (FDDIs), and the like.
  • USB universal serial bus
  • RF radio frequency
  • Bluetooth near-field communications e.g., using near-field magnetics
  • WiFi WiFi
  • frame relay TCP/IP
  • ISDN fast Ethernet interfaces
  • Gigabit Ethernet interfaces asynchronous transfer mode (ATM) interfaces
  • HSSI high-speed serial interface
  • POS Point of Sale
  • FDDIs fiber data distributed interfaces
  • FIG. 5 illustrates one specific architecture for a computing device 230 for implementing one or more of the embodiments described herein, it is by no means the only device architecture on which at least a portion of the features and techniques described herein may be implemented.
  • architectures having one or any number of processors 236 may be used, and such processors 236 may be present in a single device or distributed among any number of devices.
  • a single processor 103 handles communications as well as routing computations, while in other embodiments a separate dedicated communications processor may be provided.
  • different types of features or functionalities may be implemented in a system that includes a client device (such as a tablet device or smartphone running client software) and server systems (such as a server system described in more detail below).
  • the system may employ one or more memories or memory modules (such as, for example, remote memory block 242 and local memory 232) configured to store data, program instructions for the general-purpose network operations, or other information relating to the functionality of the embodiments described herein (or any combinations of the above).
  • Program instructions may control execution of or comprise an operating system and/or one or more applications, for example.
  • Memory 242 or memories 232, 242 may also be configured to store data structures, configuration data, encryption data, historical system operations information, or any other specific or generic non-program information described herein.
  • At least some network device embodiments may include non-transitory machine-readable storage media, which, for example, may be configured or designed to store program instructions, state information, and the like for performing various operations described herein.
  • non-transitory machine-readable storage media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as optical disks, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM), flash memory, solid state drives, memristor memory, random access memory (RAM), and the like.
  • program instructions include both object code, such as may be produced by a compiler, machine code, such as may be produced by an assembler or a linker, byte code, such as may be generated by for example a Java compiler and may be executed using a Java virtual machine or equivalent, or files containing higher level code that may be executed by the computer using an interpreter (for example, scripts written in Python, Perl, Ruby, Groovy, or any other scripting language).
  • object code such as may be produced by a compiler
  • machine code such as may be produced by an assembler or a linker
  • byte code such as may be generated by for example a Java compiler and may be executed using a Java virtual machine or equivalent
  • files containing higher level code that may be executed by the computer using an interpreter (for example, scripts written in Python, Perl, Ruby, Groovy, or any other scripting language).
  • systems may be implemented on a standalone computing system.
  • Computing device 250 includes processors 252 that may run software that carry out one or more functions or applications of embodiments, such as for example a client application 258.
  • Processors 252 may carry out computing instructions under control of an operating system 254 such as, for example, a version of Microsoft's Windows operating system, Apple's Mac OS/X or iOS operating systems, some variety of the Linux operating system, Google's Android operating system, or the like.
  • an operating system 254 such as, for example, a version of Microsoft's Windows operating system, Apple's Mac OS/X or iOS operating systems, some variety of the Linux operating system, Google's Android operating system, or the like.
  • one or more shared services 256 may be operable in system 250, and may be useful for providing common services to client applications 258.
  • Services 256 may for example be Wndows services, user-space common services in a Linux environment, or any other type of common service architecture used with operating system 254.
  • Input devices 266 may be of any type suitable for receiving user input, including for example a keyboard, touchscreen, microphone (for example, for voice input), mouse, touchpad, trackball, or any combination thereof.
  • Output devices 264 may be of any type suitable for providing output to one or more users, whether remote or local to system 250, and may include for example one or more screens for visual output, speakers, printers, or any combination thereof.
  • Memory 260 may be random-access memory having any structure and architecture known in the art, for use by processors 252, for example to run software.
  • Storage devices 262 may be any magnetic, optical, mechanical, memristor, or electrical storage device for storage of data in digital form. Examples of storage devices 262 include flash memory, magnetic hard drive, CD-ROM, and/or the like.
  • systems may be implemented on a distributed computing network, such as one having any number of clients and/or servers.
  • a distributed computing network such as one having any number of clients and/or servers.
  • FIG 7 there is shown a block diagram depicting an exemplary architecture for implementing at least a portion of a system according to an embodiment on a distributed computing network.
  • any number of clients 330 may be provided.
  • Each client 330 may run software for implementing client-side portions of the embodiments and clients may comprise a system 250 such as that illustrated in Figure 6.
  • any number of servers 320 may be provided for handling requests received from one or more clients 330.
  • Clients 330 and servers 320 may communicate with one another via one or more electronic networks 310, which may be in various embodiments any of the Internet, a wide area network, a mobile telephony network, a wireless network (such as WiFi, Wmax, and so forth), or a local area network (or indeed any network topology known in the art; no one network topology is preferred over any other).
  • Networks 310 may be implemented using any known network protocols, including for example wired and/or wireless protocols.
  • servers 320 may call external services 370 when needed to obtain additional information, or to refer to additional data concerning a particular call. Communications with external services 370 may take place, for example, via one or more networks 310.
  • external services 370 may comprise web-enabled services or functionality related to or installed on the hardware device itself.
  • client applications 258 may obtain information stored in a server system 320 in the cloud or on an external service 370 deployed on one or more of a particular enterprise's or user's premises.
  • clients 330 or servers 320 may make use of one or more specialized services or appliances that may be deployed locally or remotely across one or more networks 310.
  • one or more databases 340 may be used or referred to by one or more embodiments. It should be understood by one having ordinary skill in the art that databases 340 may be arranged in a wide variety of architectures and using a wide variety of data access and manipulation means.
  • one or more databases 340 may comprise a relational database system using a structured query language (SQL), while others may comprise an alternative data storage technology such as those referred to in the art as "NoSQL” (for example, Hadoop Cassandra, Google BigTable, and so forth).
  • SQL structured query language
  • variant database architectures such as column-oriented databases, in-memory databases, clustered databases, distributed databases, or even flat file data repositories may be used. It will be appreciated by one having ordinary skill in the art that any combination of known or future database technologies may be used as appropriate, unless a specific database technology or a specific arrangement of components is specified for a particular embodiment herein. Moreover, it should be appreciated that the term "database” as used herein may refer to a physical database machine, a cluster of machines acting as a single database system, or a logical database within an overall database management system.
  • security systems 360 and configuration systems 350 may make use of one or more security systems 360 and configuration systems 350.
  • Security and configuration management are common information technology (IT) and web functions, and some amount of each are generally associated with any IT or web systems. It should be understood by one having ordinary skill in the art that any configuration or security subsystems known in the art now or in the future may be used in conjunction with embodiments without limitation, unless a specific security 360 or configuration system 350 or approach is specifically required by the description of any specific embodiment.
  • functionality for implementing systems or methods may be distributed among any number of client and/or server components.
  • various software modules may be implemented for performing various functions, and such modules can be variously implemented to run on server and/or client components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Discrete Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

Techniques for automatically determining, on a pixel by pixel basis, whether imagery includes ground images or is obscured by cloud cover. The techniques include training a Neural Network, making an initial determination of cloud or ground by using the Neural Network, and performing a max-flow, min-cut operation on the image to determine whether each pixel is a cloud or ground imagery.

Description

ADVANCED CLOUD DETECTION USING NEURAL NETWORKS AND OPTIMIZATION
TECHNIQUES
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This international patent application claims priority to co-pending U.S. Patent Application No. 16/140,052, entitled "ADVANCED CLOUD DETECTION USING NEURAL NETWORKS AND OPTIMIZATION TECHNIQUES," filed on 24 September 2018 (24/09/2018), which is a
Continuation-in-Part of U.S. Patent Application No. 15/362,254, filed November 28, 2016 (now U.S. Patent No. 10,083,354). The entire disclosure of each patent application set forth in this Cross Reference to Related Applications section is hereby incorporated herein by reference.
BACKGROUND
[0002] The use of geospatial imagery (e.g., satellite imagery) has continued to increase in recent years. As such, high quality geospatial imagery has become increasingly valuable. For example, a variety of different entities (e.g., government entities, corporations, individuals, or others) may utilize satellite imagery. As may be appreciated, the use of such satellite imagery may vary widely such that satellite images may be used for a variety of differing purposes.
[0003] At any given time, a significant portion of the surface of the Earth is obstructed from imaging by a satellite due to the presence of clouds. While some techniques have been used in the past to determine where and when clouds are obstructing all or portions of the Earth’s surface in a geospatial image, improved techniques are desired.
[0004] It is against this background that the techniques described herein have been developed. SUMMARY
[0005] Disclosed herein is a computer-implemented process for determining whether given imagery in an overhead image is cloud imagery or ground imagery. The process includes, for multiple portions of an image, making an initial determination about whether each of the multiple portions primarily contains cloud imagery or primarily contains ground imagery by, with a processor, utilizing a neural network to classify each of the multiple portions of the image as one of cloud or ground. The process also includes, with a processor, performing an optimization technique on the multiple portions of the overhead image using the initial determination to determine which portions of the overhead image include cloud imagery or ground imagery.
[0006] The optimization technique may include identifying adjacent pixels and calculating a capacity between the identified adjacent pixels. The optimization technique may further include creating a score for each pixel to represent the likelihood that the pixel does or does not contain a cloud and creating a grid-graph of the scores of the pixels with adjacency information associated with each set of adjacent pixels. The optimization technique may further include connecting the pixels of the grid-graph to both a source and a sink using the pixel score as the capacity, wherein one of the source and the sink represents cloud and one represents ground, and performing a min- cut/max-flow segmentation on the image. The optimization technique may include applying a window, having a height and width that are less than a height and width of the image and that are the same as that of the multiple portions of the image, the various portions partially overlapping adjacent portions, in order to determine if each portion most likely contains cloud imagery or ground imagery, and incrementing or decrementing a score for each pixel in the portion based on whether the determination was of cloud imagery or ground imagery, respectively.
[0007] The overhead image may be a satellite-based image. The process may further include adding to metadata associated with each pixel an indication of whether each such pixel includes cloud imagery. The process may further include using the indication of cloud imagery in the metadata to select pixels for an orthomosaic image free of clouds.
[0008] Also disclosed is a computer-implemented process for determining whether given imagery in an overhead image is cloud imagery or ground imagery. The process includes, for multiple portions of an image, making an initial determination about whether each of the multiple portions primarily contains cloud imagery or primarily contains ground imagery by with a processor, utilizing a neural network to classify each of the multiple portions of the image as one of cloud or ground; applying a window, having a height and width that are less than a height and width of the image and that are the same as that of the multiple portions of the image, to various portions of the image, the various portions partially overlapping adjacent portions, in order to determine if each portion most likely contains cloud imagery or ground imagery, and incrementing or decrementing a score for each pixel in the portion based on whether the determination was of cloud imagery or ground imagery, respectively; creating a weight for each pixel to represent the likelihood that the pixel does or does not contain a cloud; identifying adjacent pixels and calculating a capacity between the identified adjacent pixels; creating a grid-graph of the scores of the pixels with adjacency information associated with each set of adjacent pixels; connecting the pixels of the grid-graph to both a source and a sink using the pixel score as the capacity, wherein one of the source and the sink represents cloud and one represents ground; and performing a min-cut/max-flow segmentation on the image to define portions of the overhead image which are believed to include cloud imagery and portions of the overhead image which are believed to include ground imagery.
[0009] The overhead image may be a satellite-based image. The process may further include adding to metadata associated with each pixel an indication of whether each such pixel includes cloud imagery. The process may further include using the indication of cloud imagery in the metadata to select pixels for an orthomosaic image free of clouds. [0010] Also disclosed is a computer-implemented process for determining whether given imagery in an overhead image is cloud imagery or ground imagery. The process includes receiving an image having a plurality of pixels; sliding an n x n overlapping subwindow throughout the image so that image portions of the image can be classified; for the image portion seen in each subwindow, classifying the image portion as cloud or ground with the use of a Neural Network; adding a vote to each pixel based on the classification of each image portion containing the pixel; and with a processor, performing an optimization technique on the classifications of the pixels in the image using the initial determination to determine which pixels of the image include cloud imagery or ground imagery.
[0011] The optimization technique may include identifying adjacent pixels and calculating a capacity between the identified adjacent pixels. The optimization technique may further include creating a score for each pixel to represent the likelihood that the pixel does or does not contain a cloud and creating a grid-graph of the scores of the pixels with adjacency information associated with each set of adjacent pixels. The optimization technique may further include connecting the pixels of the grid-graph to both a source and a sink using the pixel score as the capacity, wherein one of the source and the sink represents cloud and one represents ground, and performing a min- cut/max-flow segmentation on the image.
[0012] The adding a vote operation may include applying a window, having a height and width that are less than a height and width of the image and that are the same as that of the subwindow, to various portions of the image, the various portions partially overlapping adjacent portions, in order to determine if each portion most likely contains cloud imagery or ground imagery, and incrementing or decrementing a score for each pixel in the portion based on whether the determination was of cloud imagery or ground imagery, respectively. [0013] The image may be a satellite-based image. The process may further include adding to metadata associated with each pixel an indication of whether each such pixel includes cloud imagery. The process may further include using the indication of cloud imagery in the metadata to select pixels for an orthomosaic image free of clouds.
[0014] Any combination of any portions of the above techniques are considered to be a part of the inventions herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] Figure 1 is an illustration of a cloud dictionary and a ground dictionary.
[0016] Figures 2Ai through 2J2 show how a cloud score for each pixel can be determined.
[0017] Figure 3 shows a grid-graph of the cloud scores for a 4 x 4 array of pixels in an image.
[0018] Figure 4 shows the grid-graph connected to a source and a sink as part of a Min-Cut,
Max-Flow technique.
[0019] Figure 5 is a block diagram illustrating an exemplary hardware architecture of a computing device used in an embodiment of the disclosure herein.
[0020] Figure 6 is a block diagram illustrating an exemplary logical architecture for a client device, according to an embodiment of the disclosure herein.
[0021] Figure 7 is a block diagram illustrating an exemplary architectural arrangement of clients, servers, and external services, according to an embodiment of the disclosure herein.
[0022] Figure 8 is an illustration of a cloud dictionary, a patch from an input image, and a linear combination of words from the cloud dictionary. [0023] Figure 9 shows the grid-graph connected to the source and the sink of Figure 4, after the capacities have been assigned between the pixels/vertices the applicable one or both of the source and sink.
[0024] Figure 10 shows a Neural Network such as is used in the techniques described herein.
[0025] Figure 11 shows a process flow for a dictionary-based technique described herein.
[0026] Figure 12 shows a process flow for a Neural Network-based technique described herein.
DETAILED DESCRIPTION
[0027] While the disclosure is susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that it is not intended to limit the disclosure to the particular form disclosed, but rather, the disclosure is to cover all modifications, equivalents, and alternatives falling within the scope as defined by the claims.
[0028] The present disclosure generally relates to functionality that may be utilized in processing of a geospatial images. For example, in an embodiment, the geospatial source images may be satellite images acquired using low earth orbit satellites such as QuickBird, WorldView-1 , WorldView-2, WorldView-3, WorldView-4, IKONOS, or GeoEye-1 which are currently operated or proposed for operation by DigitalGlobe, Inc. of Longmont, CO. However, other geospatial imagery may also be used to generate an orthomosaic as described herein such as for example, other geospatial imagery obtained from satellites other than those previously listed, high altitude aerial photograph, or other appropriate remotely sensed imagery. The images to be selected may comprise raw image data or pre-processed geospatial images (e.g., that have undergone orthorectification, pan-sharpening, or other processes known in the art that are commonly applied to geospatial imagery).
[0029] According to the present disclosure, an algorithm has been developed to detect and delimit clouds in remotely-sensed Pan and MSI imagery. In other words, on a pixel by pixel basis, determining whether each pixel contains an image portion of a cloud or an image portion of ground imagery. And further, to determine the precise boundaries between cloud regions and non-cloud regions.
[0030] The algorithm can operate on a panchromatic (Pan) image created when the imaging sensor is sensitive to a wide range of wavelengths of light, typically spanning a large part of the visible part of the spectrum (and potentially some portions of the electromagnetic spectrum outside of the visible part). The algorithm can also or alternatively operate on any single-band image of a multispectral (MSI) image. An MSI image is one for which image data has been captured at specific wavelength bands across the electromagnetic spectrum. By way of non-limiting example, DigitalGlobe’s WV-3 satellite has eight MSI bands: coastal (approximately 400-452 nm), blue (approximately 400-452 nm), green (approximately 448-510 nm), yellow (approximately 518-586 nm), red (approximately 590-630 nm), red edge (approximately 706-746 nm), near infrared 1 (NIR1) (approximately 772-890 nm), and near infrared 2 (NIR2) (approximately 866-954 nm). It could also be applied to hyperspectral or other types of imagery.
[0031] In what follows, the term“input image” will refer to a generic grayscale image. Any multi-band image, as discussed above, may have one band extracted and used as a single band grayscale image. The algorithm may utilize Sparse Coding for Dictionary Learning and/or one or more Neural Networks in a first phase. The algorithm may also include Max-flow/Min-cut Segmentation in a second phase. At least the adaptation of these ideas to cloud detection and the manner in which they are adapted is believed to be novel. [0032] Generally, in one embodiment, a first dictionary of “cloud words” is created. This “cloud dictionary” includes a number of picture elements of clouds, and these picture elements are called the cloud words. A second dictionary of“ground words” is created. This“ground dictionary” includes a number of picture elements of the surface of the Earth (and the things built, formed, or growing thereon), and these picture elements are called the ground words.
[0033] To create these dictionaries, a training step is required in advance. First, a cloud training set is compiled on only known cloud patches. A patch refers to a normalized k x k square of pixels (and their values) in the input image. A known cloud patch is a patch that has manually been deemed to be within a cloud. Each of these k x k cloud patches is reshaped as a column vector in k2-dimensional space (using any linear indexing scheme for the k x k array of pixels). A cloud training matrix is constructed with these reshaped cloud patches. It is from this matrix, that a dictionary is created using sparse coding.
[0034] Next, and similarly, the ground dictionary is trained on only ground patches.
[0035] The dictionaries are found using sparse coding. A sparse coding dictionary is a collection (dictionary) of (generally contrived) patches (reshaped as column vectors) such that any patch in the training set can be well-approximated by a sparse linear combination of patches in the dictionary. The number of patches in the training set and in the sparse coding dictionary is typically larger than k2. Various algorithms are known in the literature for constructing a sparse coding dictionary from a training set.
[0036] Figure 1 shows examples of such a cloud dictionary 100 and a ground dictionary 102. As can be seen, the cloud dictionary 100 includes 64 different cloud words 104, which in this illustration are shown arranged in an 8 x 8 array. Close inspection shows that each cloud word 104 is composed of an 8 x 8 array of pixels. Similarly, the ground dictionary 102 includes 64 different ground words 106, which in this illustration are shown arranged in an 8 x 8 array. Close inspection shows that each ground word 106 is composed of an 8 x 8 array of pixels. As can be appreciated, either or both of the dictionaries may contain more or less different words, such as 128 words.
[0037] Also shown in Figure 1 is an example patch 110 from an input image to be classified as cloud or non-cloud. Further detail is shown in Figure 8, it which it can be seen that three particular cloud words, words 112, 114, and 116 (particularly word 112) are similar to the patch 110 and could be used in a linear combination (perhaps in combination with other words) to fairly accurately represent the patch 110. The linear combination is illustrated in the drawing. On the other hand, and referring back to Figure 1 , it can be seen that none of the ground words 106 would fairly accurately represent the patch 110, either alone or in a sparse linear combination.
[0038] In order to automatically make a determination as to whether or not an image patch is most likely to contain a cloud or to be cloud-free, a processor determines the best representation of the image patch using solely words from the cloud dictionary 100 and also determines the best representation of the image patch using solely words from the ground dictionary 102. When computing this representation, it may be important to use a small sparsity constraint which matches or is close to the one used for training. For example, if the dictionaries were trained to yield good representations using only 3 words, then only 3 words should be used in this step. The two representations could each be compared to the original patch and a determination is made as to which representation more accurately represented the image patch. The outcome of that determination is an initial classification of the image patch as cloud or non-cloud.
[0039] Next, the initial classification of image patches is used to initially assign each pixel a net score as follows. Again, in the first phase of the algorithm, every pixel is assigned a net cloud score as follows. First, the net cloud score of every pixel in the input image is set to 0. Next, for every (possibly overlapping) k x k patch P in the input image (processed as a /c2-dimensional vector): (a) approximate it as a sparse linear combination of patches in the cloud dictionary; (b) approximate it as a sparse linear combination of patches in the ground dictionary; and (c) add +1 to the net cloud score of every pixel in P if the cloud dictionary approximation to P is more accurate than the ground dictionary approximation to P. Otherwise, add -1 to the net cloud score of every pixel in P. The respective approximations are computed using Orthogonal Matching Pursuit” (defined in the literature) to find the linear combination of dictionary elements that best represents the patch. The accuracy could be measured with L1 , L2, L-infinity, and so forth. For example, for an input patch y and an approximation y, the accuracy can be measured with an I_L2 norm: | |y - y| | 2 = åi(y_i yj )2 . Similarly, one can use any norm on a /c2-dimensional vector space.
[0040] In another embodiment, instead of Sparse Coding for Dictionary Learning, one or more Neural Networks may be used to initially classify patches as“cloud” or“ground.” With reference to Figure 10, suppose a 2 x 2 subwindow provides image intensity values of 3, 6, 10, and 20 for the 4 pixels in the window. These values of 3, 6, 10, and 20 are used for the values of the 4 nodes of an input layer. Each of the nodes of the input layer is connected to each of the 5 nodes of an Li layer by a linear combination. It should be noted that the number of nodes of the Li layer could be some other number and could be less than, equal to, or greater than the number of nodes in the input layer. Each node in Li consists of a linear combination of the inputs, followed by an activation. Each linear combination can be represented by a length-4 weight vector. These 5 vectors weight vectors can be concatenated into a weight matrix, Wi. The layer is easily computed with a vector-matrix multiplication of the 4 dimension input vector with the Wi followed by the chosen activation function. There are many common choices for activations such as sigmoid, hyperbolic tangent, and rectified linear unit (ReLU). Each of the outputs of the nodes of the Li layer are inputs to the 5 nodes of an L2 layer. It should be noted that the number of nodes of the L2 layer could be some other number and could be less than, equal to, or greater than the number of nodes in the input layer and/or the number of nodes in the Li layer. Each node of L2 is constructed in the same manner as Li - that is, a linear combination of it’s inputs followed by an activation. Each of the nodes of the L2 layer are connected to each of the 5 nodes of an Ln layer. It should be noted that the number of nodes of the l_n layer could be some other number and could be less than, equal to, or greater than the number of nodes in the input layer and/or the number of nodes in the layer and/or the number of nodes in the l_2 layer (or any intervening layers). It should be noted that there could be any number of layers in the neural network. Each of the nodes of the Ln layer are connected to a single output node. This output node consists of a linear combination of Ln’s outputs, followed by a sigmoid activation function. The sigmoid function outputs a score between 0 and 1 that represents the probability the patch is a cloud. If this number is above 0.5, the patch is classified as a cloud. If this number is below 0.5, the patch is classified as ground.
[0041] Before classification takes place, all of the weight parameters in the model must be chosen. The best values of these weights are learned by minimizing a loss function on labeled training data. This loss function could be constructed in any way, the simplest being binary cross entropy. The back-propagation algorithm may be used to minimize this and other loss functions.
[0042] Next, this classification of image patches is used to initially assign each pixel a net cloud score as described above and as follows. A simplified example of adding a vote to each pixel is is described below and illustrated in Figure 2 with a 4 x 4 array of pixels. As shown in Figure 2A2, the 4 x 4 array is back-filled with a score of 0 in each position (pixel). Then in Bi, it is illustrated that a 2 x 2 portion of the array is looked at and the initial classification was determined to be a cloud. If it had been determined that this image patch was a non-cloud, then a value of -1 would have been added to the corresponding pixels in A2 to get the score shown in B2. But since in this example, the initial classification was“cloud,” then a value of +1 is added to the corresponding pixels in A2 to get the score shown in B2. So, B2 shows the result after +1 was added to each of the four most upper left pixels. [0043] Next, the 2 x 2 window is slid over so that it is in the position shown in Ci. In this example, the initial classification of that patch was non-cloud, so a value of -1 is added to the corresponding pixels and the resulting score is shown in C2. After that, the 2 x 2 window is slid over so that it is in the position shown in Di. In this example, the initial classification of that patch was cloud, so a value of +1 is added to the corresponding pixels and the resulting score is shown in D2.
[0044] Following this, the 2 x 2 window is moved down a row and slid into the positions shown in Ei. In this example, that patch had an initial classification of cloud and a value of +1 is added to the corresponding pixels in E2. After that, the 2 x 2 window is slid over so that it is in the position shown in Fi. In this example, the initial classification of that patch was cloud, so a value of +1 is added to the corresponding pixels and the resulting score is shown in F2. Next, the 2 x 2 window is slid over so that it is in the position shown in G1. In this example, the initial classification of that patch was non-cloud, so a value of -1 is added to the corresponding pixels and the resulting score is shown in G2.
[0045] Following this, the 2 x 2 window is moved down a row and slid into the positions shown in Hi. In this example, that patch had an initial classification of non-cloud and a value of -1 is added to the corresponding pixels in H2. After that, the 2 x 2 window is slid over so that it is in the position shown in h. In this example, the initial classification of that patch was non-cloud, so a value of -1 is added to the corresponding pixels and the resulting score is shown in I2. Next, the 2 x 2 window is slid over so that it is in the position shown in Ji. In this example, the initial classification of that patch was non-cloud, so a value of -1 is added to the corresponding pixels and the resulting score is shown in J2.
[0046] With this portion of the processing done, it can be seen that certain pixels in the 4 x 4 array of Figure 2J2 shows values of 2 while other pixels show values of -2 and other pixels have more intermediate values. Of course, the algorithm is in the middle of determining that certain pixels are very likely to be clouds and certain other pixels are very likely to be non-cloud, whereas for other pixels, the algorithm may not be as certain about the contents of that pixel.
[0047] In the second phase, a flow-graph is built for min-cut/max-flow segmentation. This is done by first creating a grid-graph whose vertices are the pixels of the input image and whose bi directional edges are the 4-way adjacencies between pixels in the image. With reference back to the simplified example, the information in the 4 x 4 array of Figure 2J2 is then transferred to the grid- graph representation shown in Figure 3, where the cloud score from Figure 2J2 is represented as V for each pixel. Note that each pixel has a score (V) and an intensity value (I, from the image) and each pixel shows a line connecting it to each adjacent pixel in the same row or column. Each of those lines can be seen to represent a capacity (w), which is a measure of the similarity between the score of the two adjacent pixels. For example, capacity could be calculated as: w = e - 1 c- y i , or alternatively:
w e~^x~y)2(T~2
[0048] The following notation can be used:
Pij refers to the pixel in row / and in column j
w (x,y) refers to the capacity between pixel x and pixel y
V (x) refers to the cloud or ground capacity of pixel x (if a positive value, then it is a cloud, and if a negative value, then it is the ground)
I (x) refers to the image intensity value of pixel x
[0049] Second, a new source vertex and new sink vertex are contrived. The grid-graph is represented in Figure 4, where a connection is shown from each pixel to each of a source/cloud and to a sink/ground. [0050] Third, a directional edge from the source vertex to vertex v (pixel p,y) in the grid-graph is created if v’s net cloud score (V) is positive (v is more like cloud than ground). A positive capacity is put on this edge. It may be a constant value or proportional to the absolute value of the cloud score of v. Alternatively, the source has a directional edge to every vertex v of the grid-graph whose weight is the cloud score (not-netted) of the vertex v.
[0051] Fourth, a directional edge is created from vertex v in the grid-graph to the sink vertex if v’s net cloud score is negative (v is more like ground than cloud). A positive capacity is put on this edge. It may be a constant value or proportional to the absolute value of the cloud score of v. Alternatively, there is a directional edge from every vertex v of the grid-graph to the sink whose weight is the ground score of the vertex v. Figure 9 shows the grid-graph after the directional edge capacities have been placed thereon. Note that unlabeled directional edges have a capacity of zero, and when a non-zero capacity is assigned to a directional edge between a given vertex and the source, then the directional edge between that vertex and the sink is then removed.
[0052] Fifth, a capacity value is assigned to each edge (u, v) in the grid-graph, where that value depends on the grayscale similarity between pixel u and pixel v in the input image. The larger the discrepancy, the smaller the capacity; the smaller the discrepancy, the larger the capacity. One possible implementation is to use a Gaussian of the difference. Since it may be helpful to segment the cloud and non-cloud portions of the image along edges or transitions in the image, it can be seen that the capacity between adjacent pixels will be smaller when the intensity differences are larger. So, seeking to segment between pixels with large intensity differences may be desirable.
[0053] Sixth, when available, auxiliary information known about cloud or ground pixels are incorporated into the graph as follows. If a pixel (p,y) is known to be a cloud, a directional edge is created from v to the source with a very large capacity. If p,y is known to be ground, a directional edge is created from p,y to the sink with a very large capacity. This auxiliary information may be available from manual annotations or other means.
[0054] Seventh, the min-cut partition of the graph (typical algorithms do this by finding the max-flow) between source and sink is solved for. This is a partition of the graph into two sets of vertices, one that contains the sink, the other that contains the source, and such that the total capacity of edges that span the two sets is as small as possible (meaning that the intensity differences are larger). There are many methods to solve for this partition. Some of the most common methods are the Ford-Fulkerson algorithm, the Push Relabel Max-Flow algorithm, Dinic’s algorithm, and the Boykov-Kolmogorov Max-Flow algorithm.
[0055] Eighth, the above partition induces the division of the input image into cloud regions (including pixels grouped with the source) and ground regions (including pixels grouped with the sink).
[0056] Ninth, automatic clean-up operations are applied on the cloud regions (to get rid of small regions or thin stringy regions).
[0057] Several further comments can be made about this algorithm. Cloud textures display repeatable patterns that are usually different than other features’ textures present in imagery. These textures are difficult to describe heuristically, but can be learned with the process described above. The cloud dictionary is specialized to represent these textures with few dictionary elements, while the ground dictionary will not represent these textures well with few dictionary elements. Patches are normalized, thus the average intensity of each patch plays no role in the weighting. This will ensure that, while mainly determined by the cloud score, classification follows natural transitions in the image. A thresholded cloud score itself contains many false positives and false negatives. This application of the Min-cut/Max-flow segmenter is very important for this reason as it smooths out the incorrect scores in an intuitive way. [0058] As can be seen, the overall algorithm is shown in Figure 11 for the case of using the dictionary method and in Figure 12 for the case of showing the neural network method. In each an input image is first received. The image may be a single band of a multi-band image. Next, an n x n overlapping subwindow is slid throughout the image so that patches or portions of the image can be classified. The patches/portions may be normalized. Next, in the dictionary case (Figure 11), for the image portion seen in each subwindow, the image portion is reconstructed in both a cloud dictionary and a ground dictionary to choose a classification based upon which has a better reconstruction. Alternatively, in the Neural Network case (Figure 12), for the image portion seen in each subwindow, the image portion is classified as cloud or ground with the use of a Neural Network. Next, a vote is added to each pixel. Next, the Min-Cut, Max-Flow segmentation is performed.
[0059] At this point, methods and techniques for performing such computer-implemented methods will be discussed. Generally, the techniques disclosed herein may be implemented on any suitable hardware or any suitable combination of software and hardware. For example, they may be implemented in an operating system kernel, in a separate user process, in a library package bound into network applications, on a specially constructed machine, on an application-specific integrated circuit (ASIC), or on a network interface card.
[0060] Software/hardware hybrid implementations of at least some of the embodiments disclosed herein may be implemented on a programmable network-resident machine (which should be understood to include intermittently connected network-aware machines) selectively activated or reconfigured by a computer program stored in memory. Such network devices may have multiple network interfaces that may be configured or designed to utilize different types of network communication protocols. A general architecture for some of these machines may be disclosed herein in order to illustrate one or more exemplary means by which a given unit of functionality may be implemented. According to specific embodiments, at least some of the features or functionalities of the various embodiments disclosed herein may be implemented on one or more general-purpose computers associated with one or more networks, such as for example an end-user computer system, a client computer, a network server or other server system, a mobile computing device (e.g., tablet computing device, mobile phone, smartphone, laptop, and the like), a consumer electronic device, a music player, or any other suitable electronic device, router, switch, or the like, or any combination thereof. In at least some embodiments, at least some of the features or functionalities of the various embodiments disclosed herein may be implemented in one or more virtualized computing environments (e.g., network computing clouds, virtual machines hosted on one or more physical computing machines, or the like).
[0061] Referring now to Figure 5, there is shown a block diagram depicting an exemplary computing device 230 suitable for implementing at least a portion of the features or functionalities disclosed herein. Computing device 230 may be, for example, any one of the computing machines listed in the previous paragraph, or indeed any other electronic device capable of executing software- or hardware-based instructions according to one or more programs stored in memory. Computing device 230 may be adapted to communicate with a plurality of other computing devices, such as clients or servers, over communications networks such as a wide area network a metropolitan area network, a local area network, a wireless network, the Internet, or any other network, using known protocols for such communication, whether wireless or wired.
[0062] In one embodiment, computing device 230 includes one or more central processing units (CPU) 234, one or more interfaces 240, and one or more busses 238 (such as a peripheral component interconnect (PCI) bus). When acting under the control of appropriate software or firmware, CPU 234 may be responsible for implementing specific functions associated with the functions of a specifically configured computing device or machine. For example, in at least one embodiment, a computing device 230 may be configured or designed to function as a server system utilizing CPU 234, local memory 232 and/or remote memory 242, and interface(s) 240. [0063] In at least one embodiment, CPU 234 may be caused to perform one or more of the different types of functions and/or operations under the control of software modules or components, which for example, may include an operating system and any appropriate applications software, drivers, and the like. CPU 234 may include one or more processors 236 such as, for example, a processor from one of the Intel, ARM, Qualcomm, and AMD families of microprocessors. In some embodiments, processors 236 may include specially designed hardware such as application-specific integrated circuits (ASICs), electrically erasable programmable read-only memories (EEPROMs), field-programmable gate arrays (FPGAs), and so forth, for controlling operations of computing device 230. In a specific embodiment, a local memory 232 (such as non-volatile random access memory (RAM) and/or read-only memory (ROM), including for example one or more levels of cached memory) may also form part of CPU 234. However, there are many different ways in which memory may be coupled to system 230. Memory 232 may be used for a variety of purposes such as, for example, caching and/or storing data, programming instructions, and the like.
[0064] As used herein, the term "processor" is not limited merely to those integrated circuits referred to in the art as a processor, a mobile processor, or a microprocessor, but broadly refers to a microcontroller, a microcomputer, a programmable logic controller, an application-specific integrated circuit, and any other programmable circuit.
[0065] In one embodiment, interfaces 240 are provided as network interface cards (NICs). Generally, NICs control the sending and receiving of data packets over a computer network; other types of interfaces 240 may for example support other peripherals used with computing device 230. Among the interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, graphics interfaces, and the like. In addition, various types of interfaces may be provided such as, for example, universal serial bus (USB), Serial, Ethernet, Firewire.TM., PCI, parallel, radio frequency (RF), Bluetooth near-field communications (e.g., using near-field magnetics), 802.11 (WiFi), frame relay, TCP/IP, ISDN, fast Ethernet interfaces, Gigabit Ethernet interfaces, asynchronous transfer mode (ATM) interfaces, high-speed serial interface (HSSI) interfaces, Point of Sale (POS) interfaces, fiber data distributed interfaces (FDDIs), and the like. Generally, such interfaces 240 may include ports appropriate for communication with appropriate media. In some cases, they may also include an independent processor and, in some in stances, volatile and/or non-volatile memory (e.g., RAM).
[0066] Although the system shown in Figure 5 illustrates one specific architecture for a computing device 230 for implementing one or more of the embodiments described herein, it is by no means the only device architecture on which at least a portion of the features and techniques described herein may be implemented. For example, architectures having one or any number of processors 236 may be used, and such processors 236 may be present in a single device or distributed among any number of devices. In one embodiment, a single processor 103 handles communications as well as routing computations, while in other embodiments a separate dedicated communications processor may be provided. In various embodiments, different types of features or functionalities may be implemented in a system that includes a client device (such as a tablet device or smartphone running client software) and server systems (such as a server system described in more detail below).
[0067] Regardless of network device configuration, the system may employ one or more memories or memory modules (such as, for example, remote memory block 242 and local memory 232) configured to store data, program instructions for the general-purpose network operations, or other information relating to the functionality of the embodiments described herein (or any combinations of the above). Program instructions may control execution of or comprise an operating system and/or one or more applications, for example. Memory 242 or memories 232, 242 may also be configured to store data structures, configuration data, encryption data, historical system operations information, or any other specific or generic non-program information described herein. [0068] Because such information and program instructions may be employed to implement one or more systems or methods described herein, at least some network device embodiments may include non-transitory machine-readable storage media, which, for example, may be configured or designed to store program instructions, state information, and the like for performing various operations described herein. Examples of such non-transitory machine-readable storage media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as optical disks, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM), flash memory, solid state drives, memristor memory, random access memory (RAM), and the like. Examples of program instructions include both object code, such as may be produced by a compiler, machine code, such as may be produced by an assembler or a linker, byte code, such as may be generated by for example a Java compiler and may be executed using a Java virtual machine or equivalent, or files containing higher level code that may be executed by the computer using an interpreter (for example, scripts written in Python, Perl, Ruby, Groovy, or any other scripting language).
[0069] In some embodiments, systems may be implemented on a standalone computing system. Referring now to Figure 6, there is shown a block diagram depicting a typical exemplary architecture of one or more embodiments or components thereof on a standalone computing system. Computing device 250 includes processors 252 that may run software that carry out one or more functions or applications of embodiments, such as for example a client application 258. Processors 252 may carry out computing instructions under control of an operating system 254 such as, for example, a version of Microsoft's Windows operating system, Apple's Mac OS/X or iOS operating systems, some variety of the Linux operating system, Google's Android operating system, or the like. In many cases, one or more shared services 256 may be operable in system 250, and may be useful for providing common services to client applications 258. Services 256 may for example be Wndows services, user-space common services in a Linux environment, or any other type of common service architecture used with operating system 254. Input devices 266 may be of any type suitable for receiving user input, including for example a keyboard, touchscreen, microphone (for example, for voice input), mouse, touchpad, trackball, or any combination thereof. Output devices 264 may be of any type suitable for providing output to one or more users, whether remote or local to system 250, and may include for example one or more screens for visual output, speakers, printers, or any combination thereof. Memory 260 may be random-access memory having any structure and architecture known in the art, for use by processors 252, for example to run software. Storage devices 262 may be any magnetic, optical, mechanical, memristor, or electrical storage device for storage of data in digital form. Examples of storage devices 262 include flash memory, magnetic hard drive, CD-ROM, and/or the like.
[0070] In some embodiments, systems may be implemented on a distributed computing network, such as one having any number of clients and/or servers. Referring now to Figure 7, there is shown a block diagram depicting an exemplary architecture for implementing at least a portion of a system according to an embodiment on a distributed computing network. According to the embodiment, any number of clients 330 may be provided. Each client 330 may run software for implementing client-side portions of the embodiments and clients may comprise a system 250 such as that illustrated in Figure 6. In addition, any number of servers 320 may be provided for handling requests received from one or more clients 330. Clients 330 and servers 320 may communicate with one another via one or more electronic networks 310, which may be in various embodiments any of the Internet, a wide area network, a mobile telephony network, a wireless network (such as WiFi, Wmax, and so forth), or a local area network (or indeed any network topology known in the art; no one network topology is preferred over any other). Networks 310 may be implemented using any known network protocols, including for example wired and/or wireless protocols. [0071] In addition, in some embodiments, servers 320 may call external services 370 when needed to obtain additional information, or to refer to additional data concerning a particular call. Communications with external services 370 may take place, for example, via one or more networks 310. In various embodiments, external services 370 may comprise web-enabled services or functionality related to or installed on the hardware device itself. For example, in an embodiment where client applications 258 are implemented on a smartphone or other electronic device, client applications 258 may obtain information stored in a server system 320 in the cloud or on an external service 370 deployed on one or more of a particular enterprise's or user's premises.
[0072] In some embodiments, clients 330 or servers 320 (or both) may make use of one or more specialized services or appliances that may be deployed locally or remotely across one or more networks 310. For example, one or more databases 340 may be used or referred to by one or more embodiments. It should be understood by one having ordinary skill in the art that databases 340 may be arranged in a wide variety of architectures and using a wide variety of data access and manipulation means. For example, in various embodiments one or more databases 340 may comprise a relational database system using a structured query language (SQL), while others may comprise an alternative data storage technology such as those referred to in the art as "NoSQL" (for example, Hadoop Cassandra, Google BigTable, and so forth). In some embodiments, variant database architectures such as column-oriented databases, in-memory databases, clustered databases, distributed databases, or even flat file data repositories may be used. It will be appreciated by one having ordinary skill in the art that any combination of known or future database technologies may be used as appropriate, unless a specific database technology or a specific arrangement of components is specified for a particular embodiment herein. Moreover, it should be appreciated that the term "database" as used herein may refer to a physical database machine, a cluster of machines acting as a single database system, or a logical database within an overall database management system. Unless a specific meaning is specified for a given use of the term "database", it should be construed to mean any of these senses of the word, all of which are understood as a plain meaning of the term "database" by those having ordinary skill in the art.
[0073] Similarly, most embodiments may make use of one or more security systems 360 and configuration systems 350. Security and configuration management are common information technology (IT) and web functions, and some amount of each are generally associated with any IT or web systems. It should be understood by one having ordinary skill in the art that any configuration or security subsystems known in the art now or in the future may be used in conjunction with embodiments without limitation, unless a specific security 360 or configuration system 350 or approach is specifically required by the description of any specific embodiment.
[0074] In various embodiments, functionality for implementing systems or methods may be distributed among any number of client and/or server components. For example, various software modules may be implemented for performing various functions, and such modules can be variously implemented to run on server and/or client components.
[0075] While the foregoing has illustrated and described several embodiments in detail in the drawings and foregoing description, such illustration and description is to be considered as exemplary and not restrictive in character. For example, certain embodiments described hereinabove may be combinable with other described embodiments and/or arranged in other ways (e.g., process elements may be performed in other sequences). Accordingly, it should be understood that only the preferred embodiment and variants thereof have been shown and described and that all changes and modifications that come within the spirit of the disclosure are desired to be protected.

Claims

What is claimed is:
1. A computer-implemented process for determining whether given imagery in an overhead image is cloud imagery or ground imagery, comprising:
for multiple portions of an image, making an initial determination about whether each of the multiple portions primarily contains cloud imagery or primarily contains ground imagery by, with a processor, utilizing a neural network to classify each of the multiple portions of the image as one of cloud or ground; and
with a processor, performing an optimization technique on the multiple portions of the overhead image using the initial determination to determine which portions of the overhead image include cloud imagery or ground imagery.
2. A computer-implemented process as defined in claim 1 , wherein the optimization technique includes identifying adjacent pixels and calculating a capacity between the identified adjacent pixels.
3. A computer-implemented process as defined in claim 2, wherein the optimization technique further includes creating a score for each pixel to represent the likelihood that the pixel does or does not contain a cloud and creating a grid-graph of the scores of the pixels with adjacency information associated with each set of adjacent pixels.
4. A computer-implemented process as defined in claim 3, wherein the optimization technique further includes connecting the pixels of the grid-graph to both a source and a sink using the pixel score as the capacity, wherein one of the source and the sink represents cloud and one represents ground, and performing a min- cut/m ax-flow segmentation on the image.
5. A computer-implemented process as defined in claim 1 , wherein the optimization technique includes applying a window, having a height and width that are less than a height and width of the image and that are the same as that of the multiple portions of the image, the various portions partially overlapping adjacent portions, in order to determine if each portion most likely contains cloud imagery or ground imagery, and incrementing or decrementing a score for each pixel in the portion based on whether the determination was of cloud imagery or ground imagery, respectively.
6. A computer-implemented process as defined in claim 1 , wherein the overhead image is a satellite-based image.
7. A computer-implemented process as defined in claim 1 , further including adding to metadata associated with each pixel an indication of whether each such pixel includes cloud imagery.
8. A computer-implemented process as defined in claim 7, further including using the indication of cloud imagery in the metadata to select pixels for an orthomosaic image free of clouds.
9. A computer-implemented process for determining whether given imagery in an overhead image is cloud imagery or ground imagery, comprising:
for multiple portions of an image, making an initial determination about whether each of the multiple portions primarily contains cloud imagery or primarily contains ground imagery by with a processor, utilizing a neural network to classify each of the multiple portions of the image as one of cloud or ground;
applying a window, having a height and width that are less than a height and width of the image and that are the same as that of the multiple portions of the image, to various portions of the image, the various portions partially overlapping adjacent portions, in order to determine if each portion most likely contains cloud imagery or ground imagery, and incrementing or decrementing a score for each pixel in the portion based on whether the determination was of cloud imagery or ground imagery, respectively;
creating a weight for each pixel to represent the likelihood that the pixel does or does not contain a cloud;
identifying adjacent pixels and calculating a capacity between the identified adjacent pixels;
creating a grid-graph of the scores of the pixels with adjacency information associated with each set of adjacent pixels;
connecting the pixels of the grid-graph to both a source and a sink using the pixel score as the capacity, wherein one of the source and the sink represents cloud and one represents ground; and
performing a min-cut/max-flow segmentation on the image to define portions of the overhead image which are believed to include cloud imagery and portions of the overhead image which are believed to include ground imagery.
10. A computer-implemented process as defined in claim 9, wherein the overhead image is a satellite-based image.
11. A computer-implemented process as defined in claim 9, further including adding to metadata associated with each pixel an indication of whether each such pixel includes cloud imagery.
12. A computer-implemented process as defined in claim 9, further including using the indication of cloud imagery in the metadata to select pixels for an orthomosaic image free of clouds.
13. A computer-implemented process for determining whether given imagery in an overhead image is cloud imagery or ground imagery, comprising:
receiving an image having a plurality of pixels;
sliding an n x n overlapping subwindow throughout the image so that image portions of the image can be classified;
for the image portion seen in each subwindow, classifying the image portion as cloud or ground with the use of a Neural Network;
adding a vote to each pixel based on the classification of each image portion containing the pixel; and
with a processor, performing an optimization technique on the classifications of the pixels in the image using the initial determination to determine which pixels of the image include cloud imagery or ground imagery.
14. A computer-implemented process as defined in claim 13, wherein the
optimization technique includes identifying adjacent pixels and calculating a capacity between the identified adjacent pixels.
15. A computer-implemented process as defined in claim 14, wherein the
optimization technique further includes creating a score for each pixel to represent the likelihood that the pixel does or does not contain a cloud and creating a grid-graph of the scores of the pixels with adjacency information associated with each set of adjacent pixels.
16. A computer-implemented process as defined in claim 15, wherein the
optimization technique further includes connecting the pixels of the grid-graph to both a source and a sink using the pixel score as the capacity, wherein one of the source and the sink represents cloud and one represents ground, and performing a min- cut/m ax-flow segmentation on the image.
17. A computer-implemented process as defined in claim 13, wherein the adding a vote operation includes applying a window, having a height and width that are less than a height and width of the image and that are the same as that of the subwindow, to various portions of the image, the various portions partially overlapping adjacent portions, in order to determine if each portion most likely contains cloud imagery or ground imagery, and incrementing or decrementing a score for each pixel in the portion based on whether the determination was of cloud imagery or ground imagery, respectively.
18. A computer-implemented process as defined in claim 13, wherein the image is a satellite-based image.
19. A computer-implemented process as defined in claim 13, further including adding to metadata associated with each pixel an indication of whether each such pixel includes cloud imagery.
20. A computer-implemented process as defined in claim 19, further including using the indication of cloud imagery in the metadata to select pixels for an orthomosaic image free of clouds.
PCT/US2019/052593 2018-09-24 2019-09-24 Advanced cloud detection using neural networks and optimization techniques WO2020068744A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP19864631.7A EP3857443A4 (en) 2018-09-24 2019-09-24 Advanced cloud detection using neural networks and optimization techniques

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/140,052 US10685253B2 (en) 2016-11-28 2018-09-24 Advanced cloud detection using neural networks and optimization techniques
US16/140,052 2018-09-24

Publications (1)

Publication Number Publication Date
WO2020068744A1 true WO2020068744A1 (en) 2020-04-02

Family

ID=69953551

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/052593 WO2020068744A1 (en) 2018-09-24 2019-09-24 Advanced cloud detection using neural networks and optimization techniques

Country Status (2)

Country Link
EP (1) EP3857443A4 (en)
WO (1) WO2020068744A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20230082344A (en) * 2021-12-01 2023-06-08 한국항공우주연구원 Machine learning-based optical satellite image automatic cloudiness analysis system and automatic cloudiness analysis method
CN118172291A (en) * 2024-05-14 2024-06-11 浙江国遥地理信息技术有限公司 Image cloud removing method and device for remote sensing image and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114027A1 (en) * 2003-11-24 2005-05-26 The Boeing Company Cloud shadow detection: VNIR-SWIR
US20170161584A1 (en) * 2015-12-07 2017-06-08 The Climate Corporation Cloud detection on remote sensing imagery

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10083354B2 (en) * 2016-11-28 2018-09-25 Digitalglobe, Inc. Advanced cloud detection using machine learning and optimization techniques

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114027A1 (en) * 2003-11-24 2005-05-26 The Boeing Company Cloud shadow detection: VNIR-SWIR
US20170161584A1 (en) * 2015-12-07 2017-06-08 The Climate Corporation Cloud detection on remote sensing imagery

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3857443A4 *
TIAN ET AL.: "Temporal Updating Scheme for Probabilistic Neural Network with Application to Satellite Cloud Classification", IEEE TRANSACTIONS ON NEURAL NETWORKS, vol. 11, no. 4, July 2000 (2000-07-01), pages 903 - 920, XP011039507 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20230082344A (en) * 2021-12-01 2023-06-08 한국항공우주연구원 Machine learning-based optical satellite image automatic cloudiness analysis system and automatic cloudiness analysis method
KR102656333B1 (en) * 2021-12-01 2024-04-11 한국항공우주연구원 Machine learning-based optical satellite image automatic cloudiness analysis system and automatic cloudiness analysis method
CN118172291A (en) * 2024-05-14 2024-06-11 浙江国遥地理信息技术有限公司 Image cloud removing method and device for remote sensing image and electronic equipment

Also Published As

Publication number Publication date
EP3857443A4 (en) 2022-08-03
EP3857443A1 (en) 2021-08-04

Similar Documents

Publication Publication Date Title
US10083354B2 (en) Advanced cloud detection using machine learning and optimization techniques
US10691942B2 (en) Unsupervised land use and land cover detection
US11501520B2 (en) Advanced cloud detection using neural networks and optimization techniques
US10803310B2 (en) System for simplified generation of systems for broad area geospatial object detection
Sameen et al. Classification of very high resolution aerial photos using spectral‐spatial convolutional neural networks
US9495618B1 (en) Object detection with textural to spectral domain adaptation
US9922265B2 (en) Global-scale object detection using satellite imagery
Li et al. An efficient and robust iris segmentation algorithm using deep learning
CN113785305B (en) Method, device and equipment for detecting inclined characters
US20190043217A1 (en) Broad area geospatial object detection using autogenerated deep learning models
US20180158210A1 (en) Synthesizing training data for broad area geospatial object detection
WO2020049385A1 (en) Multi-view image clustering techniques using binary compression
CN110796154A (en) Method, device and equipment for training object detection model
GB2574372A (en) Implementing Traditional Computer Vision Algorithmis As Neural Networks
WO2020068744A1 (en) Advanced cloud detection using neural networks and optimization techniques
KR20200092450A (en) Technique for perfoming data labeling
WO2018222775A1 (en) Broad area geospatial object detection
Al-wajih et al. An enhanced LBP-based technique with various size of sliding window approach for handwritten Arabic digit recognition
US9916640B2 (en) Automated sliver removal in orthomosaic generation
US11200657B2 (en) Method and system for semantic change detection using deep neural network feature correlation
WO2021109878A1 (en) Method and system for semi-supervised content localization
US20240256637A1 (en) Data Classification Using Ensemble Models
CN113591987B (en) Image recognition method, device, electronic equipment and medium
US20240143982A1 (en) Fused Convolutions for Fast Deep Neural Network
US20240296202A1 (en) Efficiently clustering data points with an in-memory computing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19864631

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019864631

Country of ref document: EP

Effective date: 20210426