CN111612102B - Satellite image data clustering method, device and equipment based on local feature selection - Google Patents
Satellite image data clustering method, device and equipment based on local feature selection Download PDFInfo
- Publication number
- CN111612102B CN111612102B CN202010504460.3A CN202010504460A CN111612102B CN 111612102 B CN111612102 B CN 111612102B CN 202010504460 A CN202010504460 A CN 202010504460A CN 111612102 B CN111612102 B CN 111612102B
- Authority
- CN
- China
- Prior art keywords
- image data
- satellite image
- distribution
- parameter
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/13—Satellite images
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Astronomy & Astrophysics (AREA)
- Remote Sensing (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a satellite image data clustering method, a device and equipment based on local feature selection, wherein the method comprises the following steps: s101, acquiring a satellite image data set to be processedS102, modeling satellite image data by using a non-parametric VM (virtual machine) mixed model selected based on local features; s103, estimating model parameters of the nonparametric VM mixed model through a variational Bayes inference algorithm and calculating feature importance; s104, judging whether the nonparametric VM mixed model converges or not according to the estimated model parameters; if not, returning to the step S103, if so, executing the step S105; s105, screening the satellite image data according to the importance of the features to reserve important satellite image data; and S106, judging the category of each satellite image data according to the posterior probability of the indicator factor, and clustering the satellite image data according to the category. The embodiment can obtain better clustering result when processing unbalanced data.
Description
Technical Field
The invention relates to the field of data processing, in particular to a satellite image data clustering method, a satellite image data clustering device and satellite image data clustering equipment based on local feature selection.
Background
Land satellites are commonly used to investigate underground mines, marine resources, and groundwater resources, study the growth and topography of natural plants, investigate and forecast various serious natural disasters (such as earthquakes) and environmental pollution, and take images of various targets to draw thematic maps (such as geological maps, topographical maps, and hydrological maps). With the advent of an era featuring integrated remote sensing methods, it would be of great importance to interpret a scene by integrating multiple types and resolutions of spatial data (including multispectral and radar data, displayed terrain, land use maps, etc.).
In the prior art, w.fan et al propose a clustering method of a VM hybrid model based on feature selection and Dirichlet Process (DP). The method adopts a variational Bayesian inference method to estimate model parameters and is applied to clustering analysis of text and plant image data. It has the following disadvantages:
clustering analysis of unbalanced data is not efficient because DP mixture models typically cannot identify classes that contain only a small number of data samples.
Disclosure of Invention
In view of the above, the present invention provides a text data clustering method, device and apparatus based on a non-parametric VMF hybrid model, which can obtain a better clustering result when processing unbalanced data.
The embodiment of the invention provides a satellite image data clustering method based on local feature selection, which comprises the following steps:
s101, acquiring a satellite image data set to be processedThe satellite image data set comprises N pieces of satellite image data, and each piece of satellite image data is normalized into a D-dimensional data characteristic vector by an L2 norm:| L | · | is the calculation of the L2 norm;
s102, modeling the satellite image data by using a non-parametric VM (virtual machine) mixed model selected based on local features;
s103, estimating model parameters of the nonparametric VM mixed model through a variational Bayes inference algorithm and calculating feature importance;
s104, judging whether the nonparametric VM mixed model converges or not according to the estimated model parameters; if not, returning to the step S103, if so, executing the step S105;
s105, screening the satellite image data according to the importance of the features to reserve important satellite image data;
and S106, judging the category of each satellite image data according to the posterior probability of the indicator factor, and clustering the satellite image data according to the category.
Preferably, the modeling the satellite image data by using the non-parametric VM hybrid model selected based on the local features specifically includes:
pair of clothesFrom VM probability distribution p vm Characterization of satellite image data, D-dimensional data thereofIs expressed as:
wherein the content of the first and second substances,y nd1 =x nd ,y nd2 in the formula, the order is to ensure the vectorThe L2 norm normalization is satisfied,as the location parameter, the location parameter is,is a scale parameter and satisfies a condition lambda d ≥0,I 0 (λ) is a modified first class 0 order Bessel function;
for each piece of D-dimensional satellite image data complying with non-parametric VM hybrid modelObtaining the probability density function expression:
the nonparametric VM hybrid model is composed of infinite mixing components, and each mixing component corresponds to the product of D VM probability distributions:wherein each feature corresponds to a VM probability distribution;is the VM distribution parameter of the d-th feature in the k-th hybrid component, and pi k >0 is the corresponding mixing coefficient and satisfies the condition
For each satellite image dataSpecifying a binary hidden variableAs an indicator factor: when Z is nk =1 hour, indicating satellite image dataBelong to the kth category; otherwise, Z nk =0; hidden variableIs distributed in probability of
Fusing a local feature selection technology and a non-parameter VM mixed model to obtain the feature x of each satellite image data nd Obeyed probability distribution:
wherein the parameter phi nkd Is a binary parameter when phi nkd When =1, the characteristic x is represented nd Are related features and obey VM probability distributionWhen phi is nkd When =0, it represents the characteristic x nd Are uncorrelated features and obey VM probability distributionsParameter(s)Obeying the Bernoulli distribution:
wherein the parameter ε kd Representing feature importance of a d-th feature in the k-th component;
adopting VM-Gamma distribution as parameter of VM distribution to which related characteristics belongJoint prior distribution of (c):
using VM-Gamma distribution as parameter of VM distribution to which irrelevant feature belongsJoint prior distribution of (c):
acquiring a full probability expression of a non-parametric VM (virtual machine) mixed model selected based on local features:
preferably, the nonparametric VM mixed model is constructed by a Pitman-Yor process model based on a Stick-Breaking representation method; in a Pitman-Yor process model based on a Stick-Breaking representation method, a mixing coefficient pi k Is represented as follows:
Wherein p is b (. Alpha.) is Beta distribution, a is a discount parameter in the Pitman-Yor process model and satisfies the condition 0-1, b is that the density parameter satisfies the condition b>-a。
Preferably, the estimating the model parameters of the non-parametric VM mixed model and calculating the feature importance by using a variational bayes inference algorithm specifically includes:
initializing model parameters; the method comprises the following steps of initializing truncation layer number K =15; initializing the hyperparameter u kd =0.1,u′ kd =0.1,v kd =0.01,v′ kd =0.01,β kd >0,β′ kd >0,a k =0.5, b k =0.5 initialization r using the K-Means algorithm nk (ii) a Initialization
Updating the variation posterior, the expected value and the feature importance by using the current model parameters;
Obtaining a variation lower bound generated by the current iteration;
and comparing the variation lower bound generated by the current iteration with the variation lower bound generated by the last iteration to judge whether the nonparametric VM mixed model converges.
Preferably, the updating of the variation posterior, the expected value and the feature importance by using the current model parameters specifically includes:
defining the lower bound of variation as:
L(q)=<lnp(Θ|X)>-<lnq(Θ)>
wherein the content of the first and second substances,<·>in order to calculate the expected value of the quantity,a set of all random variables and hidden variables; q (Θ) is an approximate distribution of the real posterior distribution p (Θ | X), namely a variational posterior; the expression of the variation posteriori q (Θ) is as follows:
truncating the hybrid component from an infinite dimensional space to a K dimensional space using a truncation technique:
Wherein K is the number of truncation layers, namely the number of categories; initializing the K value to an arbitrary value, and reaching an optimal value when converging;
all variational posteriors were optimized by maximizing the lower bound of variational L (q):
the hyperparameter in the formula is calculated by the following formula:
the expected value in the above is calculated by the following formula
<Z nk >=r nk (21)
<φ nkd >=f nkd (24)
<1-φ nkd >=1-f nkd (25)
<lnπ′ k >=Ψ(g k )-Ψ(g k +h k ) (28)
<ln(1-π′ k )>=Ψ(h k )-Ψ(g k +h k ) (29)
Wherein Ψ (. Cndot.) is a Digamma function;
calculating feature importance
Preferably, comparing the lower bound of variation generated by the current iteration with the lower bound of variation generated by the last iteration to determine whether the non-parametric VM hybrid model converges specifically is:
whether the difference between the lower variation bound generated by the current iteration and the lower variation bound generated by the last iteration is smaller than a preset threshold value or not; the preset threshold value is 0.0001
If yes, determining that the nonparametric VM mixed model converges;
and if not, judging that the nonparametric VM mixed model does not converge.
Preferably, the screening of the satellite image data according to the feature importance to retain important satellite image data specifically includes:
judging the feature importance degree feature to screen satellite image data, wherein the feature importance degree is lower than a threshold value and is regarded as irrelevant features to be eliminated, and the irrelevant features are eliminated; the feature importance degree is larger than or equal to the threshold value, and then the relevant features needing to be reserved are considered.
Preferably, the category to which each satellite image data belongs is determined according to the posterior probability of the indicator, so that the satellite image data is clustered according to the category to which the satellite image data belongs, specifically:
obtaining the posterior probability r of the indicative factor nk ,r nk Representing the nth satellite image dataProbability of belonging to the kth class;
The embodiment of the invention also provides a satellite image data clustering device based on local feature selection, which comprises:
a data set acquisition unit for acquiring satellite image data set to be processedThe satellite image data set comprises N pieces of satellite image data, and each piece of satellite image data is normalized into a D-dimensional data characteristic vector by an L2 norm:| L | · | is the calculation of the L2 norm;
the modeling unit is used for modeling the satellite image data by using a non-parameter VM (virtual machine) mixed model selected based on local features;
the parameter estimation unit is used for estimating model parameters of the nonparametric VM mixed model through a variational Bayesian inference algorithm and calculating feature importance;
a convergence judging unit for judging whether the nonparametric VM mixed model converges according to the estimated model parameters; if not, the parameter estimation unit is informed, and if yes, the screening unit is informed;
the screening unit is used for screening the satellite image data according to the importance of the characteristics so as to reserve the important satellite image data;
and the clustering unit is used for judging the category of each satellite image data according to the posterior probability of the indicator factor so as to cluster the satellite image data according to the category.
The embodiment of the invention also provides text data clustering equipment based on a nonparametric VMF mixed model, which comprises a memory and a processor, wherein a satellite image data set to be clustered and a computer program are stored in the memory, and the computer program can be executed by the processor so as to realize the satellite image data clustering method based on local feature selection.
In the embodiment, the non-parametric hybrid model based on Von Mises (VM) probability distribution is constructed by adopting a non-parametric model framework based on a Pitman-Yor Process (Pitman-Yor Process), and a local feature selection (localized feature selection) method and the non-parametric VM hybrid model are fused in the same model framework (PYP-VM for short), so that the terrestrial satellite data are subjected to cluster analysis. In this embodiment, each piece of satellite influence data is normalized by the L2 norm and then modeled using a VM mixture model selected based on local features. In order to be able to flexibly adjust the number of data categories according to the size of data, the present embodiment uses a nonparametric model framework named Pitman-Yor process to construct a nonparametric mixed model based on VM distribution, and the parameters of the proposed nonparametric VM mixed model based on local feature selection are estimated by a Variational Bayes Inference (Variational Bayes Inference) algorithm. Compared with the prior art, the method has the advantages that the discount parameters which can be used for controlling the generation of the new category number are provided, so that the method is more advantageous than a method based on a DP mixed model when unbalanced data is processed, and a better clustering result can be obtained.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a satellite image data clustering method based on local feature selection according to a first embodiment of the present invention.
Fig. 2 is another schematic flow chart of a satellite image data clustering method based on local feature selection according to a first embodiment of the present invention.
Fig. 3 is a schematic diagram of program modules of a satellite image data clustering method and device based on local feature selection according to a second embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of the present invention provides a satellite image data clustering method based on local feature selection, which can be performed by a satellite image data clustering device (hereinafter referred to as a clustering device) based on local feature selection, including:
s101, acquiring a satellite image data set to be processedThe satellite image data set comprises N pieces of satellite image data, and each piece of satellite image data is normalized to D by using L2 normDimensional data feature vector:| | | | is the calculation of the L2 norm.
In this embodiment, the clustering device may be a computer device with a data processing function, such as a laptop, a desktop, or a server, and the computer device may implement the satellite image data clustering method based on local feature selection by executing a predetermined program.
And S102, modeling the satellite image data by using a non-parametric VM (virtual machine) mixed model selected based on the local features.
In this embodiment, step S102 specifically includes:
s1021, corresponding to the obedience VM probability distribution p vm Characterization of satellite image data, D-dimensional data thereofIs expressed as:
wherein the content of the first and second substances,y nd1 =x nd ,y nd2 in the formula, to ensure the vectorThe L2 norm normalization is satisfied,is a parameter of the location of the mobile terminal,is a scale parameter and satisfies a condition lambda d ≥0,I 0 (λ) is a modified first class 0 order Bessel function;
s1022, for each obedienceD-dimensional satellite image data of nonparametric VM (virtual machine) hybrid modelObtaining the probability density function expression:
wherein, the nonparametric VM hybrid model is composed of an infinite number of hybrid components, each of which corresponds to the product of D VM probability distributions:wherein each feature corresponds to a VM probability distribution;is the VM distribution parameter of the d-th feature in the k-th hybrid component, and pi k >0 is the corresponding mixing coefficient and satisfies the conditionThe nonparametric VM mixed model is constructed by a Pitman-Yor process model based on a Stick-Breaking representation method; in a Pitman-Yor process model based on a Stick-Breaking representation method, a mixing coefficient pi k Is represented as follows:
Wherein p is b (. A) is Beta distribution, and a is a Pitman-Yor process modelThe condition 0-1 is satisfied for the discount parameter in (1), and the density parameter satisfies the condition b>-a。
S1023, for each satellite image dataSpecifying a binary hidden variableAs an indicator factor: when Z is nk =1 hour, indicating satellite image dataBelong to the kth category; otherwise, Z nk =0; hidden variableIs distributed in probability of
S1024, fusing the local feature selection technology and the non-parameter VM mixed model to obtain the feature x of each satellite image data nd Obeyed probability distribution:
in order to process high-dimensional data more effectively, the embodiment fuses a local feature selection technology and the proposed nonparametric VM hybrid model in the same model frame, so that irrelevant features can be automatically removed in the clustering analysis process to improve the clustering performance. Here, the parameter φ nkd Is a binary parameter when phi nkd =1, represents the characteristic x nd Are related features and obey VM probability distributionsWhen phi is nkd When =0, it represents the characteristic x nd Are uncorrelated features and obey VM probability distributionsParameter(s)Obeying the Bernoulli distribution:
parameter epsilon kd Representing the feature importance of the d-th feature in the k-th component;
s1025, adopting the VM-Gamma distribution as the parameter of the VM distribution to which the relevant characteristics belongJoint prior distribution of (c):
s1026, adopting VM-Gamma distribution as the parameter of the VM distribution to which the irrelevant characteristics belongJoint prior distribution of (c):
s1027, acquiring a full probability expression of the non-parametric VM mixed model selected based on local features:
s103, estimating model parameters of the nonparametric VM mixed model through a variational Bayes inference algorithm and calculating feature importance;
s104, judging whether the nonparametric VM mixed model converges or not according to the estimated model parameters; if not, the process returns to step S103, and if so, step S105 is executed.
Specifically, the method comprises the following steps:
firstly, initializing model parameters; the method comprises the following steps of initializing truncation layer number K =15; initializing the hyperparameter u kd =0.1,u′ kd =0.1,v kd =0.01,v′ kd =0.01,β kd >0,β′ kd >0,a k =0.5, b k =0.5 initialization r using the K-Means algorithm nk (ii) a Initialization
The variation posteriori, the expectation value, and the feature importance are then updated with the current model parameters.
Wherein, the lower bound of the defined variation is as follows:
L(q)=<ln p(Θ|X)>-<ln q(Θ)>
here, the first and second liquid crystal display panels are,<·>in order to calculate the expected value of the quantity,a set of all random variables and hidden variables; q (Θ) is an approximate distribution of the real posterior distribution p (Θ | X), namely a variational posterior; the expression of the variation posteriori q (Θ) is as follows:
truncating the hybrid component from an infinite dimensional space to a K dimensional space using a truncation technique:
Wherein K is the number of truncation layers, namely the number of categories; initializing the K value to an arbitrary value, and reaching an optimal value when converging;
all variational posteriors were optimized by maximizing the lower bound of variational L (q):
the hyperparameter in the formula is calculated by the following formula:
the expected value in the above is calculated by the following formula
<Z nk >=r nk (21)
<φ nkd >=f nkd (24)
<1-φ nkd >=1-f nkd (25)
<lnπ′ k >=Ψ(g k )-Ψ(g k +h k ) (28)
<ln(1-π′ k )>=Ψ(h k )-Ψ(g k +h k ) (29)
Where Ψ (·) is a Digamma function.
Calculating feature importance
Then, obtaining a variation lower bound generated by the current iteration;
and finally, comparing the variation lower bound generated by the current iteration with the variation lower bound generated by the last iteration to judge whether the nonparametric VM mixed model converges.
Specifically, it may be determined whether a difference between a lower variation bound generated by a current iteration and a lower variation bound generated by a previous iteration is smaller than a preset threshold; if yes, judging that the nonparametric VM mixed model is converged, and the number of truncation layers reaches an optimal value; if not, the non-parameter VM mixed model is judged not to be converged, and the next iteration is needed.
In a preferred embodiment of the present invention, the preset threshold may be 0.0001, but it should be noted that other values may be used, and the smaller the preset threshold is, the higher the iteration precision is, and the present invention is not limited specifically.
And S105, screening the satellite image data according to the importance of the features to reserve the important satellite image data.
The satellite image data can be screened by judging the feature importance degree feature, for example, the feature importance degree is lower than a threshold (such as 0.5) and is regarded as an irrelevant feature needing to be eliminated; the feature importance degree is greater than or equal to a threshold value (such as 0.5), and then the relevant features which need to be reserved are considered.
It should be noted that the threshold may be selected according to actual needs, and is not limited to 0.5.
And S106, judging the category of each satellite image data according to the posterior probability of the indicator factor, and clustering the satellite image data according to the category.
In this embodiment, after the model converges, the posterior probability r of the indicative factor in the converged model parameters is obtained nk Posterior probability r of the indicator nk Representing the nth satellite image dataProbability of belonging to the kth class, in this case according to r nk Selecting the category with the highest probability as satellite image dataAnd then according to the category of the satellite image data, clustering of different satellite image data in the data set can be realized.
In order to facilitate understanding of the present invention, the following description will be given of an application of the present embodiment as a practical example.
Wherein in this example, cluster verification will be performed on the published data set (Statlog data set). A terrestrial satellite MSS image is composed of four digital images of the same scene in different spectral bands. Two of which are in the visible region (corresponding approximately to the green and red regions of the visible spectrum) and two of which are in the (near) infrared. The spatial resolution of the pixels is approximately 80m x 80m. Each image contains 2340x 3380 pixels. The Statlog dataset is a sub-region of the original terrestrial satellite MSS imagery (from NASA) consisting of 82x100 pixels. The data set has a total of 6435 pieces of data, each piece of data corresponding to a 3x3 square neighborhood of pixels contained entirely within the 82x100 sub-region. Each row contains pixel values in four spectral bands for each of the 9 pixels in the 3x3 neighborhood. The experimental objective was to perform cluster analysis on the data set to identify which of the following 6 surface types or land use cases the data belongs to: red soil, cotton crops, grey soil, moist grey soil, soil with plant stubbles, very moist grey soil.
In this embodiment, the Windows10 system is used as an experimental platform, matlab is used as a programming language, and the parameter setting is described in the above embodiments. Although each piece of data in the Statlog dataset contains a corresponding class label, the present embodiment does not use these class labels in the clustering process because the clustering analysis belongs to an unsupervised learning method. After the clustering is completed, however, the accuracy of grouping can be calculated by referring to the clustering label as an evaluation index of clustering performance.
In addition, the embodiment also performs experimental comparison with the clustering method provided by the prior art and the classic K-means clustering algorithm. In the prior art, a mixed model based on a DP process and a VM (virtual machine) is referred to as DP-VM for short. Each method was repeated 10 times and the average accuracy was taken as a comparison index. The results of the experiment are shown in table 1. Compared with the prior art i and the K-means clustering method, the satellite image data clustering method provided by the embodiment can obtain a better clustering result, namely a higher accuracy.
TABLE 1
In summary, in the satellite image data clustering method based on local feature selection provided by this embodiment, a nonparametric mixture model based on Von Mises (VM) probability distribution is constructed by using a nonparametric model frame based on a Pitman-Yor Process (Pitman-Yor Process), and a local feature selection (localized feature selection) method and the nonparametric VM mixture model are fused in the same model frame (referred to as PYP-VM for short), so as to perform cluster analysis on terrestrial satellite data. In the present embodiment, each piece of satellite influence data is normalized by the L2 norm before being modeled by a VM mixture model selected based on local features. In order to be able to flexibly and automatically adjust the number of data categories according to the size of data, the embodiment uses a nonparametric model framework named as Pitman-Yor process to construct a nonparametric mixed model based on VM distribution, and the parameters of the proposed nonparametric VM mixed model based on local feature selection are estimated by a Variational Bayes Inference (Variational Bayes Inference) algorithm. Compared with the prior art, the method has the advantages that the discount parameters which can be used for controlling the generation of the new category number are provided, so that the method is more advantageous than a method based on a DP mixed model when unbalanced data is processed, and a better clustering result can be obtained.
The second embodiment of the present invention further provides a satellite image data clustering device based on local feature selection, including:
a data set acquisition unit 210 for acquiring a satellite image data set to be processedThe satellite image data set comprises N pieces of satellite image data, and each piece of satellite image data is normalized into a D-dimensional data characteristic vector by an L2 norm:| L | · | is the calculation of the L2 norm;
a modeling unit 220 for modeling the satellite image data using a non-parametric VM hybrid model selected based on the local features;
a parameter estimation unit 230 that estimates model parameters of the nonparametric VM hybrid model through a variational bayesian inference algorithm and calculates feature importance;
a convergence judging unit 240 that judges whether the nonparametric VM mixed model converges according to the estimated model parameter; if not, the parameter estimation unit 230 is notified, and if yes, the screening unit 250 is notified;
a screening unit 250, configured to screen the satellite image data according to the importance of the features to retain important satellite image data;
and the clustering unit 260 is used for judging the category of each satellite image data according to the posterior probability of the indicator factor, so as to cluster the satellite image data according to the category.
The third embodiment of the present invention further provides a text data clustering device based on a nonparametric VMF mixed model, which includes a memory and a processor, wherein the memory stores a satellite image data set to be clustered and a computer program, and the computer program can be executed by the processor, so as to implement the above satellite image data clustering method based on local feature selection.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. A satellite image data clustering method based on local feature selection is characterized by comprising the following steps:
s101, acquiring a satellite image data set to be processedThe satellite image data set comprises N pieces of satellite image data, and each piece of satellite image data is normalized into a D-dimensional data characteristic vector by an L2 norm:| L | · | is the calculation of the L2 norm;
s102, modeling the satellite image data by using a non-parametric VM (virtual machine) mixed model selected based on local features;
s103, estimating model parameters of the nonparametric VM mixed model through a variational Bayes inference algorithm and calculating feature importance;
s104, judging whether the nonparametric VM mixed model converges or not according to the estimated model parameters; if not, returning to the step S103, if so, executing the step S105;
s105, screening the satellite image data according to the importance of the features to reserve important satellite image data;
s106, judging the category of each satellite image data according to the posterior probability of the indicator factor, and clustering the satellite image data according to the category; the modeling of the satellite image data by using the non-parametric VM hybrid model selected based on the local features specifically includes:
for compliance with VM probability distribution p vm Characterization of satellite image data, D-dimensional data thereofIs expressed as:
wherein the content of the first and second substances,y nd1 =x nd ,y nd2 in the formula, the order is to ensure the vectorThe L2 norm normalization is satisfied,as the location parameter, the location parameter is,is a scale parameter and satisfies a condition lambda d ≥0,I 0 (λ) is a modified first class 0 order Bessel function;
for each piece of D-dimensional satellite image data complying with non-parametric VM hybrid modelObtaining the probability density function expression:
the nonparametric VM hybrid model is composed of infinite mixing components, and each mixing component corresponds to the product of D VM probability distributions:wherein each feature corresponds to a VM probability distribution;is the VM distribution parameter of the d-th feature in the k-th hybrid component, and pi k >0 is the corresponding mixing coefficient and satisfies the condition
For each satellite image dataSpecifying a binary hidden variableAs an indicator factor: when Z is nk When =1, indicating satellite image dataBelong to the kth category; otherwise, Z nk =0; hidden variableHas a probability distribution of
Fusing a local feature selection technology and a non-parameter VM mixed model to obtain the feature x of each satellite image data nd Obeyed probability distribution:
wherein the parameter phi nkd Is a binary parameter when phi nkd =1, represents the characteristic x nd Are related features and obey VM probability distributionsWhen phi is nkd =0, represents the characteristic x nd Being uncorrelated features and subject to VM probability distributionsParameter(s)Obeying the Bernoulli distribution:
wherein the parameter ε kd Representing the feature importance of the d-th feature in the k-th component;
adopting VM-Gamma distribution as parameter of VM distribution to which related characteristics belongJoint prior distribution of (c):
using VM-Gamma distribution as parameter of VM distribution to which irrelevant feature belongsJoint prior distribution of (c):
acquiring a full probability expression of a non-parametric VM (virtual machine) mixed model selected based on local features:
2. the method for clustering satellite image data based on local feature selection according to claim 1, wherein the non-parametric VM hybrid model is constructed by using a Pitman-Yor process model based on a Stick-Breaking representation method; in a Pitman-Yor process model based on a Stick-Breaking representation method, a mixing coefficient pi k Is represented as follows:
Wherein p is b (. A) is Beta distribution, a is a discount parameter in the Pitman-Yor process model and satisfies the condition that a is more than or equal to 0 and less than or equal to 1, b is a density parameter satisfying the condition b>-a。
3. The satellite image data clustering method based on local feature selection according to claim 2, wherein the estimating of the model parameters of the non-parametric VM mixture model and the calculating of feature importance by a variational bayesian inference algorithm specifically comprises:
initializing model parameters; the method comprises the following steps of initializing truncation layer number K =15; initializing the hyperparameter u kd =0.1,u′ kd =0.1,v kd =0.01,v′ kd =0.01,β kd >0,β′ kd >0,a k =0.5,b k =0.5 initialization r using the K-Means algorithm nk (ii) a Initialization
Updating the variation posterior, the expected value and the feature importance by using the current model parameters;
Obtaining a variation lower bound generated by the current iteration;
and comparing the variation lower bound generated by the current iteration with the variation lower bound generated by the last iteration to judge whether the nonparametric VM mixed model converges.
4. The method for clustering satellite image data based on local feature selection according to claim 3, wherein updating the variation posteriori, the expected value and the feature importance using the current model parameters specifically comprises:
defining the lower bound of variation as:
L(q)=<lnp(Θ|X)>-<lnq(Θ)>
wherein, the first and the second end of the pipe are connected with each other,<·>in order to calculate the expected value of the quantity,a set of all random variables and hidden variables; q (Θ) is an approximate distribution of the real posterior distribution p (Θ | X), namely a variational posterior; the expression of the variation posteriori q (Θ) is as follows:
truncating the hybrid component from an infinite dimensional space to a K dimensional space using a truncation technique:
Wherein K is the number of truncation layers, namely the number of categories; initializing the K value to be an arbitrary value, and reaching an optimal value during convergence;
all variational posteriors were optimized by maximizing the lower bound of variational L (q):
the hyperparameter in the formula is calculated by the following formula:
the expected value in the above is calculated by the following formula
<Z nk >=r nk (21)
<φ nkd >=f nkd (24)
<1-φ nkd >=1-f nkd (25)
<lnπ′ k >=Ψ(g k )-Ψ(g k +h k ) (28)
<ln(1-π′ k )>=Ψ(h k )-Ψ(g k +h k ) (29)
Wherein Ψ (·) is a Digamma function;
calculating the feature importance:
5. the method for clustering satellite image data based on local feature selection according to claim 3,
comparing the lower bound of variation generated by the current iteration with the lower bound of variation generated by the last iteration to judge whether the nonparametric VM mixed model converges specifically:
whether the difference between the lower variation bound generated by the current iteration and the lower variation bound generated by the last iteration is smaller than a preset threshold value or not; the preset threshold value is 0.0001
If yes, judging that the nonparametric VM mixed model converges;
if not, judging that the nonparametric VM mixed model is not converged.
6. The method for clustering satellite image data based on local feature selection according to claim 4,
the specific steps of screening the satellite image data according to the feature importance degree to retain the important satellite image data are as follows:
judging the feature importance degree feature to screen satellite image data, wherein the feature importance degree is lower than a threshold value and is regarded as irrelevant features to be eliminated, and the irrelevant features are eliminated; the feature importance degree is larger than or equal to the threshold value, and then the relevant features needing to be reserved are considered.
7. The method for clustering satellite image data based on local feature selection according to claim 1, wherein the category to which each satellite image data belongs is determined according to the posterior probability of the indicator, so as to cluster the satellite image data according to the category, specifically:
obtaining the posterior probability r of the indicative factor nk ,r nk Representing the nth satellite image dataProbability of belonging to the kth class;
8. A satellite image data clustering device based on local feature selection is characterized by comprising:
a data set acquisition unit for acquiring satellite image data set to be processedThe satellite image data set comprises N pieces of satellite image data, and each piece of satellite image data is normalized into a D-dimensional data characteristic vector by an L2 norm:| L | · | is the calculation of the L2 norm;
the modeling unit is used for modeling the satellite image data by using a non-parametric VM (virtual machine) mixed model selected based on local features; wherein, be used for specifically: for obedience VM probability distribution p vm Characterization of satellite image data, D-dimensional data thereofIs expressed as:
wherein the content of the first and second substances,y nd1 =x nd ,y nd2 in the formula, to ensure the vectorThe L2 norm normalization is satisfied,is a parameter of the location of the mobile terminal,is a scale parameter and satisfies a condition lambda d ≥0,I 0 (λ) is a modified first class 0 order Bessel function;
for each piece of D-dimensional satellite image data complying with non-parametric VM hybrid modelObtaining the probability density function expression:
wherein, the nonparametric VM hybrid model is composed of an infinite number of hybrid components, each of which corresponds to the product of D VM probability distributions:wherein each feature corresponds to a VM probability distribution;is the VM distribution parameter of the d-th feature in the k-th hybrid component, and pi k >0 is the corresponding mixing coefficient and satisfies the condition
For each satellite image dataSpecifying a binary hidden variableAs an indicator factor: when Z is nk =1 hour, indicating satellite image dataBelong to the kth category; otherwise, Z nk =0; hidden variableHas a probability distribution of
Fusing a local feature selection technology and a non-parameter VM mixed model to obtain the feature x of each satellite image data nd Obeyed probability distribution:
wherein the parameter phi nkd Is a binary parameter when phi nkd =1, represents the characteristic x nd Are related features and obey VM probability distributionWhen phi is nkd When =0, it represents the characteristic x nd Are uncorrelated features and obey VM probability distributionsParameter(s)Obeying the Bernoulli distribution:
wherein the parameter ε kd Representing the feature importance of the d-th feature in the k-th component;
adopting VM-Gamma distribution as parameter of VM distribution to which related characteristics belongJoint prior distribution of (c):
adopting VM-Gamma distribution as parameter of VM distribution to which irrelevant feature belongsJoint prior distribution of (c):
acquiring a full probability expression of a non-parametric VM (virtual machine) mixed model selected based on local features:
the parameter estimation unit estimates model parameters of the nonparametric VM mixed model through a variational Bayes inference algorithm and calculates feature importance;
a convergence judging unit for judging whether the nonparametric VM mixed model converges according to the estimated model parameters; if not, the parameter estimation unit is informed, and if yes, the screening unit is informed;
the screening unit is used for screening the satellite image data according to the importance of the characteristics so as to reserve the important satellite image data;
and the clustering unit is used for judging the category of each satellite image data according to the posterior probability of the indicator factor so as to cluster the satellite image data according to the category.
9. A text data clustering device based on a non-parametric VMF hybrid model, comprising a memory and a processor, wherein the memory stores a satellite image data set to be clustered and a computer program, the computer program is executable by the processor to implement the satellite image data clustering method based on local feature selection according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010504460.3A CN111612102B (en) | 2020-06-05 | 2020-06-05 | Satellite image data clustering method, device and equipment based on local feature selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010504460.3A CN111612102B (en) | 2020-06-05 | 2020-06-05 | Satellite image data clustering method, device and equipment based on local feature selection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111612102A CN111612102A (en) | 2020-09-01 |
CN111612102B true CN111612102B (en) | 2023-02-07 |
Family
ID=72204119
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010504460.3A Active CN111612102B (en) | 2020-06-05 | 2020-06-05 | Satellite image data clustering method, device and equipment based on local feature selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111612102B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102360435A (en) * | 2011-10-26 | 2012-02-22 | 西安电子科技大学 | Undesirable image detecting method based on connotative theme analysis |
WO2012128207A1 (en) * | 2011-03-18 | 2012-09-27 | 日本電気株式会社 | Multivariate data mixture model estimation device, mixture model estimation method, and mixture model estimation program |
CN103226595A (en) * | 2013-04-17 | 2013-07-31 | 南京邮电大学 | Clustering method for high dimensional data based on Bayes mixed common factor analyzer |
-
2020
- 2020-06-05 CN CN202010504460.3A patent/CN111612102B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012128207A1 (en) * | 2011-03-18 | 2012-09-27 | 日本電気株式会社 | Multivariate data mixture model estimation device, mixture model estimation method, and mixture model estimation program |
CN102360435A (en) * | 2011-10-26 | 2012-02-22 | 西安电子科技大学 | Undesirable image detecting method based on connotative theme analysis |
CN103226595A (en) * | 2013-04-17 | 2013-07-31 | 南京邮电大学 | Clustering method for high dimensional data based on Bayes mixed common factor analyzer |
Non-Patent Citations (1)
Title |
---|
贝塔混合模型的变分贝叶斯学习及应用;赖裕平等;《电子学报》;20180715(第07期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111612102A (en) | 2020-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11966670B2 (en) | Method and system for predicting wildfire hazard and spread at multiple time scales | |
Verbeeck et al. | The SPoCA-suite: Software for extraction, characterization, and tracking of active regions and coronal holes on EUV images | |
CA3104652C (en) | Detection and replacement of transient obstructions from high elevation digital images | |
Poggio et al. | Bayesian spatial modelling of soil properties and their uncertainty: The example of soil organic matter in Scotland using R-INLA | |
EP3853807A1 (en) | Generation of synthetic high-elevation digital images from temporal sequences of high-elevation digital images | |
Mari et al. | Temporal and spatial data mining with second-order hidden markov models | |
Yoon et al. | Surface and normal ensembles for surface reconstruction | |
US11720727B2 (en) | Method and system for increasing the resolution of physical gridded data | |
Cai et al. | A comparison of object-based and contextual pixel-based classifications using high and medium spatial resolution images | |
Torizin | Elimination of informational redundancy in the weight of evidence method: an application to landslide susceptibility assessment | |
Li et al. | Dept: depth estimation by parameter transfer for single still images | |
Sainju et al. | A hidden markov contour tree model for spatial structured prediction | |
Tao et al. | A study of a Gaussian mixture model for urban land-cover mapping based on VHR remote sensing imagery | |
Nar et al. | Sparsity-driven change detection in multitemporal SAR images | |
Xiao et al. | A point selection method in map generalization using graph convolutional network model | |
Nguyen et al. | Gaussian anamorphosis for ensemble kalman filter analysis of SAR-derived wet surface ratio observations | |
CN111612102B (en) | Satellite image data clustering method, device and equipment based on local feature selection | |
Chawla | Possibilistic c-means-spatial contextual information based sub-pixel classification approach for multi-spectral data | |
Renard et al. | Change of support: an inter-disciplinary challenge | |
CN116258877A (en) | Land utilization scene similarity change detection method, device, medium and equipment | |
Santibanez et al. | Performance analysis of machine learning algorithms for regression of spatial variables. A case study in the real estate industry | |
Venkatesh et al. | An automated geoprocessing model for accuracy assessment in various interpolation methods for groundwater quality | |
Tar et al. | Automated quantitative measurements and associated error covariances for planetary image analysis | |
CN115079309A (en) | Method, device, equipment and medium for constructing prediction model of multi-type sea fog | |
CN111428741B (en) | Network community discovery method and device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |