CN111612102B

CN111612102B - Satellite image data clustering method, device and equipment based on local feature selection

Info

Publication number: CN111612102B
Application number: CN202010504460.3A
Authority: CN
Inventors: 范文涛; 侯文娟
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2023-02-07
Anticipated expiration: 2040-06-05
Also published as: CN111612102A

Abstract

The invention discloses a satellite image data clustering method, a device and equipment based on local feature selection, wherein the method comprises the following steps: s101, acquiring a satellite image data set to be processed

S102, modeling satellite image data by using a non-parametric VM (virtual machine) mixed model selected based on local features; s103, estimating model parameters of the nonparametric VM mixed model through a variational Bayes inference algorithm and calculating feature importance; s104, judging whether the nonparametric VM mixed model converges or not according to the estimated model parameters; if not, returning to the step S103, if so, executing the step S105; s105, screening the satellite image data according to the importance of the features to reserve important satellite image data; and S106, judging the category of each satellite image data according to the posterior probability of the indicator factor, and clustering the satellite image data according to the category. The embodiment can obtain better clustering result when processing unbalanced data.

Description

Satellite image data clustering method, device and equipment based on local feature selection

Technical Field

The invention relates to the field of data processing, in particular to a satellite image data clustering method, a satellite image data clustering device and satellite image data clustering equipment based on local feature selection.

Background

Land satellites are commonly used to investigate underground mines, marine resources, and groundwater resources, study the growth and topography of natural plants, investigate and forecast various serious natural disasters (such as earthquakes) and environmental pollution, and take images of various targets to draw thematic maps (such as geological maps, topographical maps, and hydrological maps). With the advent of an era featuring integrated remote sensing methods, it would be of great importance to interpret a scene by integrating multiple types and resolutions of spatial data (including multispectral and radar data, displayed terrain, land use maps, etc.).

In the prior art, w.fan et al propose a clustering method of a VM hybrid model based on feature selection and Dirichlet Process (DP). The method adopts a variational Bayesian inference method to estimate model parameters and is applied to clustering analysis of text and plant image data. It has the following disadvantages:

clustering analysis of unbalanced data is not efficient because DP mixture models typically cannot identify classes that contain only a small number of data samples.

Disclosure of Invention

In view of the above, the present invention provides a text data clustering method, device and apparatus based on a non-parametric VMF hybrid model, which can obtain a better clustering result when processing unbalanced data.

The embodiment of the invention provides a satellite image data clustering method based on local feature selection, which comprises the following steps:

s101, acquiring a satellite image data set to be processed

The satellite image data set comprises N pieces of satellite image data, and each piece of satellite image data is normalized into a D-dimensional data characteristic vector by an L2 norm:

| L | · | is the calculation of the L2 norm;

s102, modeling the satellite image data by using a non-parametric VM (virtual machine) mixed model selected based on local features;

s103, estimating model parameters of the nonparametric VM mixed model through a variational Bayes inference algorithm and calculating feature importance;

s104, judging whether the nonparametric VM mixed model converges or not according to the estimated model parameters; if not, returning to the step S103, if so, executing the step S105;

s105, screening the satellite image data according to the importance of the features to reserve important satellite image data;

and S106, judging the category of each satellite image data according to the posterior probability of the indicator factor, and clustering the satellite image data according to the category.

Preferably, the modeling the satellite image data by using the non-parametric VM hybrid model selected based on the local features specifically includes:

pair of clothesFrom VM probability distribution p _vm Characterization of satellite image data, D-dimensional data thereof

Is expressed as:

wherein the content of the first and second substances,

y _nd1 ＝x _nd ，y _nd2 in the formula, the order is to ensure the vector

The L2 norm normalization is satisfied,

as the location parameter, the location parameter is,

is a scale parameter and satisfies a condition lambda _d ≥0，I ₀ (λ) is a modified first class 0 order Bessel function;

for each piece of D-dimensional satellite image data complying with non-parametric VM hybrid model

Obtaining the probability density function expression:

the nonparametric VM hybrid model is composed of infinite mixing components, and each mixing component corresponds to the product of D VM probability distributions:

wherein each feature corresponds to a VM probability distribution;

is the VM distribution parameter of the d-th feature in the k-th hybrid component, and pi _k >0 is the corresponding mixing coefficient and satisfies the condition

For each satellite image data

Specifying a binary hidden variable

As an indicator factor: when Z is _nk =1 hour, indicating satellite image data

Belong to the kth category; otherwise, Z _nk =0; hidden variable

Is distributed in probability of

Fusing a local feature selection technology and a non-parameter VM mixed model to obtain the feature x of each satellite image data _nd Obeyed probability distribution:

wherein the parameter phi _nkd Is a binary parameter when phi _nkd When =1, the characteristic x is represented _nd Are related features and obey VM probability distribution

When phi is _nkd When =0, it represents the characteristic x _nd Are uncorrelated features and obey VM probability distributions

Parameter(s)

Obeying the Bernoulli distribution:

wherein the parameter ε _kd Representing feature importance of a d-th feature in the k-th component;

adopting VM-Gamma distribution as parameter of VM distribution to which related characteristics belong

Joint prior distribution of (c):

wherein

p _g (. Cndot.) is a Gamma distribution;

using VM-Gamma distribution as parameter of VM distribution to which irrelevant feature belongs

Joint prior distribution of (c):

acquiring a full probability expression of a non-parametric VM (virtual machine) mixed model selected based on local features:

preferably, the nonparametric VM mixed model is constructed by a Pitman-Yor process model based on a Stick-Breaking representation method; in a Pitman-Yor process model based on a Stick-Breaking representation method, a mixing coefficient pi _k Is represented as follows:

obeying Beta distribution, the expression form is as follows

Wherein p is _b (. Alpha.) is Beta distribution, a is a discount parameter in the Pitman-Yor process model and satisfies the condition 0-1, b is that the density parameter satisfies the condition b>-a。

Preferably, the estimating the model parameters of the non-parametric VM mixed model and calculating the feature importance by using a variational bayes inference algorithm specifically includes:

initializing model parameters; the method comprises the following steps of initializing truncation layer number K =15; initializing the hyperparameter u _kd ＝0.1，u′ _kd ＝0.1，v _kd ＝0.01,v′ _kd ＝0.01，β _kd >0,β′ _kd >0，

a _k ＝0.5， b _k =0.5 initialization r using the K-Means algorithm _nk (ii) a Initialization

Updating the variation posterior, the expected value and the feature importance by using the current model parameters;

obtaining updated values from updated expected values

Obtaining a variation lower bound generated by the current iteration;

and comparing the variation lower bound generated by the current iteration with the variation lower bound generated by the last iteration to judge whether the nonparametric VM mixed model converges.

Preferably, the updating of the variation posterior, the expected value and the feature importance by using the current model parameters specifically includes:

defining the lower bound of variation as:

L(q)＝<lnp(Θ|X)>-<lnq(Θ)>

wherein the content of the first and second substances,<·>in order to calculate the expected value of the quantity,

a set of all random variables and hidden variables; q (Θ) is an approximate distribution of the real posterior distribution p (Θ | X), namely a variational posterior; the expression of the variation posteriori q (Θ) is as follows:

truncating the hybrid component from an infinite dimensional space to a K dimensional space using a truncation technique:

π′ _K ＝1，

when k is>Pi at K _k ＝0

Wherein K is the number of truncation layers, namely the number of categories; initializing the K value to an arbitrary value, and reaching an optimal value when converging;

all variational posteriors were optimized by maximizing the lower bound of variational L (q):

the hyperparameter in the formula is calculated by the following formula:

the expected value in the above is calculated by the following formula

<Z _nk >＝r _nk (21)

<φ _nkd >＝f _nkd (24)

<1-φ _nkd >＝1-f _nkd (25)

<lnπ′ _k >＝Ψ(g _k )-Ψ(g _k +h _k ) (28)

<ln(1-π′ _k )>＝Ψ(h _k )-Ψ(g _k +h _k ) (29)

Wherein Ψ (. Cndot.) is a Digamma function;

calculating feature importance

Preferably, comparing the lower bound of variation generated by the current iteration with the lower bound of variation generated by the last iteration to determine whether the non-parametric VM hybrid model converges specifically is:

whether the difference between the lower variation bound generated by the current iteration and the lower variation bound generated by the last iteration is smaller than a preset threshold value or not; the preset threshold value is 0.0001

If yes, determining that the nonparametric VM mixed model converges;

and if not, judging that the nonparametric VM mixed model does not converge.

Preferably, the screening of the satellite image data according to the feature importance to retain important satellite image data specifically includes:

judging the feature importance degree feature to screen satellite image data, wherein the feature importance degree is lower than a threshold value and is regarded as irrelevant features to be eliminated, and the irrelevant features are eliminated; the feature importance degree is larger than or equal to the threshold value, and then the relevant features needing to be reserved are considered.

Preferably, the category to which each satellite image data belongs is determined according to the posterior probability of the indicator, so that the satellite image data is clustered according to the category to which the satellite image data belongs, specifically:

obtaining the posterior probability r of the indicative factor _nk ，r _nk Representing the nth satellite image data

Probability of belonging to the kth class;

selecting the category with the highest probability as satellite image data

The category (2).

The embodiment of the invention also provides a satellite image data clustering device based on local feature selection, which comprises:

a data set acquisition unit for acquiring satellite image data set to be processed

| L | · | is the calculation of the L2 norm;

the modeling unit is used for modeling the satellite image data by using a non-parameter VM (virtual machine) mixed model selected based on local features;

the parameter estimation unit is used for estimating model parameters of the nonparametric VM mixed model through a variational Bayesian inference algorithm and calculating feature importance;

a convergence judging unit for judging whether the nonparametric VM mixed model converges according to the estimated model parameters; if not, the parameter estimation unit is informed, and if yes, the screening unit is informed;

the screening unit is used for screening the satellite image data according to the importance of the characteristics so as to reserve the important satellite image data;

and the clustering unit is used for judging the category of each satellite image data according to the posterior probability of the indicator factor so as to cluster the satellite image data according to the category.

The embodiment of the invention also provides text data clustering equipment based on a nonparametric VMF mixed model, which comprises a memory and a processor, wherein a satellite image data set to be clustered and a computer program are stored in the memory, and the computer program can be executed by the processor so as to realize the satellite image data clustering method based on local feature selection.

In the embodiment, the non-parametric hybrid model based on Von Mises (VM) probability distribution is constructed by adopting a non-parametric model framework based on a Pitman-Yor Process (Pitman-Yor Process), and a local feature selection (localized feature selection) method and the non-parametric VM hybrid model are fused in the same model framework (PYP-VM for short), so that the terrestrial satellite data are subjected to cluster analysis. In this embodiment, each piece of satellite influence data is normalized by the L2 norm and then modeled using a VM mixture model selected based on local features. In order to be able to flexibly adjust the number of data categories according to the size of data, the present embodiment uses a nonparametric model framework named Pitman-Yor process to construct a nonparametric mixed model based on VM distribution, and the parameters of the proposed nonparametric VM mixed model based on local feature selection are estimated by a Variational Bayes Inference (Variational Bayes Inference) algorithm. Compared with the prior art, the method has the advantages that the discount parameters which can be used for controlling the generation of the new category number are provided, so that the method is more advantageous than a method based on a DP mixed model when unbalanced data is processed, and a better clustering result can be obtained.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a satellite image data clustering method based on local feature selection according to a first embodiment of the present invention.

Fig. 2 is another schematic flow chart of a satellite image data clustering method based on local feature selection according to a first embodiment of the present invention.

Fig. 3 is a schematic diagram of program modules of a satellite image data clustering method and device based on local feature selection according to a second embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, an embodiment of the present invention provides a satellite image data clustering method based on local feature selection, which can be performed by a satellite image data clustering device (hereinafter referred to as a clustering device) based on local feature selection, including:

s101, acquiring a satellite image data set to be processed

The satellite image data set comprises N pieces of satellite image data, and each piece of satellite image data is normalized to D by using L2 normDimensional data feature vector:

| | | | is the calculation of the L2 norm.

In this embodiment, the clustering device may be a computer device with a data processing function, such as a laptop, a desktop, or a server, and the computer device may implement the satellite image data clustering method based on local feature selection by executing a predetermined program.

And S102, modeling the satellite image data by using a non-parametric VM (virtual machine) mixed model selected based on the local features.

In this embodiment, step S102 specifically includes:

s1021, corresponding to the obedience VM probability distribution p _vm Characterization of satellite image data, D-dimensional data thereof

Is expressed as:

wherein the content of the first and second substances,

y _nd1 ＝x _nd ，y _nd2 in the formula, to ensure the vector

The L2 norm normalization is satisfied,

is a parameter of the location of the mobile terminal,

s1022, for each obedienceD-dimensional satellite image data of nonparametric VM (virtual machine) hybrid model

Obtaining the probability density function expression:

wherein, the nonparametric VM hybrid model is composed of an infinite number of hybrid components, each of which corresponds to the product of D VM probability distributions:

wherein each feature corresponds to a VM probability distribution;

The nonparametric VM mixed model is constructed by a Pitman-Yor process model based on a Stick-Breaking representation method; in a Pitman-Yor process model based on a Stick-Breaking representation method, a mixing coefficient pi _k Is represented as follows:

obey Beta distribution, the expression form is as follows

Wherein p is _b (. A) is Beta distribution, and a is a Pitman-Yor process modelThe condition 0-1 is satisfied for the discount parameter in (1), and the density parameter satisfies the condition b>-a。

S1023, for each satellite image data

Specifying a binary hidden variable

As an indicator factor: when Z is _nk =1 hour, indicating satellite image data

Belong to the kth category; otherwise, Z _nk =0; hidden variable

Is distributed in probability of

S1024, fusing the local feature selection technology and the non-parameter VM mixed model to obtain the feature x of each satellite image data _nd Obeyed probability distribution:

in order to process high-dimensional data more effectively, the embodiment fuses a local feature selection technology and the proposed nonparametric VM hybrid model in the same model frame, so that irrelevant features can be automatically removed in the clustering analysis process to improve the clustering performance. Here, the parameter φ _nkd Is a binary parameter when phi _nkd =1, represents the characteristic x _nd Are related features and obey VM probability distributions

Parameter(s)

Obeying the Bernoulli distribution:

parameter epsilon _kd Representing the feature importance of the d-th feature in the k-th component;

s1025, adopting the VM-Gamma distribution as the parameter of the VM distribution to which the relevant characteristics belong

Joint prior distribution of (c):

wherein

p _g (. Cndot.) is a Gamma distribution;

s1026, adopting VM-Gamma distribution as the parameter of the VM distribution to which the irrelevant characteristics belong

Joint prior distribution of (c):

s1027, acquiring a full probability expression of the non-parametric VM mixed model selected based on local features:

s104, judging whether the nonparametric VM mixed model converges or not according to the estimated model parameters; if not, the process returns to step S103, and if so, step S105 is executed.

Specifically, the method comprises the following steps:

firstly, initializing model parameters; the method comprises the following steps of initializing truncation layer number K =15; initializing the hyperparameter u _kd ＝0.1，u′ _kd ＝0.1，v _kd ＝0.01,v′ _kd ＝0.01，β _kd >0,β′ _kd >0，

The variation posteriori, the expectation value, and the feature importance are then updated with the current model parameters.

Wherein, the lower bound of the defined variation is as follows:

L(q)＝<ln p(Θ|X)>-<ln q(Θ)>

here, the first and second liquid crystal display panels are,<·>in order to calculate the expected value of the quantity,

π′ _K ＝1，

when k is>Pi at K _k ＝0

the hyperparameter in the formula is calculated by the following formula:

the expected value in the above is calculated by the following formula

<Z _nk >＝r _nk (21)

<φ _nkd >＝f _nkd (24)

<1-φ _nkd >＝1-f _nkd (25)

<lnπ′ _k >＝Ψ(g _k )-Ψ(g _k +h _k ) (28)

<ln(1-π′ _k )>＝Ψ(h _k )-Ψ(g _k +h _k ) (29)

Where Ψ (·) is a Digamma function.

Calculating feature importance

Then, the updated expected value is obtained

Then, obtaining a variation lower bound generated by the current iteration;

and finally, comparing the variation lower bound generated by the current iteration with the variation lower bound generated by the last iteration to judge whether the nonparametric VM mixed model converges.

Specifically, it may be determined whether a difference between a lower variation bound generated by a current iteration and a lower variation bound generated by a previous iteration is smaller than a preset threshold; if yes, judging that the nonparametric VM mixed model is converged, and the number of truncation layers reaches an optimal value; if not, the non-parameter VM mixed model is judged not to be converged, and the next iteration is needed.

In a preferred embodiment of the present invention, the preset threshold may be 0.0001, but it should be noted that other values may be used, and the smaller the preset threshold is, the higher the iteration precision is, and the present invention is not limited specifically.

And S105, screening the satellite image data according to the importance of the features to reserve the important satellite image data.

The satellite image data can be screened by judging the feature importance degree feature, for example, the feature importance degree is lower than a threshold (such as 0.5) and is regarded as an irrelevant feature needing to be eliminated; the feature importance degree is greater than or equal to a threshold value (such as 0.5), and then the relevant features which need to be reserved are considered.

It should be noted that the threshold may be selected according to actual needs, and is not limited to 0.5.

In this embodiment, after the model converges, the posterior probability r of the indicative factor in the converged model parameters is obtained _nk Posterior probability r of the indicator _nk Representing the nth satellite image data

Probability of belonging to the kth class, in this case according to r _nk Selecting the category with the highest probability as satellite image data

And then according to the category of the satellite image data, clustering of different satellite image data in the data set can be realized.

In order to facilitate understanding of the present invention, the following description will be given of an application of the present embodiment as a practical example.

Wherein in this example, cluster verification will be performed on the published data set (Statlog data set). A terrestrial satellite MSS image is composed of four digital images of the same scene in different spectral bands. Two of which are in the visible region (corresponding approximately to the green and red regions of the visible spectrum) and two of which are in the (near) infrared. The spatial resolution of the pixels is approximately 80m x 80m. Each image contains 2340x 3380 pixels. The Statlog dataset is a sub-region of the original terrestrial satellite MSS imagery (from NASA) consisting of 82x100 pixels. The data set has a total of 6435 pieces of data, each piece of data corresponding to a 3x3 square neighborhood of pixels contained entirely within the 82x100 sub-region. Each row contains pixel values in four spectral bands for each of the 9 pixels in the 3x3 neighborhood. The experimental objective was to perform cluster analysis on the data set to identify which of the following 6 surface types or land use cases the data belongs to: red soil, cotton crops, grey soil, moist grey soil, soil with plant stubbles, very moist grey soil.

In this embodiment, the Windows10 system is used as an experimental platform, matlab is used as a programming language, and the parameter setting is described in the above embodiments. Although each piece of data in the Statlog dataset contains a corresponding class label, the present embodiment does not use these class labels in the clustering process because the clustering analysis belongs to an unsupervised learning method. After the clustering is completed, however, the accuracy of grouping can be calculated by referring to the clustering label as an evaluation index of clustering performance.

In addition, the embodiment also performs experimental comparison with the clustering method provided by the prior art and the classic K-means clustering algorithm. In the prior art, a mixed model based on a DP process and a VM (virtual machine) is referred to as DP-VM for short. Each method was repeated 10 times and the average accuracy was taken as a comparison index. The results of the experiment are shown in table 1. Compared with the prior art i and the K-means clustering method, the satellite image data clustering method provided by the embodiment can obtain a better clustering result, namely a higher accuracy.

TABLE 1

In summary, in the satellite image data clustering method based on local feature selection provided by this embodiment, a nonparametric mixture model based on Von Mises (VM) probability distribution is constructed by using a nonparametric model frame based on a Pitman-Yor Process (Pitman-Yor Process), and a local feature selection (localized feature selection) method and the nonparametric VM mixture model are fused in the same model frame (referred to as PYP-VM for short), so as to perform cluster analysis on terrestrial satellite data. In the present embodiment, each piece of satellite influence data is normalized by the L2 norm before being modeled by a VM mixture model selected based on local features. In order to be able to flexibly and automatically adjust the number of data categories according to the size of data, the embodiment uses a nonparametric model framework named as Pitman-Yor process to construct a nonparametric mixed model based on VM distribution, and the parameters of the proposed nonparametric VM mixed model based on local feature selection are estimated by a Variational Bayes Inference (Variational Bayes Inference) algorithm. Compared with the prior art, the method has the advantages that the discount parameters which can be used for controlling the generation of the new category number are provided, so that the method is more advantageous than a method based on a DP mixed model when unbalanced data is processed, and a better clustering result can be obtained.

The second embodiment of the present invention further provides a satellite image data clustering device based on local feature selection, including:

a data set acquisition unit 210 for acquiring a satellite image data set to be processed

| L | · | is the calculation of the L2 norm;

a modeling unit 220 for modeling the satellite image data using a non-parametric VM hybrid model selected based on the local features;

a parameter estimation unit 230 that estimates model parameters of the nonparametric VM hybrid model through a variational bayesian inference algorithm and calculates feature importance;

a convergence judging unit 240 that judges whether the nonparametric VM mixed model converges according to the estimated model parameter; if not, the parameter estimation unit 230 is notified, and if yes, the screening unit 250 is notified;

a screening unit 250, configured to screen the satellite image data according to the importance of the features to retain important satellite image data;

and the clustering unit 260 is used for judging the category of each satellite image data according to the posterior probability of the indicator factor, so as to cluster the satellite image data according to the category.

The third embodiment of the present invention further provides a text data clustering device based on a nonparametric VMF mixed model, which includes a memory and a processor, wherein the memory stores a satellite image data set to be clustered and a computer program, and the computer program can be executed by the processor, so as to implement the above satellite image data clustering method based on local feature selection.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A satellite image data clustering method based on local feature selection is characterized by comprising the following steps:

s101, acquiring a satellite image data set to be processed

| L | · | is the calculation of the L2 norm;

s106, judging the category of each satellite image data according to the posterior probability of the indicator factor, and clustering the satellite image data according to the category; the modeling of the satellite image data by using the non-parametric VM hybrid model selected based on the local features specifically includes:

for compliance with VM probability distribution p _vm Characterization of satellite image data, D-dimensional data thereof

Is expressed as:

wherein the content of the first and second substances,

y _nd1 ＝x _nd ，y _nd2 in the formula, the order is to ensure the vector

The L2 norm normalization is satisfied,

as the location parameter, the location parameter is,

Obtaining the probability density function expression:

wherein each feature corresponds to a VM probability distribution;

For each satellite image data

Specifying a binary hidden variable

As an indicator factor: when Z is _nk When =1, indicating satellite image data

Belong to the kth category; otherwise, Z _nk =0; hidden variable

Has a probability distribution of

wherein the parameter phi _nkd Is a binary parameter when phi _nkd =1, represents the characteristic x _nd Are related features and obey VM probability distributions

When phi is _nkd =0, represents the characteristic x _nd Being uncorrelated features and subject to VM probability distributions

Parameter(s)

Obeying the Bernoulli distribution:

wherein the parameter ε _kd Representing the feature importance of the d-th feature in the k-th component;

Joint prior distribution of (c):

wherein

p _g (. Cndot.) is a Gamma distribution;

Joint prior distribution of (c):

2. the method for clustering satellite image data based on local feature selection according to claim 1, wherein the non-parametric VM hybrid model is constructed by using a Pitman-Yor process model based on a Stick-Breaking representation method; in a Pitman-Yor process model based on a Stick-Breaking representation method, a mixing coefficient pi _k Is represented as follows:

obey Beta distribution, the expression form is as follows

Wherein p is _b (. A) is Beta distribution, a is a discount parameter in the Pitman-Yor process model and satisfies the condition that a is more than or equal to 0 and less than or equal to 1, b is a density parameter satisfying the condition b>-a。

3. The satellite image data clustering method based on local feature selection according to claim 2, wherein the estimating of the model parameters of the non-parametric VM mixture model and the calculating of feature importance by a variational bayesian inference algorithm specifically comprises:

a _k ＝0.5，b _k =0.5 initialization r using the K-Means algorithm _nk (ii) a Initialization

obtaining updated values from updated expected values

Obtaining a variation lower bound generated by the current iteration;

4. The method for clustering satellite image data based on local feature selection according to claim 3, wherein updating the variation posteriori, the expected value and the feature importance using the current model parameters specifically comprises:

defining the lower bound of variation as:

L(q)＝<lnp(Θ|X)>-<lnq(Θ)>

wherein, the first and the second end of the pipe are connected with each other,<·>in order to calculate the expected value of the quantity,

π′ _K ＝1，

when k is>Pi at K _k ＝0

Wherein K is the number of truncation layers, namely the number of categories; initializing the K value to be an arbitrary value, and reaching an optimal value during convergence;

the hyperparameter in the formula is calculated by the following formula:

the expected value in the above is calculated by the following formula

<Z _nk >＝r _nk (21)

<φ _nkd >＝f _nkd (24)

<1-φ _nkd >＝1-f _nkd (25)

<lnπ′ _k >＝Ψ(g _k )-Ψ(g _k +h _k ) (28)

<ln(1-π′ _k )>＝Ψ(h _k )-Ψ(g _k +h _k ) (29)

Wherein Ψ (·) is a Digamma function;

calculating the feature importance:

5. the method for clustering satellite image data based on local feature selection according to claim 3,

comparing the lower bound of variation generated by the current iteration with the lower bound of variation generated by the last iteration to judge whether the nonparametric VM mixed model converges specifically:

If yes, judging that the nonparametric VM mixed model converges;

if not, judging that the nonparametric VM mixed model is not converged.

6. The method for clustering satellite image data based on local feature selection according to claim 4,

the specific steps of screening the satellite image data according to the feature importance degree to retain the important satellite image data are as follows:

7. The method for clustering satellite image data based on local feature selection according to claim 1, wherein the category to which each satellite image data belongs is determined according to the posterior probability of the indicator, so as to cluster the satellite image data according to the category, specifically:

Probability of belonging to the kth class;

selecting the category with the maximum probability as satellite image data

The category (2).

8. A satellite image data clustering device based on local feature selection is characterized by comprising:

| L | · | is the calculation of the L2 norm;

the modeling unit is used for modeling the satellite image data by using a non-parametric VM (virtual machine) mixed model selected based on local features; wherein, be used for specifically: for obedience VM probability distribution p _vm Characterization of satellite image data, D-dimensional data thereof

Is expressed as:

wherein the content of the first and second substances,

y _nd1 ＝x _nd ，y _nd2 in the formula, to ensure the vector

The L2 norm normalization is satisfied,

is a parameter of the location of the mobile terminal,

Obtaining the probability density function expression:

wherein each feature corresponds to a VM probability distribution;

For each satellite image data

Specifying a binary hidden variable

As an indicator factor: when Z is _nk =1 hour, indicating satellite image data

Belong to the kth category; otherwise, Z _nk =0; hidden variable

Has a probability distribution of

wherein the parameter phi _nkd Is a binary parameter when phi _nkd =1, represents the characteristic x _nd Are related features and obey VM probability distribution

Parameter(s)

Obeying the Bernoulli distribution:

Joint prior distribution of (c):

wherein

p _g (. Cndot.) is a Gamma distribution;

adopting VM-Gamma distribution as parameter of VM distribution to which irrelevant feature belongs

Joint prior distribution of (c):

the parameter estimation unit estimates model parameters of the nonparametric VM mixed model through a variational Bayes inference algorithm and calculates feature importance;

9. A text data clustering device based on a non-parametric VMF hybrid model, comprising a memory and a processor, wherein the memory stores a satellite image data set to be clustered and a computer program, the computer program is executable by the processor to implement the satellite image data clustering method based on local feature selection according to any one of claims 1 to 7.