CN111369002A

CN111369002A - Gibbs parameter sampling method applied to random point mode finite hybrid model

Info

Publication number: CN111369002A
Application number: CN202010105441.3A
Authority: CN
Inventors: 刘伟峰; 王志; 黄梓龙; 丁禹心
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2020-07-03

Abstract

The invention relates to a Gibbs parameter sampling method applied to a random point mode finite hybrid model. Firstly, constructing a random point mode finite mixture model and a random point mode likelihood function, then constructing parameter prior distribution of the random point mode finite mixture model, and obtaining posterior distribution of model parameters according to the model parameter prior distribution; and finally, estimating the number of distribution elements and the model parameter value in the mixed distribution by adopting a sampling algorithm combining a Gibbs sampling algorithm and a Bayesian information criterion. Compared with the traditional FMM, the method only describes the characteristic randomness of the data, and the random point mode distribution function also describes the cardinality randomness of the data; a Gibbs sampling algorithm is adopted to sample data to obtain model parameters on the basis of RPP-FMM, and the situation that parameter estimation may always fall into a local extreme point and a global extreme point cannot be obtained is avoided. The method effectively improves the modeling precision and the parameter estimation precision.

Description

Gibbs parameter sampling method applied to random point mode finite hybrid model

Technical Field

The invention belongs to the technical field of pattern recognition, and particularly relates to a Gibbs parameter sampling method applied to a random point pattern finite hybrid model.

Background

Finite Mixture Modeling (FMM) is a statistical modeling tool that provides an efficient mathematical method for modeling complex densities with simple densities. The core problem of the finite mixture model is two: selection of the density of the mixture components and parameter estimation of the mixture model. The gaussian mixture model has become a limited mixture model which is commonly applied at present by virtue of the characteristics of simple form, convenient calculation and the like. However, most of the obtained actual data has non-linear and non-gaussian characteristics and is limited to the fitting capability of gaussian distribution, so that the gaussian mixture model cannot completely, accurately and effectively describe the complex data. According to the number of the mixed models and the unknown distribution parameters, the problems about the limited mixed models can be classified into problems about supervised learning, unsupervised learning and nonparametric models. Currently, mainstream learning algorithms are classified into deterministic learning algorithms and non-deterministic learning algorithms. The deterministic learning algorithm is a maximum likelihood estimation algorithm represented by an Expectation Maximization (EM) algorithm, and the non-deterministic learning algorithm is mainly a Bayes (Bayes) learning algorithm represented by a Markov chain. The research of FMM mainly includes two aspects: and (4) estimating the number of distribution elements of the mixed model and corresponding model parameters. The actually obtained data mostly has non-gaussian characteristics, and a gaussian mixture model is generally adopted for approximation. The parameter estimation of the finite mixture model may be obtained by a learning algorithm of the parameters.

It is worth mentioning that in the conventional FMM distribution, each data point is assumed to be independent, and therefore, the data likelihood function model is obtained by multiplying all data point likelihood functions, which cannot characterize the random characteristics of the data base (number of data points), and even in some cases, may generate contradictory estimation results.

Disclosure of Invention

The invention aims to provide a Gibbs parameter sampling method applied to a finite mixed model in a random point mode.

In order to characterize the randomness of the data base (number of data points), the method introduces a random point mode finite mixture model (RPP-FMM). The random point pattern distribution function also describes the cardinality of the data, as compared to a conventional FMM which only describes the characteristic randomness of the data. In the past, an EM algorithm and a Gibbs sampling algorithm are proposed to solve the relevant problems, and the EM algorithm is easily influenced by an initial value. In addition, since the EM algorithm belongs to a deterministic algorithm, for a given initial value, the parameter estimation may always fall into a local extreme point, and a global extreme point cannot be obtained. The Gibbs sampling algorithm belongs to a random sampling algorithm, and the influence of an initial value is relatively small.

The invention provides a Gibbs parameter sampling algorithm based on a random point mode finite mixture model (RPP-FMM), and the basic idea is to obtain model parameters by utilizing a Markov chain for constructing a random point mode, so that the modeling precision and the parameter estimation precision are further improved.

The method specifically comprises the following steps:

step (1), constructing a random point mode finite mixed model;

the point-mode mixture model with K random sources is represented as:

f(X_n|Θ)＝π₁f(X_n|θ₁)+π₂f(X_n|θ₂)+…+π_Kf(X_n|θ_K)；X_nrepresents the nth random point mode observed data, N is 1,2, …, N, N is the number of random point mode observed data,

a finite set space representing R, which is a real number space;

parameter set theta ═ pi of point mode hybrid model₁,π₂,…,π_K,θ₁,θ₂,…,θ_K}∈(R₊×Θ)^K，R₊Representing a positive real space; { theta ]₁,θ₂,…,θ_KIs a parameter variable in a random point mode distribution function, { pi }₁,π₂,…,π_KIs the mixing weight, pi_kIs a mixed weight of the kth distribution element and satisfies pi_k≥0，

Step (2), constructing a random point mode likelihood function;

for independent nth random point mode observation data

The likelihood function is represented as:

are independent of each other X_nLikelihood function of f (x)_n|θ_k) Represents X_nSingle point data x in_nA distribution function of (a); missing variable e_n＝{e_n,1,e_n,2,…,e_n,K}，e_n,kK is a K-th dimension missing variable in the missing variables, and K is 1,2, …, K, and is used to indicate the single point data x in the point pattern_nThe point pattern category of (1); e.g. of the type_nAnd X_nComplete data of composition (X)_n,e_n),

X_1:NSet of N random point pattern observations, e_1:NRepresenting a set of N missing variables.

The parametric posterior distribution of the point-mode hybrid model is represented as:

p (theta) is prior distribution of parameters, p (theta | X) is posterior distribution of parameters, and regular constant is

Representing a likelihood function of a random point pattern.

For the ith data x of the obtained nth random point mode observation data_n,i，x_n,i∈X_nDefining a K-dimensional indicator variable, each dimension indicating one of the mixture distributionsThe missing variable of K dimension is only one dimension of 1, and other dimensions are 0 and are represented as e_n＝[e_n,1,…,e_n,K]^TAnd satisfy e_n，k∈{0，1}，

Wherein e_n,kData x is represented by 1_nFrom the distribution element f (x)_n|θ_k) Generating; taking Gaussian mixture distribution as characteristic distribution of point pattern, and the corresponding point pattern parameter set is theta_k＝{ρ_k,μ_k,Σ_kWhere ρ is_kIs the base number, mu_kIs mean value, Σ_kIs a covariance matrix.

Step (3), establishing prior distribution of parameters of a finite mixed model in a random point mode;

in the case of Gaussian mixture, the prior parameter is

Decomposing the prior distribution p (Θ) according to a bayesian formula: p (Θ) ═ p (π)_1:K)p(ρ_1:K|π_1:K)p(Σ_1:K|ρ_1:K,π_1:K,μ_1:K)p(μ_1:K|ρ_1:K,π_1:K,Σ_1:K) (ii) a Dirichlet distributions are used as the conjugate priors of the classification distributions.

If the proportion of each distribution element is unknown, adopting equivalent Dirichlet distribution:

radix

l_kExpressing the cardinality of the kth random point mode, wherein the prior distribution of the cardinality obeys Poisson distribution;

a priori obeying Weibull distribution of inverse covariance matrices

W (V, β) denotes Weihich with respect to parameters V and βThe bit distribution, V is a positive definite matrix and β is a degree of freedom.

Mean value a gaussian distribution is used as the conjugate prior of the RPP-FMM mean value,

is the known mean of each random dot pattern.

Step (4), obtaining posterior distribution of the model parameters according to prior distribution of the model parameters;

the posterior mixed weight follows Dirichlet distribution, the posterior mean value follows Gaussian distribution, and the posterior variance follows Weight distribution;

posterior distribution of parameters

Mixing weight { pi₁,π₂,…,π_KSatisfy dirichlet distribution:

p(π₁,π₂…,π_K)＝Dir(α₁+l₁,α₂+l₂,…,α_K+l_K) Constant α_k＞0，l_kThe number of observation data belonging to the kth randomly distributed element is 1,2, … and K;

missing variable { e_n,1,…,e_n,KMissing data is estimated according to Bayes' formula:

f(x_n|θ_k) Representing random point pattern observed data X_nSingle point data x in_nDistribution function of p_kRepresenting the cardinality of the kth random point pattern observed data,

indicating variable value of each dimension of k random point mode observation dataThe sum of (1);

radix distribution:

covariance Σ_kThe inverse of the covariance follows a weichi distribution:

α therein₀And β₀Is a normal number, two regulating parameters M₀＞0，N₀＞0。

Mean value μ_kSatisfies the parameter of ξ_k,∑_k-gaussian distribution of samples from:

p(μ_k)＝N(μ_k；ξ_k,∑_k)，

ξ_kmean value of Gaussian distribution, ∑_kIs the covariance of the gaussian distribution.

Step (5), estimating the number of distribution elements and the model parameter value in mixed distribution by adopting a sampling algorithm combining a Gibbs sampling algorithm and a Bayesian information criterion;

the Bayesian information criterion is defined as: BIC (m)_k,Θ_k,X_k)＝-2logL(Θ_k,m_k|X_k)+M_klnn_k(ii) a Wherein M is_kIs the number of independent parameters, logL (Θ)_k,m_k|X_k) Representing a parameter set Θ_kAnd number of elements m_kA log-likelihood function of;

M_k＝3m_k+2，

in the alternative model class, the model that minimizes the bayesian information is the preferred model, and the parameter estimation derivation of the preferred model is:

parameter m_k,Θ_kFrom BIC (m)_k,Θ_k) Is obtained from the minimum likelihood function of (a) to obtain the parameter set theta_kAnd the number of distribution elements m_k。

With parameter set Θ_kAnd number of distribution elements m_kAnd estimating a model parameter value according to a Gibbs sampling algorithm, wherein the specific method comprises the following steps:

a. initialization

θ_k＝{ρ_k,μ_k,Σ_kFrom a conditional density p (θ)_i|θ_-i) Sampling in;

b. from

Middle sampling to obtain

And so on, from

Obtained by intermediate sampling

c. To realize from

To

Jumping of (2);

d. repeating a-c to obtain a Markov chain;

the deduced Markov chain can reflect the probability characteristics of the posterior distribution, and the stable point in the chain is often the extreme point in the distribution and can be used as the final model parameter estimation value.

The beneficial effects of the invention include: in order to characterize the random characteristics of the data base (the number of data points), the invention introduces a random point mode finite mixture model (RPP-FMM); compared with the traditional FMM which only describes the characteristic randomness of the data, the random point mode distribution function also describes the cardinality randomness of the data; a Gibbs sampling algorithm is adopted to sample data on the basis of RPP-FMM to obtain model parameters, the Gibbs sampling algorithm belongs to a random sampling algorithm, the influence of initial values is relatively small, and the situation that parameter estimation may always fall into a local extreme point and a global extreme point cannot be obtained for a given initial value by an EM algorithm can be effectively avoided. The Gibbs parameter sampling method applied to the random point mode finite mixture model (RPP-FMM) provided by the invention has the basic idea that a Markov chain for constructing the random point mode is used for obtaining model parameters, so that the modeling precision and the parameter estimation precision are further improved.

Detailed Description

A Gibbs parameter sampling method applied to a finite mixed model of a random point mode comprises the following specific steps:

step (1), constructing a random point mode finite mixed model according to the characteristics of the random point mode:

the point-mode mixture model with K random sources is represented as:

f(X_n|Θ)＝π₁f(X_n|θ₁)+π₂f(X_n|θ₂)+…+π_Kf(X_n|θ_K)；

X_nrepresents the nth random point mode observed data, N is 1,2, …, N, N is the number of random point mode observed data,

a finite set space representing R, which is a real number space;

Step (2) observing data for independent nth random point mode

The likelihood function is represented as:

Representing a likelihood function of a random point pattern.

For the ith data x of the obtained nth random point mode observation data_n,i，x_n,i∈X_nIt is not known which distribution the observation resulted from; thus, an indicator variable is defined for K dimensions, each dimension indicating a distribution element in the mixed distribution; obviously, an observation can only be generated from one distribution element, so that the missing variable of K dimension is only 1 in one dimension, and 0 in other dimensions, which is denoted as e_n＝[e_n,1,…,e_n,K]^TAnd satisfy e_n,k∈{0,1}，

Wherein e_n,kData x is represented by 1_nFrom the distribution element f (x)_n|θ_k) Generating; the Gaussian mixture distribution has good fitting performance, and is taken as the characteristic distribution of the point mode, and the parameter set of the corresponding point mode is theta_k＝{ρ_k,μ_k,Σ_kWhere ρ is_kIs the base number, mu_kIs mean value, Σ_kIs a covariance matrix.

in the case of Gaussian mixture, the prior parameter is

Decomposing the prior distribution p (Θ) according to Bayes formula: p (Θ) ═ p (π)_1:K)p(ρ_1:K|π_1:K)p(Σ_1:K|ρ_1:K,π_1:K,μ_1:K)p(μ_1:K|ρ_1:K,π_1:K,Σ_1:K) (ii) a Because the mixed weight reflects the proportion of the observed number of each component, the classified distribution is usually adopted as a prior distribution model of the mixed weight; therefore, a Dirichlet (Dirichlet) distribution is adopted as a conjugate prior of the classification distribution; if the proportion of each distribution element is unknown, the simplest prior distribution can adopt equivalent DirichletThunder (Dirichlet) distribution:

radix

a priori compliance with a Weibull (Wishart) distribution of an inverse matrix of covariance

W (V, β) represents the Wishart (wishirt) distribution with respect to parameters V and β, V being a positive definite matrix, β being a degree of freedom;

is the known mean of each random dot pattern.

the posterior mixed weight follows Dirichlet (Dirichlet) distribution, the posterior mean follows Gaussian distribution, and the posterior variance follows Weihicet (Wishart) distribution; posterior distribution of parameters

Mixing weight { pi₁,π₂,…,π_KSatisfy Dirichlet distribution:

representing the sum of the indicated variable values in each dimension of the kth random point pattern observation data;

radix distribution:

covariance Σ_kThe inverse of the covariance follows a weixilt (wishirt) distribution:

α therein₀And β₀Is a normal number, two regulating parameters M₀＞0，N₀＞0；

p(μ_k)＝N(μ_k；ξ_k,∑_k)，

Step (5), estimating the number of distribution elements in the mixed distribution by combining a model estimation algorithm of Bayesian Information Criterion (BIC);

in the parameter estimation problem of the hybrid model, how to estimate the number of distribution elements (or the order of the model) is one of the important contents of the inference statistics. The MCMC method of reversible jump adopts a method of simultaneously sampling a model order and parameters, determines the model order by using a method of a maximum posterior criterion, and is a non-deterministic method. A Gibbs parameter sampling algorithm based on a random point mode finite hybrid model is a sampling algorithm combining a Gibbs sampling algorithm and BIC. Approximating a posterior distribution p (Θ | X) by a markov chain monte carlo test method using Gibbs sampling; the Gibbs sampling generally needs to know the conditional probability of one attribute of a sample under all other attributes, and then deduces sample values of other attributes by using the conditional chain, so that the Gibbs sampling samples posterior distribution samples of parameters under the condition that given sample data and prior distribution of each parameter are known; the derived markov chain may reflect the probabilistic characteristics of the a posteriori distribution, with the stable points in the chain often being extreme points in the distribution, as the final estimate.

The Gibbs sampling algorithm is specifically as follows:

a. initialization

θ_k＝{ρ_k,μ_k,Σ_kFrom a conditional density p (θ)_i|θ_-i) Sampling in;

b. from

Middle sampling to obtain

And so on, from

Obtained by intermediate sampling

c. To realize from

To

Jumping of (2);

d. and repeating a-c to obtain a Markov chain.

On the basis of Gibbs sampling, the matching degree of RPP-FMM and real data distribution is evaluated by combining Bayesian Information Criterion (BIC), so that more information can be expressed by using a simple model.

M_k＝3m_k+2，

in the alternative model class, the model that minimizes bayesian information is the preferred model, and the parameter estimates for the model are derived as:

parameter m_k,Θ_kFrom BIC (m)_k,Θ_k) Is obtained from the minimum likelihood function of (a).

Claims

1. The Gibbs parameter sampling method applied to the finite mixed model of the random point mode is characterized by comprising the following steps of:

step (1), constructing a random point mode finite mixed model;

the point-mode mixture model with K random sources is represented as:

a finite set space representing R, which is a real number space;

Step (2), constructing a random point mode likelihood function;

for independent nth random point mode observation data

The likelihood function is represented as:

X_1:NSet of N random point pattern observations, e_1:NRepresents a set of N missing variables;

A likelihood function representing a pattern of random points;

for the ith data x of the obtained nth random point mode observation data_n,i，x_n,i∈X_nDefining an indicating variable of K dimensions, wherein each dimension indicates a distribution element in the mixed distribution, the missing variable of the K dimensions is 1 in one dimension, and the other dimensions are 0 and are expressed as e_n＝[e_n,1,…,e_n,K]^TAnd satisfy e_n,k∈{0,1}，

Wherein e_n,kData x is represented by 1_nFrom the distribution element f (x)_n|θ_k) Generating; taking Gaussian mixture distribution as characteristic distribution of point pattern, and the corresponding point pattern parameter set is theta_k＝{ρ_k,μ_k,Σ_kWhere ρ is_kIs the base number, mu_kIs mean value, Σ_kIs a covariance matrix;

in the case of Gaussian mixture, the prior parameter is

Decomposing the prior distribution p (Θ) according to a bayesian formula: p (Θ) ═ p (π)_1:K)p(ρ_1:K|π_1:K)p(Σ_1:K|ρ_1:K,π_1:K,μ_1:K)p(μ_1:K|ρ_1:K,π_1:K,Σ_1:K) (ii) a Dirichlet distribution is adopted as the conjugate prior of classification distribution;

radix

a priori obeying Weibull distribution of inverse covariance matrices

W (V, β) represents the weicht distribution with respect to parameters V and β, V being a positive definite matrix, β being a degree of freedom;

is the known mean of each random point pattern;

posterior distribution of parameters

Mixing weight { pi₁,π₂,…,π_KSatisfy dirichlet distribution:

radix distribution:

covariance Σ_kThe inverse of the covariance follows a weichi distribution:

p(μ_k)＝N(μ_k；ξ_k,Σ_k)，

ξ_kis the mean of a Gaussian distribution, sigma_kCovariance as gaussian distribution;

M_k＝3m_k+2，

parameter m_k,Θ_kFrom BIC (m)_k,Θ_k) Is obtained from the minimum likelihood function of (a) to obtain the parameter set theta_kAnd the number of distribution elements m_k；

With parameter set Θ_kAnd number of distribution elements m_kAnd estimating the model parameter value according to a Gibbs sampling algorithm.

2. The Gibbs parameter sampling method applied to the finite mixture model with random point mode as claimed in claim 1, wherein the Gibbs sampling algorithm in step (5) estimates the model parameter values by the following specific method:

a. initialization

From the conditional density p (theta)_i|θ_-i) Sampling in;

b. from

Middle sampling to obtain

And so on, from

Obtained by intermediate sampling

c. To realize from

To

Jumping of (2);

d. repeating a-c to obtain a Markov chain;

the stable point in the markov chain is the extreme point in the distribution, which is used as the final model parameter estimation value.