CN107844461A - A kind of Gaussian process based on broad sense N body problems returns computational methods - Google Patents

A kind of Gaussian process based on broad sense N body problems returns computational methods Download PDF

Info

Publication number
CN107844461A
CN107844461A CN201710966946.7A CN201710966946A CN107844461A CN 107844461 A CN107844461 A CN 107844461A CN 201710966946 A CN201710966946 A CN 201710966946A CN 107844461 A CN107844461 A CN 107844461A
Authority
CN
China
Prior art keywords
kernel function
data set
nodes
tree
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710966946.7A
Other languages
Chinese (zh)
Inventor
何克晶
李智博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201710966946.7A priority Critical patent/CN107844461A/en
Publication of CN107844461A publication Critical patent/CN107844461A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of Gaussian process based on broad sense N body problems to return computational methods, including:Data set partitioning based on double kd trees, double kd traversal of trees beta pruning methods, the kernel matrix solving method divided and ruled based on high-order and Cholesky decomposition algorithms, key step are included:Division spatially is carried out to input data set with kd trees, then tries to achieve simplified Euclidean distance with double kd trees traversal beta pruning methods and stores;According to Euclidean distance and relative index is simplified, using a square exponential kernel functions, training dataset and the kernel matrix K of test data set are tried to achieve*And the kernel matrix K of training dataset itself;Kernel matrix K inverse matrix K is quickly tried to achieve with Cholesky decomposition algorithms‑1;Finally, by kernel matrix K inverse matrix K‑1, kernel matrix K*And the target function value of input, can be in the hope of prediction result vector.By the method increase forecasting efficiency and the ability of processing big data that Gaussian process returns, Gaussian process is promoted to return the extensive use in big data analysis.

Description

Gaussian process regression calculation method based on generalized N-body problem
Technical Field
The invention relates to a big data regression technology under the big data background, in particular to a Gaussian process regression calculation method based on a generalized N-body problem general solving method, so as to improve the efficiency of big data processing and analysis.
Background
In the big data era, the collection, access, management, analysis and utilization of mass data become a global research and application hotspot. The big data analysis is an important component of big data related research and application. The Gaussian process regression is a machine learning algorithm developed based on Bayes theory and statistical learning theory, and is suitable for processing the problem of high-dimensional nonlinear regression. Compared with other big data analysis classical algorithms such as SVM, neural Network and the like, the Gaussian process regression has the advantages of flexible nonparametric inference, probability significance of output and the like. However, the general calculation method of gaussian process regression has the disadvantages of large calculation amount, high-dimensional data trap and the like, and cannot be well adapted to massive high-dimensional data in the big data era. The generalized N-body problem is a problem of calculating the relationship such as distance, kernel, similarity and the like between point pairs, comprises the problems of Gaussian process regression, N-point correlation function, all-near-Neighbor, nonparametric Bayesian classification, kernel density estimation and the like, and is widely applied to big data analysis. A high-order divide-and-conquer algorithm based on a space division data structure is a general calculation method for a generalized N-body problem. Based on the method, the general solution method of the generalized N-body problem is improved and applied to the calculation of Gaussian process regression, so that the limitations of low calculation efficiency and low big data processing capability of a general algorithm of Gaussian process regression can be overcome, and the wide application of Gaussian process regression in big data analysis is promoted.
Disclosure of Invention
In order to solve the problems, the invention provides a Gaussian process regression calculation method based on the generalized N-body problem, which can accelerate the model fitting and prediction efficiency and improve the big data adaptability of Gaussian process regression. In order to achieve the above purpose, the invention adopts the following technical scheme:
the invention discloses a Gaussian process regression calculation method based on a generalized N-body problem, which is characterized in that the Gaussian process regression calculation is cooperatively realized by utilizing a data set partition method based on a double-kd tree, a traversal pruning method based on the double-kd tree, a kernel function matrix solution based on high-order partition and a Cholesky decomposition algorithm;
the formula for the gaussian process regression is:
wherein K * T Is the kernel function matrix between the training data set and the test data set, K is the kernel function matrix of the training data set itself, σ n 2 Is noise, I represents the identity matrix, y is the input training objective function value vector,is a predicted target value vector;
the method specifically comprises the following steps:
performing spatial division on an input data set by using a data set division method based on a double-kd tree, and then obtaining and storing a simplified Euclidean distance by using a traversal pruning method of the double-kd tree; according to the simplified Euclidean distance and related indexes, a square exponential kernel function is adopted to obtain a kernel function matrix K of the training data set and the test data set * And training a kernel function matrix K of the data set; fast solving inverse matrix K of kernel function matrix K by using Cholesky decomposition algorithm -1 (ii) a Finally, the inverse K of the kernel function K of the training data set itself -1 Kernel function matrix K between training data set and test data set * And the input objective function value to obtain a prediction result vector;
the data set partitioning method based on the double-kd tree is used for spatially partitioning data, and in the calculation of Gaussian process regression, for a training data set R and a testing data set Q, the kd tree T of each R and Q is respectively constructed according to the same rule R 、T Q (ii) a Traversing, calculating and pruning the two kd trees simultaneously to finally obtain a kernel function matrix;
the traversal pruning method of the double-kd tree is used for traversal, calculation and pruning of the double-kd tree; respectively calculating the distance between the corresponding nodes of the two kd-trees from the respective root nodes of the two kd-trees; setting a threshold value epsilon, when the distance between the nodes obtained by calculation is larger than the value epsilon, considering the distance between the two nodes to be infinite according to the calculation property of the kernel function, pruning the two nodes, and not recursing the two nodes and all child nodes; for the nodes with the distance value smaller than the epsilon, continuing the recursive computation until all the nodes are recurred and each leaf node is computed at least once;
the kernel function matrix solving method based on the high-order division is used for solving a kernel function matrix; according to the calculation result of the double kd-tree, the value of a kernel function matrix is obtained; the kernel function is a function defining the similarity or distance between data points, so that a corresponding kernel function value is calculated according to the distance x-x' between two data points;
the Cholesky decomposition algorithm is used for inverting the kernel function matrix; the fast inversion of the kernel function matrix by using Cholesky decomposition is adopted, and for the matrix inversion process, the Cholesky decomposition algorithm is used for accelerating the inversion efficiency and accelerating the prediction efficiency of Gaussian process regression.
As an optimal technical scheme, a general solving method of the generalized N-body problem, namely a high-order division and treatment algorithm based on a division data structure is used for respectively solving a kernel function matrix K of a training data set and a kernel function matrix K between the training data set and a test data set *
As a preferred technical solution, based on a data set partitioning method of a dual kd tree, the same partitioning rule is applied to respectively spatially partition a training data set and a test data set, that is, a whole data set is used as a root node of each kd tree, each time ordering is performed according to a value of a certain dimension of each data point, a partitioning object is equally divided into two parts which are used as left and right child nodes of the kd tree, and then the child nodes are respectively partitioned, and the above partitioning process is recursed until a leaf node can not be partitioned any more.
As a preferred technical scheme, in a traversal pruning method of the double kd-tree, a simplified Euclidean distance between two nodes is calculated during traversal, and whether the two nodes should be pruned or not is judged according to a distance value; defining the simplified Euclidean distance as the Euclidean distance without the evolution operation:
dist(X,X * )=||X-X * || 2
as a preferred technical scheme, a double-kd tree traversal pruning method for simplifying Euclidean distance is applied, if the simplified Euclidean distance between two calculated nodes is smaller than a set threshold value epsilon, downward recursive search is continued until the two nodes are leaf nodes, and at the moment, the simplified Euclidean distance of all data points between the two leaf sub-nodes is calculated pair by pair and stored; if the simplified Euclidean distance between two nodes obtained by calculation is larger than or equal to the set threshold value epsilon, the optimized technical scheme is that when the double-tree is traversed in the traversing pruning method processing process of the double-kd tree, all nodes are traversed by adopting a depth-first strategy from the root node.
As an optimal technical scheme, in a kernel function matrix solving method based on high-order divide-and-conquer, a square exponential kernel function is selected
As a kernel function of the gaussian process regression; where l represents a bandwidth parameter.
As a preferred technical scheme, in a square exponential kernel function, (x-x') 2 Values from the foregoing reduced euclidean distance; and reading the corresponding simplified Euclidean distance value when calculating the kernel function matrix, and solving the kernel function matrix through exponential operation.
As a preferred technical solution, in the Cholesky decomposition algorithm, fast inversion of a kernel function matrix of Cholesky decomposition is applied, and the method aims atThe matrix inversion part in the system is used for solving the inverse of the matrix by using a Cholesky decomposition algorithm, so that the operation efficiency of the whole process is accelerated; under the same conditions, the computation rate of the Cholesky decomposition algorithm inversion is 2 times faster than the computation rate of the LU decomposition algorithm inversion.
As an optimal technical scheme, aiming at a kernel function matrix K of a training data set and a kernel function matrix K between the training data set and a test data set * And respectively carrying out the processes of double-kd tree construction, traversal pruning, calculation of simplified Euclidean distance and kernel function value twice.
Compared with the prior art, the invention has the following advantages and effects:
the invention adopts the technical scheme of combining the high-order divide-and-conquer algorithm based on the space division data structure with the Gaussian process regression calculation method, and the high-order divide-and-conquer algorithm based on the space division data structure is a general calculation method for the generalized N-body problem. Based on the method, the general solution method of the generalized N-body problem is improved and applied to the calculation of the Gaussian process regression, so that the limitations of low calculation efficiency and low big data processing capability of the general algorithm of the Gaussian process regression are overcome, and the wide application of the Gaussian process regression in big data analysis is promoted.
Drawings
FIG. 1 is a diagram of an initial situation in which data is partitioned using a kd-Tree.
FIG. 2 is an exemplary diagram of 10 two-dimensional data points partitioned using a kd-Tree.
Fig. 3 is a diagram of a binary tree structure partitioned in fig. 2.
Fig. 4 is a flow chart of the dual tree algorithm therein.
FIG. 5 is an operation diagram of a two-tree algorithm recursive to two nodes.
FIG. 6 is an operation diagram of the dual-tree algorithm when recursion occurs to the child nodes of the node.
FIG. 7 is a flow chart of a Gaussian process regression calculation method based on the generalized N-body problem.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the embodiments of the present invention are not limited thereto.
Examples
The Gaussian process regression is nonparametric regression and has the advantages of flexible nonparametric inference, probability significance of output and the like. However, the general calculation method of gaussian process regression has the disadvantages of large calculation amount, high-dimensional data traps and the like, and has certain limitation when large data analysis is carried out. The generalized N-body problem is a problem of calculating the relation of distance, kernel, similarity and the like between point pairs, and a high-order partition algorithm based on a partition data structure is a general solving method of the generalized N-body problem. The gaussian process regression belongs to one of the generalized N-body problems, and the kernel function matrix calculation process can be optimized by using a general solution method of the generalized N-body problem, so that the prediction efficiency of the gaussian process regression is improved, as shown in fig. 7, and a flowchart of the gaussian process regression calculation method based on the generalized N-body problem is shown. The kd-tree is a space division data structure and has the advantages of easy realization and high efficiency. The invention uses kd-tree as space division data structure, divides the training data set and the testing data set, and traverses and prunes the double trees to obtain the kernel function matrix. And calculating a predicted value vector according to the calculated kernel function matrix and the input training set target.
The kd-tree, i.e. the k-dimension tree, is a binary tree in which nodes store k-dimensional data. As shown in fig. 1, the kd-tree bisects a whole set of data points into two parts according to the distance measure of the data points. A kd tree is constructed on the basis of a data set with a dimension k, namely the partition of a k-dimensional space formed by the data set, and any node in the tree corresponds to a k-dimensional hyper-rectangular area. FIG. 2 is an exemplary diagram of partitioning 10 two-dimensional data points using a kd-Tree, and FIG. 3 is a binary tree hierarchy diagram of the partitioned data points of FIG. 2. Binary search trees are commonly used for interval searching of one-dimensional data. Since the kd-tree and the binary search tree have structural similarities, the search rules are closely related for both the kd-tree and the binary search tree. In the invention, a depth-first strategy is applied to traverse each node of the kd-tree.
And aiming at the training data set and the testing data set, respectively constructing respective kd trees according to the same rule. In the process of constructing the kd-tree, a single data set is used as the root of the kd-tree, each time a component on a certain dimension of a data point of a current node is used as a comparison object, the data point is equally divided into two parts to respectively form a left child node and a right child node, and recursion is carried out on the newly divided child nodes until a leaf node can not be divided any more. In the construction process of the kd tree, if the time for constructing n data points is T (n), then
Solving the above recursive method can obtain O (nlogn), i.e. the time complexity of constructing the kd-tree based on the data points occupying O (n) storage space is O (nlogn).
And after the kd tree is constructed, pruning and traversing of double trees can be carried out. Fig. 4 is a flow chart of the dual-tree algorithm. Taking the calculation of the kernel function matrix between the training data set and the test data set as an example, let the kd-Tree based on the training data set be T R The kd-Tree based on the test data set is T Q . As shown in fig. 5, starting from the root nodes of the two trees, the simplified euclidean distance of the two nodes is calculated:
dist(X,X * )=||X-X * || 2
therefore, the influence of the dimensionality disaster on the calculation of the distance between the two data points can be reduced to a certain degree, and meanwhile, the process of opening the square root is omitted, so that the larger calculation amount can be reduced, and the accuracy of distance judgment is not lost. Starting from a root node, recursing to a lower node by adopting a depth-first search strategy, as shown in fig. 6, if the simplified Euclidean distance between two nodes is less than the e, continuing to recurse to the lower node; and if the simplified Euclidean distance between the two points is larger than or equal to the epsilon, pruning the pair of nodes. Wherein epsilon is a parameter set by people, and the degree of algorithm pruning can be controlled. Let N be a certain node in the kd-Tree, then N q For a certain node, N, in the kd-Tree where the test data set is located r Is a certain node in the kd-tree where the training data set is located. During the traversal pruning process of the double trees, the method comprises the following stepsThe following four cases:
(1) The distance between the two nodes is larger than the set epsilon: at the moment, pruning is carried out on the two nodes, and nodes below the two nodes are not recursed;
(2) Both nodes are leaf nodes: at the moment, the simplified Euclidean distances between all the point pairs in the two nodes are calculated and stored, and the algorithm does not recur to the following nodes any more;
(3)N q is a leaf node and N r Is an intermediate node: acting recursively on N r A node;
(4)N q is an intermediate node and N r Is a leaf node: acting recursively on N q And (4) nodes.
In case (3) and case (4), the pseudo code of the two-tree traversal algorithm is as follows:
{case 3:N q is a leaf node and N r Is an intermediate node }
if Distance(N q ,N rc1 )<Distance(N q ,N rc2 )
DualTreeTraversal(N q ,N rc1 )
DualTreeTraversal(N q ,N rc2 )
else
DualTreeTraversal(N q ,N rc2 )
DualTreeTraversal(N q ,N rc1 )
{case 4:N q Is an intermediate node and N r Is the leaf node }
if Distance(N qc1 ,N r )<Distance(N qc2 ,N r )
DualTreeTraversal(N qc1 ,N r )
DualTreeTraversal(N qc2 ,N r )
else
DualTreeTraversal(N qc2 ,N r )
DualTreeTraversal(N qc1 ,N r )
Wherein, distance () function calculates the simplified Euclidean Distance between two nodes, and DualTreeTranssal () function carries out recursion traversal. The simplified Euclidean distance value obtained by double-tree traversal and pruning and related indexes are stored in a heap heapK and used as input parameters for next calculation of the kernel function matrix.
And when all nodes are recurred and each leaf node is calculated at least once, calculating a kernel function matrix according to the obtained heapK. The covariance value corresponding to the index value can be obtained from the definition of the square exponential kernel function and the simplified euclidean distance. The following is the algorithmic pseudo code to compute the kernel function matrix:
covK←θ
for m in range(X)
fori in range k
covK[i][index]←kernel[m][i]
in the above algorithm, the kernel function matrix covK is first initialized to θ. The value of θ is equal to the value of the covariance function when the distance between two data points is infinite. For pruned nodes, the distance between the relevant data point pairs can be considered infinite, i.e. the covariance function value between these data points is θ. Next, the simplified Euclidean distance values in the heapK are assigned to corresponding positions in the kernel function matrix covK according to the indexes. kernel stands for the operation of the kernel function:
kernel←exp[(0.5×dist_from_heapK)/l 2 ]
where l is a bandwidth parameter and the default value is 1.0.
According to the method, two kernel function matrixes are respectively calculated, namely the kernel function matrix K between the training data set and the test data set * And a kernel function matrix K of the training data set itself. Since the double kd-trees of the training data set are used for both calculations, the respective kd-trees can be constructed only once in actual practice.
Consider the case of no noise, i.e.
For the kernel function matrix K of the training data set itself, its inverse matrix needs to be solved. In the invention, the Cholesky algorithm is applied to solve the inverse matrix, so that the calculation efficiency of the algorithm can be accelerated. The Cholesky decomposition is a product of a positive definite matrix and a lower triangular matrix and a conjugate transpose matrix, and the calculation efficiency is about 2 times that of the LU decomposition.
According to a kernel function matrix K between a training data set and a test data set * And training a kernel function matrix K of the data set to obtain a prediction result vector. The following is the algorithmic pseudo code of the model fitting and prediction sections.
L←cholesky(K)
α←L T \(L\y)
In the pseudo code, L is a lower triangular matrix obtained by performing Cholesky decomposition on the matrix K. α is the dot product of the inverse of matrix K and target y.I.e. the final prediction result. If the variance of the prediction is required to be known, it can be calculated from the following algorithm pseudo-code:
v=L\K *
V[f * ]=K **T υ
FIG. 7 is a flowchart of a Gaussian process regression calculation method based on generalized N-body problem, i.e. a flowchart of the whole calculation process of the present invention.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such modifications are intended to be included in the scope of the present invention.

Claims (10)

1. A Gaussian process regression calculation method based on a generalized N-body problem is characterized in that the method utilizes a data set partition method based on a double-kd tree, a traversal pruning method of the double-kd tree, a kernel function matrix solution based on high-order partition and a Cholesky decomposition algorithm to cooperatively realize the Gaussian process regression calculation;
the formula for the gaussian process regression is:
wherein K * T Is the kernel function matrix between the training data set and the test data set, K is the kernel function matrix of the training data set itself, σ n 2 Is noise, I represents the identity matrix, y is the input training objective function value vector,is a predicted target value vector;
the method specifically comprises the following steps:
performing spatial division on an input data set by using a data set division method based on a double-kd tree, and then obtaining and storing a simplified Euclidean distance by using a traversal pruning method of the double-kd tree; according to the simplified Euclidean distance and related indexes, a square exponential kernel function is adopted to obtain a kernel function matrix K of the training data set and the test data set * And training a kernel function matrix K of the data set; fast solving inverse matrix K of kernel function matrix K by using Cholesky decomposition algorithm -1 (ii) a Finally, the inverse K of the kernel function K of the training data set itself -1 Kernel function matrix K between training data set and test data set * And the input objective function value to obtain a prediction result vector;
the data set partitioning method based on the double-kd tree is used for spatially partitioning data, and in the calculation of Gaussian process regression, for a training data set R and a testing data set Q, the kd tree T of each R and Q is respectively constructed according to the same rule R 、T Q (ii) a By aligning the twoTraversing, calculating and pruning the kd-trees simultaneously to finally obtain a kernel function matrix;
the traversing and pruning method of the double kd trees is used for traversing, calculating and pruning the double kd trees; respectively calculating the distance between the corresponding nodes of the two kd-trees from the respective root nodes of the two kd-trees; setting a threshold value epsilon, when the distance between the nodes obtained by calculation is larger than the value epsilon, considering the distance between the two nodes to be infinite according to the calculation property of the kernel function, pruning the two nodes, and not recursing the two nodes and all child nodes; for the nodes with the distance value smaller than the epsilon, continuing the recursive computation until all nodes are recurred and each leaf node is computed at least once;
the kernel function matrix solving method based on the high-order division is used for solving a kernel function matrix; obtaining the value of a kernel function matrix according to the calculation result of the double kd-tree; the kernel function is a function defining the similarity or distance between data points, so that a corresponding kernel function value is calculated according to the distance x-x' between two data points;
the Cholesky decomposition algorithm is used for inverting the kernel function matrix; the fast inversion of the kernel function matrix by using Cholesky decomposition is adopted, and for the matrix inversion process, the Cholesky decomposition algorithm is used for accelerating the inversion efficiency and accelerating the prediction efficiency of Gaussian process regression.
2. The method of claim 1, wherein the generalized N-body problem-based Gaussian process regression calculation method is applied to solve the kernel function matrix K of the training data set itself and the kernel function matrix K between the training data set and the test data set respectively by using a general solution method of the generalized N-body problem, i.e. a high-order divide and conquer algorithm based on a divided data structure *
3. The method for gaussian process regression calculation based on generalized N-body problem as claimed in claim 1, wherein the method is based on dual kd-tree dataset partitioning, the training dataset and the testing dataset are respectively spatially partitioned by applying the same partitioning rule, that is, the whole dataset is used as the root node of each kd-tree, each time the data sets are sorted according to the value of a certain dimension of each data point, the partitioned object is equally divided into two parts which are used as the left and right child nodes of the kd-tree, and then the child nodes are respectively partitioned, and the above partitioning process is recursed until the leaf nodes can not be further partitioned.
4. The method for gaussian process regression calculation based on generalized N-body problem according to claim 1, wherein in the traversal pruning method of double kd-tree, the simplified euclidean distance between two nodes is calculated during traversal, and it is determined whether two nodes should be pruned according to the distance value; defining the simplified Euclidean distance as the Euclidean distance without the evolution operation:
dist(X,X * )=||X-X * || 2
5. the Gaussian process regression calculation method based on the generalized N-body problem is characterized in that a double-kd tree traversal pruning method of simplified Euclidean distance is used, if the simplified Euclidean distance between two nodes obtained through calculation is smaller than a set threshold epsilon, downward recursive search is continued until the two nodes are leaf nodes, and at the moment, the simplified Euclidean distance of all data points between the two leaf nodes is calculated pair by pair and stored; and if the simplified Euclidean distance between the two nodes obtained by calculation is larger than or equal to a set threshold value epsilon, pruning the two nodes, and not recursively calculating all the following child nodes.
6. The method of claim 4, wherein the traversal of the dual-kd-Tree is performed while traversing each node from a root node using a depth-first strategy in the traversal pruning process of the dual-kd-Tree.
7. The method of Gaussian process regression calculation based on generalized N-body problem as claimed in claim 1, wherein in the kernel function matrix solving method based on high order divide and conquer, square exponential kernel function is selected
As a kernel function of the gaussian process regression; where l represents a bandwidth parameter.
8. The method of claim 7, wherein the regression is based on the generalized N-body problem (x-x') 2 Values from the foregoing reduced euclidean distance; and reading the corresponding simplified Euclidean distance value when calculating the kernel function matrix, and solving the kernel function matrix through exponential operation.
9. The method for Gaussian process regression calculation based on generalized N-body problem as claimed in claim 1, wherein in Cholesky decomposition algorithm, fast inversion of kernel function matrix of Cholesky decomposition is applied, aiming atIn the matrix inversion part, the inverse of the matrix is obtained by using a Cholesky decomposition algorithm, so that the operation efficiency of the whole process is accelerated; under the same conditions, the computation rate of the Cholesky decomposition algorithm inversion is 2 times faster than the computation rate of the LU decomposition algorithm inversion.
10. The method for Gaussian process regression calculation based on generalized N-body problem according to any one of claims 1 to 9, wherein the kernel function matrix K for the training data set itself and the kernel function matrix K between the training data set and the test data set * And respectively carrying out the processes of double kd tree construction, traversal pruning, calculation of simplified Euclidean distance and kernel function value twice.
CN201710966946.7A 2017-10-17 2017-10-17 A kind of Gaussian process based on broad sense N body problems returns computational methods Pending CN107844461A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710966946.7A CN107844461A (en) 2017-10-17 2017-10-17 A kind of Gaussian process based on broad sense N body problems returns computational methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710966946.7A CN107844461A (en) 2017-10-17 2017-10-17 A kind of Gaussian process based on broad sense N body problems returns computational methods

Publications (1)

Publication Number Publication Date
CN107844461A true CN107844461A (en) 2018-03-27

Family

ID=61662266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710966946.7A Pending CN107844461A (en) 2017-10-17 2017-10-17 A kind of Gaussian process based on broad sense N body problems returns computational methods

Country Status (1)

Country Link
CN (1) CN107844461A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308122A (en) * 2020-10-20 2021-02-02 中国刑事警察学院 High-dimensional vector space sample fast searching method and device based on double trees
CN112711013A (en) * 2020-12-14 2021-04-27 中国船舶重工集团公司第七一五研究所 Rapid self-adaptive beam forming algorithm based on block matrix

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308122A (en) * 2020-10-20 2021-02-02 中国刑事警察学院 High-dimensional vector space sample fast searching method and device based on double trees
CN112308122B (en) * 2020-10-20 2024-03-01 中国刑事警察学院 High-dimensional vector space sample rapid searching method and device based on double trees
CN112711013A (en) * 2020-12-14 2021-04-27 中国船舶重工集团公司第七一五研究所 Rapid self-adaptive beam forming algorithm based on block matrix
CN112711013B (en) * 2020-12-14 2022-10-21 中国船舶重工集团公司第七一五研究所 Rapid self-adaptive beam forming method based on block matrix

Similar Documents

Publication Publication Date Title
Rong et al. Extended sequential adaptive fuzzy inference system for classification problems
Foygel et al. Bayesian model choice and information criteria in sparse generalized linear models
CN107070867B (en) Network flow abnormity rapid detection method based on multilayer locality sensitive hash table
Rao et al. Discovering nonlinear PDEs from scarce data with physics-encoded learning
Cocucci et al. Model error covariance estimation in particle and ensemble Kalman filters using an online expectation–maximization algorithm
Sahman et al. Binary tree-seed algorithms with S-shaped and V-shaped transfer functions
Masuyama et al. Fast topological adaptive resonance theory based on correntropy induced metric
CN107844461A (en) A kind of Gaussian process based on broad sense N body problems returns computational methods
Gong Deep dynamic Poisson factorization model
Chhabra et al. Missing value imputation using hybrid k-means and association rules
Wang et al. Robust particle tracker via Markov Chain Monte Carlo posterior sampling
CN115208651B (en) Flow clustering anomaly detection method and system based on reverse habituation mechanism
Kim et al. New multivariate kernel density estimator for uncertain data classification
KR20080078292A (en) Domain density description based incremental pattern classification method
CN115982570A (en) Multi-link custom optimization method, device, equipment and storage medium for federated learning modeling
Shang et al. Co-evolution-based immune clonal algorithm for clustering
Hu et al. PWSNAS: Powering weight sharing NAS with general search space shrinking framework
Dudik et al. Hierarchical maximum entropy density estimation
Liu An improved Bayesian network intrusion detection algorithm based on deep learning
Puri et al. H-Mrk-Means: Enhanced Heuristic Mrk-Means for Linear Time Clustering of Big Data Using Hybrid Meta-Heuristic Algorithm
Yuan et al. Hash-based feature learning for incomplete continuous-valued data
Zhao et al. A hybrid method for incomplete data imputation
Yin et al. Distributed clustering using distributed mixture of probabilistic PCA
Jiang et al. Learning and evaluation of latent dependency forest models
Umathe et al. Imputation methods for incomplete data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180327