CN110334754A - A method of by star Formation Fast Classification - Google Patents

A method of by star Formation Fast Classification Download PDF

Info

Publication number
CN110334754A
CN110334754A CN201910562679.6A CN201910562679A CN110334754A CN 110334754 A CN110334754 A CN 110334754A CN 201910562679 A CN201910562679 A CN 201910562679A CN 110334754 A CN110334754 A CN 110334754A
Authority
CN
China
Prior art keywords
distance
cluster
calculated
density
star formation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910562679.6A
Other languages
Chinese (zh)
Inventor
栗雅婷
蔡江辉
杨海峰
张继福
赵旭俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Science and Technology
Original Assignee
Taiyuan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Science and Technology filed Critical Taiyuan University of Science and Technology
Priority to CN201910562679.6A priority Critical patent/CN110334754A/en
Publication of CN110334754A publication Critical patent/CN110334754A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of method by star Formation Fast Classification provided by the invention first looks for the exemplary spectrum of every one kind as cluster centre, and then the distance between other spectrum foundation to each quasi-representative spectrum is clustered.In general, cluster centre is those data points in small radii with higher density and away from each other, the present invention determines initial cluster center using MNN (M nearest-neighbors), density and distance.From the feature of spectroscopic data itself, by the density and distance feature that calculate every stellar spectrum, construction one is able to reflect the spectrum and improves the accuracy of cluster to select the stellar spectrum of maximum probability as initial cluster center as the probability function of cluster centre.

Description

A method of by star Formation Fast Classification
Technical field
The present invention relates to a kind of methods by star Formation Fast Classification, the star aberration for taking LAMOST Modal data is classified, and data mining technology field is belonged to.
Background technique
LAMOST is a kind of giant optical telescope by Chinese independent design and innovation, technically has very much challenge Property.As the telescope with highest celestial body frequency spectrum acquisition rate, LAMOST will break through " bottleneck " of spectrum observation in astronomical research, And become most powerful spectrum observation telescope.The most prominent feature of LAMOST telescope is major diameter (4 meters) and big visual field (5 Degree), and the ultra-large spectrum observation system being made of 4000 optical fiber.LAMOST be include tens million of a galaxies, class star The spectrum observation of body and the Ha Noi celestial body including a large amount of fixed stars is made that tremendous contribution.Project is toured the heavens not recently as large size Disconnected implementation and the appearance of new observation technology, obtain a large amount of large data sets, wherein LAMOST guide, which tours the heavens, issues spectrum number According to more than 480,000 items, the celestial body including fixed star, Galaxies and some UNKNOWN TYPEs.The stellar spectrum packet of LAMOST shooting Containing multiple types such as A, F, G, K, M, these spectrum are classified, only manually operation needs to expend very big time and essence Power.
Data mining is that the process of interesting mode and knowledge is found from mass data.Cluster is a kind of typical unsupervised Algorithm occupies an important position in data mining.The purpose of cluster be one group of data object is grouped into multiple groups or cluster so that There is high similarity with the object in cluster, and dissimilar with the object height in other clusters.Traditional clustering algorithm is big Cause can be divided into partition clustering method, hierarchy clustering method, density clustering method, and the clustering method based on grid is based on The clustering method etc. of model.Most of clustering algorithms all encounter challenge, such as cluster centre selection difficulty, cluster number K's Artificial determining, clustering precision is low equal.
Summary of the invention
To solve the problems, such as Stellar spectra classification, the invention discloses a kind of methods by star Formation Fast Classification. The present invention constructs one by calculating the density and distance feature of every stellar spectrum from the feature of spectroscopic data itself Being able to reflect the spectrum becomes the probability function of cluster centre, to select the stellar spectrum of maximum probability as in initial clustering The heart improves the accuracy of cluster.
A kind of method by star Formation Fast Classification provided by the invention, it is comprised the steps of:
S1: the star Formation of LAMOST shooting is collected, and place is normalized to the star Formation being collected into Reason, is considered as an object for each spectroscopic data here;
S2: the distance between any two object d is calculated;
S3: centered on each object, finding out its M nearest-neighbors, and by the distance definition of M nearest-neighbors to center For r, repeat distance is only calculated once;
S4: the mean value of all r is calculated, R is denoted as;
S5: the density p in each data object R neighborhood is calculated;
S6: by the density p of each object divided by the distance r of corresponding M nearest-neighbors, it is denoted as Pro;
S7: Pro is sorted from large to small, K value as K initial cluster center, that is, K exemplary spectrum before output;
S8: to remaining each object, calculating the distance between itself and each cluster center, according to apart from nearest principle by its Cluster where distributing to corresponding initial center, distance is closer to indicate more similar, so that it to be assigned to most like cluster.
It is further improved, the star Formation of LAMOST shooting is collected in the step S1, and to the perseverance being collected into Starlight modal data is normalized, specifically: each spectroscopic data is considered as an object, it is assumed that object xiInclude P Dimension data, i.e.,Calculate mean valueCoordinate after normalization ForIt has been more than 3000 dimensions because initial data includes multiple features and dimension, therefore has only been extracted it In a feature, with simplify calculate.
It is further improved, the distance between any two object d is calculated in the step S2, specifically: assuming thatWithAny two spectrum respectively in data set, then xiAnd xjEuclidean distance Calculation method is as follows:
dp(xi,xj) it is xiAnd xjBetween distance metric relative to attribute P, wherein am∈ P, P are the attribute of object, f (xi,am) indicate object xiIn attribute amOn value.
It is further improved, the mean value of all r is calculated in the step S4, specifically:
Wherein n indicates the number of element in data set.
It is further improved, the density p in each data object R neighborhood is calculated in the step S5, specifically:
With point xiCentered on, if xiWith xjBetween Euclidean distance dp(xi,xj)≤R, then xjBelong to xiFor cluster centre The element for including in formed cluster, xiDensity value add 1, otherwise, xiDensity value add 0, point xiDensity piCalculation formula are as follows:
It is further improved, the calculation formula of Pro in the step S6 are as follows:
The beneficial effects of the present invention are:
Compared with prior art, the execution process of the method for the present invention mainly include collect data and to the data being collected into Row normalized, the distance between any two points calculate between data, the searching of each data point M nearest-neighbors distance r, adjacent The solution of domain radius R, the calculating of each data dot density, point become the probability assessment of initial cluster center and how to select most 7 steps of whole initial center.The present invention first looks for the exemplary spectrum of every one kind as cluster centre, then other spectrum foundations It is clustered to the distance between each quasi-representative spectrum.In general, cluster centre be those in small radii have compared with High density and data point away from each other, this method determine initial clustering using MNN (M nearest-neighbors), density and distance Center.This method is from the feature of spectroscopic data itself, by calculating the density and distance feature of every stellar spectrum, construction One is able to reflect the spectrum and becomes the probability function of cluster centre, to select the stellar spectrum of maximum probability as initial clustering The accuracy of cluster is improved at center.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention;
Fig. 2 is the schematic diagram of the distance r of M nearest-neighbors;
Fig. 3 is the density schematic diagram of data point;
Fig. 4 is different types of K typical star aberration spectrogram being found using the method for the present invention.
Specific embodiment
The present invention is described in further detail combined with specific embodiments below, but protection scope of the present invention is not Be limited to these embodiments, it is all without departing substantially from the change of present inventive concept or equivalent substitute be included in protection scope of the present invention it It is interior.It is involved in the present invention to definition include:
Define the 1:(radius of neighbourhood).M nearest-neighbors (M nearest apart from the data point are looked for around each data point Object), this M arest neighbors is centered to the data point and is set to r apart from farthest value.As shown in Fig. 2, dotted line is indicated with point xiFor Cluster centre, M (M=6) the distance r greatly found.
Define 2:(density).Using each data point as cluster centre, the institute being present in data point radius of neighbourhood R is found Quantity a little and the density for being regarded as the point, neighbours' number of data point is more, and density is bigger.
As shown in figure 3, the point in closure virtual coil is point xiAll neighbours within the scope of radius R, point xiNeighbours' quantity For the density of the point.
As Figure 1-Figure 4, the execution process of the method for the present invention mainly include collect data and to the data being collected into Row normalized, the distance between any two points calculate between data, the searching of each data point M nearest-neighbors distance r, adjacent The solution of domain radius R, the calculating of each data dot density, point become the probability assessment of initial cluster center and how to select most 7 steps of whole initial center, specific as follows:
S1: collecting the star Formation of LAMOST shooting, and the star Formation being collected into be normalized, Here each spectroscopic data is considered as an object, normalized, specifically: each spectroscopic data is considered as an object, Assuming that object xiComprising P dimension data, i.e.,Calculate mean value Coordinate after normalization isIt has been more than 3000 because initial data includes multiple features and dimension Dimension, therefore it is only extracted one of feature, it is calculated with simplifying.
S2: calculating the distance between any two object d, specifically: assuming thatWithAny two spectrum respectively in data set, then xiAnd xjEuclidean distance calculation method it is as follows:
dp(xi,xj) it is xiAnd xjBetween distance metric relative to attribute P, wherein am∈ P, P are the attribute of object, f (xi,am) indicate object xiIn attribute amOn value, be deposited into a symmetrical matrix
S3: centered on each object, finding out its M nearest-neighbors, and by the distance definition of M nearest-neighbors to center For r, and repeat distance is only calculated once, i.e., arranges each row of data according to sequence from small to large, and the matrix after sequence is as follows, Take the corresponding value of m-th point as the radius of neighbourhood r found under the conditions of each object equal densities.Such as: it is cluster with point x3 Center defines M=2, then the r found is 0.223,
S4: the mean value of all r is calculated, R is denoted as;Specifically:
Wherein n indicates the number of element in data set.
S5: calculating the density p in each data object R neighborhood, is understood in combination with Fig. 3, specifically:
With point xiCentered on, if xiWith xjBetween Euclidean distance dp(xi,xj)≤R, then xjBelong to xiFor cluster centre The element for including in formed cluster, xiDensity value add 1, otherwise, xiDensity value add 0, point xiDensity piCalculation formula are as follows:
S6: by the density p of each object divided by the distance r of corresponding M nearest-neighbors, being denoted as Pro, is measured with Pro Data point becomes the probability of initial center,
S7: Pro is sorted from large to small, K value as K initial cluster center, that is, K exemplary spectrum before output, this In by taking star Formation collection as an example, cluster number K=5.Due to using different M, the value got may difference, institute To repeat to test repeatedly using different M here, so that finding sub-fraction most probable becomes the point of initial cluster center, so These points are clustered using K-means afterwards, the center K-means after cluster is the center of star Formation, such as Fig. 4 It is shown.
S8: to remaining n-K object, calculating the distance between itself and each cluster center, incites somebody to action according to apart from nearest principle It distributes to cluster where corresponding initial center, and distance is closer to indicate more similar, so that it to be assigned to most like cluster.
Method of the present invention is by that can be determined more accurately initial cluster center after above-mentioned processing, to overcome The problem of spectral data classification difficulty, it was demonstrated that feasibility of the invention.

Claims (6)

1. a kind of method by star Formation Fast Classification, it is characterised in that: the following steps are included:
S1: collecting the star Formation of LAMOST shooting, and the star Formation being collected into be normalized, this In each spectroscopic data is considered as an object;
S2: the distance between any two object d is calculated;
S3: centered on each object, its M nearest-neighbors is found out, and is r by the distance definition of M nearest-neighbors to center, is repeated Distance is only calculated primary;
S4: the mean value of all r is calculated, R is denoted as;
S5: the density p in each data object R neighborhood is calculated;
S6: by the density p of each object divided by the distance r of corresponding M nearest-neighbors, it is denoted as Pro;
S7: Pro is sorted from large to small, K value as K initial cluster center, that is, K exemplary spectrum before output;
S8: to remaining each object, the distance between itself and each cluster center is calculated, is distributed according to apart from nearest principle To cluster where corresponding initial center, distance is closer to indicate more similar, so that it to be assigned to most like cluster.
2. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step The star Formation of LAMOST shooting is collected in rapid S1, and the star Formation being collected into is normalized, is had Body are as follows: each spectroscopic data is considered as an object, it is assumed that object xiComprising P dimension data, i.e.,Meter Calculate mean valueCoordinate after normalization isBecause initial data includes Multiple features and dimension have been more than 3000 dimensions, therefore are only extracted one of feature, are calculated with simplifying.
3. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step The distance between any two object d is calculated in rapid S2, specifically: assuming thatWithPoint Not Wei any two spectrum in data set, then xiAnd xjEuclidean distance calculation method it is as follows:
dp(xi,xj) it is xiAnd xjBetween distance metric relative to attribute P, wherein am∈ P, P are the attribute of object, f (xi,am) Indicate object xiIn attribute amOn value.
4. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step The mean value of all r is calculated in rapid S4, specifically:
Wherein n indicates the number of element in data set.
5. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step The density p in each data object R neighborhood is calculated in rapid S5, specifically:
With point xiCentered on, if xiWith xjBetween Euclidean distance dp(xi,xj)≤R, then xjBelong to xiFor cluster centre institute shape The element for including in cluster, xiDensity value add 1, otherwise, xiDensity value add 0, point xiDensity piCalculation formula are as follows:
6. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step The calculation formula of Pro in rapid S6 are as follows:
CN201910562679.6A 2019-06-26 2019-06-26 A method of by star Formation Fast Classification Pending CN110334754A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910562679.6A CN110334754A (en) 2019-06-26 2019-06-26 A method of by star Formation Fast Classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910562679.6A CN110334754A (en) 2019-06-26 2019-06-26 A method of by star Formation Fast Classification

Publications (1)

Publication Number Publication Date
CN110334754A true CN110334754A (en) 2019-10-15

Family

ID=68142974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910562679.6A Pending CN110334754A (en) 2019-06-26 2019-06-26 A method of by star Formation Fast Classification

Country Status (1)

Country Link
CN (1) CN110334754A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797916A (en) * 2020-06-30 2020-10-20 东华大学 Classification method of stellar spectra

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071078A (en) * 2007-06-14 2007-11-14 太原科技大学 Astronmical spectral data correlation analysis and system based on constraint frequent mode
CN104765832A (en) * 2015-04-14 2015-07-08 山东大学(威海) Planetary nebular spectrum mining method and system based on mode recognition method
CN105548066A (en) * 2015-12-11 2016-05-04 贵州中烟工业有限责任公司 Method and system for distinguishing colloid types
CN108537290A (en) * 2018-04-25 2018-09-14 攀枝花学院 Stellar spectra classification method based on data distribution characteristics and fuzzy membership function

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071078A (en) * 2007-06-14 2007-11-14 太原科技大学 Astronmical spectral data correlation analysis and system based on constraint frequent mode
CN104765832A (en) * 2015-04-14 2015-07-08 山东大学(威海) Planetary nebular spectrum mining method and system based on mode recognition method
CN105548066A (en) * 2015-12-11 2016-05-04 贵州中烟工业有限责任公司 Method and system for distinguishing colloid types
CN108537290A (en) * 2018-04-25 2018-09-14 攀枝花学院 Stellar spectra classification method based on data distribution characteristics and fuzzy membership function

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
余弘道: ""基于密度差距离的聚类中心快速定位方法"", 《福建电脑》 *
许婷婷等: ""基于深度学习的LAMOST光谱分类研究"", 《天文学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797916A (en) * 2020-06-30 2020-10-20 东华大学 Classification method of stellar spectra

Similar Documents

Publication Publication Date Title
van Haarlem et al. Velocity fields and alignments of clusters in gravitational instability scenarios
CN111260594B (en) Unsupervised multi-mode image fusion method
Yang et al. Efficient image retrieval via decoupling diffusion into online and offline processing
CN105930862A (en) Density peak clustering algorithm based on density adaptive distance
CN106022473B (en) A kind of gene regulatory network construction method merging population and genetic algorithm
CN103279556B (en) Iteration Text Clustering Method based on self adaptation sub-space learning
Orman et al. Towards realistic artificial benchmark for community detection algorithms evaluation
Peng et al. A SVM-kNN method for quasar-star classification
CN103745205A (en) Gait recognition method based on multi-linear mean component analysis
CN103888541A (en) Method and system for discovering cells fused with topology potential and spectral clustering
Longo et al. Foreword to the focus issue on machine intelligence in astronomy and astrophysics
CN106548041A (en) A kind of tumour key gene recognition methods based on prior information and parallel binary particle swarm optimization
CN111368936A (en) Feature selection method based on improved SVM-RFE
CN110334754A (en) A method of by star Formation Fast Classification
CN113128618A (en) Parallel spectrum clustering method based on KD tree and chaotic mayfly optimization algorithm
CN110263834A (en) A kind of detection method of new energy power quality exceptional value
Rematas et al. The pooled nbnn kernel: Beyond image-to-class and image-to-image
Iess et al. LSTM and CNN application for core-collapse supernova search in gravitational wave real data
Farr et al. A more efficient approach to parallel-tempered Markov-chain Monte Carlo for the highly structured posteriors of gravitational-wave signals
Yu et al. Hierarchical clustering in astronomy
Escalera et al. Topology in galaxy distributions: method for a multi-scale analysis. A use of the wavelet transform.
CN107941210A (en) A kind of method for recognising star map of combination nerual network technique and triangle algorithm
CN110097636B (en) Site selection planning method based on visual field analysis
KR102158049B1 (en) Data clustering apparatus and method based on range query using cf tree
Lach‐hab et al. Novel approach for clustering zeolite crystal structures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191015

RJ01 Rejection of invention patent application after publication