CN110334754A - A method of by star Formation Fast Classification - Google Patents
A method of by star Formation Fast Classification Download PDFInfo
- Publication number
- CN110334754A CN110334754A CN201910562679.6A CN201910562679A CN110334754A CN 110334754 A CN110334754 A CN 110334754A CN 201910562679 A CN201910562679 A CN 201910562679A CN 110334754 A CN110334754 A CN 110334754A
- Authority
- CN
- China
- Prior art keywords
- distance
- cluster
- calculated
- density
- star formation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of method by star Formation Fast Classification provided by the invention first looks for the exemplary spectrum of every one kind as cluster centre, and then the distance between other spectrum foundation to each quasi-representative spectrum is clustered.In general, cluster centre is those data points in small radii with higher density and away from each other, the present invention determines initial cluster center using MNN (M nearest-neighbors), density and distance.From the feature of spectroscopic data itself, by the density and distance feature that calculate every stellar spectrum, construction one is able to reflect the spectrum and improves the accuracy of cluster to select the stellar spectrum of maximum probability as initial cluster center as the probability function of cluster centre.
Description
Technical field
The present invention relates to a kind of methods by star Formation Fast Classification, the star aberration for taking LAMOST
Modal data is classified, and data mining technology field is belonged to.
Background technique
LAMOST is a kind of giant optical telescope by Chinese independent design and innovation, technically has very much challenge
Property.As the telescope with highest celestial body frequency spectrum acquisition rate, LAMOST will break through " bottleneck " of spectrum observation in astronomical research,
And become most powerful spectrum observation telescope.The most prominent feature of LAMOST telescope is major diameter (4 meters) and big visual field (5
Degree), and the ultra-large spectrum observation system being made of 4000 optical fiber.LAMOST be include tens million of a galaxies, class star
The spectrum observation of body and the Ha Noi celestial body including a large amount of fixed stars is made that tremendous contribution.Project is toured the heavens not recently as large size
Disconnected implementation and the appearance of new observation technology, obtain a large amount of large data sets, wherein LAMOST guide, which tours the heavens, issues spectrum number
According to more than 480,000 items, the celestial body including fixed star, Galaxies and some UNKNOWN TYPEs.The stellar spectrum packet of LAMOST shooting
Containing multiple types such as A, F, G, K, M, these spectrum are classified, only manually operation needs to expend very big time and essence
Power.
Data mining is that the process of interesting mode and knowledge is found from mass data.Cluster is a kind of typical unsupervised
Algorithm occupies an important position in data mining.The purpose of cluster be one group of data object is grouped into multiple groups or cluster so that
There is high similarity with the object in cluster, and dissimilar with the object height in other clusters.Traditional clustering algorithm is big
Cause can be divided into partition clustering method, hierarchy clustering method, density clustering method, and the clustering method based on grid is based on
The clustering method etc. of model.Most of clustering algorithms all encounter challenge, such as cluster centre selection difficulty, cluster number K's
Artificial determining, clustering precision is low equal.
Summary of the invention
To solve the problems, such as Stellar spectra classification, the invention discloses a kind of methods by star Formation Fast Classification.
The present invention constructs one by calculating the density and distance feature of every stellar spectrum from the feature of spectroscopic data itself
Being able to reflect the spectrum becomes the probability function of cluster centre, to select the stellar spectrum of maximum probability as in initial clustering
The heart improves the accuracy of cluster.
A kind of method by star Formation Fast Classification provided by the invention, it is comprised the steps of:
S1: the star Formation of LAMOST shooting is collected, and place is normalized to the star Formation being collected into
Reason, is considered as an object for each spectroscopic data here;
S2: the distance between any two object d is calculated;
S3: centered on each object, finding out its M nearest-neighbors, and by the distance definition of M nearest-neighbors to center
For r, repeat distance is only calculated once;
S4: the mean value of all r is calculated, R is denoted as;
S5: the density p in each data object R neighborhood is calculated;
S6: by the density p of each object divided by the distance r of corresponding M nearest-neighbors, it is denoted as Pro;
S7: Pro is sorted from large to small, K value as K initial cluster center, that is, K exemplary spectrum before output;
S8: to remaining each object, calculating the distance between itself and each cluster center, according to apart from nearest principle by its
Cluster where distributing to corresponding initial center, distance is closer to indicate more similar, so that it to be assigned to most like cluster.
It is further improved, the star Formation of LAMOST shooting is collected in the step S1, and to the perseverance being collected into
Starlight modal data is normalized, specifically: each spectroscopic data is considered as an object, it is assumed that object xiInclude P
Dimension data, i.e.,Calculate mean valueCoordinate after normalization
ForIt has been more than 3000 dimensions because initial data includes multiple features and dimension, therefore has only been extracted it
In a feature, with simplify calculate.
It is further improved, the distance between any two object d is calculated in the step S2, specifically: assuming thatWithAny two spectrum respectively in data set, then xiAnd xjEuclidean distance
Calculation method is as follows:
dp(xi,xj) it is xiAnd xjBetween distance metric relative to attribute P, wherein am∈ P, P are the attribute of object, f
(xi,am) indicate object xiIn attribute amOn value.
It is further improved, the mean value of all r is calculated in the step S4, specifically:
Wherein n indicates the number of element in data set.
It is further improved, the density p in each data object R neighborhood is calculated in the step S5, specifically:
With point xiCentered on, if xiWith xjBetween Euclidean distance dp(xi,xj)≤R, then xjBelong to xiFor cluster centre
The element for including in formed cluster, xiDensity value add 1, otherwise, xiDensity value add 0, point xiDensity piCalculation formula are as follows:
It is further improved, the calculation formula of Pro in the step S6 are as follows:
The beneficial effects of the present invention are:
Compared with prior art, the execution process of the method for the present invention mainly include collect data and to the data being collected into
Row normalized, the distance between any two points calculate between data, the searching of each data point M nearest-neighbors distance r, adjacent
The solution of domain radius R, the calculating of each data dot density, point become the probability assessment of initial cluster center and how to select most
7 steps of whole initial center.The present invention first looks for the exemplary spectrum of every one kind as cluster centre, then other spectrum foundations
It is clustered to the distance between each quasi-representative spectrum.In general, cluster centre be those in small radii have compared with
High density and data point away from each other, this method determine initial clustering using MNN (M nearest-neighbors), density and distance
Center.This method is from the feature of spectroscopic data itself, by calculating the density and distance feature of every stellar spectrum, construction
One is able to reflect the spectrum and becomes the probability function of cluster centre, to select the stellar spectrum of maximum probability as initial clustering
The accuracy of cluster is improved at center.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention;
Fig. 2 is the schematic diagram of the distance r of M nearest-neighbors;
Fig. 3 is the density schematic diagram of data point;
Fig. 4 is different types of K typical star aberration spectrogram being found using the method for the present invention.
Specific embodiment
The present invention is described in further detail combined with specific embodiments below, but protection scope of the present invention is not
Be limited to these embodiments, it is all without departing substantially from the change of present inventive concept or equivalent substitute be included in protection scope of the present invention it
It is interior.It is involved in the present invention to definition include:
Define the 1:(radius of neighbourhood).M nearest-neighbors (M nearest apart from the data point are looked for around each data point
Object), this M arest neighbors is centered to the data point and is set to r apart from farthest value.As shown in Fig. 2, dotted line is indicated with point xiFor
Cluster centre, M (M=6) the distance r greatly found.
Define 2:(density).Using each data point as cluster centre, the institute being present in data point radius of neighbourhood R is found
Quantity a little and the density for being regarded as the point, neighbours' number of data point is more, and density is bigger.
As shown in figure 3, the point in closure virtual coil is point xiAll neighbours within the scope of radius R, point xiNeighbours' quantity
For the density of the point.
As Figure 1-Figure 4, the execution process of the method for the present invention mainly include collect data and to the data being collected into
Row normalized, the distance between any two points calculate between data, the searching of each data point M nearest-neighbors distance r, adjacent
The solution of domain radius R, the calculating of each data dot density, point become the probability assessment of initial cluster center and how to select most
7 steps of whole initial center, specific as follows:
S1: collecting the star Formation of LAMOST shooting, and the star Formation being collected into be normalized,
Here each spectroscopic data is considered as an object, normalized, specifically: each spectroscopic data is considered as an object,
Assuming that object xiComprising P dimension data, i.e.,Calculate mean value
Coordinate after normalization isIt has been more than 3000 because initial data includes multiple features and dimension
Dimension, therefore it is only extracted one of feature, it is calculated with simplifying.
S2: calculating the distance between any two object d, specifically: assuming thatWithAny two spectrum respectively in data set, then xiAnd xjEuclidean distance calculation method it is as follows:
dp(xi,xj) it is xiAnd xjBetween distance metric relative to attribute P, wherein am∈ P, P are the attribute of object, f
(xi,am) indicate object xiIn attribute amOn value, be deposited into a symmetrical matrix
S3: centered on each object, finding out its M nearest-neighbors, and by the distance definition of M nearest-neighbors to center
For r, and repeat distance is only calculated once, i.e., arranges each row of data according to sequence from small to large, and the matrix after sequence is as follows,
Take the corresponding value of m-th point as the radius of neighbourhood r found under the conditions of each object equal densities.Such as: it is cluster with point x3
Center defines M=2, then the r found is 0.223,
S4: the mean value of all r is calculated, R is denoted as;Specifically:
Wherein n indicates the number of element in data set.
S5: calculating the density p in each data object R neighborhood, is understood in combination with Fig. 3, specifically:
With point xiCentered on, if xiWith xjBetween Euclidean distance dp(xi,xj)≤R, then xjBelong to xiFor cluster centre
The element for including in formed cluster, xiDensity value add 1, otherwise, xiDensity value add 0, point xiDensity piCalculation formula are as follows:
S6: by the density p of each object divided by the distance r of corresponding M nearest-neighbors, being denoted as Pro, is measured with Pro
Data point becomes the probability of initial center,
S7: Pro is sorted from large to small, K value as K initial cluster center, that is, K exemplary spectrum before output, this
In by taking star Formation collection as an example, cluster number K=5.Due to using different M, the value got may difference, institute
To repeat to test repeatedly using different M here, so that finding sub-fraction most probable becomes the point of initial cluster center, so
These points are clustered using K-means afterwards, the center K-means after cluster is the center of star Formation, such as Fig. 4
It is shown.
S8: to remaining n-K object, calculating the distance between itself and each cluster center, incites somebody to action according to apart from nearest principle
It distributes to cluster where corresponding initial center, and distance is closer to indicate more similar, so that it to be assigned to most like cluster.
Method of the present invention is by that can be determined more accurately initial cluster center after above-mentioned processing, to overcome
The problem of spectral data classification difficulty, it was demonstrated that feasibility of the invention.
Claims (6)
1. a kind of method by star Formation Fast Classification, it is characterised in that: the following steps are included:
S1: collecting the star Formation of LAMOST shooting, and the star Formation being collected into be normalized, this
In each spectroscopic data is considered as an object;
S2: the distance between any two object d is calculated;
S3: centered on each object, its M nearest-neighbors is found out, and is r by the distance definition of M nearest-neighbors to center, is repeated
Distance is only calculated primary;
S4: the mean value of all r is calculated, R is denoted as;
S5: the density p in each data object R neighborhood is calculated;
S6: by the density p of each object divided by the distance r of corresponding M nearest-neighbors, it is denoted as Pro;
S7: Pro is sorted from large to small, K value as K initial cluster center, that is, K exemplary spectrum before output;
S8: to remaining each object, the distance between itself and each cluster center is calculated, is distributed according to apart from nearest principle
To cluster where corresponding initial center, distance is closer to indicate more similar, so that it to be assigned to most like cluster.
2. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step
The star Formation of LAMOST shooting is collected in rapid S1, and the star Formation being collected into is normalized, is had
Body are as follows: each spectroscopic data is considered as an object, it is assumed that object xiComprising P dimension data, i.e.,Meter
Calculate mean valueCoordinate after normalization isBecause initial data includes
Multiple features and dimension have been more than 3000 dimensions, therefore are only extracted one of feature, are calculated with simplifying.
3. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step
The distance between any two object d is calculated in rapid S2, specifically: assuming thatWithPoint
Not Wei any two spectrum in data set, then xiAnd xjEuclidean distance calculation method it is as follows:
dp(xi,xj) it is xiAnd xjBetween distance metric relative to attribute P, wherein am∈ P, P are the attribute of object, f (xi,am)
Indicate object xiIn attribute amOn value.
4. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step
The mean value of all r is calculated in rapid S4, specifically:
Wherein n indicates the number of element in data set.
5. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step
The density p in each data object R neighborhood is calculated in rapid S5, specifically:
With point xiCentered on, if xiWith xjBetween Euclidean distance dp(xi,xj)≤R, then xjBelong to xiFor cluster centre institute shape
The element for including in cluster, xiDensity value add 1, otherwise, xiDensity value add 0, point xiDensity piCalculation formula are as follows:
6. a kind of method by star Formation Fast Classification according to claim 1, it is characterised in that: the step
The calculation formula of Pro in rapid S6 are as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910562679.6A CN110334754A (en) | 2019-06-26 | 2019-06-26 | A method of by star Formation Fast Classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910562679.6A CN110334754A (en) | 2019-06-26 | 2019-06-26 | A method of by star Formation Fast Classification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110334754A true CN110334754A (en) | 2019-10-15 |
Family
ID=68142974
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910562679.6A Pending CN110334754A (en) | 2019-06-26 | 2019-06-26 | A method of by star Formation Fast Classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334754A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111797916A (en) * | 2020-06-30 | 2020-10-20 | 东华大学 | Classification method of stellar spectra |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101071078A (en) * | 2007-06-14 | 2007-11-14 | 太原科技大学 | Astronmical spectral data correlation analysis and system based on constraint frequent mode |
CN104765832A (en) * | 2015-04-14 | 2015-07-08 | 山东大学(威海) | Planetary nebular spectrum mining method and system based on mode recognition method |
CN105548066A (en) * | 2015-12-11 | 2016-05-04 | 贵州中烟工业有限责任公司 | Method and system for distinguishing colloid types |
CN108537290A (en) * | 2018-04-25 | 2018-09-14 | 攀枝花学院 | Stellar spectra classification method based on data distribution characteristics and fuzzy membership function |
-
2019
- 2019-06-26 CN CN201910562679.6A patent/CN110334754A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101071078A (en) * | 2007-06-14 | 2007-11-14 | 太原科技大学 | Astronmical spectral data correlation analysis and system based on constraint frequent mode |
CN104765832A (en) * | 2015-04-14 | 2015-07-08 | 山东大学(威海) | Planetary nebular spectrum mining method and system based on mode recognition method |
CN105548066A (en) * | 2015-12-11 | 2016-05-04 | 贵州中烟工业有限责任公司 | Method and system for distinguishing colloid types |
CN108537290A (en) * | 2018-04-25 | 2018-09-14 | 攀枝花学院 | Stellar spectra classification method based on data distribution characteristics and fuzzy membership function |
Non-Patent Citations (2)
Title |
---|
余弘道: ""基于密度差距离的聚类中心快速定位方法"", 《福建电脑》 * |
许婷婷等: ""基于深度学习的LAMOST光谱分类研究"", 《天文学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111797916A (en) * | 2020-06-30 | 2020-10-20 | 东华大学 | Classification method of stellar spectra |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
van Haarlem et al. | Velocity fields and alignments of clusters in gravitational instability scenarios | |
CN111260594B (en) | Unsupervised multi-mode image fusion method | |
Yang et al. | Efficient image retrieval via decoupling diffusion into online and offline processing | |
CN105930862A (en) | Density peak clustering algorithm based on density adaptive distance | |
CN106022473B (en) | A kind of gene regulatory network construction method merging population and genetic algorithm | |
CN103279556B (en) | Iteration Text Clustering Method based on self adaptation sub-space learning | |
Orman et al. | Towards realistic artificial benchmark for community detection algorithms evaluation | |
Peng et al. | A SVM-kNN method for quasar-star classification | |
CN103745205A (en) | Gait recognition method based on multi-linear mean component analysis | |
CN103888541A (en) | Method and system for discovering cells fused with topology potential and spectral clustering | |
Longo et al. | Foreword to the focus issue on machine intelligence in astronomy and astrophysics | |
CN106548041A (en) | A kind of tumour key gene recognition methods based on prior information and parallel binary particle swarm optimization | |
CN111368936A (en) | Feature selection method based on improved SVM-RFE | |
CN110334754A (en) | A method of by star Formation Fast Classification | |
CN113128618A (en) | Parallel spectrum clustering method based on KD tree and chaotic mayfly optimization algorithm | |
CN110263834A (en) | A kind of detection method of new energy power quality exceptional value | |
Rematas et al. | The pooled nbnn kernel: Beyond image-to-class and image-to-image | |
Iess et al. | LSTM and CNN application for core-collapse supernova search in gravitational wave real data | |
Farr et al. | A more efficient approach to parallel-tempered Markov-chain Monte Carlo for the highly structured posteriors of gravitational-wave signals | |
Yu et al. | Hierarchical clustering in astronomy | |
Escalera et al. | Topology in galaxy distributions: method for a multi-scale analysis. A use of the wavelet transform. | |
CN107941210A (en) | A kind of method for recognising star map of combination nerual network technique and triangle algorithm | |
CN110097636B (en) | Site selection planning method based on visual field analysis | |
KR102158049B1 (en) | Data clustering apparatus and method based on range query using cf tree | |
Lach‐hab et al. | Novel approach for clustering zeolite crystal structures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191015 |
|
RJ01 | Rejection of invention patent application after publication |