CN113723456A - Unsupervised machine learning-based astronomical image automatic classification method and system - Google Patents
Unsupervised machine learning-based astronomical image automatic classification method and system Download PDFInfo
- Publication number
- CN113723456A CN113723456A CN202110853849.3A CN202110853849A CN113723456A CN 113723456 A CN113723456 A CN 113723456A CN 202110853849 A CN202110853849 A CN 202110853849A CN 113723456 A CN113723456 A CN 113723456A
- Authority
- CN
- China
- Prior art keywords
- astronomical
- astronomical image
- probability
- layer
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an unsupervised machine learning-based astronomical image automatic classification method and system. The method comprises the following steps: preprocessing astronomical image data to be classified; inputting preprocessed astronomical image data into a trained convolution self-coding network model, and extracting astronomical image features to obtain an astronomical image feature set; inputting the acquired astronomical image feature set into an image feature clustering model, and outputting the probability that each astronomical image belongs to each clustering cluster; manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class; and multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and screening the classes through a threshold value to finish the classification. The method can obtain higher astronomical image data classification accuracy rate at lower cost under the condition of no data label.
Description
Technical Field
The invention relates to an unsupervised machine learning-based astronomical image automatic classification method and system, and belongs to the technical field of intelligent processing of astronomical images.
Background
The formation and evolution of the galaxy is an important scientific problem in astrophysics, and the galaxy morphology is an important reference index in the formation and evolution of the galaxy, and is strongly correlated with a plurality of physical parameters, including quality, star formation history and quality distribution. Simple morphology of nearly 90 million Galaxy from the semon digital sky survey was collected in the Galaxy zoo1.0 project, and classification of Galaxy morphology was completed by thousands of volunteers over months. With the development of science and technology, observation equipment is continuously upgraded, such as the establishment of items such as LSST (Large synthetic surface Telescope American Large field Space-time sky-patrol item), EUCLID (European Union Euclidean Space Station sky-patrol item), CSST (China Space Telescope item) and the like, and we will step towards a Large-scale sky-patrol era, and the data set in the astronomical field will increase exponentially at that time. For example, 1.6 billion Galaxy zoo 2.0 project collects Galaxy from SDSS to determine the form of the Galaxy and further study the formation and evolution of the Galaxy, and in the face of a huge Galaxy image data set, the problem cannot be effectively solved by using a human eye observation method, so astronomers turn the eyes to an automatic classification method.
In recent years, methods such as machine learning and deep learning have been tried and applied to the field of astrology morphology classification. In 2010, Gauci et al proposed a classification model of galaxy morphology that combines a decision tree learning algorithm with a random forest algorithm. In 2015, Ferrari et al used Linear Discriminant Analysis (LDA) techniques to classify the constellation morphology. The machine learning algorithm usually needs complex feature engineering, exploratory data analysis needs to be performed on a data set firstly, then the data is transmitted to the machine learning algorithm through dimension reduction, meanwhile, the best feature needs to be selected for obtaining the best experimental result, and in order to avoid the complex process of the feature engineering, astronomers begin to try to solve the classification task of the galaxy morphology.
The concept of deep learning was proposed by Hinton in 2006, and positive contributions were made in various fields by constructing a multi-layer artificial neural network. Deep learning performs feature extraction and abstraction on input data through a plurality of nonlinear layers, and then classifies images. Although deep learning obtains little achievement in the field of astrology classification, when a deep learning method trains a model, the deep learning method strongly depends on data labels of a training set, but in reality, labeling astronomical images with labels is a highly professional work, and a large amount of time cost of experts is consumed, and in addition, the astronomical images are labeled by using manpower, so that artificial prejudices to the astronomical images are introduced to a certain extent, and the artificial prejudices are often difficult to find by people.
Disclosure of Invention
The purpose of the invention is as follows: with the arrival of the astronomical big data era, astronomical images grow exponentially, a large amount of labeled astronomical data obtained in a short time becomes unrealistic, and a plurality of defects exist in a supervised machine learning technology at the moment.
To implement the above-described invention, several core problems must be solved: (1) most of data considered in the existing astronomical image classification method need to be marked manually, for example, a Galaxy zoo data set is published on a website, astronomical enthusiasts are enabled to finish classification work together in a crowdsourcing mode, and the crowdsourcing mode is time-consuming and can bring human bias; (2) unsupervised classification methods are mostly limited to clustering models. Since astronomical images are high-dimensional image data, the data is often presented as three-dimensional images, and in the face of a large amount of high-dimensional data, a general clustering model may be caught in dimension cursing or it may be difficult to process such large-scale, high-latitude astronomical image data.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
in one aspect, the invention provides an unsupervised machine learning-based astronomical image automatic classification method, which comprises the following steps:
preprocessing astronomical image data to be classified;
inputting preprocessed astronomical image data into a trained convolution self-coding network model, and extracting astronomical image features to obtain an astronomical image feature set;
inputting the acquired astronomical image feature set into an image feature clustering model, and outputting the probability that each astronomical image belongs to each clustering cluster;
manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and screening the classes through a threshold value to finish the classification.
Further, the convolutional self-coding network model comprises an encoder and a decoder, wherein the encoder comprises an input layer, a first convolutional layer, a first attention module, a first downsampling layer, a second convolutional layer, a second attention module, a second downsampling layer, a third convolutional layer, a third attention module, a third downsampling layer, a fourth convolutional layer, a fourth attention module and a fourth downsampling layer which are sequentially connected; the decoder comprises a first hidden layer, a second hidden layer, a third hidden layer, a fourth hidden layer, a fifth hidden layer, a reshape layer, a fifth convolutional layer, a first upsampling layer, a sixth convolutional layer, a second upsampling layer, a seventh convolutional layer, a third upsampling layer, an eighth convolutional layer and an output layer which are sequentially connected, wherein each convolutional layer uses a ReLU activation function, and the fourth convolutional layer and the eighth convolutional layer use a Sigmoid activation function.
Further, the first convolutional layer contains 128 convolution kernels of 4 × 4, the second convolutional layer contains 64 convolution kernels of 4 × 4, the third convolutional layer contains 32 convolution kernels of 3 × 3, the fourth convolutional layer contains 16 convolution kernels of 3 × 3, the fifth convolutional layer contains 32 convolution kernels of 3 × 3, the sixth convolutional layer contains 32 convolution kernels of 3 × 3, the seventh convolutional layer contains 64 convolution kernels of 4 × 4, the eighth convolutional layer contains 3 convolution kernels of 4 × 4, the first lower sampling layer, the second lower sampling layer, the third lower sampling layer, the fourth lower sampling layer, the first upper sampling layer, the second upper sampling layer and the third upper sampling layer are all 2, the first hidden layer contains 128 neuron nodes, the second hidden layer contains 64 neuron nodes, the third hidden layer contains 32 neuron nodes, and the fourth hidden layer contains 64 neuron nodes, the fifth hidden layer contains 128 neuron nodes.
Further, the training method of the convolutional self-coding network model comprises the following steps:
acquiring a set of astronomical image data;
preprocessing a group of acquired astronomical image data;
and (3) performing parameter optimization through an Adam optimization algorithm by taking preprocessed astronomical image data as a training data set, taking a cross entropy loss function as an objective function and taking the objective function value tending to be minimum as a target, and training to obtain the convolutional self-coding network model.
Further, the preprocessing the acquired set of astronomical image data comprises: and performing center point cutting, random cutting and dimension reduction on each astronomical image in the set of astronomical image data.
Further, the preprocessing the acquired set of astronomical image data further comprises: and when the number of astronomical image samples of a certain astronomical image corresponding to the category after the central point is cut and dimensionality reduced is insufficient, randomly overturning the astronomical image obtained after the central point is cut and dimensionality reduced to obtain a new astronomical image, and adding the new astronomical image into the training data set.
Further, the image feature clustering model is built by adopting a gaussian mixture model, the number of clustered components is determined, the obtained feature set of the astronomical images is input into the image feature clustering model, and the probability that each astronomical image belongs to each clustering cluster is output, and the method comprises the following steps:
s1, for each feature data in the feature set of the input astronomical image, estimating the probability that each feature data is generated by each Component according to the following formula:
wherein γ (i, k) represents the probability that the ith feature data is generated by the kth Component; featurepiThe ith feature data in the feature set of the input astronomical image is represented; mu.skSum-sigmakParameter of kth Component; pikRepresents the mixing coefficient of the kth Component; k represents the number of Components; pijRepresents the mixing coefficient of the jth Component, whereinN () represents a gaussian distribution; mu.sjSum-sigmajIs the parameter of jth Component;
s2, based on the estimated γ (i, k), obtains the parameter value of the kth Component corresponding to the maximum likelihood according to the following equation:
the iterations S1 and S2 are repeated until the values of the likelihood functions converge, and a probability model is used to derive the probability that each feature data belongs to each cluster.
Further, the manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class includes:
and classifying different clustering clusters of the clustered astronomical images according to experience by experts to obtain the probability of each clustering cluster belonging to each class.
Further, the multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class includes:
and multiplying the probability matrix of each astronomical image belonging to each cluster with the probability matrix of each cluster belonging to each class to obtain the probability of each astronomical image belonging to each class.
In another aspect, the present invention provides an unsupervised machine learning-based astronomical image automatic classification system, including:
the preprocessing module is configured to preprocess astronomical image data to be classified;
the convolution self-coding network model is configured to take preprocessed astronomical image data as input, extract astronomical image features and obtain an astronomical image feature set;
the image feature clustering model is configured to take the extracted astronomical image feature set as input and output the probability that each astronomical image belongs to each clustering cluster;
the artificial scoring module is configured to manually score the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and the image characteristic classification model is configured to multiply the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and then the classification is finished through threshold value screening.
Has the advantages that: compared with the prior art, the astronomical image automatic classification method and system based on unsupervised machine learning provided by the invention have the following advantages:
(1) because the problem of astronomical image classification is solved at present, a deep learning method usually needs large-scale labeled astronomical image data as a training set, and if a large number of astronomical images are labeled by manpower, astronomical workers pay a large amount of time for the labeling, the method can directly learn and extract the characteristics of the unlabeled astronomical images, and then use the characteristics for further classification, so that the manual labeling cost is greatly reduced, and the efficiency is improved;
(2) the convolutional self-encoder based on the convolutional neural network is provided, the convolutional advantage is given to the high-dimensional characteristic of image data, and the accuracy of astronomical image classification is effectively improved.
Drawings
FIG. 1 is a flow chart of an unsupervised machine learning-based astronomical image automatic classification method according to an embodiment of the present invention;
FIG. 2 is a flow chart of astronomical image preprocessing in an embodiment of the present invention;
fig. 3 is a structural diagram of a convolutional self-coding network model in an embodiment of the present invention.
Detailed Description
The invention is further described with reference to specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As described above, with the advent of the astronomical big data era, astronomical pictures will grow exponentially, and it will become less practical to obtain a large amount of labeled astronomical data in a short time, and it is difficult to directly classify astronomical images accurately without labels.
To this end, in one embodiment, the invention provides an unsupervised machine learning-based astronomical image automatic classification method. As illustrated in fig. 1, the method comprises:
since galaxyoo data, which is a high-dimensional color picture, is used in the embodiment of the present invention, it needs to be subjected to some preprocessing.
And preprocessing the astronomical image data by using an image processing algorithm, such as central point cutting, turning and the like, preliminarily reducing the dimensionality of the astronomical image, and performing data amplification.
As shown in fig. 2, the astronomical image preprocessing algorithm specifically includes:
inputting: set of astronomical image data T ═ { P ═ P1,P2,P3,…,Pn},PiRepresenting the ith astronomical image sample
And (3) outputting: preprocessed astronomical image data set
a1. Traversing the astronomical images, setting a loop variable i from 1 to n, wherein n represents the total number of the astronomical images, and initially setting i to 1;
a2. traversing each astronomical image sample, and traversing the traversed image sample PiCenter crop and replace PiJump to a 3;
a3. further adjusting PiSize of (1), random clipping PiAnd replace PiThen, P is addediAdjusting the size of the P-shaped material to a uniform size, and adjusting the size of the P-shaped material to a uniform sizeiAdding the data into the preprocessed data set, and jumping to a 4;
a4. when P is presentiWhen the number of astronomical image samples of the corresponding category is not enough, jumping to a5, otherwise, jumping to a 6;
a5. randomly rotating the resized P of a3iTo obtain new PiAnd new PiAdding the data into the preprocessed data set, and jumping to a 4;
a6. executing i-i + 1;
a7. when i < n, jumping to a2, otherwise finishing the preprocessing of the astronomical image.
Step 2, inputting preprocessed astronomical image data into a trained convolutional self-coding network model, and extracting astronomical image features to obtain an astronomical image feature set;
in order to further reduce the dimensionality of the astronomical image data and obtain effective astronomical image data features, the embodiment builds a convolutional self-coding network model to train the astronomical image data, and finally, the output features of a certain hidden layer in the model can be used for representing the picture features input into the model.
Extracting astronomical image features by using a convolutional self-coding network model, comprising the following steps:
b1. based on convolutionA neural network for constructing the encoder part of the convolution self-encoder and inputting the preprocessed astronomical image PiExtracting features to form a low-dimensional feature vector featurep;
b2. Based on the convolutional neural network, a decoder part of a convolutional self-encoder is built, and feature vector feature is up-sampledpFeature vector feature of low dimensionpReduction to PiDimension, output P'i;
b3. Building a convolution self-coding network model:
inputting: astronomical image P after preprocessingiSpliced three-channel astronomical image input N ═ P1,P2,P3,...,Pn}
And (3) outputting: astronomical image feature set
The method specifically comprises the following steps:
1.1) constructing an encoder through an input layer, a convolution layer, a down-sampling layer, a full-connection layer, a flat layer and the like;
1.2) extracting, by an encoder, each sample P in an input vector NiFeature ofp;
1.3) constructing a decoder through a Reshape layer, a convolutional layer, an upper sampling layer, a full connection layer and the like;
1.4) feature extraction Using encoderpFeature vector feature of low dimensionpReduction to PiDimension, output P'i;
As shown in fig. 3, the specific structure of the convolutional self-coding network model includes:
first part (input layer): the input data is a preprocessed astronomical image and comprises picture characteristics of three channels of red, green and blue, the input dimension is 12288, and the data with the dimension of 64 x 3 is output after reshape;
second part (first buildup layer): the method comprises the steps that a convolution layer containing 128 convolution kernels is processed by a ReLU activation function, and data with the dimension of 64 x 128 are obtained;
third section (first attention module): the attention module comprises a channel attention mechanism of attention channel characteristics and a space attention mechanism of attention space characteristics, and the input and the output of the attention module are data with the dimension of 64 × 128;
fourth section (first downsampling layer): is a down-sampling layer with the size of 2 x 2, and data with the dimension of 32 x 128 is obtained;
fifth part (second convolution layer): the method comprises the steps that a convolution layer containing 64 convolution kernels is processed by a ReLU activation function, and data with the dimension of 32 x 64 are obtained;
sixth section (second attention module): the attention module comprises a channel attention mechanism of attention channel characteristics and a space attention mechanism of attention space characteristics, and the input and the output of the attention module are data with the dimension of 32 × 64;
seventh section (second downsampling layer): is a down-sampling layer with the size of 2 x 2, and data with the dimension of 16 x 64 is obtained;
eighth part (third convolution layer): the method comprises the steps that a convolution layer containing 32 convolution kernels is processed by a ReLU activation function, and data with the dimension of 16 × 32 are obtained;
ninth section (third attention module): the attention module comprises a channel attention mechanism of attention channel characteristics and a space attention mechanism of attention space characteristics, and the input and the output of the attention module are data with the dimension of 16 × 32;
tenth section (third downsampling layer): is a down-sampling layer with the size of 2 x 2, and data with the dimension of 8 x 32 is obtained;
eleventh part (fourth convolution layer): the method comprises the steps that a convolution layer containing 16 convolution kernels is processed by a ReLU activation function, and data with the dimension of 8 x 16 are obtained;
twelfth section (fourth attention module): the attention module comprises a channel attention mechanism of attention channel characteristics and a space attention mechanism of attention space characteristics, and the input and the output of the attention module are data with the dimension of 8 × 16;
thirteenth section (fourth downsampling layer): is a down-sampling layer with the size of 2 x 2, and data with the dimension of 4 x 16 is obtained;
fourteenth section (first hidden layer): the data is a hidden layer containing 128 neuron nodes, and data with dimension of 128 is obtained through ReLU activation function processing;
fifteenth section (second hidden layer): the data is a hidden layer containing 64 neuron nodes, and data with dimension of 64 is obtained through ReLU activation function processing;
sixteenth part (third hidden layer): the data processing method comprises the following steps that a hidden layer containing 32 neuron nodes is used, and data with dimension of 32 are obtained through ReLU activation function processing;
seventeenth section (fourth hidden layer): the data is a hidden layer containing 64 neuron nodes, and data with dimension of 64 is obtained through ReLU activation function processing;
eighteenth part (fifth hidden layer): the data is a hidden layer containing 128 neuron nodes, and data with dimension of 128 is obtained through ReLU activation function processing;
nineteenth part: is a reshape layer, and obtains data with the dimension of 4 × 8;
twentieth part (fifth convolutional layer): the method comprises the steps that a convolution layer containing 16 convolution kernel points is processed by a ReLU activation function, and data with the dimension of 4 x 16 are obtained;
twenty-first part (first upsampling layer): is an upsampled layer with size 2 x 2, resulting in data with dimension 8 x 16;
twenty-second part (sixth convolutional layer): the method comprises the steps that a convolution layer containing 32 convolution kernels is processed by a ReLU activation function, and data with the dimension of 16 × 32 are obtained;
twenty-third section (second upsampling layer): is an upsampled layer with size 2 x 2, resulting in data with dimension 32 x 32;
twenty-fourth (seventh convolutional layer): the method comprises the steps that a convolution layer containing 64 convolution kernels is processed by a ReLU activation function, and data with the dimension of 32 x 64 are obtained;
twenty-fifth section (third upsampling layer): is an upsampled layer with size 2 x 2, resulting in data with dimension 64 x 64;
twenty-sixth section (eighth convolutional layer): the method comprises the steps that a convolution layer containing 3 convolution kernels is processed by a Sigmoid activation function, and data with the dimension of 64 x 3 are obtained;
a twenty-seventh section: is the output layer, outputs data with dimensions 64 x 3.
1.5) optimization method and loss function
After the model is constructed, the model is trained, wherein the batch size of a training sample is set to be 128, a cross entropy loss function is selected, a ReLU activation function is used in a convolutional layer, a Sigmoid activation function is used in the last layer of an encoder and a decoder, nonlinear transformation is completed by the activation function, an attention mechanism is added after the convolutional layer of the encoder, the capability of extracting features of the model is improved, parameter optimization is carried out through an Adam optimization algorithm, the learning rate is 0.001, attenuation items 1e-08, the momentum is 0.9, and the iteration times are respectively set to be 300 to obtain the optimal model.
b4. And (4) obtaining the convolutional self-encoder network with trained weight through deep learning iterative training network.
Step 3, inputting the astronomical image feature set obtained in the step 2 into an image feature clustering model, and outputting the probability that each astronomical image belongs to each clustering cluster;
establishing an image feature clustering model:
inputting: image feature sample set { featurep1,featurep2,featurep3,...,featurepnWhere n is the total number of samples
And (3) outputting: probability of each astronomical image belonging to each cluster
1.1) building a clustering model based on a Gaussian mixture model, and determining the number of clustered components as K;
1.2) feature of samples in the input sample setpiEstimate featurepiProbability generated by each Component, for each featurepiThe generation probability of the kth Component is:
wherein γ (i, k) represents the ithProbability that the feature data is generated by kth Component; featurepiThe ith feature data in the feature set of the input astronomical image is represented; mu.skSum ΣkParameter of kth Component; pikRepresents the mixing coefficient of the kth Component; pijRepresents the mixing coefficient of the jth Component, whereinN () represents a gaussian distribution; mu.sjSum ΣjIs a parameter of jth Component.
Using an iterative approach, μ is assumed when calculating γ (i, k)kSum ΣkThe initial value or the value obtained from the last iteration can be taken as known;
1.3) estimating the parameters of each Component, assuming that the gamma (i, k) obtained in the previous step is the correct "featurepiProbability of Generation by Component k ", considering all data samples, it can be considered that Component has generated γ (1, k) featurep1,γ(2,k)featurep2,...,γ(n,k)featurepnThese points. Since each Component is a standard gaussian distribution, the parameter value corresponding to the maximum likelihood can be found:
1.4) iterating 1.2) and 1.3) repeatedly until the values of the likelihood functions converge, the probability { p) that each input sample belongs to each cluster can be obtained using a probability model1,p2,p3,...,pn};
Step 4, manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class;
establishing an artificial scoring model of the image clustering cluster:
for the clustered star images, the experts classify different clusters through their own experiences, and the result corresponds to the probability that each cluster belongs to each class, namely { c1,c2,c3,...,cLL represents the number of galaxy image classes;
and 5, multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and screening the classes through a threshold value to finish the classification.
Establishing an image feature classification model:
probability p of each input sample belonging to each cluster obtained in step 31,p2,p3,...,pnAnd the probability of each cluster belonging to each class obtained in step 4, namely { c }1,c2,c3,...,cLMultiplying the two matrixes to obtain the probability that each sample belongs to each class, and then screening the classes through a threshold value to finish the classification work.
In another embodiment, the invention provides an unsupervised machine learning-based astronomical image automatic classification system, comprising:
the preprocessing module is configured to preprocess astronomical image data to be classified;
the convolution self-coding network model is configured to take preprocessed astronomical image data as input, extract astronomical image features and obtain an astronomical image feature set;
the image feature clustering model is configured to take the extracted astronomical image feature set as input and output the probability that each astronomical image belongs to each clustering cluster;
the artificial scoring module is configured to manually score the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and the image characteristic classification model is configured to multiply the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and then the classification is finished through threshold value screening.
Compared with the prior art, the unsupervised astronomical image classification method disclosed by the invention integrates knowledge such as a neural network, a convolution self-encoder, a clustering model and the like, so that the model can learn to extract the characteristics of the astronomical image without any manual label data, the labor cost is reduced, the influence of human bias on the model classification is avoided, and the model can obtain higher classification accuracy under lower calculation cost.
The present invention has been disclosed in terms of the preferred embodiment, but is not intended to be limited to the embodiment, and all technical solutions obtained by substituting or converting equivalents thereof fall within the scope of the present invention.
Claims (10)
1. An unsupervised machine learning-based astronomical image automatic classification method is characterized by comprising the following steps:
preprocessing astronomical image data to be classified;
inputting preprocessed astronomical image data into a trained convolution self-coding network model, and extracting astronomical image features to obtain an astronomical image feature set;
inputting the acquired astronomical image feature set into an image feature clustering model, and outputting the probability that each astronomical image belongs to each clustering cluster;
manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and screening the classes through a threshold value to finish the classification.
2. The unsupervised machine learning-based astronomical image automatic classification method is characterized in that the convolutional self-coding network model comprises an encoder and a decoder, wherein the encoder comprises an input layer, a first convolutional layer, a first attention module, a first downsampling layer, a second convolutional layer, a second attention module, a second downsampling layer, a third convolutional layer, a third attention module, a third downsampling layer, a fourth convolutional layer, a fourth attention module and a fourth downsampling layer which are connected in sequence; the decoder comprises a first hidden layer, a second hidden layer, a third hidden layer, a fourth hidden layer, a fifth hidden layer, a reshape layer, a fifth convolutional layer, a first upsampling layer, a sixth convolutional layer, a second upsampling layer, a seventh convolutional layer, a third upsampling layer, an eighth convolutional layer and an output layer which are sequentially connected, wherein each convolutional layer uses a ReLU activation function, and the fourth convolutional layer and the eighth convolutional layer use a Sigmoid activation function.
3. The unsupervised machine learning-based method for automated classification of astronomical images according to claim 2, wherein said first convolutional layer comprises 128 4 x 4 convolution kernels, said second convolutional layer comprises 64 4 x 4 convolution kernels, said third convolutional layer comprises 32 3 x 3 convolution kernels, said fourth convolutional layer comprises 16 3 x 3 convolution kernels, said fifth convolutional layer comprises 32 3 x 3 convolution kernels, said sixth convolutional layer comprises 32 3 convolution kernels, said seventh convolutional layer comprises 64 4 x 4 convolution kernels, said eighth convolutional layer comprises 3 4 x 4 convolution kernels, said first lower sampling layer, said second lower sampling layer, said third lower sampling layer, said fourth lower sampling layer, said first upper sampling layer, said second upper sampling layer, said third upper sampling layer have a size of 2, said first hidden layer comprises 128 neuron nodes, said second hidden layer comprises 64 neuron nodes, the third hidden layer contains 32 neuron nodes, the fourth hidden layer contains 64 neuron nodes, and the fifth hidden layer contains 128 neuron nodes.
4. The unsupervised machine learning-based astronomical image automatic classification method according to claim 1, wherein the training method of the convolutional self-coding network model comprises the following steps:
acquiring a set of astronomical image data;
preprocessing a group of acquired astronomical image data;
and (3) performing parameter optimization through an Adam optimization algorithm by taking preprocessed astronomical image data as a training data set, taking a cross entropy loss function as an objective function and taking the objective function value tending to be minimum as a target, and training to obtain the convolutional self-coding network model.
5. The unsupervised machine learning-based astronomical image automatic classification method according to claim 4, wherein the preprocessing of the acquired set of astronomical image data comprises: and performing center point cutting, random cutting and dimension reduction on each astronomical image in the set of astronomical image data.
6. The unsupervised machine learning-based astronomical image automatic classification method according to claim 5, wherein said preprocessing the acquired set of astronomical image data further comprises: and when the number of astronomical image samples of a certain astronomical image corresponding to the category after the central point is cut and dimensionality reduced is insufficient, randomly overturning the astronomical image obtained after the central point is cut and dimensionality reduced to obtain a new astronomical image, and adding the new astronomical image into the training data set.
7. The unsupervised machine learning-based astronomical image automatic classification method according to claim 1, wherein the image feature clustering model is built by adopting a Gaussian mixture model, the number of clustered components is determined, the obtained astronomical image feature set is input into the image feature clustering model, and the probability that each astronomical image belongs to each clustering cluster is output, and the method comprises the following steps:
s1, for each feature data in the feature set of the input astronomical image, estimating the probability that each feature data is generated by each Component according to the following formula:
where γ (i, k) represents the ith feature data from the kth ComponnThe probability of ent generation; featurepiThe ith feature data in the feature set of the input astronomical image is represented; mu.skSum-sigmakParameter of kth Component; pikRepresents the mixing coefficient of the kth Component; k represents the number of Components; pijRepresents the mixing coefficient of the jth Component, whereinN () represents a gaussian distribution; mu.sjSum-sigmajIs the parameter of jth Component;
s2, based on the estimated γ (i, k), obtains the parameter value of the kth Component corresponding to the maximum likelihood according to the following equation:
the iterations S1 and S2 are repeated until the values of the likelihood functions converge, and a probability model is used to derive the probability that each feature data belongs to each cluster.
8. The unsupervised machine learning-based astronomical image automatic classification method according to claim 1, wherein the manual scoring of the astronomical images which have been clustered into clusters to obtain the probability that each cluster belongs to each class comprises:
and classifying different clustering clusters of the clustered astronomical images according to experience by experts to obtain the probability of each clustering cluster belonging to each class.
9. The unsupervised machine learning-based astronomical image automatic classification method according to claim 1, wherein the multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class comprises:
and multiplying the probability matrix of each astronomical image belonging to each cluster with the probability matrix of each cluster belonging to each class to obtain the probability of each astronomical image belonging to each class.
10. An unsupervised machine learning-based astronomical image automated classification system, comprising:
the preprocessing module is configured to preprocess astronomical image data to be classified;
the convolution self-coding network model is configured to take preprocessed astronomical image data as input, extract astronomical image features and obtain an astronomical image feature set;
the image feature clustering model is configured to take the extracted astronomical image feature set as input and output the probability that each astronomical image belongs to each clustering cluster;
the artificial scoring module is configured to manually score the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and the image characteristic classification model is configured to multiply the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and then the classification is finished through threshold value screening.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110853849.3A CN113723456B (en) | 2021-07-28 | 2021-07-28 | Automatic astronomical image classification method and system based on unsupervised machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110853849.3A CN113723456B (en) | 2021-07-28 | 2021-07-28 | Automatic astronomical image classification method and system based on unsupervised machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113723456A true CN113723456A (en) | 2021-11-30 |
CN113723456B CN113723456B (en) | 2023-10-17 |
Family
ID=78674118
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110853849.3A Active CN113723456B (en) | 2021-07-28 | 2021-07-28 | Automatic astronomical image classification method and system based on unsupervised machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113723456B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109919204A (en) * | 2019-02-23 | 2019-06-21 | 华南理工大学 | A kind of deep learning clustering method towards noise image |
WO2020041503A1 (en) * | 2018-08-24 | 2020-02-27 | Arterys Inc. | Deep learning-based coregistration |
CN111582389A (en) * | 2020-05-11 | 2020-08-25 | 昆明能讯科技有限责任公司 | Automatic tower point cloud data classification method based on convolution self-coding network |
US20200272857A1 (en) * | 2019-02-22 | 2020-08-27 | Neuropace, Inc. | Systems and methods for labeling large datasets of physiologial records based on unsupervised machine learning |
CN111859978A (en) * | 2020-06-11 | 2020-10-30 | 南京邮电大学 | Emotion text generation method based on deep learning |
-
2021
- 2021-07-28 CN CN202110853849.3A patent/CN113723456B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020041503A1 (en) * | 2018-08-24 | 2020-02-27 | Arterys Inc. | Deep learning-based coregistration |
US20200272857A1 (en) * | 2019-02-22 | 2020-08-27 | Neuropace, Inc. | Systems and methods for labeling large datasets of physiologial records based on unsupervised machine learning |
CN109919204A (en) * | 2019-02-23 | 2019-06-21 | 华南理工大学 | A kind of deep learning clustering method towards noise image |
CN111582389A (en) * | 2020-05-11 | 2020-08-25 | 昆明能讯科技有限责任公司 | Automatic tower point cloud data classification method based on convolution self-coding network |
CN111859978A (en) * | 2020-06-11 | 2020-10-30 | 南京邮电大学 | Emotion text generation method based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN113723456B (en) | 2023-10-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112308158B (en) | Multi-source field self-adaptive model and method based on partial feature alignment | |
CN109615582B (en) | Face image super-resolution reconstruction method for generating countermeasure network based on attribute description | |
CN110298266B (en) | Deep neural network target detection method based on multiscale receptive field feature fusion | |
CN111476713B (en) | Intelligent weather image identification method and system based on multi-depth convolution neural network fusion | |
CN106845401B (en) | Pest image identification method based on multi-space convolution neural network | |
CN108875674B (en) | Driver behavior identification method based on multi-column fusion convolutional neural network | |
CN107526785B (en) | Text classification method and device | |
CN111696101A (en) | Light-weight solanaceae disease identification method based on SE-Inception | |
CN109993100B (en) | Method for realizing facial expression recognition based on deep feature clustering | |
CN110321862B (en) | Pedestrian re-identification method based on compact ternary loss | |
CN106845528A (en) | A kind of image classification algorithms based on K means Yu deep learning | |
CN110991349B (en) | Lightweight vehicle attribute identification method based on metric learning | |
CN110287882A (en) | A kind of big chrysanthemum kind image-recognizing method based on deep learning | |
CN111639719A (en) | Footprint image retrieval method based on space-time motion and feature fusion | |
CN114038037B (en) | Expression label correction and identification method based on separable residual error attention network | |
CN106326925A (en) | Apple disease image identification method based on deep learning network | |
CN109508640A (en) | A kind of crowd's sentiment analysis method, apparatus and storage medium | |
CN113378962B (en) | Garment attribute identification method and system based on graph attention network | |
CN114170657A (en) | Facial emotion recognition method integrating attention mechanism and high-order feature representation | |
CN110688966A (en) | Semantic-guided pedestrian re-identification method | |
CN109934281B (en) | Unsupervised training method of two-class network | |
CN114511849B (en) | Grape thinning identification method based on graph attention network | |
CN115100509B (en) | Image identification method and system based on multi-branch block-level attention enhancement network | |
CN113723456A (en) | Unsupervised machine learning-based astronomical image automatic classification method and system | |
CN110363198A (en) | A kind of neural network weight matrix fractionation and combined method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |