CN113723456A - Unsupervised machine learning-based astronomical image automatic classification method and system - Google Patents

Unsupervised machine learning-based astronomical image automatic classification method and system Download PDF

Info

Publication number
CN113723456A
CN113723456A CN202110853849.3A CN202110853849A CN113723456A CN 113723456 A CN113723456 A CN 113723456A CN 202110853849 A CN202110853849 A CN 202110853849A CN 113723456 A CN113723456 A CN 113723456A
Authority
CN
China
Prior art keywords
astronomical
astronomical image
probability
layer
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110853849.3A
Other languages
Chinese (zh)
Other versions
CN113723456B (en
Inventor
邹志强
韩杨
吴家皋
张芷瑞
洪舒欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110853849.3A priority Critical patent/CN113723456B/en
Publication of CN113723456A publication Critical patent/CN113723456A/en
Application granted granted Critical
Publication of CN113723456B publication Critical patent/CN113723456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an unsupervised machine learning-based astronomical image automatic classification method and system. The method comprises the following steps: preprocessing astronomical image data to be classified; inputting preprocessed astronomical image data into a trained convolution self-coding network model, and extracting astronomical image features to obtain an astronomical image feature set; inputting the acquired astronomical image feature set into an image feature clustering model, and outputting the probability that each astronomical image belongs to each clustering cluster; manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class; and multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and screening the classes through a threshold value to finish the classification. The method can obtain higher astronomical image data classification accuracy rate at lower cost under the condition of no data label.

Description

Unsupervised machine learning-based astronomical image automatic classification method and system
Technical Field
The invention relates to an unsupervised machine learning-based astronomical image automatic classification method and system, and belongs to the technical field of intelligent processing of astronomical images.
Background
The formation and evolution of the galaxy is an important scientific problem in astrophysics, and the galaxy morphology is an important reference index in the formation and evolution of the galaxy, and is strongly correlated with a plurality of physical parameters, including quality, star formation history and quality distribution. Simple morphology of nearly 90 million Galaxy from the semon digital sky survey was collected in the Galaxy zoo1.0 project, and classification of Galaxy morphology was completed by thousands of volunteers over months. With the development of science and technology, observation equipment is continuously upgraded, such as the establishment of items such as LSST (Large synthetic surface Telescope American Large field Space-time sky-patrol item), EUCLID (European Union Euclidean Space Station sky-patrol item), CSST (China Space Telescope item) and the like, and we will step towards a Large-scale sky-patrol era, and the data set in the astronomical field will increase exponentially at that time. For example, 1.6 billion Galaxy zoo 2.0 project collects Galaxy from SDSS to determine the form of the Galaxy and further study the formation and evolution of the Galaxy, and in the face of a huge Galaxy image data set, the problem cannot be effectively solved by using a human eye observation method, so astronomers turn the eyes to an automatic classification method.
In recent years, methods such as machine learning and deep learning have been tried and applied to the field of astrology morphology classification. In 2010, Gauci et al proposed a classification model of galaxy morphology that combines a decision tree learning algorithm with a random forest algorithm. In 2015, Ferrari et al used Linear Discriminant Analysis (LDA) techniques to classify the constellation morphology. The machine learning algorithm usually needs complex feature engineering, exploratory data analysis needs to be performed on a data set firstly, then the data is transmitted to the machine learning algorithm through dimension reduction, meanwhile, the best feature needs to be selected for obtaining the best experimental result, and in order to avoid the complex process of the feature engineering, astronomers begin to try to solve the classification task of the galaxy morphology.
The concept of deep learning was proposed by Hinton in 2006, and positive contributions were made in various fields by constructing a multi-layer artificial neural network. Deep learning performs feature extraction and abstraction on input data through a plurality of nonlinear layers, and then classifies images. Although deep learning obtains little achievement in the field of astrology classification, when a deep learning method trains a model, the deep learning method strongly depends on data labels of a training set, but in reality, labeling astronomical images with labels is a highly professional work, and a large amount of time cost of experts is consumed, and in addition, the astronomical images are labeled by using manpower, so that artificial prejudices to the astronomical images are introduced to a certain extent, and the artificial prejudices are often difficult to find by people.
Disclosure of Invention
The purpose of the invention is as follows: with the arrival of the astronomical big data era, astronomical images grow exponentially, a large amount of labeled astronomical data obtained in a short time becomes unrealistic, and a plurality of defects exist in a supervised machine learning technology at the moment.
To implement the above-described invention, several core problems must be solved: (1) most of data considered in the existing astronomical image classification method need to be marked manually, for example, a Galaxy zoo data set is published on a website, astronomical enthusiasts are enabled to finish classification work together in a crowdsourcing mode, and the crowdsourcing mode is time-consuming and can bring human bias; (2) unsupervised classification methods are mostly limited to clustering models. Since astronomical images are high-dimensional image data, the data is often presented as three-dimensional images, and in the face of a large amount of high-dimensional data, a general clustering model may be caught in dimension cursing or it may be difficult to process such large-scale, high-latitude astronomical image data.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
in one aspect, the invention provides an unsupervised machine learning-based astronomical image automatic classification method, which comprises the following steps:
preprocessing astronomical image data to be classified;
inputting preprocessed astronomical image data into a trained convolution self-coding network model, and extracting astronomical image features to obtain an astronomical image feature set;
inputting the acquired astronomical image feature set into an image feature clustering model, and outputting the probability that each astronomical image belongs to each clustering cluster;
manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and screening the classes through a threshold value to finish the classification.
Further, the convolutional self-coding network model comprises an encoder and a decoder, wherein the encoder comprises an input layer, a first convolutional layer, a first attention module, a first downsampling layer, a second convolutional layer, a second attention module, a second downsampling layer, a third convolutional layer, a third attention module, a third downsampling layer, a fourth convolutional layer, a fourth attention module and a fourth downsampling layer which are sequentially connected; the decoder comprises a first hidden layer, a second hidden layer, a third hidden layer, a fourth hidden layer, a fifth hidden layer, a reshape layer, a fifth convolutional layer, a first upsampling layer, a sixth convolutional layer, a second upsampling layer, a seventh convolutional layer, a third upsampling layer, an eighth convolutional layer and an output layer which are sequentially connected, wherein each convolutional layer uses a ReLU activation function, and the fourth convolutional layer and the eighth convolutional layer use a Sigmoid activation function.
Further, the first convolutional layer contains 128 convolution kernels of 4 × 4, the second convolutional layer contains 64 convolution kernels of 4 × 4, the third convolutional layer contains 32 convolution kernels of 3 × 3, the fourth convolutional layer contains 16 convolution kernels of 3 × 3, the fifth convolutional layer contains 32 convolution kernels of 3 × 3, the sixth convolutional layer contains 32 convolution kernels of 3 × 3, the seventh convolutional layer contains 64 convolution kernels of 4 × 4, the eighth convolutional layer contains 3 convolution kernels of 4 × 4, the first lower sampling layer, the second lower sampling layer, the third lower sampling layer, the fourth lower sampling layer, the first upper sampling layer, the second upper sampling layer and the third upper sampling layer are all 2, the first hidden layer contains 128 neuron nodes, the second hidden layer contains 64 neuron nodes, the third hidden layer contains 32 neuron nodes, and the fourth hidden layer contains 64 neuron nodes, the fifth hidden layer contains 128 neuron nodes.
Further, the training method of the convolutional self-coding network model comprises the following steps:
acquiring a set of astronomical image data;
preprocessing a group of acquired astronomical image data;
and (3) performing parameter optimization through an Adam optimization algorithm by taking preprocessed astronomical image data as a training data set, taking a cross entropy loss function as an objective function and taking the objective function value tending to be minimum as a target, and training to obtain the convolutional self-coding network model.
Further, the preprocessing the acquired set of astronomical image data comprises: and performing center point cutting, random cutting and dimension reduction on each astronomical image in the set of astronomical image data.
Further, the preprocessing the acquired set of astronomical image data further comprises: and when the number of astronomical image samples of a certain astronomical image corresponding to the category after the central point is cut and dimensionality reduced is insufficient, randomly overturning the astronomical image obtained after the central point is cut and dimensionality reduced to obtain a new astronomical image, and adding the new astronomical image into the training data set.
Further, the image feature clustering model is built by adopting a gaussian mixture model, the number of clustered components is determined, the obtained feature set of the astronomical images is input into the image feature clustering model, and the probability that each astronomical image belongs to each clustering cluster is output, and the method comprises the following steps:
s1, for each feature data in the feature set of the input astronomical image, estimating the probability that each feature data is generated by each Component according to the following formula:
Figure BDA0003183353100000051
wherein γ (i, k) represents the probability that the ith feature data is generated by the kth Component; featurepiThe ith feature data in the feature set of the input astronomical image is represented; mu.skSum-sigmakParameter of kth Component; pikRepresents the mixing coefficient of the kth Component; k represents the number of Components; pijRepresents the mixing coefficient of the jth Component, wherein
Figure BDA0003183353100000052
N () represents a gaussian distribution; mu.sjSum-sigmajIs the parameter of jth Component;
s2, based on the estimated γ (i, k), obtains the parameter value of the kth Component corresponding to the maximum likelihood according to the following equation:
Figure BDA0003183353100000061
Figure BDA0003183353100000062
the iterations S1 and S2 are repeated until the values of the likelihood functions converge, and a probability model is used to derive the probability that each feature data belongs to each cluster.
Further, the manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class includes:
and classifying different clustering clusters of the clustered astronomical images according to experience by experts to obtain the probability of each clustering cluster belonging to each class.
Further, the multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class includes:
and multiplying the probability matrix of each astronomical image belonging to each cluster with the probability matrix of each cluster belonging to each class to obtain the probability of each astronomical image belonging to each class.
In another aspect, the present invention provides an unsupervised machine learning-based astronomical image automatic classification system, including:
the preprocessing module is configured to preprocess astronomical image data to be classified;
the convolution self-coding network model is configured to take preprocessed astronomical image data as input, extract astronomical image features and obtain an astronomical image feature set;
the image feature clustering model is configured to take the extracted astronomical image feature set as input and output the probability that each astronomical image belongs to each clustering cluster;
the artificial scoring module is configured to manually score the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and the image characteristic classification model is configured to multiply the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and then the classification is finished through threshold value screening.
Has the advantages that: compared with the prior art, the astronomical image automatic classification method and system based on unsupervised machine learning provided by the invention have the following advantages:
(1) because the problem of astronomical image classification is solved at present, a deep learning method usually needs large-scale labeled astronomical image data as a training set, and if a large number of astronomical images are labeled by manpower, astronomical workers pay a large amount of time for the labeling, the method can directly learn and extract the characteristics of the unlabeled astronomical images, and then use the characteristics for further classification, so that the manual labeling cost is greatly reduced, and the efficiency is improved;
(2) the convolutional self-encoder based on the convolutional neural network is provided, the convolutional advantage is given to the high-dimensional characteristic of image data, and the accuracy of astronomical image classification is effectively improved.
Drawings
FIG. 1 is a flow chart of an unsupervised machine learning-based astronomical image automatic classification method according to an embodiment of the present invention;
FIG. 2 is a flow chart of astronomical image preprocessing in an embodiment of the present invention;
fig. 3 is a structural diagram of a convolutional self-coding network model in an embodiment of the present invention.
Detailed Description
The invention is further described with reference to specific examples. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As described above, with the advent of the astronomical big data era, astronomical pictures will grow exponentially, and it will become less practical to obtain a large amount of labeled astronomical data in a short time, and it is difficult to directly classify astronomical images accurately without labels.
To this end, in one embodiment, the invention provides an unsupervised machine learning-based astronomical image automatic classification method. As illustrated in fig. 1, the method comprises:
step 1, preprocessing astronomical image data;
since galaxyoo data, which is a high-dimensional color picture, is used in the embodiment of the present invention, it needs to be subjected to some preprocessing.
And preprocessing the astronomical image data by using an image processing algorithm, such as central point cutting, turning and the like, preliminarily reducing the dimensionality of the astronomical image, and performing data amplification.
As shown in fig. 2, the astronomical image preprocessing algorithm specifically includes:
inputting: set of astronomical image data T ═ { P ═ P1,P2,P3,…,Pn},PiRepresenting the ith astronomical image sample
And (3) outputting: preprocessed astronomical image data set
a1. Traversing the astronomical images, setting a loop variable i from 1 to n, wherein n represents the total number of the astronomical images, and initially setting i to 1;
a2. traversing each astronomical image sample, and traversing the traversed image sample PiCenter crop and replace PiJump to a 3;
a3. further adjusting PiSize of (1), random clipping PiAnd replace PiThen, P is addediAdjusting the size of the P-shaped material to a uniform size, and adjusting the size of the P-shaped material to a uniform sizeiAdding the data into the preprocessed data set, and jumping to a 4;
a4. when P is presentiWhen the number of astronomical image samples of the corresponding category is not enough, jumping to a5, otherwise, jumping to a 6;
a5. randomly rotating the resized P of a3iTo obtain new PiAnd new PiAdding the data into the preprocessed data set, and jumping to a 4;
a6. executing i-i + 1;
a7. when i < n, jumping to a2, otherwise finishing the preprocessing of the astronomical image.
Step 2, inputting preprocessed astronomical image data into a trained convolutional self-coding network model, and extracting astronomical image features to obtain an astronomical image feature set;
in order to further reduce the dimensionality of the astronomical image data and obtain effective astronomical image data features, the embodiment builds a convolutional self-coding network model to train the astronomical image data, and finally, the output features of a certain hidden layer in the model can be used for representing the picture features input into the model.
Extracting astronomical image features by using a convolutional self-coding network model, comprising the following steps:
b1. based on convolutionA neural network for constructing the encoder part of the convolution self-encoder and inputting the preprocessed astronomical image PiExtracting features to form a low-dimensional feature vector featurep
b2. Based on the convolutional neural network, a decoder part of a convolutional self-encoder is built, and feature vector feature is up-sampledpFeature vector feature of low dimensionpReduction to PiDimension, output P'i
b3. Building a convolution self-coding network model:
inputting: astronomical image P after preprocessingiSpliced three-channel astronomical image input N ═ P1,P2,P3,...,Pn}
And (3) outputting: astronomical image feature set
The method specifically comprises the following steps:
1.1) constructing an encoder through an input layer, a convolution layer, a down-sampling layer, a full-connection layer, a flat layer and the like;
1.2) extracting, by an encoder, each sample P in an input vector NiFeature ofp
1.3) constructing a decoder through a Reshape layer, a convolutional layer, an upper sampling layer, a full connection layer and the like;
1.4) feature extraction Using encoderpFeature vector feature of low dimensionpReduction to PiDimension, output P'i
As shown in fig. 3, the specific structure of the convolutional self-coding network model includes:
first part (input layer): the input data is a preprocessed astronomical image and comprises picture characteristics of three channels of red, green and blue, the input dimension is 12288, and the data with the dimension of 64 x 3 is output after reshape;
second part (first buildup layer): the method comprises the steps that a convolution layer containing 128 convolution kernels is processed by a ReLU activation function, and data with the dimension of 64 x 128 are obtained;
third section (first attention module): the attention module comprises a channel attention mechanism of attention channel characteristics and a space attention mechanism of attention space characteristics, and the input and the output of the attention module are data with the dimension of 64 × 128;
fourth section (first downsampling layer): is a down-sampling layer with the size of 2 x 2, and data with the dimension of 32 x 128 is obtained;
fifth part (second convolution layer): the method comprises the steps that a convolution layer containing 64 convolution kernels is processed by a ReLU activation function, and data with the dimension of 32 x 64 are obtained;
sixth section (second attention module): the attention module comprises a channel attention mechanism of attention channel characteristics and a space attention mechanism of attention space characteristics, and the input and the output of the attention module are data with the dimension of 32 × 64;
seventh section (second downsampling layer): is a down-sampling layer with the size of 2 x 2, and data with the dimension of 16 x 64 is obtained;
eighth part (third convolution layer): the method comprises the steps that a convolution layer containing 32 convolution kernels is processed by a ReLU activation function, and data with the dimension of 16 × 32 are obtained;
ninth section (third attention module): the attention module comprises a channel attention mechanism of attention channel characteristics and a space attention mechanism of attention space characteristics, and the input and the output of the attention module are data with the dimension of 16 × 32;
tenth section (third downsampling layer): is a down-sampling layer with the size of 2 x 2, and data with the dimension of 8 x 32 is obtained;
eleventh part (fourth convolution layer): the method comprises the steps that a convolution layer containing 16 convolution kernels is processed by a ReLU activation function, and data with the dimension of 8 x 16 are obtained;
twelfth section (fourth attention module): the attention module comprises a channel attention mechanism of attention channel characteristics and a space attention mechanism of attention space characteristics, and the input and the output of the attention module are data with the dimension of 8 × 16;
thirteenth section (fourth downsampling layer): is a down-sampling layer with the size of 2 x 2, and data with the dimension of 4 x 16 is obtained;
fourteenth section (first hidden layer): the data is a hidden layer containing 128 neuron nodes, and data with dimension of 128 is obtained through ReLU activation function processing;
fifteenth section (second hidden layer): the data is a hidden layer containing 64 neuron nodes, and data with dimension of 64 is obtained through ReLU activation function processing;
sixteenth part (third hidden layer): the data processing method comprises the following steps that a hidden layer containing 32 neuron nodes is used, and data with dimension of 32 are obtained through ReLU activation function processing;
seventeenth section (fourth hidden layer): the data is a hidden layer containing 64 neuron nodes, and data with dimension of 64 is obtained through ReLU activation function processing;
eighteenth part (fifth hidden layer): the data is a hidden layer containing 128 neuron nodes, and data with dimension of 128 is obtained through ReLU activation function processing;
nineteenth part: is a reshape layer, and obtains data with the dimension of 4 × 8;
twentieth part (fifth convolutional layer): the method comprises the steps that a convolution layer containing 16 convolution kernel points is processed by a ReLU activation function, and data with the dimension of 4 x 16 are obtained;
twenty-first part (first upsampling layer): is an upsampled layer with size 2 x 2, resulting in data with dimension 8 x 16;
twenty-second part (sixth convolutional layer): the method comprises the steps that a convolution layer containing 32 convolution kernels is processed by a ReLU activation function, and data with the dimension of 16 × 32 are obtained;
twenty-third section (second upsampling layer): is an upsampled layer with size 2 x 2, resulting in data with dimension 32 x 32;
twenty-fourth (seventh convolutional layer): the method comprises the steps that a convolution layer containing 64 convolution kernels is processed by a ReLU activation function, and data with the dimension of 32 x 64 are obtained;
twenty-fifth section (third upsampling layer): is an upsampled layer with size 2 x 2, resulting in data with dimension 64 x 64;
twenty-sixth section (eighth convolutional layer): the method comprises the steps that a convolution layer containing 3 convolution kernels is processed by a Sigmoid activation function, and data with the dimension of 64 x 3 are obtained;
a twenty-seventh section: is the output layer, outputs data with dimensions 64 x 3.
1.5) optimization method and loss function
After the model is constructed, the model is trained, wherein the batch size of a training sample is set to be 128, a cross entropy loss function is selected, a ReLU activation function is used in a convolutional layer, a Sigmoid activation function is used in the last layer of an encoder and a decoder, nonlinear transformation is completed by the activation function, an attention mechanism is added after the convolutional layer of the encoder, the capability of extracting features of the model is improved, parameter optimization is carried out through an Adam optimization algorithm, the learning rate is 0.001, attenuation items 1e-08, the momentum is 0.9, and the iteration times are respectively set to be 300 to obtain the optimal model.
b4. And (4) obtaining the convolutional self-encoder network with trained weight through deep learning iterative training network.
Step 3, inputting the astronomical image feature set obtained in the step 2 into an image feature clustering model, and outputting the probability that each astronomical image belongs to each clustering cluster;
establishing an image feature clustering model:
inputting: image feature sample set { featurep1,featurep2,featurep3,...,featurepnWhere n is the total number of samples
And (3) outputting: probability of each astronomical image belonging to each cluster
1.1) building a clustering model based on a Gaussian mixture model, and determining the number of clustered components as K;
1.2) feature of samples in the input sample setpiEstimate featurepiProbability generated by each Component, for each featurepiThe generation probability of the kth Component is:
Figure BDA0003183353100000141
wherein γ (i, k) represents the ithProbability that the feature data is generated by kth Component; featurepiThe ith feature data in the feature set of the input astronomical image is represented; mu.skSum ΣkParameter of kth Component; pikRepresents the mixing coefficient of the kth Component; pijRepresents the mixing coefficient of the jth Component, wherein
Figure BDA0003183353100000142
N () represents a gaussian distribution; mu.sjSum ΣjIs a parameter of jth Component.
Using an iterative approach, μ is assumed when calculating γ (i, k)kSum ΣkThe initial value or the value obtained from the last iteration can be taken as known;
1.3) estimating the parameters of each Component, assuming that the gamma (i, k) obtained in the previous step is the correct "featurepiProbability of Generation by Component k ", considering all data samples, it can be considered that Component has generated γ (1, k) featurep1,γ(2,k)featurep2,...,γ(n,k)featurepnThese points. Since each Component is a standard gaussian distribution, the parameter value corresponding to the maximum likelihood can be found:
Figure BDA0003183353100000143
Figure BDA0003183353100000144
1.4) iterating 1.2) and 1.3) repeatedly until the values of the likelihood functions converge, the probability { p) that each input sample belongs to each cluster can be obtained using a probability model1,p2,p3,...,pn};
Step 4, manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class;
establishing an artificial scoring model of the image clustering cluster:
for the clustered star images, the experts classify different clusters through their own experiences, and the result corresponds to the probability that each cluster belongs to each class, namely { c1,c2,c3,...,cLL represents the number of galaxy image classes;
and 5, multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and screening the classes through a threshold value to finish the classification.
Establishing an image feature classification model:
probability p of each input sample belonging to each cluster obtained in step 31,p2,p3,...,pnAnd the probability of each cluster belonging to each class obtained in step 4, namely { c }1,c2,c3,...,cLMultiplying the two matrixes to obtain the probability that each sample belongs to each class, and then screening the classes through a threshold value to finish the classification work.
In another embodiment, the invention provides an unsupervised machine learning-based astronomical image automatic classification system, comprising:
the preprocessing module is configured to preprocess astronomical image data to be classified;
the convolution self-coding network model is configured to take preprocessed astronomical image data as input, extract astronomical image features and obtain an astronomical image feature set;
the image feature clustering model is configured to take the extracted astronomical image feature set as input and output the probability that each astronomical image belongs to each clustering cluster;
the artificial scoring module is configured to manually score the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and the image characteristic classification model is configured to multiply the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and then the classification is finished through threshold value screening.
Compared with the prior art, the unsupervised astronomical image classification method disclosed by the invention integrates knowledge such as a neural network, a convolution self-encoder, a clustering model and the like, so that the model can learn to extract the characteristics of the astronomical image without any manual label data, the labor cost is reduced, the influence of human bias on the model classification is avoided, and the model can obtain higher classification accuracy under lower calculation cost.
The present invention has been disclosed in terms of the preferred embodiment, but is not intended to be limited to the embodiment, and all technical solutions obtained by substituting or converting equivalents thereof fall within the scope of the present invention.

Claims (10)

1. An unsupervised machine learning-based astronomical image automatic classification method is characterized by comprising the following steps:
preprocessing astronomical image data to be classified;
inputting preprocessed astronomical image data into a trained convolution self-coding network model, and extracting astronomical image features to obtain an astronomical image feature set;
inputting the acquired astronomical image feature set into an image feature clustering model, and outputting the probability that each astronomical image belongs to each clustering cluster;
manually scoring the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and screening the classes through a threshold value to finish the classification.
2. The unsupervised machine learning-based astronomical image automatic classification method is characterized in that the convolutional self-coding network model comprises an encoder and a decoder, wherein the encoder comprises an input layer, a first convolutional layer, a first attention module, a first downsampling layer, a second convolutional layer, a second attention module, a second downsampling layer, a third convolutional layer, a third attention module, a third downsampling layer, a fourth convolutional layer, a fourth attention module and a fourth downsampling layer which are connected in sequence; the decoder comprises a first hidden layer, a second hidden layer, a third hidden layer, a fourth hidden layer, a fifth hidden layer, a reshape layer, a fifth convolutional layer, a first upsampling layer, a sixth convolutional layer, a second upsampling layer, a seventh convolutional layer, a third upsampling layer, an eighth convolutional layer and an output layer which are sequentially connected, wherein each convolutional layer uses a ReLU activation function, and the fourth convolutional layer and the eighth convolutional layer use a Sigmoid activation function.
3. The unsupervised machine learning-based method for automated classification of astronomical images according to claim 2, wherein said first convolutional layer comprises 128 4 x 4 convolution kernels, said second convolutional layer comprises 64 4 x 4 convolution kernels, said third convolutional layer comprises 32 3 x 3 convolution kernels, said fourth convolutional layer comprises 16 3 x 3 convolution kernels, said fifth convolutional layer comprises 32 3 x 3 convolution kernels, said sixth convolutional layer comprises 32 3 convolution kernels, said seventh convolutional layer comprises 64 4 x 4 convolution kernels, said eighth convolutional layer comprises 3 4 x 4 convolution kernels, said first lower sampling layer, said second lower sampling layer, said third lower sampling layer, said fourth lower sampling layer, said first upper sampling layer, said second upper sampling layer, said third upper sampling layer have a size of 2, said first hidden layer comprises 128 neuron nodes, said second hidden layer comprises 64 neuron nodes, the third hidden layer contains 32 neuron nodes, the fourth hidden layer contains 64 neuron nodes, and the fifth hidden layer contains 128 neuron nodes.
4. The unsupervised machine learning-based astronomical image automatic classification method according to claim 1, wherein the training method of the convolutional self-coding network model comprises the following steps:
acquiring a set of astronomical image data;
preprocessing a group of acquired astronomical image data;
and (3) performing parameter optimization through an Adam optimization algorithm by taking preprocessed astronomical image data as a training data set, taking a cross entropy loss function as an objective function and taking the objective function value tending to be minimum as a target, and training to obtain the convolutional self-coding network model.
5. The unsupervised machine learning-based astronomical image automatic classification method according to claim 4, wherein the preprocessing of the acquired set of astronomical image data comprises: and performing center point cutting, random cutting and dimension reduction on each astronomical image in the set of astronomical image data.
6. The unsupervised machine learning-based astronomical image automatic classification method according to claim 5, wherein said preprocessing the acquired set of astronomical image data further comprises: and when the number of astronomical image samples of a certain astronomical image corresponding to the category after the central point is cut and dimensionality reduced is insufficient, randomly overturning the astronomical image obtained after the central point is cut and dimensionality reduced to obtain a new astronomical image, and adding the new astronomical image into the training data set.
7. The unsupervised machine learning-based astronomical image automatic classification method according to claim 1, wherein the image feature clustering model is built by adopting a Gaussian mixture model, the number of clustered components is determined, the obtained astronomical image feature set is input into the image feature clustering model, and the probability that each astronomical image belongs to each clustering cluster is output, and the method comprises the following steps:
s1, for each feature data in the feature set of the input astronomical image, estimating the probability that each feature data is generated by each Component according to the following formula:
Figure FDA0003183353090000031
where γ (i, k) represents the ith feature data from the kth ComponnThe probability of ent generation; featurepiThe ith feature data in the feature set of the input astronomical image is represented; mu.skSum-sigmakParameter of kth Component; pikRepresents the mixing coefficient of the kth Component; k represents the number of Components; pijRepresents the mixing coefficient of the jth Component, wherein
Figure FDA0003183353090000032
N () represents a gaussian distribution; mu.sjSum-sigmajIs the parameter of jth Component;
s2, based on the estimated γ (i, k), obtains the parameter value of the kth Component corresponding to the maximum likelihood according to the following equation:
Figure FDA0003183353090000041
Figure FDA0003183353090000042
the iterations S1 and S2 are repeated until the values of the likelihood functions converge, and a probability model is used to derive the probability that each feature data belongs to each cluster.
8. The unsupervised machine learning-based astronomical image automatic classification method according to claim 1, wherein the manual scoring of the astronomical images which have been clustered into clusters to obtain the probability that each cluster belongs to each class comprises:
and classifying different clustering clusters of the clustered astronomical images according to experience by experts to obtain the probability of each clustering cluster belonging to each class.
9. The unsupervised machine learning-based astronomical image automatic classification method according to claim 1, wherein the multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class comprises:
and multiplying the probability matrix of each astronomical image belonging to each cluster with the probability matrix of each cluster belonging to each class to obtain the probability of each astronomical image belonging to each class.
10. An unsupervised machine learning-based astronomical image automated classification system, comprising:
the preprocessing module is configured to preprocess astronomical image data to be classified;
the convolution self-coding network model is configured to take preprocessed astronomical image data as input, extract astronomical image features and obtain an astronomical image feature set;
the image feature clustering model is configured to take the extracted astronomical image feature set as input and output the probability that each astronomical image belongs to each clustering cluster;
the artificial scoring module is configured to manually score the clustered astronomical images to obtain the probability that each cluster belongs to each class;
and the image characteristic classification model is configured to multiply the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and then the classification is finished through threshold value screening.
CN202110853849.3A 2021-07-28 2021-07-28 Automatic astronomical image classification method and system based on unsupervised machine learning Active CN113723456B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110853849.3A CN113723456B (en) 2021-07-28 2021-07-28 Automatic astronomical image classification method and system based on unsupervised machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110853849.3A CN113723456B (en) 2021-07-28 2021-07-28 Automatic astronomical image classification method and system based on unsupervised machine learning

Publications (2)

Publication Number Publication Date
CN113723456A true CN113723456A (en) 2021-11-30
CN113723456B CN113723456B (en) 2023-10-17

Family

ID=78674118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110853849.3A Active CN113723456B (en) 2021-07-28 2021-07-28 Automatic astronomical image classification method and system based on unsupervised machine learning

Country Status (1)

Country Link
CN (1) CN113723456B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919204A (en) * 2019-02-23 2019-06-21 华南理工大学 A kind of deep learning clustering method towards noise image
WO2020041503A1 (en) * 2018-08-24 2020-02-27 Arterys Inc. Deep learning-based coregistration
CN111582389A (en) * 2020-05-11 2020-08-25 昆明能讯科技有限责任公司 Automatic tower point cloud data classification method based on convolution self-coding network
US20200272857A1 (en) * 2019-02-22 2020-08-27 Neuropace, Inc. Systems and methods for labeling large datasets of physiologial records based on unsupervised machine learning
CN111859978A (en) * 2020-06-11 2020-10-30 南京邮电大学 Emotion text generation method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020041503A1 (en) * 2018-08-24 2020-02-27 Arterys Inc. Deep learning-based coregistration
US20200272857A1 (en) * 2019-02-22 2020-08-27 Neuropace, Inc. Systems and methods for labeling large datasets of physiologial records based on unsupervised machine learning
CN109919204A (en) * 2019-02-23 2019-06-21 华南理工大学 A kind of deep learning clustering method towards noise image
CN111582389A (en) * 2020-05-11 2020-08-25 昆明能讯科技有限责任公司 Automatic tower point cloud data classification method based on convolution self-coding network
CN111859978A (en) * 2020-06-11 2020-10-30 南京邮电大学 Emotion text generation method based on deep learning

Also Published As

Publication number Publication date
CN113723456B (en) 2023-10-17

Similar Documents

Publication Publication Date Title
CN112308158B (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN109615582B (en) Face image super-resolution reconstruction method for generating countermeasure network based on attribute description
CN110298266B (en) Deep neural network target detection method based on multiscale receptive field feature fusion
CN111476713B (en) Intelligent weather image identification method and system based on multi-depth convolution neural network fusion
CN106845401B (en) Pest image identification method based on multi-space convolution neural network
CN108875674B (en) Driver behavior identification method based on multi-column fusion convolutional neural network
CN107526785B (en) Text classification method and device
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN109993100B (en) Method for realizing facial expression recognition based on deep feature clustering
CN110321862B (en) Pedestrian re-identification method based on compact ternary loss
CN106845528A (en) A kind of image classification algorithms based on K means Yu deep learning
CN110991349B (en) Lightweight vehicle attribute identification method based on metric learning
CN110287882A (en) A kind of big chrysanthemum kind image-recognizing method based on deep learning
CN111639719A (en) Footprint image retrieval method based on space-time motion and feature fusion
CN114038037B (en) Expression label correction and identification method based on separable residual error attention network
CN106326925A (en) Apple disease image identification method based on deep learning network
CN109508640A (en) A kind of crowd&#39;s sentiment analysis method, apparatus and storage medium
CN113378962B (en) Garment attribute identification method and system based on graph attention network
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN110688966A (en) Semantic-guided pedestrian re-identification method
CN109934281B (en) Unsupervised training method of two-class network
CN114511849B (en) Grape thinning identification method based on graph attention network
CN115100509B (en) Image identification method and system based on multi-branch block-level attention enhancement network
CN113723456A (en) Unsupervised machine learning-based astronomical image automatic classification method and system
CN110363198A (en) A kind of neural network weight matrix fractionation and combined method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant