CN113723456B - Automatic astronomical image classification method and system based on unsupervised machine learning - Google Patents

Automatic astronomical image classification method and system based on unsupervised machine learning Download PDF

Info

Publication number
CN113723456B
CN113723456B CN202110853849.3A CN202110853849A CN113723456B CN 113723456 B CN113723456 B CN 113723456B CN 202110853849 A CN202110853849 A CN 202110853849A CN 113723456 B CN113723456 B CN 113723456B
Authority
CN
China
Prior art keywords
astronomical
astronomical image
layer
probability
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110853849.3A
Other languages
Chinese (zh)
Other versions
CN113723456A (en
Inventor
邹志强
韩杨
吴家皋
张芷瑞
洪舒欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110853849.3A priority Critical patent/CN113723456B/en
Publication of CN113723456A publication Critical patent/CN113723456A/en
Application granted granted Critical
Publication of CN113723456B publication Critical patent/CN113723456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an astronomical image automatic classification method and system based on unsupervised machine learning. The method comprises the following steps: preprocessing astronomical image data to be classified; inputting the preprocessed astronomical image data into a trained convolution self-coding network model, and extracting astronomical image characteristics to obtain an astronomical image characteristic set; inputting the obtained astronomical image feature set into an image feature cluster model, and outputting the probability that each astronomical image belongs to each cluster; manually scoring the astronomical images clustered to obtain the probability that each cluster belongs to each class; multiplying the probability that each astronomical image belongs to each cluster by the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and finishing classification through threshold screening. The invention can obtain higher astronomical image data classification accuracy rate without data labels at lower cost.

Description

Automatic astronomical image classification method and system based on unsupervised machine learning
Technical Field
The invention relates to an automatic astronomical image classification method and system based on unsupervised machine learning, and belongs to the technical field of astronomical image intelligent processing.
Background
The formation and evolution of the star system is an important scientific problem in astrophysics, and the morphology of the star system is an important reference index in the formation and evolution of the star system, and has strong correlation with a plurality of physical parameters including quality, star formation history and quality distribution. In Galaxy zoo1.0 project, a simple morphology of nearly 90 ten thousand stars from a Stoney digital sky survey was collected, and classification of the stars morphology was completed by thousands of volunteers over several months. With the development of scientific technology, observation equipment is continuously upgraded, and projects such as LSST (Large Synoptic Survey Telescope U.S. large-field space-time sky patrol project), EUCLID (European Union Euclidean space station sky patrol project), CSST (China Space Station Telescope Chinese sky patrol space telescope project) and the like are established, so that people will move to a large-scale sky patrol period, and data sets in astronomical fields will grow at an exponential speed. For example, the Galaxy zoo 2.0 project has collected 1.6 billions of stars from SDSS for determining the morphology of the stars and thus for studying the formation and evolution of the stars, which is not effectively solved by the use of human eye observation methods in the face of large image data sets of the stars, and astronomists have diverted their eyes to an automatic classification method.
In recent years, methods such as machine learning, deep learning and the like have been tried and applied to the field of classification of the morphology of the star system. In 2010, gauci et al proposed a classification model of the morphology of the star system combining a decision tree learning algorithm with a random forest algorithm. In 2015, ferrari et al used Linear Discriminant Analysis (LDA) techniques to classify the morphology of the asterisks. Machine learning algorithms typically require complex feature engineering, requiring exploratory data analysis on a dataset, then transmitting the data to the machine learning algorithm through dimension reduction, while selecting the best features for best experimental results, and astronomists begin to try deep learning to solve the task of classification of the morphology of the astronomical system in order to avoid the complex process of feature engineering.
Hinton in 2006 has proposed a deep learning concept that makes a positive contribution in various fields by constructing a multi-layer artificial neural network. Deep learning performs feature extraction and abstraction on input data through a plurality of nonlinear layers, and then classifies images. Although deep learning achieves not small achievement in the field of astronomical classification, the deep learning method has strong dependence on data labels of training sets when training models, but in reality, marking astronomical images with labels is a piece of work with strong expertise and requires a great deal of time and cost of experts, in addition, manual labeling of astronomical images is used, so that artificial bias on astronomical images is introduced to a certain extent, and is difficult to find by people.
Disclosure of Invention
The invention aims to: with the advent of astronomical big data age, astronomical pictures are exponentially increased, a large amount of tagged astronomical data is obtained in a short time, and at that time, the supervised machine learning technology has a plurality of defects.
To achieve the above summary, several core problems must be solved: (1) Most of the data considered in the existing astronomical image classification method need to be marked manually, for example, galaxy zoo data sets are published on websites, astronomical fans are allowed to finish classification work together in a crowdsourcing mode, and the crowdsourcing mode is time-consuming and brings about artificial bias; (2) the unsupervised classification method is mostly limited to clustering models. Since astronomical images are high-dimensional image data, the data are often presented as three-dimensional images, and in the face of a large amount of high-dimensional data, a general clustering model can sink into a curse of dimensions or can be difficult to process astronomical image data with large scale and high latitude.
The technical scheme is as follows: in order to achieve the above purpose, the invention adopts the following technical scheme:
in one aspect, the invention provides an astronomical image automatic classification method based on unsupervised machine learning, comprising the following steps:
preprocessing astronomical image data to be classified;
inputting the preprocessed astronomical image data into a trained convolution self-coding network model, and extracting astronomical image characteristics to obtain an astronomical image characteristic set;
inputting the obtained astronomical image feature set into an image feature cluster model, and outputting the probability that each astronomical image belongs to each cluster;
manually scoring the astronomical images clustered to obtain the probability that each cluster belongs to each class;
multiplying the probability that each astronomical image belongs to each cluster by the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and finishing classification through threshold screening.
Further, the convolutional self-coding network model comprises an encoder and a decoder, wherein the encoder comprises an input layer, a first convolutional layer, a first attention module, a first downsampling layer, a second convolutional layer, a second attention module, a second downsampling layer, a third convolutional layer, a third attention module, a third downsampling layer, a fourth convolutional layer, a fourth attention module and a fourth downsampling layer which are connected in sequence; the decoder comprises a first hiding layer, a second hiding layer, a third hiding layer, a fourth hiding layer, a fifth hiding layer, a reshape layer, a fifth convolution layer, a first upsampling layer, a sixth convolution layer, a second upsampling layer, a seventh convolution layer, a third upsampling layer, an eighth convolution layer and an output layer which are sequentially connected, wherein each convolution layer uses a ReLU activation function, and the fourth convolution layer and the eighth convolution layer use a Sigmoid activation function.
Further, the first convolution layer contains 128 convolution kernels of 4*4, the second convolution layer contains 64 convolution kernels of 4*4, the third convolution layer contains 32 convolution kernels of 3*3, the fourth convolution layer contains 16 convolution kernels of 3*3, the fifth convolution layer contains 32 convolution kernels of 3*3, the sixth convolution layer contains 32 convolution kernels of 3*3, the seventh convolution layer contains 64 convolution kernels of 4*4, the eighth convolution layer contains 3 convolution kernels of 4*4, the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, the first upsampling layer, the second upsampling layer, and the third upsampling layer are all 2 x 2 in size, the first hidden layer contains 128 neuron nodes, the second hidden layer contains 64 neuron nodes, the third hidden layer contains 32 neuron nodes, the fourth hidden layer contains 64 neuron nodes, and the fifth hidden layer contains 128 neuron nodes.
Further, the training method of the convolution self-coding network model comprises the following steps:
acquiring a set of astronomical image data;
preprocessing the acquired set of astronomical image data;
and taking the preprocessed astronomical image data as a training data set, taking a cross entropy loss function as an objective function, taking the objective function value tending to be minimum as an objective, carrying out parameter optimization through an Adam optimization algorithm, and training to obtain the convolutional self-coding network model.
Further, the preprocessing the acquired set of astronomical image data includes: and performing center point clipping, random clipping and dimension reduction on each astronomical image in the group of astronomical image data.
Further, the preprocessing the acquired set of astronomical image data further includes: when the number of astronomical image samples of a certain astronomical image corresponding category is insufficient after the center point is cut and the dimension is reduced, randomly overturning the astronomical image obtained after the center point is cut and the dimension is reduced to obtain a new astronomical image, and adding the new astronomical image into a training data set.
Further, the image feature clustering model is built by adopting a gaussian mixture model, the number of components of clustering is determined, the obtained astronomical image feature set is input into the image feature clustering model, the probability that each astronomical image belongs to each cluster is output, and the method comprises the following steps:
s1, for each feature data in the input astronomical image feature set, estimating the probability that each feature data is generated by each Component according to the following formula:
where γ (i, k) represents the probability that the ith feature data is generated by the kth Component; feature of feature pi Representing the ith feature data in the input astronomical image feature set; mu (mu) k Sum sigma k Parameters for the kth Component; pi k A mixing coefficient representing the kth Component; k represents the number of components; pi j Representing the mixing coefficient of the j-th Component, whereN () represents a gaussian distribution; mu (mu) j Sum sigma j Parameters for the j-th Component;
s2, based on the estimated gamma (i, k), calculating a parameter value of a kth Component corresponding to the maximum likelihood according to the following formula:
and repeating the iteration S1 and the iteration S2 until the value of the likelihood function converges, and obtaining the probability that each feature data belongs to each cluster by using a probability model.
Further, the manually scoring the astronomical images clustered to obtain the probability that each cluster belongs to each class includes:
and classifying the astronomical images clustered into clusters according to experience by an expert to obtain the probability that each cluster belongs to each class.
Further, the multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class includes:
multiplying the probability matrix of each astronomical image belonging to each cluster by the probability matrix of each cluster belonging to each class to obtain the probability of each astronomical image belonging to each class.
In another aspect, the present invention provides an astronomical image automatic classification system based on unsupervised machine learning, comprising:
the preprocessing module is configured to preprocess astronomical image data to be classified;
the convolution self-coding network model is configured to take the preprocessed astronomical image data as input, extract astronomical image characteristics and obtain an astronomical image characteristic set;
the image feature clustering model is configured to take the extracted astronomical image feature set as input and output the probability that each astronomical image belongs to each cluster;
the artificial scoring module is configured to manually score the astronomical images clustered into clusters to obtain the probability that each cluster belongs to each class;
the image feature classification model is configured to multiply the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and then classification is completed through threshold screening.
The beneficial effects are that: compared with the prior art, the astronomical image automatic classification method and system based on the unsupervised machine learning provided by the invention have the following advantages:
(1) Because the problem of astronomical image classification is solved at present, the deep learning method often needs large-scale labeled astronomical image data as a training set, if a great deal of astronomical images are labeled manually, astronomical students will pay a great deal of time for the purpose, the method can directly learn and extract the characteristics of the unlabeled astronomical images, and then uses the characteristics for further classification, so that the manual labeling cost is greatly reduced, and the efficiency is improved;
(2) The convolutional neural network has the advantages of high astronomical image data dimensionality and multiple features, the pooling layer can reduce the output vector dimensionality layer by layer, and the up-sampling layer can amplify the output vector dimensionality layer by layer.
Drawings
FIG. 1 is a flow chart of an automatic astronomical image classification method based on unsupervised machine learning according to an embodiment of the present invention;
FIG. 2 is a flow chart of preprocessing an image of an antenna according to an embodiment of the present invention;
FIG. 3 is a block diagram of a convolutional self-encoding network model in an embodiment of the invention.
Detailed Description
The invention is further described below in connection with specific embodiments. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
As previously mentioned, with the advent of the astronomical big data age, astronomical pictures will grow exponentially, it will become less practical to obtain large amounts of tagged astronomical data in a short time, and it is difficult to accurately classify astronomical images directly without tags.
To this end, in one embodiment, the present invention provides an astronomical image automatic classification method based on unsupervised machine learning. As shown in fig. 1, the method includes:
step 1, preprocessing astronomical image data;
because Galaxyo data is used in the embodiment of the invention, wherein the image data are all high-dimensional color pictures, certain preprocessing is needed.
The image processing algorithm is used for preprocessing such as center point cutting and overturning of astronomical image data, so that the dimension of astronomical images is primarily reduced, and some data augmentation work is performed.
As shown in fig. 2, the astronomical image preprocessing algorithm specifically includes:
input: astronomical image data t= { P 1 ,P 2 ,P 3 ,…,P n },P i Representing the ith astronomical image sample
And (3) outputting: astronomical image dataset after pretreatment
a1. Traversing the astronomical images, setting a circulation variable i from 1 to n, wherein n represents the total number of the astronomical images, and i=1 in the initial stage;
a2. traversing each astronomical image sample, and traversing the traversed image sample P i Center clipping and P replacement i Jump to a3;
a3. further adjust P i Randomly clipping P i And replace P i Then, P is added again i Adjusting the P to be uniform in size and adjusting the P to be uniform in size i Adding the data into the preprocessed data set, and jumping to a4;
a4. when P i If the number of astronomical image samples in the corresponding category is insufficient, jumping to a5, otherwise jumping to a6;
a5. randomly rotating the resized P in a3 i Obtaining a new P i And will be new P i Adding the data into the preprocessed data set, and jumping to a4;
a6. performing i=i+1;
a7. and when i is less than n, jumping to a2, otherwise, finishing preprocessing of the astronomical image.
Step 2, inputting the preprocessed astronomical image data into a trained convolution self-coding network model, extracting astronomical image characteristics, and obtaining an astronomical image characteristic set;
in order to further reduce the dimensionality of the astronomical image data and obtain effective astronomical image data characteristics, the embodiment builds a convolution self-coding network model to train astronomical image data, and finally, the output characteristics of a certain hidden layer in the middle of the model can be used for representing the picture characteristics input into the model.
Extracting astronomical image features using a convolutional self-encoding network model, comprising:
b1. based on the convolutional neural network, an encoder part of a convolutional self-encoder is built, and an astronomical image P after preprocessing is input i The extracted features form a low-dimensional feature vector feature p
b2. Based on convolutional neural network, constructing decoder part of convolutional self-encoder, up-sampling feature vector feature p Feature vector feature in low dimension p Reduction to P i Dimension, output P' i
b3. Building a convolution self-coding network model:
input: astronomical image P after pretreatment i Three-channel astronomical image input N= { P formed by splicing 1 ,P 2 ,P 3 ,...,P n }
And (3) outputting: astronomical image feature set
The method specifically comprises the following steps:
1.1 The encoder is built through an input layer, a convolution layer, a downsampling layer, a full-connection layer, a flat layer and the like;
1.2 Extracting each sample P in the input vector N by the encoder i Features of (a) p
1.3 A decoder is built through a Reshape layer, a convolution layer, an up-sampling layer, a full connection layer and the like;
1.4 Feature extracted using encoder p Feature vector feature in low dimension p Reduction to P i Dimension, output P' i
As shown in fig. 3, the specific structure of the convolutional self-coding network model includes:
first part (input layer): the input data is an astronomical image after preprocessing, the astronomical image comprises picture characteristics of three channels of red, green and blue, the input dimension is 12288, and the output is data with dimension of 64 x 3 through reshape;
second part (first convolution layer): is a convolution layer containing 128 convolution kernels, and obtains data with dimension of 64 x 128 through ReLU activation function processing;
third part (first attention module): the attention module comprises a channel attention mechanism focusing on channel characteristics and a space attention mechanism focusing on space characteristics, wherein the input and the output are data with the dimension of 64 x 128;
fourth part (first downsampling layer): is a downsampling layer with the size of 2 x 2, and obtains data with the dimension of 32 x 128;
fifth part (second convolution layer): is a convolution layer containing 64 convolution kernels, and obtains data with dimension of 32 x 64 through ReLU activation function processing;
sixth section (second attention module): the attention module comprises a channel attention mechanism focusing on channel characteristics and a space attention mechanism focusing on space characteristics, wherein the input and the output are data with dimensions of 32 x 64;
seventh part (second downsampling layer): is a downsampling layer with the size of 2 x 2, and obtains data with the dimension of 16 x 64;
eighth section (third convolution layer): is a convolution layer containing 32 convolution kernels, and obtains data with dimensions of 16 x 32 through ReLU activation function processing;
ninth section (third attention module): the attention module comprises a channel attention mechanism focusing on channel characteristics and a space attention mechanism focusing on space characteristics, wherein the input and the output are data with dimensions of 16 x 32;
tenth part (third downsampling layer): is a downsampling layer with the size of 2 x 2, and obtains data with the dimension of 8 x 32;
eleventh section (fourth convolution layer): is a convolution layer containing 16 convolution kernels, and obtains data with dimension of 8 x 16 through ReLU activation function processing;
twelfth part (fourth attention module): the attention module comprises a channel attention mechanism focusing on channel characteristics and a space attention mechanism focusing on space characteristics, wherein the input and the output are data with the dimension of 8 x 16;
thirteenth portion (fourth downsampling layer): is a downsampling layer with the size of 2 x 2, and obtains data with the dimension of 4 x 16;
fourteenth portion (first hidden layer): is a hidden layer containing 128 neuron nodes, and obtains data with dimension of 128 through ReLU activation function processing;
fifteenth portion (second hidden layer): is a hidden layer containing 64 neuron nodes, and obtains data with dimension of 64 through ReLU activation function processing;
sixteenth portion (third hidden layer): is a hidden layer containing 32 neuron nodes, and obtains data with dimension of 32 through ReLU activation function processing;
seventeenth portion (fourth hidden layer): is a hidden layer containing 64 neuron nodes, and obtains data with dimension of 64 through ReLU activation function processing;
eighteenth part (fifth hidden layer): is a hidden layer containing 128 neuron nodes, and obtains data with dimension of 128 through ReLU activation function processing;
nineteenth portion: is a reshape layer, and obtains data with the dimension of 4 x 8;
twentieth part (fifth convolution layer): is a convolution layer containing 16 convolution core points, and obtains data with the dimension of 4 x 16 through the treatment of a ReLU activation function;
twentieth part (first upsampling layer): an up-sampling layer with the size of 2 x 2, and obtaining data with the dimension of 8 x 16;
twenty-second part (sixth convolution layer): is a convolution layer containing 32 convolution kernels, and obtains data with dimensions of 16 x 32 through ReLU activation function processing;
twenty-third part (second upsampling layer): is an up-sampling layer with the size of 2 x 2, and obtains data with the dimension of 32 x 32;
twenty-fourth part (seventh convolution layer): is a convolution layer containing 64 convolution kernels, and obtains data with dimension of 32 x 64 through ReLU activation function processing;
twenty-fifth part (third upsampling layer): is an up-sampling layer with the size of 2 x 2, and obtains data with the dimension of 64 x 64;
twenty-first portion (eighth convolutional layer): is a convolution layer containing 3 convolution kernels, and is processed by a Sigmoid activation function to obtain data with the dimension of 64 x 3;
twenty-seventh part: is an output layer, outputting data with dimension of 64×64×3.
1.5 Optimization method and loss function
After the model is constructed, the model is trained, wherein the batch size of training samples is set to 128, a cross entropy loss function is selected, a ReLU activation function is used in a convolution layer, a Sigmoid activation function is used in the last layer of an encoder and a decoder, nonlinear transformation is completed by the activation function, a attention mechanism is added after the encoder convolution layer, the capability of extracting characteristics of the model is improved, parameter optimization is carried out through an Adam optimization algorithm, the learning rate is 0.001, the attenuation items 1e-08 and the momentum are 0.9, and the iteration times are respectively set to 300 to obtain the optimal model.
b4. And obtaining the convolutional self-encoder network with trained weights through the deep learning iterative training network.
Step 3, inputting the astronomical image feature set obtained in the step 2 into an image feature clustering model, and outputting the probability that each astronomical image belongs to each cluster;
establishing an image feature clustering model:
input: image feature sample set { feature ] p1 ,feature p2 ,feature p3 ,...,feature pn Where n is the total number of samples
And (3) outputting: probability of each astronomical image belonging to each cluster
1.1 Building a clustering model based on the Gaussian mixture model, and determining the number of components of the clusters as K;
1.2 For sample features in the input sample set pi Estimating feature pi Probabilities generated by each Component, for each feature pi The probability of generation of the k-th Component is:
where γ (i, k) represents the probability that the ith feature data is generated by the kth Component; feature of feature pi Representing the ith feature data in the input astronomical image feature set; mu (mu) k Sum sigma k Parameters for the kth Component; pi k A mixing coefficient representing the kth Component; pi j Representing the mixing coefficient of the j-th Component, whereN () represents a gaussian distribution; mu (mu) j Sum sigma j Is the parameter of the j-th Component.
By iterative method, μ is assumed when computing γ (i, k) k Sum sigma k All known values can be taken as initial values or values obtained in the last iteration;
1.3 Estimating the parameters of each Component, assuming that the gamma (i, k) obtained in the previous step is the correct "feature pi The probability "generated by Component k, considering all data samples, can be considered as Component generating γ (1, k) features p1 ,γ(2,k)feature p2 ,...,γ(n,k)feature pn These points. Since each Component is a standard gaussian distribution, the parameter value corresponding to the maximum likelihood can be obtained:
1.4 Repeating the iterations 1.2) and 1.3) until the value of the likelihood function converges, the probability { p } of each input sample belonging to each cluster can be obtained using a probability model 1 ,p 2 ,p 3 ,...,p n };
Step 4, manually scoring the astronomical images clustered into clusters to obtain the probability that each cluster belongs to each class;
establishing a manual scoring model of the image cluster:
for the already clustered star images, the expert classifies the different clusters by their own experience, the result of which corresponds to the probability that each cluster belongs to each class, i.e. { c } 1 ,c 2 ,c 3 ,...,c L -wherein L represents the number of the star image categories;
and 5, multiplying the probability that each astronomical image belongs to each cluster by the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and finishing classification through threshold screening.
Establishing an image feature classification model:
for each input sample obtained in step 3, belonging to the probability of each clusterRate { p 1 ,p 2 ,p 3 ,...,p n Probability that each cluster belongs to each class obtained in step 4, namely { c } 1 ,c 2 ,c 3 ,...,c L And multiplying the two matrixes to obtain the probability that each sample belongs to each class, and completing the classification work through threshold value screening.
In another embodiment, the present invention provides an astronomical image automated classification system based on unsupervised machine learning, comprising:
the preprocessing module is configured to preprocess astronomical image data to be classified;
the convolution self-coding network model is configured to take the preprocessed astronomical image data as input, extract astronomical image characteristics and obtain an astronomical image characteristic set;
the image feature clustering model is configured to take the extracted astronomical image feature set as input and output the probability that each astronomical image belongs to each cluster;
the artificial scoring module is configured to manually score the astronomical images clustered into clusters to obtain the probability that each cluster belongs to each class;
the image feature classification model is configured to multiply the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and then classification is completed through threshold screening.
Compared with the prior art, the method for classifying the unsupervised astronomical image fuses the knowledge of a neural network, a convolution self-encoder, a clustering model and the like, so that the model self learns to extract the characteristics of the astronomical image without any manual label data, the labor cost is reduced, the influence of the artificial bias on the classification of the model is avoided, and the model can obtain higher classification accuracy under lower calculation cost.
The present invention has been disclosed in the preferred embodiments, but the invention is not limited thereto, and the technical solutions obtained by adopting equivalent substitution or equivalent transformation fall within the protection scope of the present invention.

Claims (8)

1. An automatic astronomical image classification method based on unsupervised machine learning is characterized by comprising the following steps:
preprocessing astronomical image data to be classified;
inputting the preprocessed astronomical image data into a trained convolution self-coding network model, and extracting astronomical image characteristics to obtain an astronomical image characteristic set;
inputting the obtained astronomical image feature set into an image feature cluster model, and outputting the probability that each astronomical image belongs to each cluster;
manually scoring the astronomical images clustered to obtain the probability that each cluster belongs to each class;
multiplying the probability that each astronomical image belongs to each cluster by the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and finishing classification through threshold screening;
the convolution self-coding network model comprises an encoder and a decoder, wherein the encoder comprises an input layer, a first convolution layer, a first attention module, a first downsampling layer, a second convolution layer, a second attention module, a second downsampling layer, a third convolution layer, a third attention module, a third downsampling layer, a fourth convolution layer, a fourth attention module and a fourth downsampling layer which are connected in sequence; the decoder comprises a first hiding layer, a second hiding layer, a third hiding layer, a fourth hiding layer, a fifth hiding layer, a reshape layer, a fifth convolution layer, a first upsampling layer, a sixth convolution layer, a second upsampling layer, a seventh convolution layer, a third upsampling layer, an eighth convolution layer and an output layer which are sequentially connected, wherein each convolution layer uses a ReLU activation function, and the fourth convolution layer and the eighth convolution layer use a Sigmoid activation function;
the image feature clustering model is built by adopting a Gaussian mixture model, the number of components of clustering is determined, the obtained astronomical image feature set is input into the image feature clustering model, the probability that each astronomical image belongs to each cluster is output, and the method comprises the following steps:
s1, for each feature data in the input astronomical image feature set, estimating the probability that each feature data is generated by each Component according to the following formula:
where γ (i, k) represents the probability that the ith feature data is generated by the kth Component; feature of feature pi Representing the ith feature data in the input astronomical image feature set; mu (mu) k Sum sigma k Parameters for the kth Component; pi k A mixing coefficient representing the kth Component; k represents the number of components; pi j Representing the mixing coefficient of the j-th Component, whereN () represents a gaussian distribution; mu (mu) j Sum sigma j Parameters for the j-th Component;
s2, based on the estimated gamma (i, k), calculating a parameter value of a kth Component corresponding to the maximum likelihood according to the following formula:
and repeating the iteration S1 and the iteration S2 until the value of the likelihood function converges, and obtaining the probability that each feature data belongs to each cluster by using a probability model.
2. The method of claim 1, wherein the first convolution layer comprises 128 convolution kernels of 4*4, the second convolution layer comprises 64 convolution kernels of 4*4, the third convolution layer comprises 32 convolution kernels of 3*3, the fourth convolution layer comprises 16 convolution kernels of 3*3, the fifth convolution layer comprises 32 convolution kernels of 3*3, the sixth convolution layer comprises 32 convolution kernels of 3*3, the seventh convolution layer comprises 64 convolution kernels of 4*4, the eighth convolution layer comprises 3 convolution kernels of 4*4, the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, the first upsampling layer, the second upsampling layer, the third upsampling layer each have a size of 2 x 2, the first hidden layer comprises 128 neuron nodes, the second hidden layer comprises 64 neuron nodes, the third hidden layer comprises 32 neuron nodes, the fourth hidden layer comprises 64 neuron nodes, and the fifth hidden layer comprises 128 neuron nodes.
3. An automatic astronomical image classification method based on unsupervised machine learning according to claim 1, characterized in that the training method of the convolutional self-coding network model comprises:
acquiring a set of astronomical image data;
preprocessing the acquired set of astronomical image data;
and taking the preprocessed astronomical image data as a training data set, taking a cross entropy loss function as an objective function, taking the objective function value tending to be minimum as an objective, carrying out parameter optimization through an Adam optimization algorithm, and training to obtain the convolutional self-coding network model.
4. An automated astronomical image classification method based on unsupervised machine learning according to claim 3, characterized in that said preprocessing of the acquired set of astronomical image data comprises: and performing center point clipping, random clipping and dimension reduction on each astronomical image in the group of astronomical image data.
5. An automated method of classifying an astronomical image based on unsupervised machine learning according to claim 4, wherein the preprocessing of the acquired set of astronomical image data further comprises: when the number of astronomical image samples of a certain astronomical image corresponding category is insufficient after the center point is cut and the dimension is reduced, randomly overturning the astronomical image obtained after the center point is cut and the dimension is reduced to obtain a new astronomical image, and adding the new astronomical image into a training data set.
6. The automatic classification method of astronomical images based on unsupervised machine learning according to claim 1, wherein the manually scoring astronomical images clustered into clusters to obtain the probability that each cluster belongs to each class comprises:
and classifying the astronomical images clustered into clusters according to experience by an expert to obtain the probability that each cluster belongs to each class.
7. The automatic classification method of astronomical images based on the unsupervised machine learning according to claim 1, wherein the multiplying the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class comprises:
multiplying the probability matrix of each astronomical image belonging to each cluster by the probability matrix of each cluster belonging to each class to obtain the probability of each astronomical image belonging to each class.
8. An automatic astronomical image classification system based on unsupervised machine learning, comprising:
the preprocessing module is configured to preprocess astronomical image data to be classified;
the convolution self-coding network model is configured to take the preprocessed astronomical image data as input, extract astronomical image characteristics and obtain an astronomical image characteristic set;
the image feature clustering model is configured to take the extracted astronomical image feature set as input and output the probability that each astronomical image belongs to each cluster;
the artificial scoring module is configured to manually score the astronomical images clustered into clusters to obtain the probability that each cluster belongs to each class;
the image feature classification model is configured to multiply the probability that each astronomical image belongs to each cluster with the probability that each cluster belongs to each class to obtain the probability that each astronomical image belongs to each class, and then classification is completed through threshold screening;
the system is used for implementing the automatic astronomical image classification method based on the unsupervised machine learning in any one of claims 1-7.
CN202110853849.3A 2021-07-28 2021-07-28 Automatic astronomical image classification method and system based on unsupervised machine learning Active CN113723456B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110853849.3A CN113723456B (en) 2021-07-28 2021-07-28 Automatic astronomical image classification method and system based on unsupervised machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110853849.3A CN113723456B (en) 2021-07-28 2021-07-28 Automatic astronomical image classification method and system based on unsupervised machine learning

Publications (2)

Publication Number Publication Date
CN113723456A CN113723456A (en) 2021-11-30
CN113723456B true CN113723456B (en) 2023-10-17

Family

ID=78674118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110853849.3A Active CN113723456B (en) 2021-07-28 2021-07-28 Automatic astronomical image classification method and system based on unsupervised machine learning

Country Status (1)

Country Link
CN (1) CN113723456B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919204A (en) * 2019-02-23 2019-06-21 华南理工大学 A kind of deep learning clustering method towards noise image
WO2020041503A1 (en) * 2018-08-24 2020-02-27 Arterys Inc. Deep learning-based coregistration
CN111582389A (en) * 2020-05-11 2020-08-25 昆明能讯科技有限责任公司 Automatic tower point cloud data classification method based on convolution self-coding network
CN111859978A (en) * 2020-06-11 2020-10-30 南京邮电大学 Emotion text generation method based on deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11481578B2 (en) * 2019-02-22 2022-10-25 Neuropace, Inc. Systems and methods for labeling large datasets of physiological records based on unsupervised machine learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020041503A1 (en) * 2018-08-24 2020-02-27 Arterys Inc. Deep learning-based coregistration
CN109919204A (en) * 2019-02-23 2019-06-21 华南理工大学 A kind of deep learning clustering method towards noise image
CN111582389A (en) * 2020-05-11 2020-08-25 昆明能讯科技有限责任公司 Automatic tower point cloud data classification method based on convolution self-coding network
CN111859978A (en) * 2020-06-11 2020-10-30 南京邮电大学 Emotion text generation method based on deep learning

Also Published As

Publication number Publication date
CN113723456A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN107526785B (en) Text classification method and device
CN110689086B (en) Semi-supervised high-resolution remote sensing image scene classification method based on generating countermeasure network
CN112308158A (en) Multi-source field self-adaptive model and method based on partial feature alignment
CN111476713B (en) Intelligent weather image identification method and system based on multi-depth convolution neural network fusion
CN111126386B (en) Sequence domain adaptation method based on countermeasure learning in scene text recognition
CN106845401B (en) Pest image identification method based on multi-space convolution neural network
US20190228268A1 (en) Method and system for cell image segmentation using multi-stage convolutional neural networks
CN111611924B (en) Mushroom identification method based on deep migration learning model
CN110222634B (en) Human body posture recognition method based on convolutional neural network
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN112347970B (en) Remote sensing image ground object identification method based on graph convolution neural network
CN106845528A (en) A kind of image classification algorithms based on K means Yu deep learning
CN109741341A (en) A kind of image partition method based on super-pixel and long memory network in short-term
CN110321862B (en) Pedestrian re-identification method based on compact ternary loss
CN111401156A (en) Image identification method based on Gabor convolution neural network
CN110263174A (en) - subject categories the analysis method based on focus
CN112232395A (en) Semi-supervised image classification method for generating confrontation network based on joint training
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN114170657A (en) Facial emotion recognition method integrating attention mechanism and high-order feature representation
CN113011436A (en) Traditional Chinese medicine tongue color and fur color collaborative classification method based on convolutional neural network
CN111553424A (en) CGAN-based image data balancing and classifying method
CN115100509B (en) Image identification method and system based on multi-branch block-level attention enhancement network
CN109934281B (en) Unsupervised training method of two-class network
CN113723456B (en) Automatic astronomical image classification method and system based on unsupervised machine learning
CN116681921A (en) Target labeling method and system based on multi-feature loss function fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant