Disclosure of Invention
In view of the above, an object of the present invention is to provide a method and an apparatus for machine learning cell classification based on hyperspectral imaging, in which a hyperspectral image of a cell containing chemical information and two-dimensional image information is obtained by hyperspectral imaging, and cell classification is performed based on the hyperspectral image of the cell, so as to realize automatic high-speed and high-precision classification of living single cells.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, a hyperspectral imaging-based machine learning cell classification method includes the following steps:
performing hyperspectral imaging on cells to obtain three-dimensional cell hyperspectral images with the size of S multiplied by M multiplied by N, wherein S represents different wave numbers, the size of each wave number corresponds to a two-dimensional image with the size of M multiplied by N, the intensity of each pixel in the two-dimensional images represents the signal intensity under the corresponding wave number, and the signal intensity combination of each pixel under the different wave numbers can reflect the chemical characteristics of substances corresponding to the pixels;
preprocessing a cell hyperspectral image and segmenting cells to obtain a cell image block;
and classifying the cell image blocks by using a machine learning model to obtain a cell classification result.
Preferably, the hyperspectral imaging of the cell is performed using visible light spectroscopy, near infrared spectroscopy, raman spectroscopy, stimulated raman spectroscopy, coherent anti-stokes raman spectroscopy, coherent stokes raman spectroscopy, transient absorption spectroscopy, stimulated emission spectroscopy, infrared spectroscopy or fourier transform infrared spectroscopy.
Preferably, when hyperspectral imaging is performed on the cells, the following method is adopted:
obtaining a two-dimensional image through a single-point photoelectric detector and laser scanning or sample scanning; or the like, or, alternatively,
obtaining a two-dimensional image through a one-dimensional array detector and laser scanning or sample scanning; or, a two-dimensional image is obtained through exposure of the two-dimensional array detector and the imaging system;
and forming a three-dimensional cell hyperspectral image by the two-dimensional images under different wave numbers.
Preferably, the preprocessing of the cell hyperspectral image comprises region of interest extraction and noise filtering, binarization processing and image morphology processing.
Preferably, the cell segmentation is performed on the preprocessed image by using a watershed algorithm to obtain a cell image block.
Preferably, the machine learning model employs a support vector machine.
Preferably, before the cell image block classification is performed by using the machine learning model, the machine learning model needs to be subjected to parameter optimization.
In a second aspect, a hyperspectral imaging-based machine learning cell classification device includes:
the image acquisition module is used for performing hyperspectral imaging on the cells to obtain a three-dimensional cell hyperspectral image with the size of S multiplied by M multiplied by N, wherein S represents different wave numbers, each wave number corresponds to a two-dimensional image with the size of M multiplied by N, the intensity of each pixel in the two-dimensional image represents the signal intensity under the corresponding wave number, and the signal intensity combination of each pixel under the different wave numbers can reflect the chemical characteristics of substances corresponding to the pixel;
the image preprocessing module is used for preprocessing the hyperspectral image of the cell and segmenting the cell to obtain a cell image block;
and the cell classification module is used for classifying the cell image blocks by utilizing a machine learning model to obtain a cell classification result.
In a third aspect, a hyperspectral imaging-based machine-learned cell classification apparatus includes a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, wherein a parameter-optimized machine-learned model is stored in the computer memory, and the computer processor executes the computer program to implement the following steps:
preprocessing and cell segmenting the collected hyperspectral image of the cell with the size of S multiplied by M multiplied by N to obtain a cell image block;
and classifying the cell image blocks by using a machine learning model to obtain a cell classification result.
Compared with the prior art, the invention has the beneficial effects that at least:
the hyperspectral imaging-based machine learning cell classification method and device provided by the embodiment of the invention can be used for hyperspectral imaging of cells, so that the obtained hyperspectral image of the cells simultaneously contains two-dimensional morphological information and chemical components of the cells, and then, the cell classification is carried out by combining the chemical components and two-dimensional heart states of the cells, thereby improving the speed and accuracy of cell classification.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
In order to solve the problem of automatic high-speed, high-precision and living body single cell classification which is difficult to realize by using the existing cell sequencing technology, the embodiment of the invention provides a hyperspectral imaging-based machine learning cell classification method and a hyperspectral imaging-based machine learning cell classification device. The method is different from the method for classifying cells by utilizing the morphological characteristics of the traditional single-frame cells, simultaneously combines the chemical components provided by hyperspectral imaging and the multidimensional information of the distribution of the chemical components, realizes quick and high-accuracy single-cell classification by identifying the spatial information and the hyperspectral information in the three-dimensional image, and further obviously improves the accuracy and the efficiency of cell classification.
Fig. 1 is a flowchart of a hyperspectral imaging-based machine learning cell classification method according to an embodiment of the invention. As shown in fig. 1, the hyperspectral imaging-based machine learning cell classification method provided by the embodiment includes the following steps:
step 1, performing hyperspectral imaging on cells to obtain a three-dimensional hyperspectral cell image.
In the embodiment, the prepared cell sample is subjected to hyperspectral imaging to obtain a cell hyperspectral image containing high-dimensional spectral information. As a novel imaging technology with chemical bond selectivity, hyperspectral imaging provides a brand new idea for developing capture of cell phenotypes. Hyperspectral imaging utilizes molecular vibration spectroscopy to superimpose chemical information on traditional two-dimensional image information with another layer of dimensionality-spectral dimensionality-to form hyperspectral image data.
Visible light spectrum, near infrared spectrum, Raman spectrum, stimulated Raman spectrum, coherent anti-Stokes Raman spectrum, coherent Stokes Raman spectrum, transient absorption spectrum, stimulated emission spectrum, infrared spectrum and Fourier transform infrared spectrum can be adopted in hyperspectral imaging.
When hyperspectral imaging is performed on cells, the following method can be adopted: (a) obtaining a two-dimensional image through a single-point photoelectric detector and laser scanning or sample scanning; (b) obtaining a two-dimensional image through a one-dimensional array detector and laser scanning or sample scanning; (c) exposing through a two-dimensional array detector and an imaging system to obtain a two-dimensional image; and forming a three-dimensional cell hyperspectral image by the two-dimensional images under different wave numbers.
The term (a) is to be understood as: detection is performed by a single-point photodetector (e.g., a photodiode) having only one pixel, and in order to achieve two-dimensional detection, either the light source is scanned, i.e., the illumination beam moves in the XY plane to have an opportunity to cover the entire sample, or the sample is scanned, i.e., the sample to be observed moves in the XY plane.
In an embodiment, a stimulated Raman scattering hyperspectral imaging device is used to obtain a hyperspectral image of a cell to be measured, the device comprises a frequency omegapPump laser of and a frequency of omegasThe stokes laser of (2). When the frequency difference between the two lasers matches the vibration frequency of the Raman vibration band, i.e., ωp-ωs=νvibThe intensity of the pump laser will decrease (stimulated raman loss) while the stokes laser increases (stimulated raman gain). By detecting the modulation of one of the narrow band lasers, while the other modulated laser is adjusted in the spectral range, or using a broadband laser, to produce a frequency range covering the region of interest, and detecting the Raman gain (or loss) from that laser by usingThe two narrow-band lasers scan at the interested frequency, or the full spectrum is recorded point by using a broadband laser method, so that the three-dimensional cell hyperspectral image can be obtained.
And obtaining a three-dimensional cell hyperspectral image with the size of S multiplied by M multiplied by N, wherein S represents different wave numbers, the size of the two-dimensional image corresponding to each wave number is M multiplied by N, the intensity of each pixel in the two-dimensional image represents the signal intensity under the corresponding wave number, the combination of the signal intensities of each pixel under different wave numbers can reflect the chemical characteristics of substances corresponding to the pixels, and the chemical characteristics are key factors for cell classification.
And 2, preprocessing the cell hyperspectral image and segmenting the cell to obtain a cell image block.
The three-dimensional cell hyperspectral image contains chemical information of cells, and may also contain complex information components which are not beneficial to cell classification information acquisition, such as system noise and the like. Therefore, how to deal with the increased dimension of chemical information to serve the task of single-cell classification is a major challenge of current technologies. In order to solve the problem, cell hyperspectral image preprocessing is required, the preprocessing comprises at least one of region of interest (ROI) extraction and noise filtering, binarization processing, image morphology processing and the like, and the preprocessing sequence can be adjusted.
Wherein, the average filter can be adopted to filter noise to improve the image quality. Specifically, the image is filtered using a filter (filter) of size m x n,
wherein x is 0,1,2, M-1, y is 0,1,2, N-1, a is (M-1)/2, and b is (N-1)/2.
The binarization process is to convert the grayscale map into a binary image, and in one embodiment, an adaptive threshold method must be used to obtain sufficient information due to the different grayscale intensities of different regions in the sample, rather than using a fixed threshold, the binarization method of Otsu is preferably selected.
Morphological image processing involves binary erosion, dilation, opening, closing, and reconstruction (bounding and reconstruction). The effect of erosion is to "shrink" and "thin" objects in the image, while dilation is used to "grow" and "thicken" objects in the image. Second, morphological opening (morphological opening) is a combined process of erosion and dilation, while morphological closing (morphological closing) continues erosion by using the concept of dilation. In other words, the function of the form opening is to remove, break, shrink connections or objects that do not contain structural elements. Instead, the function of morphological closure is to connect, fill, and establish connections and objects in the image. Morphological processing will continuously apply morphological open fields and morphological closed to achieve ideal images.
The single hyperspectral image obtained contains a plurality of cells which may be from different types and needs to be identified first. Therefore, after the cell hyperspectral image is preprocessed, the foreground and the background of the image need to be distinguished, namely, whether each pixel in the image belongs to the foreground (cell) or the background is distinguished, cell segmentation is realized, cell image blocks are obtained, and each cell image block is used as a sample. In the embodiment, a watershed algorithm (watershed algorithm) is used for cell segmentation, and the specific process is as follows:
(a) a local minimum in the image is found. Each minimum value is assigned a unique label; (b) the priority queues are scanned in sequence from a small value of the variable h to a large value. Then, an element is selected from the first non-empty queue. If all queues of the priority queue are empty, the algorithm terminates. (c) The selected element is removed from the queue and its tag is passed to all unmarked neighbors. (d) All neighbors marked in the previous step are placed in the priority queue and then go back to step (a).
And 3, classifying the cell image blocks by using a machine learning model to obtain a cell classification result.
After the treatment is completed, cell sorting is performed using signals in each cell. In the traditional method for analyzing the single cell type, components in an image are judged mainly through a phasor analysis or clustering method, and then the single cell type is judged according to the position characteristics of different substances through people. Methods for cell classification using machine learning methods include training and applying predictions of machine learning models.
In this embodiment, the machine learning model adopts a Support Vector Machine (SVM), and the support vector machine after parameter optimization is used to classify the cells of the cell image block.
When the parameters of the support vector machine are optimized, an already labeled data set is required to be given as a training set. (x)1,y1),...,(xn,yn),xi∈Rd and yiE (-1, +1), where xiIs a feature vector representation, yiAre known labels of the training components i. The aim of training the support vector machine is to find an optimal hyperplane, wxT+ b is 0, w is the weight vector, x is the input feature vector, and b is the offset. w and b satisfy the following inequality: wxi T+b≥+1if yi=1,wxi T+b≤-1if yiThe training process is to find the most suitable w and b so that the hyperplane can distinguish the boundaries, maximizing the boundary 1/| w | | survival2。
In a non-linear problem, the kernel function can be used to add an extra dimension to the raw data, making it a linear problem in the resulting high dimensional space. Its definition is: k (x, y) ═ K<f(x),f(y)>. Where K is a kernel function, x, y are n-dimensional inputs, and f is used to map the inputs from an n-dimensional space to an m-dimensional space.<x,y>The dot product is represented. By means of the kernel function, a scalar product between two data points in the high-dimensional space can be calculated without explicitly calculating a mapping from the input space to the high-dimensional space. In one embodiment, RBF is used as the kernel function, defined as follows: kRBF(x,y)=exp(-γ||x-y||2)。
During training, a pre-labeled training sample is input into an algorithm for learning, and the training sample can be from a sample obtained in a previous experiment or from part of previous data obtained in a single experiment process, for example, the hyperspectral image obtained in step 1, and part but not all cells in the pre-labeled image. The labeling operation refers to the region and cell type given the segmentation of the cells and the background.
In embodiments, a support vector machine is selected, not requiring too many labeled samples. The typical neural network requires a large amount of sample training to obtain reasonable results, and the support vector machine is free from the problem.
During prediction, cell image blocks obtained by segmentation are input into a machine learning model with optimized parameters, and cell classification results are obtained through calculation.
The hyperspectral imaging-based machine learning cell classification method provided by the embodiment performs hyperspectral imaging on cells, enables the obtained hyperspectral images of the cells to simultaneously contain two-dimensional morphological information and chemical components of the cells, then performs cell classification by combining the chemical components and two-dimensional heart states of the cells, and improves the speed and accuracy of cell classification.
An embodiment also provides a machine learning cell classification device based on hyperspectral imaging, including:
the image acquisition module is used for performing hyperspectral imaging on the cells to obtain a three-dimensional cell hyperspectral image with the size of S multiplied by M multiplied by N, wherein S represents different wave numbers, each wave number corresponds to a two-dimensional image with the size of M multiplied by N, the intensity of each pixel in the two-dimensional image represents the signal intensity under the corresponding wave number, and the signal intensity combination of each pixel under the different wave numbers can reflect the chemical characteristics of substances corresponding to the pixel;
the image preprocessing module is used for preprocessing the hyperspectral image of the cell and segmenting the cell to obtain a cell image block;
and the cell classification module is used for classifying the cell image blocks by utilizing a machine learning model to obtain a cell classification result.
It should be noted that, when the hyperspectral imaging-based machine learning cell classification device provided in the above embodiment performs cell classification, the above division of each functional module is taken as an example, and the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the terminal or the server is divided into different functional modules to complete all or part of the above described functions. In addition, the hyperspectral imaging-based machine learning cell classification device and the hyperspectral imaging-based machine learning cell classification method provided by the embodiment belong to the same concept, and specific implementation processes thereof are detailed in the hyperspectral imaging-based machine learning cell classification method embodiment, and are not repeated here.
Embodiments also provide a hyperspectral imaging based machine-learned cell classification apparatus comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, wherein the computer memory has a parameter-optimized machine learning model stored therein, and the computer processor executes the computer program to perform the steps of:
preprocessing and cell segmenting the collected hyperspectral image of the cell with the size of S multiplied by M multiplied by N to obtain a cell image block;
and classifying the cell image blocks by using a machine learning model to obtain a cell classification result.
In practical applications, the computer memory may be volatile memory at the near end, such as RAM, or may be non-volatile memory, such as ROM, FLASH, floppy disk, mechanical hard disk, etc., or may be a remote storage cloud. The computer processor can be a Central Processing Unit (CPU), a microprocessor unit (MPU), a Digital Signal Processor (DSP) or a Field Programmable Gate Array (FPGA), namely steps of the hyperspectral imaging-based machine learning cell classification method can be realized through the processors.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.