CN106295251A - Phenotypic data analysis and processing method based on unicellular Phenotype data base - Google Patents
Phenotypic data analysis and processing method based on unicellular Phenotype data base Download PDFInfo
- Publication number
- CN106295251A CN106295251A CN201510270838.7A CN201510270838A CN106295251A CN 106295251 A CN106295251 A CN 106295251A CN 201510270838 A CN201510270838 A CN 201510270838A CN 106295251 A CN106295251 A CN 106295251A
- Authority
- CN
- China
- Prior art keywords
- cell
- data
- image
- phenotype
- phenotypic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 13
- 238000003672 processing method Methods 0.000 title claims abstract description 9
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 25
- 238000004458 analytical method Methods 0.000 claims abstract description 6
- 238000012706 support-vector machine Methods 0.000 claims description 22
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 238000001914 filtration Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 8
- 238000010224 classification analysis Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 230000001413 cellular effect Effects 0.000 claims description 2
- 238000003707 image sharpening Methods 0.000 claims description 2
- 238000004611 spectroscopical analysis Methods 0.000 claims 1
- 238000001069 Raman spectroscopy Methods 0.000 abstract description 12
- 238000003703 image analysis method Methods 0.000 abstract description 2
- 210000004027 cell Anatomy 0.000 description 71
- 230000006870 function Effects 0.000 description 8
- 230000003595 spectral effect Effects 0.000 description 7
- 238000007781 pre-processing Methods 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000001237 Raman spectrum Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000010191 image analysis Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 238000010183 spectrum analysis Methods 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 230000006806 disease prevention Effects 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 210000005260 human cell Anatomy 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Landscapes
- Investigating Or Analysing Biological Materials (AREA)
Abstract
本发明涉及一种基于单细胞表现型数据库的表型数据分析处理方法。本发明的主要模块由单细胞表现型数据库和表型数据分析处理方法两部分组成。(1)基于单细胞表现型数据库的细胞图像分析处理方法,该方法通过分析比对表型数据库中细胞图像信息,结合未知细胞图像数据进行比对并找出最佳匹配细胞,从而获取此未知细胞的详细信息。(2)基于单细胞表现型数据库的细胞拉曼数据分析处理方法。该方法通过分析比对表型数据库中细胞拉曼信息,结合未知细胞拉曼数据进行比对并找出最佳匹配细胞,从而获取此未知细胞的详细信息。The invention relates to a method for analyzing and processing phenotype data based on a single-cell phenotype database. The main module of the present invention consists of two parts: single cell phenotype database and phenotype data analysis and processing method. (1) A cell image analysis and processing method based on a single-cell phenotype database. This method analyzes and compares the cell image information in the phenotype database, compares the unknown cell image data, and finds the best matching cell to obtain the unknown cell. Cell details. (2) Analysis and processing method of cell Raman data based on single cell phenotype database. The method obtains the detailed information of the unknown cell by analyzing and comparing the cell Raman information in the phenotype database, combining the Raman data of the unknown cell for comparison and finding the best matching cell.
Description
技术领域technical field
本发明涉及单细胞研究与细胞科学应用领域,具体的说是一种基于单细胞表现型数据库的表型数据分析处理方法。The invention relates to the fields of single cell research and cell science application, in particular to a method for analyzing and processing phenotype data based on a single cell phenotype database.
背景技术Background technique
单个细胞是生命活动的基本单元,地球上所有生物均由单细胞构成或单细胞分化形成。对单细胞进行深入系统的研究不仅可以全景式地揭示生命活动的本质,而且单个细胞的特异性与分化过程对于研究疾病机理和诊断预防疾病等具有重要的意义。“单细胞研究”(针对特定功能的单个细胞的分析)将能够解析生命体系最“深”层次的运作机制,因此能够带来生命科学及其在能源、环境、健康、农业、海洋等广泛应用领域的突破。美国国家健康研究院(NIH)更是于2012年9月启动了“Single Cell Analysis Program”,公布了对26个项目总计9000万美元的资助,主要用于单细胞领域新工具、新技术的开发(http://commonfund.nih.gov/singlecell/fundedresearch.aspx.)。2012年12月21日的Science杂志将单细胞研究推选为2013年最值得关注的六大科学领域之一。A single cell is the basic unit of life activities. All organisms on earth are composed of single cells or differentiated from single cells. In-depth and systematic research on single cells can not only reveal the essence of life activities in a panoramic manner, but also the specificity and differentiation process of single cells is of great significance for the study of disease mechanisms and diagnosis and prevention of diseases. "Single-cell research" (analysis of single cells for specific functions) will be able to analyze the "deepest" operating mechanism of living systems, thus bringing life science and its wide application in energy, environment, health, agriculture, oceans, etc. field breakthrough. The National Institutes of Health (NIH) launched the "Single Cell Analysis Program" in September 2012, announcing a total of 90 million US dollars in funding for 26 projects, mainly for the development of new tools and technologies in the field of single cells (http://commonfund.nih.gov/singlecell/fundedresearch.aspx.). On December 21, 2012, Science magazine selected single-cell research as one of the six most noteworthy scientific fields in 2013.
细胞的表现型也就是细胞的表现形式,即利用整体观测手段可以获得的、反映细胞生长状态的信息。对于单细胞而言,表示它特定的物理外观或成分,如细胞形状、大小、颜色特征、纹理特征、类别等等,都是表现型的例子。其中重要的方法包括显微鉴定单细胞形态,以及利用拉曼光谱仪等设备得到的细胞拉曼光谱信号。对单细胞进行研究,也就是对细胞形状、大小、颜色等相关信息分析和细胞种类的判别,这些均需借助于一个包含不同细胞及不同生长周期的表现型数据库及对应的表型数据分析处理系统来实现。而现阶段在国内很少有相关的表型数据分析处理系统的研究,使得建立一套基于单细胞表现型数据库的表型数据分析处理方法对单个细胞进行研究具有重要的实用价值。The phenotype of the cell is also the form of expression of the cell, that is, the information that can be obtained by using the overall observation method and reflects the growth state of the cell. For a single cell, something that expresses its particular physical appearance or composition, such as cell shape, size, color characteristics, texture characteristics, class, etc., are examples of phenotypes. The important methods include microscopic identification of single cell morphology, and cell Raman spectrum signals obtained by Raman spectrometer and other equipment. The research on single cells, that is, the analysis of cell shape, size, color and other related information and the discrimination of cell types, all of which need to rely on a phenotype database containing different cells and different growth cycles and corresponding phenotype data analysis and processing system to achieve. At present, there are few related studies on phenotype data analysis and processing systems in China, which makes the establishment of a set of phenotype data analysis and processing methods based on single-cell phenotype databases have important practical value for the study of single cells.
发明内容Contents of the invention
针对现有技术中存在的上述不足之处,本发明要解决的技术问题是提供一种基于单细胞表现型数据库的表型数据分析处理方法,通过新一代的细胞分选装备,得到单个或群体细胞(微生物、植物、动物或人体细胞均适用)的表型信息,从而为对这些细胞的组学分析、改造和利用奠定根本的基础。Aiming at the above-mentioned deficiencies in the prior art, the technical problem to be solved in the present invention is to provide a method for analyzing and processing phenotype data based on a single-cell phenotype database, and obtain a single or group cell through a new generation of cell sorting equipment. Phenotype information of cells (applicable to microorganisms, plants, animals or human cells), thus laying a fundamental foundation for the omics analysis, transformation and utilization of these cells.
本发明为实现上述目的所采用的技术方案是:一种基于单细胞表现型数据库的表型数据分析处理方法,包括以下步骤:The technical solution adopted by the present invention to achieve the above object is: a method for analyzing and processing phenotype data based on a single-cell phenotype database, comprising the following steps:
细胞图像分析处理阶段:通过分析比对表型数据库中细胞图像信息和未知细胞图像数据;提取未知细胞的表型特征;Cell image analysis and processing stage: by analyzing and comparing cell image information and unknown cell image data in the phenotype database; extracting phenotypic characteristics of unknown cells;
数据预处理:将提取到的表型特征处理成适合欧式距离算法、KNN算法、支持向量机算法处理的数据;Data preprocessing: process the extracted phenotypic features into data suitable for processing by Euclidean distance algorithm, KNN algorithm, and support vector machine algorithm;
基于单细胞表现型数据库中的特征数据进行分类分析,找出最佳匹配细胞。Classification analysis is performed based on the characteristic data in the single-cell phenotype database to find the best matching cells.
所述数据预处理包括以下步骤:The data preprocessing includes the following steps:
对图像进行灰度变换;Perform grayscale transformation on the image;
进行图像锐化,使灰度反差增强,从而增强图像中边缘信息;Perform image sharpening to enhance the grayscale contrast, thereby enhancing the edge information in the image;
对图像进行平滑滤波,以滤除噪声源;Smoothing and filtering the image to remove noise sources;
查找图像中灰度变化率最大的地方,得到细胞图像的闭合轮廓,进而提取轮廓中的特征。Find the place with the largest grayscale change rate in the image, get the closed contour of the cell image, and then extract the features in the contour.
所述对图像进行平滑滤波采用数字傅里叶滤波,具体为:The image is smoothed and filtered using digital Fourier filtering, specifically:
首先将数据进行快速傅里叶变换,在频率空间与高斯窗函数作用,然后反快速傅里叶变换,得到经带通滤波后的光谱数据。First, fast Fourier transform is performed on the data, and the Gaussian window function is applied to the frequency space, and then the fast Fourier transform is reversed to obtain spectral data after bandpass filtering.
所述基于单细胞表现型数据库中的特征数据进行分类分析,包括训练和判断两个阶段;The classification analysis based on the characteristic data in the single-cell phenotype database includes two stages of training and judgment;
首先,获取已认识细胞表型的典型样本,进行特征提取和数据预处理,获得特征样本对SVM模型进行训练,找到训练样本中的支持向量,确定SVM模型参数;然后,把未知细胞信息输入到已经参数化的SVM模型对其进行判断。First, obtain typical samples of known cell phenotypes, perform feature extraction and data preprocessing, obtain feature samples to train the SVM model, find support vectors in the training samples, and determine the parameters of the SVM model; then, input unknown cell information into The already parameterized SVM model judges it.
本发明具有以下优点及有益效果:收集不同种类单细胞样品,通过构建单细胞表现型数据库系统,借助表型数据分析处理手段,来对未知细胞种类及表型特征进行判别,克服了不能对未知细胞种类进行判别的瓶颈,而且借助于新一代的细胞分选装备可以实现原位、实时对细胞种类进行判别,易于普及市场。本发明的应用将会提速单细胞分析的研究。The present invention has the following advantages and beneficial effects: collecting different types of single-cell samples, by constructing a single-cell phenotype database system, with the help of phenotype data analysis and processing means, to discriminate unknown cell types and phenotypic characteristics, overcoming the inability to identify unknown The bottleneck of cell type discrimination, and with the help of a new generation of cell sorting equipment, in situ and real-time cell type discrimination can be realized, which is easy to popularize in the market. The application of the present invention will speed up the research of single cell analysis.
附图说明Description of drawings
图1、细胞样本特征提取;细胞样本特征提取通过对细胞图像进行灰度变换、边界检测、深度优先搜索等处理后,获取每个细胞的位置,然后根据位置对每个细胞进行特征提取和数据保存,从而方便后续的分类识别等操作。Figure 1. Cell sample feature extraction; cell sample feature extraction is performed on the cell image by grayscale transformation, boundary detection, depth-first search, etc., to obtain the position of each cell, and then perform feature extraction and data analysis on each cell according to the position. Save, so as to facilitate subsequent operations such as classification and identification.
图2、支持向量机的识别过程;SVM根据已认识细胞表型特征进行SVM模型的训练,从而确定SVM模型参数,然后基于模型参数对未知细胞特征信息进行分析处理,从而判断出未知细胞详细信息。Figure 2. The recognition process of the support vector machine; SVM trains the SVM model according to the phenotype characteristics of the known cells to determine the parameters of the SVM model, and then analyzes and processes the characteristic information of the unknown cells based on the model parameters to determine the detailed information of the unknown cells .
具体实施方式detailed description
下面结合附图及实施例对本发明做进一步的详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.
本发明开发了一种基于单细胞表现型数据库的表型数据分析处理方法,收集不同种类单细胞样品,通过构建单细胞表现型数据库系统,借助表型数据分析处理手段,来对未知细胞种类及表型特征进行判别。主要包括以下两个方面内容(以下以微藻样品为对象示例):The present invention develops a method for analyzing and processing phenotypic data based on a single-cell phenotype database, collects different types of single-cell samples, constructs a single-cell phenotype database system, and uses phenotype data analysis and processing means to analyze unknown cell types and Identify phenotypic characteristics. It mainly includes the following two aspects (microalgae samples are used as examples below):
1.基于单细胞表现型数据库的细胞图像分析处理方法。该方法通过分析比对表型数据库中细胞图像信息,结合未知细胞图像数据进行比对并找出最佳匹配细胞,从而获取此未知细胞的详细信息。具体流程如下所示:1. A cell image analysis and processing method based on a single-cell phenotype database. The method obtains the detailed information of the unknown cell by analyzing and comparing the cell image information in the phenotype database, combining the image data of the unknown cell for comparison and finding the best matching cell. The specific process is as follows:
(1)训练样本特征提取和样本集构造(1) Training sample feature extraction and sample set construction
提取细胞表型特征是细胞图像分析处理的关键环节,也是构造样本集的基础。一旦获得细胞表型原始数据,要把原始数据映射到样本空间中的点或者向量。这些数据包含细胞表型各个观测部位生理上的本征表象信息如:视觉,触觉等方面的变化及烈度,表象及其烈度组合是确定未知细胞种类的关键依据。Extracting cell phenotype characteristics is a key link in cell image analysis and processing, and it is also the basis for constructing sample sets. Once the raw data of cell phenotype are obtained, the raw data should be mapped to points or vectors in the sample space. These data include the physiological intrinsic appearance information of each observation part of the cell phenotype, such as changes and intensity of vision, touch, etc. The combination of appearance and intensity is the key basis for determining unknown cell types.
一般来说,原始数据包含了冗余信息,需要经过适当的处理变换以求有效地提取细胞表型特征。将提取到的细胞表型特征的样本数据处理成适合支持向量机处理的数据的过程称为数据预处理。首先对图像进行灰度变换;然后进行图像锐化,目的是使灰度反差增强,从而增强图像中边缘信息,有利于轮廓抽取;进而对图像进行平滑滤波,以滤除噪声源(如电子噪声、光子噪声、斑点噪声和量化噪声等),从而提高图像的信噪比,方便进行图像轮廓的查找;最后查找图像中灰度变化率最大的地方,从而得到细胞图像的闭合轮廓,进而提取轮廓中的特征(图1)。然后可以根据特征值将相应数据存入单细胞表现型数据库中。Generally speaking, raw data contains redundant information, which needs to be processed and transformed in order to effectively extract cell phenotype characteristics. The process of processing the extracted sample data of cell phenotype characteristics into data suitable for support vector machine processing is called data preprocessing. First, the image is gray-scale transformed; then the image is sharpened to enhance the gray-scale contrast, thereby enhancing the edge information in the image, which is conducive to contour extraction; and then smoothing the image to filter out noise sources (such as electronic noise). , photon noise, speckle noise and quantization noise, etc.), so as to improve the signal-to-noise ratio of the image and facilitate the search of the image contour; finally, find the place with the largest gray scale change rate in the image, so as to obtain the closed contour of the cell image, and then extract the contour features in (Figure 1). The corresponding data can then be stored in a single-cell phenotype database based on the characteristic values.
(2)支持向量机的识别过程(2) Recognition process of support vector machine
然后基于数据库中的特征值进行分类分析。目前开发应用的分类算法包括欧式距离算法、KNN算法、支持向量机(SVM)算法。以SVM算法为例,实施由训练和判断两个阶段进行。首先,获取已认识细胞表型的典型样本,进行特征提取和数据预处理,获得特征样本对SVM模型进行训练,找到训练样本中的支持向量,确定SVM模型参数;然后,把未知细胞信息输入到已经参数化的SVM模型对其进行判断,具体流程图如图2所示。Classification analysis is then performed based on the feature values in the database. The classification algorithms currently developed and applied include Euclidean distance algorithm, KNN algorithm, and support vector machine (SVM) algorithm. Taking the SVM algorithm as an example, the implementation is carried out in two stages: training and judgment. First, obtain typical samples of known cell phenotypes, perform feature extraction and data preprocessing, obtain feature samples to train the SVM model, find support vectors in the training samples, and determine the parameters of the SVM model; then, input unknown cell information into The parameterized SVM model judges it, and the specific flow chart is shown in Figure 2.
2.基于单细胞表现型数据库的细胞拉曼数据分析处理方法。该方法通过分析比对表型数据库中细胞拉曼信息,结合未知细胞拉曼数据进行比对并找出最佳匹配细胞,从而获取此未知细胞的详细信息。具体流程如下所示:2. Cellular Raman data analysis and processing method based on single-cell phenotype database. The method obtains the detailed information of the unknown cell by analyzing and comparing the cell Raman information in the phenotype database, combining the Raman data of the unknown cell for comparison and finding the best matching cell. The specific process is as follows:
2.1光谱处理模块2.1 Spectral processing module
从拉曼系统的结构和工作原理等方面出发,影响光谱信号的主要因素主要有以下几个方面:Starting from the structure and working principle of the Raman system, the main factors affecting the spectral signal mainly include the following aspects:
(1)来自光学系统和探测物的干扰信号(1) Interference signals from the optical system and detection objects
光学系统对光谱信号的干扰信号主要是系统的杂散光、象差和无用的次级光谱级引起的假信号等。对于传统的大型光谱仪,光学系统的干扰信号主要是从系统结构和相关的光学元件来进行消除。而激光拉曼光谱仪光谱仪的结构、光学元件和系统集成等方面受到很大的限制,传统方法不可行。只能通过研究这些干扰信号的基本特点,然后再寻求解决的方法。The interference signal of the optical system to the spectral signal is mainly the false signal caused by the system's stray light, aberration and useless secondary spectral level. For traditional large-scale spectrometers, the interference signal of the optical system is mainly eliminated from the system structure and related optical components. However, the structure, optical components and system integration of the laser Raman spectrometer are greatly restricted, and the traditional method is not feasible. Only by studying the basic characteristics of these interference signals, and then seek solutions.
(2)来自电路系统和电源的噪声信号(2) Noise signals from circuit system and power supply
信号采集处理电路的漂移和波动信号,电源噪声信号也是干扰信号的主要来源。特别是以上干扰信号在信号微弱的情况下影响极大,有时可能将有用信号完全淹没,严重影响系统的检测性能。这部分信号的处理,首先是尽可能地提高电路系统和电源的性能,再从信号处理技术方面考虑。The drift and fluctuating signal of the signal acquisition and processing circuit, and the power supply noise signal are also the main sources of the interference signal. In particular, the above interference signals have a great influence when the signal is weak, and sometimes may completely submerge useful signals, seriously affecting the detection performance of the system. The processing of this part of the signal is first to improve the performance of the circuit system and power supply as much as possible, and then consider the signal processing technology.
数字傅里叶滤波(Digital Fourier filtering)预处理方法可以有效地滤除高频噪声和由仪器背景杂噪或基线漂移等原因引起的低频噪声,增加光谱信噪比。数字傅里叶滤波首先将数据进行快速傅里叶变换(FFT),在频率空间与高斯窗函数作用,然后反快速傅里叶变换(IFFT),得到经带通滤波后的光谱数据。高斯函数的均值和标准差分别确定带通滤波器的中心频率和带宽,滤波参数的确定通常采用数值优化方法来实现,以获得最佳滤波效果。Digital Fourier filtering (Digital Fourier filtering) preprocessing method can effectively filter out high-frequency noise and low-frequency noise caused by instrument background noise or baseline drift, etc., and increase the spectral signal-to-noise ratio. Digital Fourier filtering first performs fast Fourier transform (FFT) on the data, acts on the Gaussian window function in the frequency space, and then inverse fast Fourier transform (IFFT) to obtain spectral data after bandpass filtering. The mean and standard deviation of the Gaussian function determine the center frequency and bandwidth of the bandpass filter respectively, and the determination of the filtering parameters is usually realized by numerical optimization method to obtain the best filtering effect.
2.2光谱分析模块2.2 Spectral analysis module
光谱分析模块运用欧氏距离、神经网络、支持向量机三种算法对拉曼光谱进行分析处理。The spectral analysis module uses three algorithms of Euclidean distance, neural network and support vector machine to analyze and process the Raman spectrum.
2.2.1欧式距离2.2.1 Euclidean distance
欧式距离也称欧几里得度量、欧几里得距离,是一个通常采用的距离定义,它是在m维空间中两个点之间的真实距离。在二维空间中的欧氏距离就是两点之间的直线段距离。Euclidean distance, also known as Euclidean metric and Euclidean distance, is a commonly used definition of distance, which is the real distance between two points in m-dimensional space. The Euclidean distance in two-dimensional space is the straight-line distance between two points.
n维欧氏空间是一个点集,它的每个点X可以表示为(x[1],x[2],…,x[n]),其中x[i](i=1,2,…,n)是实数,称为X的第i个坐标,两个点A=(a[1],a[2],…,a[n])和B=(b[1],b[2],…,b[n])之间的距离d(A,B)定义为下面的公式。d(A,B)=sqrt[∑((a[i]-b[i])^2)](i=1,2,…,n)。The n-dimensional Euclidean space is a point set, and each point X of it can be expressed as (x[1], x[2], ..., x[n]), where x[i] (i=1, 2, ..., n) is a real number called the i-th coordinate of X, two points A=(a[1],a[2],...,a[n]) and B=(b[1],b[ 2], . . . , b[n]), the distance d(A, B) between is defined as the following formula. d(A, B)=sqrt[∑((a[i]-b[i])^2)] (i=1, 2, . . . , n).
根据欧式距离算法,对待测细胞的拉曼光谱与数据库中已有的拉曼光谱数据运用欧式距离算法,找出最相近的一组拉曼数据,从而获取其细胞类型等信息。这样对待测细胞种类有一定的参考。According to the Euclidean distance algorithm, the Euclidean distance algorithm is used to find the most similar set of Raman data between the Raman spectrum of the test cell and the existing Raman spectrum data in the database, so as to obtain information such as its cell type. In this way, there is a certain reference for the type of cells to be tested.
2.2.2神经网络2.2.2 Neural network
神经网络是一种模范动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调整内部大量节点之间相互连接的关系,从而达到处理信息的目的。Neural network is an algorithmic mathematical model that models the behavior characteristics of animal neural networks and performs distributed parallel information processing. This kind of network depends on the complexity of the system, and achieves the purpose of processing information by adjusting the interconnection relationship between a large number of internal nodes.
表型数据分析处理系统根据神经网络提供的开发包进行了二次开发,通过调用相应的接口函数实现训练样本训练、拉曼数据分类及显示、结果保存等功能。The phenotype data analysis and processing system has been re-developed according to the development kit provided by the neural network, and realizes the functions of training sample training, Raman data classification and display, and result preservation by calling the corresponding interface functions.
2.2.3支持向量机2.2.3 Support Vector Machine
支持向量机方法是建立在统计学习理论的VC维理论和结构风险最小原理基础上的,根据有限的样本信息在模型的复杂性(即对特定训练样本的学习精度)和学习能力(即无错误地识别任意样本的能力)之间寻求最佳折衷,以求获得最好的推广能力。The support vector machine method is based on the VC dimension theory of statistical learning theory and the principle of structural risk minimization, according to the complexity of the model (that is, the learning accuracy of specific training samples) and the learning ability (that is, error-free The ability to accurately identify any sample) seeks the best compromise in order to obtain the best generalization ability.
表型数据分析处理系统利用支持向量机开发包进行了二次开发,通过调用相应的接口函数实现训练样本训练、拉曼数据分类及显示、结果保存等功能。The phenotypic data analysis and processing system is developed secondaryly by using the support vector machine development kit, and realizes the functions of training sample training, Raman data classification and display, and result preservation by calling the corresponding interface functions.
在图1中,基于单细胞表现型数据库的表型数据分析处理方法基本配置是:Windows XP操作系统,预装MySQL数据库。In Figure 1, the basic configuration of the phenotype data analysis and processing method based on the single-cell phenotype database is: Windows XP operating system, pre-installed MySQL database.
在图2中,支持向量机识别的硬件基本配置是:包含GPGPU(通用并行处理器)运行硬件的超级计算机,CPU至少两个核心,运算速度至少2Ghz以上,内存至少2GB以上,硬盘至少50G以上。CPU、GPGPU和存储之间高速互联。In Figure 2, the basic hardware configuration for support vector machine identification is: a supercomputer with GPGPU (general purpose parallel processor) operating hardware, at least two CPU cores, a computing speed of at least 2Ghz, a memory of at least 2GB, and a hard disk of at least 50G . High-speed interconnection between CPU, GPGPU and storage.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510270838.7A CN106295251A (en) | 2015-05-25 | 2015-05-25 | Phenotypic data analysis and processing method based on unicellular Phenotype data base |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510270838.7A CN106295251A (en) | 2015-05-25 | 2015-05-25 | Phenotypic data analysis and processing method based on unicellular Phenotype data base |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN106295251A true CN106295251A (en) | 2017-01-04 |
Family
ID=57634415
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510270838.7A Pending CN106295251A (en) | 2015-05-25 | 2015-05-25 | Phenotypic data analysis and processing method based on unicellular Phenotype data base |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106295251A (en) |
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109891508A (en) * | 2019-01-29 | 2019-06-14 | 北京大学 | Single cell type detection method, device, equipment and storage medium |
| CN110520876A (en) * | 2017-03-29 | 2019-11-29 | 新克赛特株式会社 | Learning result output device and learning result output program |
| US11358984B2 (en) | 2018-08-27 | 2022-06-14 | Regeneran Pharmaceuticals, Inc. | Use of Raman spectroscopy in downstream purification |
| CN114965420A (en) * | 2022-04-28 | 2022-08-30 | 浙江工业大学 | Rapid detection method for single-cell lipid metabolism phenotype |
| CN116798523A (en) * | 2023-06-01 | 2023-09-22 | 南京金域医学检验所有限公司 | Pattern recognition and judgment system for anti-neutrophil cytoplasmic antibody |
| CN117288661A (en) * | 2023-09-25 | 2023-12-26 | 青岛瑞斯凯尔生物科技有限公司 | Method, medium and system for outputting cell mass removal signal by flow cytometer |
| CN118942085A (en) * | 2024-10-10 | 2024-11-12 | 江苏爱影医疗科技有限公司 | Classification method, system and computer-readable storage medium for tumor tissue slice medical images |
| US12230023B2 (en) | 2015-10-28 | 2025-02-18 | The University Of Tokyo | Analysis device |
| US12235202B2 (en) | 2019-12-27 | 2025-02-25 | Thinkcyte K.K. | Flow cytometer performance evaluation method and standard particle suspension |
| US12259311B2 (en) | 2018-06-13 | 2025-03-25 | Thinkcyte K.K. | Methods and systems for cytometry |
| US12298221B2 (en) | 2020-04-01 | 2025-05-13 | Thinkcyte K.K. | Observation device |
| US12339217B2 (en) | 2020-04-01 | 2025-06-24 | Thinkcyte K.K. | Flow cytometer |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2006006224A (en) * | 2004-06-25 | 2006-01-12 | Hitachi Ltd | Cell tissue culture management method and system |
| US7747547B1 (en) * | 2007-10-31 | 2010-06-29 | Pathwork Diagnostics, Inc. | Systems and methods for diagnosing a biological specimen using probabilities |
| CN103473751A (en) * | 2013-08-14 | 2013-12-25 | 西安理工大学 | CMOS sensor cell image super-resolution reconstruction method based on multiple objects |
| CN104077307A (en) * | 2013-03-29 | 2014-10-01 | 中国科学院青岛生物能源与过程研究所 | Single-cell phenotype database system and search engine |
-
2015
- 2015-05-25 CN CN201510270838.7A patent/CN106295251A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2006006224A (en) * | 2004-06-25 | 2006-01-12 | Hitachi Ltd | Cell tissue culture management method and system |
| US7747547B1 (en) * | 2007-10-31 | 2010-06-29 | Pathwork Diagnostics, Inc. | Systems and methods for diagnosing a biological specimen using probabilities |
| CN104077307A (en) * | 2013-03-29 | 2014-10-01 | 中国科学院青岛生物能源与过程研究所 | Single-cell phenotype database system and search engine |
| CN103473751A (en) * | 2013-08-14 | 2013-12-25 | 西安理工大学 | CMOS sensor cell image super-resolution reconstruction method based on multiple objects |
Non-Patent Citations (6)
| Title |
|---|
| PETRA RÖSCH 等: "Chemotaxonomic Identification of Single Bacteria by Micro-Raman Spectroscopy: Application to Clean-Room-Relevant Biological Contaminations", 《APPLIED AND ENVIRONMENT MICROBIOLOGY》 * |
| 张问银 等: "基于支持向量机的CD4细胞图像识别方法", 《计算机工程与科学》 * |
| 李庆波 等: "应用数字傅里叶滤波方法提高近红外光谱多元校正模型稳健性的研究", 《光谱学与光谱分析》 * |
| 秦颖博 等: "基于支持向量机的尿液细胞图像识别分类研究", 《计算机工程与设计》 * |
| 邹江 等: "红外图像综合处理算法研究", 《电子测试》 * |
| 陈婷: "细胞图像处理及识别技术在生物材料表征领域的研究", 《中国优秀博硕士学位论文全文数据库 (硕士) 信息科技辑》 * |
Cited By (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12230023B2 (en) | 2015-10-28 | 2025-02-18 | The University Of Tokyo | Analysis device |
| CN110520876B (en) * | 2017-03-29 | 2024-05-14 | 新克赛特株式会社 | Learning result output device and learning result output program |
| CN110520876A (en) * | 2017-03-29 | 2019-11-29 | 新克赛特株式会社 | Learning result output device and learning result output program |
| US12259311B2 (en) | 2018-06-13 | 2025-03-25 | Thinkcyte K.K. | Methods and systems for cytometry |
| US11358984B2 (en) | 2018-08-27 | 2022-06-14 | Regeneran Pharmaceuticals, Inc. | Use of Raman spectroscopy in downstream purification |
| US12398176B2 (en) | 2018-08-27 | 2025-08-26 | Regeneron Pharmaceuticals, Inc. | Use of Raman spectroscopy in downstream purification |
| CN109891508B (en) * | 2019-01-29 | 2023-05-23 | 北京大学 | Single cell type detection method, device, apparatus and storage medium |
| CN109891508A (en) * | 2019-01-29 | 2019-06-14 | 北京大学 | Single cell type detection method, device, equipment and storage medium |
| US12235202B2 (en) | 2019-12-27 | 2025-02-25 | Thinkcyte K.K. | Flow cytometer performance evaluation method and standard particle suspension |
| US12298221B2 (en) | 2020-04-01 | 2025-05-13 | Thinkcyte K.K. | Observation device |
| US12339217B2 (en) | 2020-04-01 | 2025-06-24 | Thinkcyte K.K. | Flow cytometer |
| CN114965420A (en) * | 2022-04-28 | 2022-08-30 | 浙江工业大学 | Rapid detection method for single-cell lipid metabolism phenotype |
| CN116798523B (en) * | 2023-06-01 | 2024-07-30 | 南京金域医学检验所有限公司 | Pattern recognition and judgment system for anti-neutrophil cytoplasmic antibody |
| CN116798523A (en) * | 2023-06-01 | 2023-09-22 | 南京金域医学检验所有限公司 | Pattern recognition and judgment system for anti-neutrophil cytoplasmic antibody |
| CN117288661A (en) * | 2023-09-25 | 2023-12-26 | 青岛瑞斯凯尔生物科技有限公司 | Method, medium and system for outputting cell mass removal signal by flow cytometer |
| CN117288661B (en) * | 2023-09-25 | 2025-04-25 | 青岛瑞斯凯尔生物科技股份有限公司 | A method, medium and system for outputting cell removal signal by flow cytometer |
| CN118942085A (en) * | 2024-10-10 | 2024-11-12 | 江苏爱影医疗科技有限公司 | Classification method, system and computer-readable storage medium for tumor tissue slice medical images |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106295251A (en) | Phenotypic data analysis and processing method based on unicellular Phenotype data base | |
| CN114564982B (en) | Automatic identification method of radar signal modulation type | |
| US10671833B2 (en) | Analyzing digital holographic microscopy data for hematology applications | |
| Dirvanauskas et al. | Embryo development stage prediction algorithm for automated time lapse incubators | |
| CN114358279B (en) | Image recognition network model pruning method, apparatus, equipment and storage medium | |
| CN107330412B (en) | A face age estimation method based on deep sparse representation | |
| CN115841110B (en) | A method and system for acquiring scientific knowledge discovery | |
| CN117219176B (en) | A Raman Spectroscopy-Based Bacterial Classification Method and System Based on Contrastive Learning | |
| CN109858386A (en) | A kind of microalgae cell recognition methods based on fluorescence microscope images | |
| Wang et al. | IMAL: an improved meta-learning approach for few-shot classification of plant diseases | |
| CN107045624A (en) | A Method of EEG Signal Preprocessing and Classification Based on Maximum Weighted Clique | |
| CN114841214B (en) | Pulse data classification method and device based on semi-supervised discrimination projection | |
| CN116612335A (en) | Few-sample fine-granularity image classification method based on contrast learning | |
| CN118378070B (en) | An optimization method for epilepsy signal processing | |
| CN104077307B (en) | Unicellular phenotype Database Systems and search engine | |
| Sun et al. | Hyperedge representations with hypergraph wavelets: Applications to spatial transcriptomics | |
| CN119112183B (en) | EEG sentiment analysis method based on deep neural networks | |
| CN117132809B (en) | Semi-supervised medical image classification method based on class prototype matching soft pseudo labels | |
| CN116503854B (en) | A white blood cell recognition method based on deep learning image enhancement | |
| CN113066544B (en) | FVEP characteristic point detection method based on CAA-Net and LightGBM | |
| CN118312864A (en) | A radiation source identification and prediction method based on unsupervised domain adaptation | |
| Zheng et al. | Princut-Auto: An Unsupervised 3D Cell Detection Tool for Embryonic Data | |
| CN120541553B (en) | A brain network analysis method based on multi-scale nucleus attention mechanism | |
| CN121072811B (en) | A method, system, electronic system, and storage device for screening characteristic genes of lung diseases across species based on multivariate machine learning models. | |
| US20260065483A1 (en) | Methods of enhancing multidimensional time series analysis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170104 |