CN106295251A - Phenotypic data analysis and processing method based on unicellular Phenotype data base - Google Patents

Phenotypic data analysis and processing method based on unicellular Phenotype data base Download PDF

Info

Publication number
CN106295251A
CN106295251A CN201510270838.7A CN201510270838A CN106295251A CN 106295251 A CN106295251 A CN 106295251A CN 201510270838 A CN201510270838 A CN 201510270838A CN 106295251 A CN106295251 A CN 106295251A
Authority
CN
China
Prior art keywords
data
cell
image
phenotypic
unicellular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510270838.7A
Other languages
Chinese (zh)
Inventor
任立辉
滕琳
王晓君
苏晓泉
徐健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Institute of Bioenergy and Bioprocess Technology of CAS
Original Assignee
Qingdao Institute of Bioenergy and Bioprocess Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Institute of Bioenergy and Bioprocess Technology of CAS filed Critical Qingdao Institute of Bioenergy and Bioprocess Technology of CAS
Priority to CN201510270838.7A priority Critical patent/CN106295251A/en
Publication of CN106295251A publication Critical patent/CN106295251A/en
Pending legal-status Critical Current

Links

Abstract

The present invention relates to a kind of phenotypic data analysis and processing method based on unicellular Phenotype data base.The main modular of the present invention is made up of unicellular Phenotype data base and phenotypic data analysis and processing method two parts.(1) cell image analysis and processing method based on unicellular Phenotype data base, the method is by cell image information in analyses and comparison phenotypic data storehouse, compare in conjunction with unknown cell image data and find out and most preferably mate cell, thus obtain the details of this unknown cell.(2) cell Raman data analysis and processing method based on unicellular Phenotype data base.The method, by cell Raman information in analyses and comparison phenotypic data storehouse, is compared in conjunction with unknown cell Raman data and is found out and most preferably mate cell, thus obtain the details of this unknown cell.

Description

Phenotypic data analysis and processing method based on unicellular Phenotype data base
Technical field
The present invention relates to unicellular research and cell science application, a kind of based on unicellular The phenotypic data analysis and processing method of Phenotype data base.
Background technology
Individual cells is the elementary cell of vital movement, and on the earth, all biologies are by unicellular composition or slender Born of the same parents are differentiated to form.Go deep into systematic research be possible not only to disclose panoramically vital movement to unicellular Essence, and the specificity of individual cells and atomization are for study of disease mechanism and diagnosis prevention disease etc. Have great importance." unicellular research " (for the analysis of individual cells of specific function) can solve The operating mechanism of analysis life system " deeply " level, therefore, it is possible to bring life sciences and at the energy, ring The breakthrough of the extensively application such as border, health, agricultural, ocean.U.S. national health academy (NIH) is more It is to start " Single Cell Analysis Program " in JIUYUE, 2012, discloses 26 projects total Count the subsidy of 90,000,000 dollars, be mainly used in unicellular field new tool, the exploitation of new technique (http://commonfund.nih.gov/singlecell/fundedresearch.aspx.).On December 21st, 2012 Science magazine unicellular research to be elected be one of six big science fields of meriting attention most for 2013.
That the overall observation method of the Phenotype of cell i.e. the form of expression of cell, i.e. utilization can obtain, The information of reflection cell growth state.For unicellular, represent its specific physical appearance or composition, Such as cell shape, size, color characteristic, textural characteristics, classification etc., it it is all phenotypic example.Its In important method include the unicellular form of microscopical identification, and utilize that the equipment such as Raman spectrometer obtain thin Born of the same parents' raman spectral signal.Study unicellular, namely relevant to cell shape, size, color etc. Information analysis and the differentiation of cell category, these are both needed to comprise different cell and different growth week by means of one The Phenotype data base of phase and the phenotypic data analysis process system of correspondence realize.And present stage is the most very The research of rare relevant phenotypic data analysis process system so that set up a set of based on unicellular Phenotype number According to the phenotypic data analysis and processing method in storehouse individual cells studied and there is important practical value.
Summary of the invention
For above-mentioned weak point present in prior art, the technical problem to be solved in the present invention is to provide one Plant phenotypic data analysis and processing method based on unicellular Phenotype data base, by the cell sorting of a new generation Equipment, obtains the single or phenotype of colony's cell (microorganism, plant, animal or human body cell are all suitable for) Information, thus for these cell group credits are analysed, transformed and utilize and establish basic basis.
The present invention be the technical scheme is that a kind of based on unicellular Phenotype data for achieving the above object The phenotypic data analysis and processing method in storehouse, comprises the following steps:
The cell image analyzing and processing stage: by cell image information and the unknown in analyses and comparison phenotypic data storehouse Cell image data;Extract the phenotypic characteristic of unknown cell;
Data prediction: the phenotypic characteristic extracted is processed into applicable Euclidean distance algorithm, KNN algorithm, The data that algorithm of support vector machine processes;
Carry out classification analysis based on the characteristic in unicellular Phenotype data base, find out and most preferably mate cell.
Described data prediction comprises the following steps:
Image is carried out greyscale transformation;
Carry out image sharpening, make gray scale contrast strengthen, thus strengthen marginal information in image;
Image is carried out smothing filtering, to filter noise source;
Search the place that in image, rate of gray level is maximum, obtain the closed contour of cell image, and then extract Feature in profile.
The described smothing filtering that carries out image uses Digital Fourier filter, particularly as follows:
First data are carried out fast Fourier transform, in frequency space and Gauss function effect, the most instead Fast Fourier transform, obtains the spectroscopic data after bandpass filtering.
Described carry out classification analysis based on the characteristic in unicellular Phenotype data base, including training with sentence Disconnected two stages;
First, obtain the typical sample having recognized cell phenotype, carry out feature extraction and data prediction, obtain Obtain feature samples SVM model is trained, find the support vector in training sample, determine SVM mould Shape parameter;Then, unknown cellular informatics is input to the most parameterized SVM model it is judged.
The present invention has the following advantages and beneficial effect: collect the unicellular sample of variety classes, single by building Cell phenotype Database Systems, by phenotypic data analyzing and processing means, come unknown cell category and table Type feature differentiates, overcomes the bottleneck that can not differentiate unknown cell category, and by means of newly The cell sorting equipment of a generation can realize in situ, differentiate cell category in real time, it is easy to universal market. The application of the present invention will be raised speed the research of single cell analysis.
Accompanying drawing explanation
Fig. 1, cell sample feature extraction;Cell sample feature extraction by cell image is carried out greyscale transformation, After border detection, depth-first search etc. process, obtain the position of each cell, then according to position to often Individual cell carries out feature extraction and data preserve, thus facilitates the operations such as follow-up Classification and Identification.
Fig. 2, the identification process of support vector machine;SVM carries out SVM mould according to recognizing cell phenotype feature The training of type, so that it is determined that SVM model parameter, is then based on model parameter and enters unknown cell characteristic information Row analyzing and processing, thus judge unknown cell details.
Detailed description of the invention
Below in conjunction with the accompanying drawings and embodiment the present invention is described in further detail.
The present invention develops a kind of phenotypic data analysis and processing method based on unicellular Phenotype data base, receives The collection unicellular sample of variety classes, by building unicellular Phenotype Database Systems, divides by phenotypic data Analysis processes means, differentiates unknown cell category and phenotypic characteristic.Mainly include following two aspect Content (below with microalgae sample for object example):
1. cell image analysis and processing method based on unicellular Phenotype data base.The method is compared by analysis To cell image information in phenotypic data storehouse, compare in conjunction with unknown cell image data and find out optimal Join cell, thus obtain the details of this unknown cell.Idiographic flow is as follows:
(1) training sample feature extraction and sample set structure
Extract the key link that cell phenotype feature is cell image analyzing and processing, be also the base of structure sample set Plinth.Once obtain cell phenotype initial data, some initial data being mapped in sample space or to Amount.Containing cell phenotype, each observes position physiological intrinsic presentation information such as to these packets: vision, touches The change of the aspects such as feel and earthquake intensity, presentation and earthquake intensity combination thereof determine that the key foundation of unknown cell category.
In general, initial data has comprised redundancy, needs to convert in the hope of effectively through suitable process Ground extracts cell phenotype feature.The sample data of the cell phenotype feature extracted is processed into be suitable for support to The process of the data that amount machine processes is referred to as data prediction.First image is carried out greyscale transformation;Then carry out Image sharpening, it is therefore an objective to make gray scale contrast strengthen, thus strengthen marginal information in image, beneficially profile and take out Take;And then image is carried out smothing filtering, to filter noise source (such as electronic noise, photon noise, speckle Noise and quantizing noise etc.), thus improve the signal to noise ratio of image, conveniently carry out the lookup of image outline;Finally Search the place that in image, rate of gray level is maximum, thus obtain the closed contour of cell image, and then extract Feature (Fig. 1) in profile.Then according to eigenvalue, corresponding data can be stored in unicellular Phenotype data In storehouse.
(2) the identification process of support vector machine
The eigenvalue being then based in data base carries out classification analysis.The sorting algorithm of exploitation application at present includes Euclidean distance algorithm, KNN algorithm, support vector machine (SVM) algorithm.As a example by SVM algorithm, real Execute by training and judging that two stages are carried out.First, obtain the typical sample having recognized cell phenotype, carry out Feature extraction and data prediction, it is thus achieved that SVM model is trained by feature samples, finds training sample In support vector, determine SVM model parameter;Then, unknown cellular informatics is input to parameter It is judged by the SVM model changed, and particular flow sheet is as shown in Figure 2.
2. cell Raman data analysis and processing method based on unicellular Phenotype data base.The method is by dividing Cell Raman information in analysis comparison phenotype data base, compares in conjunction with unknown cell Raman data and finds out Good coupling cell, thus obtain the details of this unknown cell.Idiographic flow is as follows:
2.1 spectral manipulation modules
From the aspect such as the structure of Raman system and operation principle, the principal element affecting spectral signal is main There is the following aspects:
(1) from optical system and the interference signal of detection thing
Spectral signal is disturbed signal to be mainly the veiling glare of system, aberration and useless secondary by optical system The glitch etc. that the order of spectrum causes.For traditional large-scale spectrogrph, the interference signal of optical system is mainly Eliminate from system structure and relevant optical element.And the structure of laser Raman spectrometer spectrogrph, The aspect such as optical element and the system integration is very restricted, and traditional method is infeasible.Can only be by research The basic characteristics of these interference signals, the method seeking the most again to solve.
(2) from Circuits System and the noise signal of power supply
The drift of signal acquisition processing circuit and fluctuation signal, power supply noise signal is also the main of interference signal Source.Particularly above interference signal affects greatly in the case of weak output signal, there may come a time when useful letter Number flood completely, have a strong impact on the detection performance of system.The process of this part signal, is first as much as possible Improve Circuits System and the performance of power supply, then from the standpoint of signal processing technology.
Digital Fourier filter (Digital Fourier filtering) preprocess method can effective filter out high frequency and make an uproar Sound and being made an uproar or low-frequency noise that the reason such as baseline drift causes by instrumental background is miscellaneous, increases spectral signal-noise ratio.Number First data are carried out fast Fourier transform (FFT) by word Fourier filtering, at frequency space and Gauss function Effect, then an inverse fast fourier (IFFT), obtain the spectroscopic data after bandpass filtering.Gaussian function Average and standard deviation determine mid frequency and the bandwidth of band filter respectively, the determination of filtering parameter is usual Numerical optimization is used to realize, to obtain optimum filtering effect.
2.2 spectral analysis module
Spectral analysis module uses Euclidean distance, neutral net, three kinds of algorithms of support vector machine to Raman spectrum It is analyzed processing.
2.2.1 Euclidean distance
Euclidean distance is also referred to as euclidean metric, Euclidean distance, is the distance definition of a usual employing, It is the actual distance in m-dimensional space between two points.Euclidean distance in two-dimensional space is exactly 2 points Between straightway distance.
N dimension Euclidean space be a point set, its each some X can be expressed as (x [1], x [2] ..., x [n]), Wherein x [i] (i=1,2 ..., n) be real number, the i-th coordinate of referred to as X, two some A=(a [1], A [2] ..., a [n]) and B=(b [1], b [2] ..., b [n]) between distance d (A, B) be defined as down The formula in face.D (A, B)=sqrt [∑ ((a [i]-b [i]) ^2)] (i=1,2 ..., n).
According to Euclidean distance algorithm, Raman spectrum and the existing Raman spectrum number in data base to cell to be measured According to using Euclidean distance algorithm, find out the most close one group Raman data, thus obtain the letters such as its cell type Breath.So cell category to be measured there is certain reference.
2.2.2 neutral net
Neutral net is a kind of model animal nerve network behavior feature, carries out distributed parallel information processing Algorithm mathematics model.This network relies on the complexity of system, by adjusting phase between internal great deal of nodes The relation connected, thus reach the purpose of process information.
The kit that phenotypic data analysis process system provides according to neutral net has carried out secondary development, passes through Call corresponding interface function and realize the merits such as training sample training, Raman data classification and display, result preservation Energy.
2.2.3 support vector machine
Support vector machine method is built upon VC dimension theory and the Structural risk minization principle of Statistical Learning Theory On the basis of, according to limited sample information, in the complexity of model, (i.e. the study to specific training sample is smart Degree) and learning capacity (identifying the ability of arbitrary sample the most error-free) between seek optimal compromise, in the hope of Obtain best Generalization Ability.
Phenotypic data analysis process system utilizes support vector machine kit to carry out secondary development, by calling Corresponding interface function realizes the functions such as training sample training, Raman data classification and display, result preservation.
In FIG, phenotypic data analysis and processing method basic configuration based on unicellular Phenotype data base is: Windows XP operating system, pre-installs MySQL database.
In fig. 2, the hardware basic configuration of support vector machine identification is: comprise GPGPU (at universal parallel Reason device) run hardware supercomputer, CPU at least two core, arithmetic speed at least more than 2Ghz, Internal memory at least more than 2GB, hard disk at least more than 50G.Interconnection at a high speed between CPU, GPGPU and storage.

Claims (4)

1. a phenotypic data analysis and processing method based on unicellular Phenotype data base, it is characterised in that Comprise the following steps:
The cell image analyzing and processing stage: by cell image information and the unknown in analyses and comparison phenotypic data storehouse Cell image data;Extract the phenotypic characteristic of unknown cell;
Data prediction: the phenotypic characteristic extracted is processed into applicable Euclidean distance algorithm, KNN algorithm, The data that algorithm of support vector machine processes;
Carry out classification analysis based on the characteristic in unicellular Phenotype data base, find out and most preferably mate cell.
Phenotypic data analyzing and processing side based on unicellular Phenotype data base the most according to claim 1 Method, it is characterised in that described data prediction comprises the following steps:
Image is carried out greyscale transformation;
Carry out image sharpening, make gray scale contrast strengthen, thus strengthen marginal information in image;
Image is carried out smothing filtering, to filter noise source;
Search the place that in image, rate of gray level is maximum, obtain the closed contour of cell image, and then extract Feature in profile.
Phenotypic data analyzing and processing side based on unicellular Phenotype data base the most according to claim 2 Method, it is characterised in that the described smothing filtering that carries out image uses Digital Fourier filter, particularly as follows:
First data are carried out fast Fourier transform, in frequency space and Gauss function effect, the most instead Fast Fourier transform, obtains the spectroscopic data after bandpass filtering.
Phenotypic data analyzing and processing side based on unicellular Phenotype data base the most according to claim 1 Method, it is characterised in that described carry out classification analysis based on the characteristic in unicellular Phenotype data base, Including training and judging two stages;
First, obtain the typical sample having recognized cell phenotype, carry out feature extraction and data prediction, obtain Obtain feature samples SVM model is trained, find the support vector in training sample, determine SVM mould Shape parameter;Then, unknown cellular informatics is input to the most parameterized SVM model it is judged.
CN201510270838.7A 2015-05-25 2015-05-25 Phenotypic data analysis and processing method based on unicellular Phenotype data base Pending CN106295251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510270838.7A CN106295251A (en) 2015-05-25 2015-05-25 Phenotypic data analysis and processing method based on unicellular Phenotype data base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510270838.7A CN106295251A (en) 2015-05-25 2015-05-25 Phenotypic data analysis and processing method based on unicellular Phenotype data base

Publications (1)

Publication Number Publication Date
CN106295251A true CN106295251A (en) 2017-01-04

Family

ID=57634415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510270838.7A Pending CN106295251A (en) 2015-05-25 2015-05-25 Phenotypic data analysis and processing method based on unicellular Phenotype data base

Country Status (1)

Country Link
CN (1) CN106295251A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109891508A (en) * 2019-01-29 2019-06-14 北京大学 Single cell type detection method, device, equipment and storage medium
CN110520876A (en) * 2017-03-29 2019-11-29 新克赛特株式会社 Learning outcome output device and learning outcome output program
US11358984B2 (en) 2018-08-27 2022-06-14 Regeneran Pharmaceuticals, Inc. Use of Raman spectroscopy in downstream purification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006006224A (en) * 2004-06-25 2006-01-12 Hitachi Ltd Method for managing cell and tissue culture and system therefor
US7747547B1 (en) * 2007-10-31 2010-06-29 Pathwork Diagnostics, Inc. Systems and methods for diagnosing a biological specimen using probabilities
CN103473751A (en) * 2013-08-14 2013-12-25 西安理工大学 CMOS sensor cell image super-resolution reconstruction method based on multiple objects
CN104077307A (en) * 2013-03-29 2014-10-01 中国科学院青岛生物能源与过程研究所 Single-cell phenotype database system and search engine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006006224A (en) * 2004-06-25 2006-01-12 Hitachi Ltd Method for managing cell and tissue culture and system therefor
US7747547B1 (en) * 2007-10-31 2010-06-29 Pathwork Diagnostics, Inc. Systems and methods for diagnosing a biological specimen using probabilities
CN104077307A (en) * 2013-03-29 2014-10-01 中国科学院青岛生物能源与过程研究所 Single-cell phenotype database system and search engine
CN103473751A (en) * 2013-08-14 2013-12-25 西安理工大学 CMOS sensor cell image super-resolution reconstruction method based on multiple objects

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
PETRA RÖSCH 等: "Chemotaxonomic Identification of Single Bacteria by Micro-Raman Spectroscopy: Application to Clean-Room-Relevant Biological Contaminations", 《APPLIED AND ENVIRONMENT MICROBIOLOGY》 *
张问银 等: "基于支持向量机的CD4细胞图像识别方法", 《计算机工程与科学》 *
李庆波 等: "应用数字傅里叶滤波方法提高近红外光谱多元校正模型稳健性的研究", 《光谱学与光谱分析》 *
秦颖博 等: "基于支持向量机的尿液细胞图像识别分类研究", 《计算机工程与设计》 *
邹江 等: "红外图像综合处理算法研究", 《电子测试》 *
陈婷: "细胞图像处理及识别技术在生物材料表征领域的研究", 《中国优秀博硕士学位论文全文数据库 (硕士) 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110520876A (en) * 2017-03-29 2019-11-29 新克赛特株式会社 Learning outcome output device and learning outcome output program
US11358984B2 (en) 2018-08-27 2022-06-14 Regeneran Pharmaceuticals, Inc. Use of Raman spectroscopy in downstream purification
CN109891508A (en) * 2019-01-29 2019-06-14 北京大学 Single cell type detection method, device, equipment and storage medium
CN109891508B (en) * 2019-01-29 2023-05-23 北京大学 Single cell type detection method, device, apparatus and storage medium

Similar Documents

Publication Publication Date Title
Jiang et al. Real-time detection of apple leaf diseases using deep learning approach based on improved convolutional neural networks
CN106604229A (en) Indoor positioning method based on manifold learning and improved support vector machine
CN103870816B (en) The method of the plants identification that a kind of discrimination is high
CN114564982B (en) Automatic identification method for radar signal modulation type
CN109544538B (en) Wheat scab disease grade grading method and device
CN102609681A (en) Face recognition method based on dictionary learning models
CN101751666A (en) Semi-supervised multi-spectral remote sensing image segmentation method based on spectral clustering
CN110134719B (en) Identification and classification method for sensitive attribute of structured data
CN110377605B (en) Sensitive attribute identification and classification method for structured data
CN101833667A (en) Pattern recognition classification method expressed based on grouping sparsity
CN110348494A (en) A kind of human motion recognition method based on binary channels residual error neural network
CN102346851B (en) Image segmentation method based on NJW (Ng-Jordan-Weiss) spectral clustering mark
Zhang et al. A new time series representation model and corresponding similarity measure for fast and accurate similarity detection
CN108596227B (en) Mining method for dominant influence factors of electricity consumption behaviors of users
CN105320764A (en) 3D model retrieval method and 3D model retrieval apparatus based on slow increment features
CN109858386A (en) A kind of microalgae cell recognition methods based on fluorescence microscope images
CN106295251A (en) Phenotypic data analysis and processing method based on unicellular Phenotype data base
CN106611016B (en) A kind of image search method based on decomposable word packet model
CN103278467A (en) Rapid nondestructive high-accuracy method with for identifying abundance degree of nitrogen element in plant leaf
Lin et al. A new automatic recognition system of gender, age and ethnicity
CN113076878B (en) Constitution identification method based on attention mechanism convolution network structure
CN110554429A (en) Earthquake fault identification method based on variable neighborhood sliding window machine learning
CN101667253B (en) Supervised classification method of multi-class hyperspectrum remotely sensed data
Zhang et al. An improved PAM clustering algorithm based on initial clustering centers
CN103761530B (en) Hyperspectral image unmixing method based on relevance vector machine

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170104

RJ01 Rejection of invention patent application after publication