CN106295251A - Phenotypic data analysis and processing method based on unicellular Phenotype data base - Google Patents
Phenotypic data analysis and processing method based on unicellular Phenotype data base Download PDFInfo
- Publication number
- CN106295251A CN106295251A CN201510270838.7A CN201510270838A CN106295251A CN 106295251 A CN106295251 A CN 106295251A CN 201510270838 A CN201510270838 A CN 201510270838A CN 106295251 A CN106295251 A CN 106295251A
- Authority
- CN
- China
- Prior art keywords
- data
- cell
- image
- phenotypic
- unicellular
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The present invention relates to a kind of phenotypic data analysis and processing method based on unicellular Phenotype data base.The main modular of the present invention is made up of unicellular Phenotype data base and phenotypic data analysis and processing method two parts.(1) cell image analysis and processing method based on unicellular Phenotype data base, the method is by cell image information in analyses and comparison phenotypic data storehouse, compare in conjunction with unknown cell image data and find out and most preferably mate cell, thus obtain the details of this unknown cell.(2) cell Raman data analysis and processing method based on unicellular Phenotype data base.The method, by cell Raman information in analyses and comparison phenotypic data storehouse, is compared in conjunction with unknown cell Raman data and is found out and most preferably mate cell, thus obtain the details of this unknown cell.
Description
Technical field
The present invention relates to unicellular research and cell science application, a kind of based on unicellular
The phenotypic data analysis and processing method of Phenotype data base.
Background technology
Individual cells is the elementary cell of vital movement, and on the earth, all biologies are by unicellular composition or slender
Born of the same parents are differentiated to form.Go deep into systematic research be possible not only to disclose panoramically vital movement to unicellular
Essence, and the specificity of individual cells and atomization are for study of disease mechanism and diagnosis prevention disease etc.
Have great importance." unicellular research " (for the analysis of individual cells of specific function) can solve
The operating mechanism of analysis life system " deeply " level, therefore, it is possible to bring life sciences and at the energy, ring
The breakthrough of the extensively application such as border, health, agricultural, ocean.U.S. national health academy (NIH) is more
It is to start " Single Cell Analysis Program " in JIUYUE, 2012, discloses 26 projects total
Count the subsidy of 90,000,000 dollars, be mainly used in unicellular field new tool, the exploitation of new technique
(http://commonfund.nih.gov/singlecell/fundedresearch.aspx.).On December 21st, 2012
Science magazine unicellular research to be elected be one of six big science fields of meriting attention most for 2013.
That the overall observation method of the Phenotype of cell i.e. the form of expression of cell, i.e. utilization can obtain,
The information of reflection cell growth state.For unicellular, represent its specific physical appearance or composition,
Such as cell shape, size, color characteristic, textural characteristics, classification etc., it it is all phenotypic example.Its
In important method include the unicellular form of microscopical identification, and utilize that the equipment such as Raman spectrometer obtain thin
Born of the same parents' raman spectral signal.Study unicellular, namely relevant to cell shape, size, color etc.
Information analysis and the differentiation of cell category, these are both needed to comprise different cell and different growth week by means of one
The Phenotype data base of phase and the phenotypic data analysis process system of correspondence realize.And present stage is the most very
The research of rare relevant phenotypic data analysis process system so that set up a set of based on unicellular Phenotype number
According to the phenotypic data analysis and processing method in storehouse individual cells studied and there is important practical value.
Summary of the invention
For above-mentioned weak point present in prior art, the technical problem to be solved in the present invention is to provide one
Plant phenotypic data analysis and processing method based on unicellular Phenotype data base, by the cell sorting of a new generation
Equipment, obtains the single or phenotype of colony's cell (microorganism, plant, animal or human body cell are all suitable for)
Information, thus for these cell group credits are analysed, transformed and utilize and establish basic basis.
The present invention be the technical scheme is that a kind of based on unicellular Phenotype data for achieving the above object
The phenotypic data analysis and processing method in storehouse, comprises the following steps:
The cell image analyzing and processing stage: by cell image information and the unknown in analyses and comparison phenotypic data storehouse
Cell image data;Extract the phenotypic characteristic of unknown cell;
Data prediction: the phenotypic characteristic extracted is processed into applicable Euclidean distance algorithm, KNN algorithm,
The data that algorithm of support vector machine processes;
Carry out classification analysis based on the characteristic in unicellular Phenotype data base, find out and most preferably mate cell.
Described data prediction comprises the following steps:
Image is carried out greyscale transformation;
Carry out image sharpening, make gray scale contrast strengthen, thus strengthen marginal information in image;
Image is carried out smothing filtering, to filter noise source;
Search the place that in image, rate of gray level is maximum, obtain the closed contour of cell image, and then extract
Feature in profile.
The described smothing filtering that carries out image uses Digital Fourier filter, particularly as follows:
First data are carried out fast Fourier transform, in frequency space and Gauss function effect, the most instead
Fast Fourier transform, obtains the spectroscopic data after bandpass filtering.
Described carry out classification analysis based on the characteristic in unicellular Phenotype data base, including training with sentence
Disconnected two stages;
First, obtain the typical sample having recognized cell phenotype, carry out feature extraction and data prediction, obtain
Obtain feature samples SVM model is trained, find the support vector in training sample, determine SVM mould
Shape parameter;Then, unknown cellular informatics is input to the most parameterized SVM model it is judged.
The present invention has the following advantages and beneficial effect: collect the unicellular sample of variety classes, single by building
Cell phenotype Database Systems, by phenotypic data analyzing and processing means, come unknown cell category and table
Type feature differentiates, overcomes the bottleneck that can not differentiate unknown cell category, and by means of newly
The cell sorting equipment of a generation can realize in situ, differentiate cell category in real time, it is easy to universal market.
The application of the present invention will be raised speed the research of single cell analysis.
Accompanying drawing explanation
Fig. 1, cell sample feature extraction;Cell sample feature extraction by cell image is carried out greyscale transformation,
After border detection, depth-first search etc. process, obtain the position of each cell, then according to position to often
Individual cell carries out feature extraction and data preserve, thus facilitates the operations such as follow-up Classification and Identification.
Fig. 2, the identification process of support vector machine;SVM carries out SVM mould according to recognizing cell phenotype feature
The training of type, so that it is determined that SVM model parameter, is then based on model parameter and enters unknown cell characteristic information
Row analyzing and processing, thus judge unknown cell details.
Detailed description of the invention
Below in conjunction with the accompanying drawings and embodiment the present invention is described in further detail.
The present invention develops a kind of phenotypic data analysis and processing method based on unicellular Phenotype data base, receives
The collection unicellular sample of variety classes, by building unicellular Phenotype Database Systems, divides by phenotypic data
Analysis processes means, differentiates unknown cell category and phenotypic characteristic.Mainly include following two aspect
Content (below with microalgae sample for object example):
1. cell image analysis and processing method based on unicellular Phenotype data base.The method is compared by analysis
To cell image information in phenotypic data storehouse, compare in conjunction with unknown cell image data and find out optimal
Join cell, thus obtain the details of this unknown cell.Idiographic flow is as follows:
(1) training sample feature extraction and sample set structure
Extract the key link that cell phenotype feature is cell image analyzing and processing, be also the base of structure sample set
Plinth.Once obtain cell phenotype initial data, some initial data being mapped in sample space or to
Amount.Containing cell phenotype, each observes position physiological intrinsic presentation information such as to these packets: vision, touches
The change of the aspects such as feel and earthquake intensity, presentation and earthquake intensity combination thereof determine that the key foundation of unknown cell category.
In general, initial data has comprised redundancy, needs to convert in the hope of effectively through suitable process
Ground extracts cell phenotype feature.The sample data of the cell phenotype feature extracted is processed into be suitable for support to
The process of the data that amount machine processes is referred to as data prediction.First image is carried out greyscale transformation;Then carry out
Image sharpening, it is therefore an objective to make gray scale contrast strengthen, thus strengthen marginal information in image, beneficially profile and take out
Take;And then image is carried out smothing filtering, to filter noise source (such as electronic noise, photon noise, speckle
Noise and quantizing noise etc.), thus improve the signal to noise ratio of image, conveniently carry out the lookup of image outline;Finally
Search the place that in image, rate of gray level is maximum, thus obtain the closed contour of cell image, and then extract
Feature (Fig. 1) in profile.Then according to eigenvalue, corresponding data can be stored in unicellular Phenotype data
In storehouse.
(2) the identification process of support vector machine
The eigenvalue being then based in data base carries out classification analysis.The sorting algorithm of exploitation application at present includes
Euclidean distance algorithm, KNN algorithm, support vector machine (SVM) algorithm.As a example by SVM algorithm, real
Execute by training and judging that two stages are carried out.First, obtain the typical sample having recognized cell phenotype, carry out
Feature extraction and data prediction, it is thus achieved that SVM model is trained by feature samples, finds training sample
In support vector, determine SVM model parameter;Then, unknown cellular informatics is input to parameter
It is judged by the SVM model changed, and particular flow sheet is as shown in Figure 2.
2. cell Raman data analysis and processing method based on unicellular Phenotype data base.The method is by dividing
Cell Raman information in analysis comparison phenotype data base, compares in conjunction with unknown cell Raman data and finds out
Good coupling cell, thus obtain the details of this unknown cell.Idiographic flow is as follows:
2.1 spectral manipulation modules
From the aspect such as the structure of Raman system and operation principle, the principal element affecting spectral signal is main
There is the following aspects:
(1) from optical system and the interference signal of detection thing
Spectral signal is disturbed signal to be mainly the veiling glare of system, aberration and useless secondary by optical system
The glitch etc. that the order of spectrum causes.For traditional large-scale spectrogrph, the interference signal of optical system is mainly
Eliminate from system structure and relevant optical element.And the structure of laser Raman spectrometer spectrogrph,
The aspect such as optical element and the system integration is very restricted, and traditional method is infeasible.Can only be by research
The basic characteristics of these interference signals, the method seeking the most again to solve.
(2) from Circuits System and the noise signal of power supply
The drift of signal acquisition processing circuit and fluctuation signal, power supply noise signal is also the main of interference signal
Source.Particularly above interference signal affects greatly in the case of weak output signal, there may come a time when useful letter
Number flood completely, have a strong impact on the detection performance of system.The process of this part signal, is first as much as possible
Improve Circuits System and the performance of power supply, then from the standpoint of signal processing technology.
Digital Fourier filter (Digital Fourier filtering) preprocess method can effective filter out high frequency and make an uproar
Sound and being made an uproar or low-frequency noise that the reason such as baseline drift causes by instrumental background is miscellaneous, increases spectral signal-noise ratio.Number
First data are carried out fast Fourier transform (FFT) by word Fourier filtering, at frequency space and Gauss function
Effect, then an inverse fast fourier (IFFT), obtain the spectroscopic data after bandpass filtering.Gaussian function
Average and standard deviation determine mid frequency and the bandwidth of band filter respectively, the determination of filtering parameter is usual
Numerical optimization is used to realize, to obtain optimum filtering effect.
2.2 spectral analysis module
Spectral analysis module uses Euclidean distance, neutral net, three kinds of algorithms of support vector machine to Raman spectrum
It is analyzed processing.
2.2.1 Euclidean distance
Euclidean distance is also referred to as euclidean metric, Euclidean distance, is the distance definition of a usual employing,
It is the actual distance in m-dimensional space between two points.Euclidean distance in two-dimensional space is exactly 2 points
Between straightway distance.
N dimension Euclidean space be a point set, its each some X can be expressed as (x [1], x [2] ..., x [n]),
Wherein x [i] (i=1,2 ..., n) be real number, the i-th coordinate of referred to as X, two some A=(a [1],
A [2] ..., a [n]) and B=(b [1], b [2] ..., b [n]) between distance d (A, B) be defined as down
The formula in face.D (A, B)=sqrt [∑ ((a [i]-b [i]) ^2)] (i=1,2 ..., n).
According to Euclidean distance algorithm, Raman spectrum and the existing Raman spectrum number in data base to cell to be measured
According to using Euclidean distance algorithm, find out the most close one group Raman data, thus obtain the letters such as its cell type
Breath.So cell category to be measured there is certain reference.
2.2.2 neutral net
Neutral net is a kind of model animal nerve network behavior feature, carries out distributed parallel information processing
Algorithm mathematics model.This network relies on the complexity of system, by adjusting phase between internal great deal of nodes
The relation connected, thus reach the purpose of process information.
The kit that phenotypic data analysis process system provides according to neutral net has carried out secondary development, passes through
Call corresponding interface function and realize the merits such as training sample training, Raman data classification and display, result preservation
Energy.
2.2.3 support vector machine
Support vector machine method is built upon VC dimension theory and the Structural risk minization principle of Statistical Learning Theory
On the basis of, according to limited sample information, in the complexity of model, (i.e. the study to specific training sample is smart
Degree) and learning capacity (identifying the ability of arbitrary sample the most error-free) between seek optimal compromise, in the hope of
Obtain best Generalization Ability.
Phenotypic data analysis process system utilizes support vector machine kit to carry out secondary development, by calling
Corresponding interface function realizes the functions such as training sample training, Raman data classification and display, result preservation.
In FIG, phenotypic data analysis and processing method basic configuration based on unicellular Phenotype data base is:
Windows XP operating system, pre-installs MySQL database.
In fig. 2, the hardware basic configuration of support vector machine identification is: comprise GPGPU (at universal parallel
Reason device) run hardware supercomputer, CPU at least two core, arithmetic speed at least more than 2Ghz,
Internal memory at least more than 2GB, hard disk at least more than 50G.Interconnection at a high speed between CPU, GPGPU and storage.
Claims (4)
1. a phenotypic data analysis and processing method based on unicellular Phenotype data base, it is characterised in that
Comprise the following steps:
The cell image analyzing and processing stage: by cell image information and the unknown in analyses and comparison phenotypic data storehouse
Cell image data;Extract the phenotypic characteristic of unknown cell;
Data prediction: the phenotypic characteristic extracted is processed into applicable Euclidean distance algorithm, KNN algorithm,
The data that algorithm of support vector machine processes;
Carry out classification analysis based on the characteristic in unicellular Phenotype data base, find out and most preferably mate cell.
Phenotypic data analyzing and processing side based on unicellular Phenotype data base the most according to claim 1
Method, it is characterised in that described data prediction comprises the following steps:
Image is carried out greyscale transformation;
Carry out image sharpening, make gray scale contrast strengthen, thus strengthen marginal information in image;
Image is carried out smothing filtering, to filter noise source;
Search the place that in image, rate of gray level is maximum, obtain the closed contour of cell image, and then extract
Feature in profile.
Phenotypic data analyzing and processing side based on unicellular Phenotype data base the most according to claim 2
Method, it is characterised in that the described smothing filtering that carries out image uses Digital Fourier filter, particularly as follows:
First data are carried out fast Fourier transform, in frequency space and Gauss function effect, the most instead
Fast Fourier transform, obtains the spectroscopic data after bandpass filtering.
Phenotypic data analyzing and processing side based on unicellular Phenotype data base the most according to claim 1
Method, it is characterised in that described carry out classification analysis based on the characteristic in unicellular Phenotype data base,
Including training and judging two stages;
First, obtain the typical sample having recognized cell phenotype, carry out feature extraction and data prediction, obtain
Obtain feature samples SVM model is trained, find the support vector in training sample, determine SVM mould
Shape parameter;Then, unknown cellular informatics is input to the most parameterized SVM model it is judged.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510270838.7A CN106295251A (en) | 2015-05-25 | 2015-05-25 | Phenotypic data analysis and processing method based on unicellular Phenotype data base |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510270838.7A CN106295251A (en) | 2015-05-25 | 2015-05-25 | Phenotypic data analysis and processing method based on unicellular Phenotype data base |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106295251A true CN106295251A (en) | 2017-01-04 |
Family
ID=57634415
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510270838.7A Pending CN106295251A (en) | 2015-05-25 | 2015-05-25 | Phenotypic data analysis and processing method based on unicellular Phenotype data base |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106295251A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109891508A (en) * | 2019-01-29 | 2019-06-14 | 北京大学 | Single cell type detection method, device, equipment and storage medium |
CN110520876A (en) * | 2017-03-29 | 2019-11-29 | 新克赛特株式会社 | Learning outcome output device and learning outcome output program |
US11358984B2 (en) | 2018-08-27 | 2022-06-14 | Regeneran Pharmaceuticals, Inc. | Use of Raman spectroscopy in downstream purification |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006006224A (en) * | 2004-06-25 | 2006-01-12 | Hitachi Ltd | Method for managing cell and tissue culture and system therefor |
US7747547B1 (en) * | 2007-10-31 | 2010-06-29 | Pathwork Diagnostics, Inc. | Systems and methods for diagnosing a biological specimen using probabilities |
CN103473751A (en) * | 2013-08-14 | 2013-12-25 | 西安理工大学 | CMOS sensor cell image super-resolution reconstruction method based on multiple objects |
CN104077307A (en) * | 2013-03-29 | 2014-10-01 | 中国科学院青岛生物能源与过程研究所 | Single-cell phenotype database system and search engine |
-
2015
- 2015-05-25 CN CN201510270838.7A patent/CN106295251A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006006224A (en) * | 2004-06-25 | 2006-01-12 | Hitachi Ltd | Method for managing cell and tissue culture and system therefor |
US7747547B1 (en) * | 2007-10-31 | 2010-06-29 | Pathwork Diagnostics, Inc. | Systems and methods for diagnosing a biological specimen using probabilities |
CN104077307A (en) * | 2013-03-29 | 2014-10-01 | 中国科学院青岛生物能源与过程研究所 | Single-cell phenotype database system and search engine |
CN103473751A (en) * | 2013-08-14 | 2013-12-25 | 西安理工大学 | CMOS sensor cell image super-resolution reconstruction method based on multiple objects |
Non-Patent Citations (6)
Title |
---|
PETRA RÖSCH 等: "Chemotaxonomic Identification of Single Bacteria by Micro-Raman Spectroscopy: Application to Clean-Room-Relevant Biological Contaminations", 《APPLIED AND ENVIRONMENT MICROBIOLOGY》 * |
张问银 等: "基于支持向量机的CD4细胞图像识别方法", 《计算机工程与科学》 * |
李庆波 等: "应用数字傅里叶滤波方法提高近红外光谱多元校正模型稳健性的研究", 《光谱学与光谱分析》 * |
秦颖博 等: "基于支持向量机的尿液细胞图像识别分类研究", 《计算机工程与设计》 * |
邹江 等: "红外图像综合处理算法研究", 《电子测试》 * |
陈婷: "细胞图像处理及识别技术在生物材料表征领域的研究", 《中国优秀博硕士学位论文全文数据库 (硕士) 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110520876A (en) * | 2017-03-29 | 2019-11-29 | 新克赛特株式会社 | Learning outcome output device and learning outcome output program |
US11358984B2 (en) | 2018-08-27 | 2022-06-14 | Regeneran Pharmaceuticals, Inc. | Use of Raman spectroscopy in downstream purification |
CN109891508A (en) * | 2019-01-29 | 2019-06-14 | 北京大学 | Single cell type detection method, device, equipment and storage medium |
CN109891508B (en) * | 2019-01-29 | 2023-05-23 | 北京大学 | Single cell type detection method, device, apparatus and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | Real-time detection of apple leaf diseases using deep learning approach based on improved convolutional neural networks | |
CN106604229A (en) | Indoor positioning method based on manifold learning and improved support vector machine | |
CN103870816B (en) | The method of the plants identification that a kind of discrimination is high | |
CN114564982B (en) | Automatic identification method for radar signal modulation type | |
CN109544538B (en) | Wheat scab disease grade grading method and device | |
CN102609681A (en) | Face recognition method based on dictionary learning models | |
CN101751666A (en) | Semi-supervised multi-spectral remote sensing image segmentation method based on spectral clustering | |
CN110134719B (en) | Identification and classification method for sensitive attribute of structured data | |
CN110377605B (en) | Sensitive attribute identification and classification method for structured data | |
CN101833667A (en) | Pattern recognition classification method expressed based on grouping sparsity | |
CN110348494A (en) | A kind of human motion recognition method based on binary channels residual error neural network | |
CN102346851B (en) | Image segmentation method based on NJW (Ng-Jordan-Weiss) spectral clustering mark | |
Zhang et al. | A new time series representation model and corresponding similarity measure for fast and accurate similarity detection | |
CN108596227B (en) | Mining method for dominant influence factors of electricity consumption behaviors of users | |
CN105320764A (en) | 3D model retrieval method and 3D model retrieval apparatus based on slow increment features | |
CN109858386A (en) | A kind of microalgae cell recognition methods based on fluorescence microscope images | |
CN106295251A (en) | Phenotypic data analysis and processing method based on unicellular Phenotype data base | |
CN106611016B (en) | A kind of image search method based on decomposable word packet model | |
CN103278467A (en) | Rapid nondestructive high-accuracy method with for identifying abundance degree of nitrogen element in plant leaf | |
Lin et al. | A new automatic recognition system of gender, age and ethnicity | |
CN113076878B (en) | Constitution identification method based on attention mechanism convolution network structure | |
CN110554429A (en) | Earthquake fault identification method based on variable neighborhood sliding window machine learning | |
CN101667253B (en) | Supervised classification method of multi-class hyperspectrum remotely sensed data | |
Zhang et al. | An improved PAM clustering algorithm based on initial clustering centers | |
CN103761530B (en) | Hyperspectral image unmixing method based on relevance vector machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170104 |
|
RJ01 | Rejection of invention patent application after publication |