CN109522378A - The display methods and display equipment of hereditary birthplace probability distribution - Google Patents
The display methods and display equipment of hereditary birthplace probability distribution Download PDFInfo
- Publication number
- CN109522378A CN109522378A CN201811178756.XA CN201811178756A CN109522378A CN 109522378 A CN109522378 A CN 109522378A CN 201811178756 A CN201811178756 A CN 201811178756A CN 109522378 A CN109522378 A CN 109522378A
- Authority
- CN
- China
- Prior art keywords
- birthplace
- hereditary
- probability
- display
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000009826 distribution Methods 0.000 title claims abstract description 18
- 230000000007 visual effect Effects 0.000 claims abstract description 11
- 238000013507 mapping Methods 0.000 claims description 14
- 239000003086 colorant Substances 0.000 claims description 2
- 238000012360 testing method Methods 0.000 abstract description 14
- 239000000523 sample Substances 0.000 description 66
- 238000003205 genotyping method Methods 0.000 description 22
- 239000011159 matrix material Substances 0.000 description 13
- 238000012549 training Methods 0.000 description 13
- 238000007637 random forest analysis Methods 0.000 description 12
- 239000013598 vector Substances 0.000 description 12
- 230000006870 function Effects 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 7
- 239000013074 reference sample Substances 0.000 description 5
- 238000012163 sequencing technique Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 4
- 108090000623 proteins and genes Proteins 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000002493 microarray Methods 0.000 description 3
- 239000002773 nucleotide Substances 0.000 description 3
- 125000003729 nucleotide group Chemical group 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 208000032839 leukemia Diseases 0.000 description 2
- 238000012775 microarray technology Methods 0.000 description 2
- 238000007481 next generation sequencing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011524 similarity measure Methods 0.000 description 2
- 108020004414 DNA Proteins 0.000 description 1
- 238000000018 DNA microarray Methods 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 150000007523 nucleic acids Chemical group 0.000 description 1
- 238000002966 oligonucleotide array Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of display methods of hereditary birthplace probability distribution and its display equipment.The display methods includes: to obtain the calculated result of sample to be tested ancestral derived components;According to the calculated result, the probability in different hereditary birthplaces is determined;The probability in the different hereditary birthplace is shown in visual form.The display methods shows the hereditary birthplace prediction result or probability distribution of test sample in a manner of intuitive visual, user can be enabled to have better understanding to final display result, user experience is more preferable.
Description
Technical field
The present invention relates to the display methods of technical field of biological information more particularly to a kind of hereditary birthplace probability distribution and
Show equipment.
Background technique
SNP is writing a Chinese character in simplified form for single nucleotide polymorphism (Single Nucleotide Polymorphism), refers to genome
Upper single nucleotide acid variation, i.e. the mutual change of tetra- kinds of bases of A, T, C, G, forming same position on genome can be there are many base
Existing polymorphism.
SNP Genotyping refers to determining the base-pair type of SNP, and in addition to situation is not detected, a total of 4*4=16 kind can
It can result.The difference of Genotyping, the phenotype that may cause sample are different.It is widely present in crowd, and rich polymorphism is
Good genetic marker.After especially high-throughput SNP detection method occurs, it is widely used in the analysis of bioinformatics.
Historically come, the ancestors of different regions are limited to transportation condition at that time, with being mainly collected on local small range
Area, thus these area crowds compare apart from this area farther out other area, have apparent ancestral's derived components difference.If surveyed
Ancestral's derived components at family on probation and ancestral's derived components similarity degree of this area crowd are very high, and this area is the pre- of referred to as test user
Survey hereditary birthplace.
In realizing process of the present invention, inventor has found that the relevant technologies have the following problems: it is mature with the development of technology,
The hereditary birthplace for testing user can be calculated by various ways to be obtained.The probability in these hereditary birthplaces is pre-
Surveying result is usually indicated with a vector, these vectors can not intuitively show the prediction result in hereditary birthplace to user.
Summary of the invention
In view of the above technical problems, the embodiment of the invention provides a kind of display methods of hereditary birthplace probability distribution and
Show equipment, with solve hereditary birthplace calculated result in the prior art show it is not intuitive enough, it is difficult to meet that user requires asks
Topic.
The first aspect of the embodiment of the present invention provides a kind of display methods of hereditary birthplace probability distribution.The display side
Method includes:
Obtain the calculated result of sample to be tested ancestral derived components;According to the calculated result, determine in different heredity births
The probability on ground;The probability in the different hereditary birthplace is shown in visual form.
Optionally, the probability for showing the different hereditary birthplace in visual form, comprising: according to described
The geographical location in hereditary birthplace generates corresponding map;According to the probability in hereditary birthplace, corresponding color depth is determined;
On the map, each hereditary birthplace is shown with corresponding color depth.
Optionally, the map includes all hereditary birthplaces, and each heredity goes out birthplace and formed on the map
Corresponding display block.
Optionally, the probability according to different hereditary birthplaces, determines corresponding color depth, specifically includes:
The minimum value and maximum value for determining the color depth, form the variation range of color depth;Pass through Linear Mapping
The probability in the hereditary birthplace is mapped in the variation range by function;According to the mapping result, different something lost is determined
Spread out of the corresponding color depth of Radix Rehmanniae.
Optionally, the probability according to hereditary birthplace, determines corresponding color depth, specifically includes: described in determining
The minimum value and maximum value of color depth, form the variation range of color depth;The variation range is divided into several areas
Between, each section has corresponding probability;According to the corresponding area in heredity birthplace described in the determine the probability in the hereditary birthplace
Between;Using the color depth median in the corresponding section in hereditary birthplace as the corresponding color depth.
Optionally, the method also includes: receive the color selection instruction of user;Referred to according to the selection of the color of the user
It enables, determines the display color on the map.
Optionally, the probability is bigger, and corresponding color depth is deeper.
Optionally, the method also includes: on the hereditary birthplace with maximum probability show cue mark.
Optionally, the probability in the hereditary birthplace include: the hereditary birthplace of sample to be tested probability distribution or to
The ancestral source similarity of test sample sheet.
The second aspect of the embodiment of the present invention provides a kind of display equipment.The display equipment includes display unit and control
Device processed;The controller controls the display unit and shows with different colours depth for executing display methods as described above
The map of degree.
Method provided in an embodiment of the present invention shows the hereditary birthplace prediction of test sample in a manner of intuitive visual
As a result or probability distribution, user can be enabled to have better understanding to final display result, user experience is more preferable, and advantageous
In the popularization and application of technology, have a good application prospect.
Detailed description of the invention
Fig. 1 is one embodiment schematic diagram of the hereditary birthplace calculation method of the embodiment of the present invention;
Fig. 2 is one embodiment schematic diagram of the hereditary birthplace visualization display of the embodiment of the present invention;
Fig. 3 is one embodiment schematic diagram of the Similarity measures of the embodiment of the present invention;
Fig. 4 is one embodiment schematic diagram of the Similarity measures of another embodiment of the present invention;
Fig. 5 is one embodiment of the mahalanobis distance for calculating sample to be tested and each hereditary birthplace of the embodiment of the present invention
Schematic diagram;
Fig. 6 is one embodiment signal of the display methods of hereditary birthplace probability distribution provided in an embodiment of the present invention
Figure.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
It should be noted that be expressed " being fixed on " another element when element, it can directly on the other element,
Or there may be one or more elements placed in the middle therebetween.When an element is expressed " connection " another element, it can be with
It is directly to another element or there may be one or more elements placed in the middle therebetween.Used in this specification
The orientation or position of the instructions such as term "vertical", "horizontal", "left" and "right", "upper", "lower", "inner", "outside", " bottom "
Relationship is to be based on the orientation or positional relationship shown in the drawings, and is merely for convenience of description of the present invention and simplification of the description, without referring to
Show or imply that signified device or element must have a particular orientation, be constructed and operated in a specific orientation, therefore cannot manage
Solution is limitation of the present invention.In addition, term " first ", " second " etc. are used for description purposes only, and should not be understood as instruction or
Imply relative importance.
Unless otherwise defined, technical and scientific term all used in this specification is led with technology of the invention is belonged to
The normally understood meaning of the technical staff in domain is identical.Used term is only in the description of the invention in this specification
The purpose of description specific embodiment is not intended to the limitation present invention.Term "and/or" used in this specification includes
Any and all combinations of one or more related listed items.In addition, invention described below difference is implemented
Technical characteristic involved in mode can be combined with each other as long as they do not conflict with each other.
With high-throughput SNP microarray technology (micro array) and second generation sequencing technologies (Next
Generation Sequencing, NGS) development, the SNP genotyping result of human gene can be quick, accurate and low
The acquisition of cost.
Wherein, microarray (micro array) is also referred to as oligonucleotide arrays (Oligonucleitide array),
Belong to one of biochip.The principle of the technology is the gene probe of integrated known array on a solid surface, is tested life
It is miscellaneous by detection corresponding position after the nucleic acid sequence largely marked in object cell or tissue is hybridized with above-mentioned probe array
Probe is handed over to realize the quick detection of gene information.Mature commercial microarray technology can be disposably to up to a million at present
SNP site carry out accurate parting.
The core of second generation sequencing technologies is to become sequencing to become synthesis order-checking, by capturing the label of newly synthesized end come really
Determine the sequence of DNA.The advantages of second generation sequencing used today has expense low, and flux is high, and speed is fast and is convenient for operation, quilt
It is widely used in various large-scale genome research.The parting of full-length genome SNP is carried out using second generation sequencing technologies, it can
Reach very high accuracy while detecting the SNP of full-length genome.
Based on the development of the gene-correlation basic technology, a large amount of and accurate SNP genotyping result can be provided, is used
Using the basic data as analysis of biological information.The embodiment of the invention provides one kind to be based on these SNP typing datas and correspondence
Native place information Genes location method, the hereditary birthplace of sample can be predicted, the heredity for providing multiple candidates goes out
The prediction result of Radix Rehmanniae.
Fig. 1 is the calculation method of the prediction result in hereditary birthplace provided in an embodiment of the present invention.It is described as shown in Fig. 1
Method may include steps of:
110, the SNP genotyping result of sample to be tested is obtained.
The SNP genotyping result obtains after can carrying out genetic test by way of one or more disclosed in above embodiments
?.It is easy for statement, the SNP genotyping result of the sample to be tested is indicated with " sample SNP genotyping result ".
120, the SNP genotyping result for calculating the sample to be tested is similar to the reference SNP genotyping result of several groups
Property.The reference SNP genotyping result composition group set of several groups.
Described with reference to SNP genotyping result is reference sample after data prediction, has and calculates or count
The ancestral's derived components arrived.
The similitude refers to the statistical close degree of the two, specifically can be possibility or similarity is big
It is small.
The group for including in group's set specifically can be selected according to practical application or use demand.In some realities
It applies in example, may include Han Nationality from Northern, Leukemia in Southern Chinese Hans, high mountain group and Deng Yue42Ge group, Tibetan.
130, according to the similitude, ancestral's derived components of the sample to be tested are determined.Ancestral's derived components include the race
The component ratio of each group in cluster conjunction.
Zu Yuan analysis refers to from science of heredity angle the ancestors for describing everyone and group.Different group's history of evolution are not
Together, SNP polymorphism has with very strong group's specificity, can be used to reflect the hereditary feature of group.
Historically coming, the ancestors of different regions are limited to transportation condition at that time, it is mainly gathered in a small range area,
Therefore these area crowds compare apart from this area farther out other area, have apparent ancestral's derived components difference.
If ancestral's derived components of sample to be tested and ancestral's derived components similarity degree of this area crowd are very high, show that sample is corresponding
The probability be born on the ground of user it is very high.Thus, it is possible to which this area to be known as to the prediction heredity birthplace of sample to be tested.
Based on the theory deduction above with respect to ancestral source and the hereditary birthplace of prediction, the calculated result-that step 130 finally obtains
Ancestral's derived components are a kind of prediction probabilities for hereditary birthplace.Ancestral's derived components are the probability that each place is hereditary birthplace
Number list or sequence of values.
A possibility that each place (i.e. numerical value) value range is between 0 to 1.The numerical value in all places is added (i.e. ancestral
The ratios of derived components is added) the sum of be 1.Numerical values recited represents possibility size or similarity size.And the numerical value of sample is 1
When, indicate that the two is completely the same.
For the application scenarios in different hereditary birthplaces, the step 120 can specifically use two distinct types of
Method is realized, to export different result types.The sample SNP genotyping result of input and ancestral's derived components of final output are equal
It can be JSON file format, be realized by way of HTTP API Calls.
In some embodiments, when the probability point that the representation in the hereditary birthplace of the sample to be tested is each place
When cloth, the similitude between sample SNP genotyping result and reference data can be calculated using random forest machine learning method,
And the hereditary birthplace for exporting sample to be detected is the probability point of each place (corresponding with the place for including) in group's set
Cloth.
Fig. 3 is the side of the similitude between calculating sample SNP genotyping result provided in an embodiment of the present invention and reference data
Method flow chart.As shown in figure 3, the method specifically includes:
310, determine the reference SNP genotyping result of each group ancestral's derived components and corresponding ancestral home information;The ancestral
Derived components and corresponding ancestral home information form a labeled data.
The ancestral home data can be from data acquisition modes such as questionnaires on user's line.It is being collected into some ginseng
After examining the ancestral home data that sample corresponds to user, a label has been stamped to ancestral's derived components of the sample, has constituted a mark
Infuse data.
320, after the quantity of the labeled data reaches preset sample size, the labeled data is divided into training
Collection and test set.
The sample size refers to the minimum quantity of labeled data (or training data) required for carrying out machine learning.One
As, it can be determined by the machine learning model of practical application.The quantity of labeled data is bigger, usually can make machine learning
Effect it is more preferable, forecast result of model is more accurate.
After obtaining enough labeled data, the labeled data can be divided according to a certain percentage,
It is respectively used to training and tests.Specifically, can by the labeled data according to the ratio of 0.8:0.2, be divided into training set and
Test set is respectively used to be trained and test.
330, using training set training Random Forest model.
" random forest " refers to setting a kind of classifier for being trained sample and predicting using more.It belongs to machine
Integrated study in study has preferable estimated performance, can be very good the data mining applied to SNP parting.
340, the accuracy rate that different hyper parameters combine the Random Forest model to be formed is verified by the test set.
The hyper parameter is the parameter being arranged before model starts training.The parameter is not the ginseng obtained by training
Number.In general, require to optimize hyper parameter, model is given to select one group of optimal hyper parameter, with improve study performance and
Effect.
350, it is final mask that determining, which has the Random Forest model of highest accuracy rate,.
Step 340 and step 350 are verifying and preferred process, and random forest may be implemented by the data of test set
The optimization of hyper parameter in model, to obtain optimal prediction effect.
360, it by the final mask, calculates and obtains the sample to be tested in the probability distribution of each ancestral home.It is described
Each specific area in ancestral home is determined by the needs of the labeled data and practical application.
Step 360 eventually exports the prediction probability that the sample to be tested belongs to each ancestral home.All ancestral homes it is pre-
Surveying probability sum is 1.
In further embodiments, the phase between sample to be tested and each reference sample is calculated from absolute sense when needs
When like degree, it can be calculated using the method for mahalanobis distance, and return to the ancestral between sample to be tested and each reference sample
Source similitude.
" mahalanobis distance " is a kind of distance proposed by India's statistician's Mahalanobis (P.C.Mahalanobis)
Measurement.It is a kind of effective ways of similarity for calculating two unknown sample collection.The calculating of mahalanobis distance considers various
Connection between characteristic, and be that measurement scale is unrelated (independently of measurement scale).It typically, is μ, association side for a mean value
Poor matrix is the multivariable vector of Σ, and mahalanobis distance can be calculated by following formula:
Fig. 4 be another embodiment of the present invention provides the calculating sample SNP genotyping result and refer to SNP genotyping result
Between similitude (i.e. the similitudes of ancestral's derived components) method flow diagram.As shown in figure 4, the method may include following steps
It is rapid:
410, according to the reference SNP genotyping result of several groups, the ancestral source for calculating each hereditary birthplace is average
Value.
Assuming that group's collection is combined into [(x1,y1,z1...),(x2,y2,z2...),(xn,yn,zn...)], { x, y, z... }
For ancestral's derived components, n is the serial number of the reference sample in group set.
Correspondingly, (x1,x2,x3,...,xn) it is that the sequence of ancestral's derived components x (can similarly obtain ancestral's derived components y, Zu Yuancheng
Divide the sequence of z).The sequence of each ancestral's derived components x, y and z etc..It, can be with for the reference sample set in some hereditary birthplace
Calculate the ancestral source average value in the heredity birthplace.
420, the ancestral source that the SNP genotyping result for calculating sample to be tested calculates the ancestral source result and each birthplace that obtain is average
Mahalanobis distance between value.
It in some embodiments, can be simply using in a manner of successively calculating, successively to calculate the SNP of the sample to be tested
Mahalanobis distance between the ancestral source result of genotyping result and the ancestral source average value in each birthplace, to obtain sample to be tested and every
Mahalanobis distance between a birthplace.
In further embodiments, can also be by the way of structural matrix, while the SNP for exporting the sample to be tested divides
Mahalanobis distance between type result and the ancestral source average value in each birthplace.Fig. 5 is calculated simultaneously to be provided in an embodiment of the present invention
The method flow diagram of mahalanobis distance between each hereditary birthplace of sample to be tested.
As shown in figure 5, the method may include following steps:
421, pass through the ancestral source average value meter of ancestral's derived components vector of the sample to be tested and each hereditary birthplace
Calculate vector difference.
422, the vector difference is sequentially placed into the same matrix by row, constructs input matrix.
423, by the input matrix, at the same calculate the geneva between the sample to be tested and each hereditary birthplace away from
From.
430, by preset mapping function, the mahalanobis distance is converted into similarity.
In the above-described embodiments, the final mahalanobis distance value range obtained that calculates is zero to just infinite.Therefore, it is necessary to right
It calculates the mahalanobis distance obtained to be converted accordingly, to keep the sum of probability between each hereditary birthplace for 1.
The conversion can specifically default mapping function be realized by one, is protected so that mapping later variable-value range
It holds between 0-1.Also that is, when mahalanobis distance between the two is 0, corresponding similarity is 1.And geneva between the two away from
From bigger, corresponding similarity is also closer to 0.
It is preferred that the mapping function can use nonlinear mapping function when carrying out the conversion of mahalanobis distance, it will
The mahalanobis distance is converted to similarity.This is because needing to calculate horse when carrying out the conversion of mahalanobis distance using linear function
The maximum value of family name's distance (calculating of the maximum value of the mahalanobis distance is more complicated).And it is then not required to using nonlinear mapping function
The maximum value is calculated, to preferably reduce computation complexity.
The concrete operation method of method disclosed in above-described embodiment is described in detail below in conjunction with specific example.The present invention is implemented
The method that example provides can be based on the realization of Python2.7 version, being capable of the stable operation in debian system.Based on Python's
Cross-platform characteristic can also be run in other Linux releases such as CentOS and Windows/MacOS.The present invention is implemented
The method that example discloses can also be deployed to cloud computing product (during such as function calculates) during production application, to obtain more
Good performance.
Firstly, calculating ancestral's derived components of new samples using ADMIXTURE tool.The ADMIXTURE tool is one and is based on
SNP genotyping result data set carries out the open source software of ancestral's derived components estimation, is developed by UCLA.The input of the tool be for two into
The PLINK file of system, the entitled .ped of suffix have the support file of the same name of a corresponding PLINK format, suffix name at the same time
For .map.
Then, run in order line: admixture, filename .ped can be automatically generated result text by the tool
Part.Every a line of destination file format is represented as (x1,x2,...,xn).Wherein, each element representation ancestral's derived components
Specific gravity, the sum of specific gravity of all ancestral's derived components are 1.
In the present embodiment, ancestral's derived components successively can be the Dai nationality, the Gaoshan, Han Nationality from Northern, Leukemia in Southern Chinese Hans and Japan.?
In actual application, more ancestral's derived components quantity, such as 42 or so can also be added.
It on the one hand, can be using the supervision in machine learning when needing to predict the situation of hereditary birthplace probability distribution
Learning art (being in the present embodiment the random forest with good robustness).The machine learning basis of random forest is to fill
The labeled data of foot.That is, ancestral's derived components of each sample and corresponding ancestral home data.In actual application, it needs
It just can be carried out the training of machine learning model after labeled data item number, that is, sample size reaches certain amount.
The Random Forest model includes one or more hyper parameter, needs to optimize and adjust.Therefore, in order to realize
Labeled data can be divided into training set and test according to a certain percentage (such as 0.8:0.2) by the optimization of Random Forest model
Collection, is respectively used to the training of model and the optimization of hyper parameter.
The various combination of the hyper parameter of Random Forest model is set, multiple models in training set training are being tested respectively
After carrying out accuracy rate verifying on collection, the highest model of accuracy rate is chosen as final mask, and by the Model sequence and protect
It is saved as file.
When operation on line, the optimal models of the above Optimization Steps output are loaded, and by imitating with optimum prediction
The model of fruit calculates probability distribution of the sample to be detected on each ancestral home, to provide the prediction of the affiliated ancestral home of the sample
As a result.
On the other hand, when needing from absolute sense to calculate the similarity degree between certain sample and each sample, then
Using mahalanobis distance method, measures similitude between the two by mahalanobis distance and be further converted to mahalanobis distance and take
It is worth similarity of the range between 0-1.
Assuming that whole sample sets are expressed as [(x1,y1,z1...),(x2,y2,z2...),(xn,yn,zn...)].Wherein,
The letter such as { x, y, z... } is ancestral's derived components, and digital n is sample serial number, (x1,x2,x3,...,xn) be ancestral's derived components x sequence
Column.
Calculating the covariance between each ancestral's derived components sequence can get the covariance matrix of m × m, and wherein m indicates ancestral source
The quantity (can choose 42 or so in practical application) of ingredient.
For the sample set in some hereditary birthplace, the ancestral source average value u in the heredity birthplace is calculated with it.And input
Sample to be detected, then for calculating the mahalanobis distance of the sample Yu the heredity birthplace, i.e.,
Wherein, d is the mahalanobis distance of the sample to be tested of input and the ancestral source average value in the hereditary birthplace.In practical behaviour
During work, it can the mahalanobis distance for successively calculating sample to be tested and each hereditary birthplace can also construct input matrix
It is calculated simultaneously.
Specifically, the method for the construction input matrix are as follows: go out to sample ancestral derived components vector to be detected with each heredity
Ancestral's derived components mean value calculation vector difference of Radix Rehmanniae, and the vector difference is sequentially placed into the same matrix by row to construct input
Matrix.
After obtaining the input matrix, the geneva of the sample and each hereditary birthplace is calculated simultaneously by following formula
Distance:
Wherein, A is input matrix, and D is result vector, each element diFor the mahalanobis distance in corresponding hereditary birthplace.On
The calculation for stating building input matrix can make calculation expression more succinct, decrease required calculation amount.
In the present embodiment, the final value range for calculating the mahalanobis distance obtained is zero to just infinite, to keep similar
The value range of degree can be converted mahalanobis distance to value range 0-1 using following nonlinear mapping function between 0-1
Similarity:
Wherein, S indicates ancestral source similarity.
Finally, there are two kinds of result types for corresponding calculation method.The first is that return source sample is divided into each birthplace
The probability of classification, second is return source sample and the ancestral source similitude of each contrast sample.Above-mentioned Random Forest model calculates
It is all a vector that obtained ancestral source probability distribution and mahalanobis distance, which converts the ancestral source similarity to be formed,.Each member in vector
Element successively represents the probability size or similarity size in the heredity birthplace.
Above method embodiment finally calculates the hereditary birthplace calculated result of acquisition using provided in an embodiment of the present invention
Display methods is shown, so that these calculated results be allow intuitively to show to user, is perceived by the user.Such as Fig. 6 institute
Show, the method may include following steps:
610, the calculated result of sample to be tested ancestral derived components is obtained.
620, according to the calculated result, the probability in different hereditary birthplaces is determined.
Disclosed by embodiment as above, the probability in different hereditary birthplaces both can be by the heredity of sample to be tested
The probability distribution in birthplace indicates, can also be indicated by the ancestral source similarity of sample to be tested.
630, the probability in the different hereditary birthplace is shown in visual form.
Specific visualization display form can select to set according to the actual situation, such as color depth, color, clarity
Indicate that the sample to be tested heredity goes out Deng the perhaps height of the mode different regions of similar histogram or position outwardly convex
The probability of Radix Rehmanniae.
Fig. 2 is a kind of Show Styles of visualization display provided in an embodiment of the present invention.In the embodiment shown in Figure 2,
Indicate the sample to be tested in the probability in different hereditary birthplaces by the way of different color depths.Wherein, it loses
The probability for spreading out of Radix Rehmanniae is bigger, and corresponding color depth is deeper.
In the present embodiment, step 630 can specifically include following steps: firstly, according to the ground in the hereditary birthplace
Position is managed, corresponding map is generated.Then, according to the probability in hereditary birthplace, corresponding color depth is determined.Finally, institute
It states on map, each heredity birthplace is shown with corresponding color depth.
Specifically, the map includes all hereditary birthplaces.The indication range of map is covered by hereditary birthplace
Range is determined.Each heredity goes out birthplace and forms corresponding display block on the map.Such as shown in Fig. 2, map can
With using the map of China's Mainland, each display block is a provincial administrative unit.
It, can will be final similar by display block different on map in conjunction with the position of hereditary birthplace geographically
Degree result is shown in visual form, so that user can better understand data result, obtains more information.
In some embodiments, the color depth can determine in the following way: determine the color depth
Minimum value and maximum value form the variation range of color depth;By linear mapping function, by the probability in the hereditary birthplace
It is mapped in the variation range;According to the mapping result, the different corresponding color depths in hereditary birthplace is determined.
Other than using linear mapping function, in further embodiments, the color depth can also use as follows
Mode: the minimum value and maximum value of the color depth are determined, the variation range of color depth is formed;The variation range is drawn
It is divided into several sections, each section has corresponding probability;Gone out according to heredity described in the determine the probability in the hereditary birthplace
The corresponding section of Radix Rehmanniae;Using the color depth median in the corresponding section in hereditary birthplace as the corresponding color depth.
It is preferred that in order to meet the personal settings demand of user the face of user can also be received during display
Color selection instruction, and according to the color selection instruction of the user, determine the display color on the map.
In further embodiments, since the display mode of color depth may enable user that can not quickly judge probability most
Big region locks the target of oneself.And hence it is also possible to further show have most on map with corresponding cue mark
The hereditary birthplace of high probability.Also that is, showing cue mark (such as shown in Fig. 2 on the hereditary birthplace with maximum probability
), prompt several hereditary birthplaces that user is most possible.
The embodiment of the present invention still further provides a kind of display equipment.The display equipment goes out receiving corresponding heredity
After Radix Rehmanniae prediction result, the method that embodiment of the method as above provides can be executed by controller, control display unit is shown
Corresponding map provides visual prediction result display format.
In conclusion display methods provided in an embodiment of the present invention and display equipment, using the mode of color depth, on ground
The hereditary birthplace prediction result or probability distribution for showing test sample on figure in a manner of intuitive visual, can enable user
There is better understanding to final display result, user experience is more preferable.
It, can according to the technique and scheme of the present invention and this hair it is understood that for those of ordinary skills
Bright design is subject to equivalent substitution or change, and all these changes or replacement all should belong to the guarantor of appended claims of the invention
Protect range.
Claims (10)
1. a kind of display methods of heredity birthplace probability distribution characterized by comprising
Obtain the calculated result of sample to be tested ancestral derived components;
According to the calculated result, the probability in different hereditary birthplaces is determined;
The probability in the different hereditary birthplace is shown in visual form.
2. display methods according to claim 1, which is characterized in that it is described show in visual form it is described different
The probability in hereditary birthplace, comprising:
According to the geographical location in the hereditary birthplace, corresponding map is generated;
According to the probability in hereditary birthplace, corresponding color depth is determined;
On the map, each hereditary birthplace is shown with corresponding color depth.
3. display methods according to claim 2, which is characterized in that the map includes all hereditary birthplaces, often
A heredity goes out birthplace and forms corresponding display block on the map.
4. display methods according to claim 2, which is characterized in that the probability according to different hereditary birthplaces,
It determines corresponding color depth, specifically includes:
The minimum value and maximum value for determining the color depth, form the variation range of color depth;
By linear mapping function, the probability in the hereditary birthplace is mapped in the variation range;
According to the mapping result, the different corresponding color depths in hereditary birthplace is determined.
5. display methods according to claim 2, which is characterized in that the probability according to hereditary birthplace, determining pair
The color depth answered, specifically includes:
The minimum value and maximum value for determining the color depth, form the variation range of color depth;
The variation range is divided into several sections, each section has corresponding probability;
According to the corresponding section in heredity birthplace described in the determine the probability in the hereditary birthplace;
Using the color depth median in the corresponding section in hereditary birthplace as the corresponding color depth.
6. display methods according to claim 2, which is characterized in that the method also includes:
Receive the color selection instruction of user;
According to the color selection instruction of the user, the display color on the map is determined.
7. display methods according to claim 2, which is characterized in that the probability is bigger, and corresponding color depth is deeper.
8. display methods according to claim 1-7, which is characterized in that the method also includes: have most
Cue mark is shown on the hereditary birthplace of high probability.
9. display methods according to claim 1-7, which is characterized in that the probability packet in the heredity birthplace
It includes: the probability distribution in the hereditary birthplace of sample to be tested or the ancestral source similarity of sample to be tested.
10. a kind of display equipment, which is characterized in that the display equipment includes display unit and controller;The controller
For executing display methods as described in any one of claim 1 to 9, controls the display unit and show with different colours depth
The map of degree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811178756.XA CN109522378A (en) | 2018-10-10 | 2018-10-10 | The display methods and display equipment of hereditary birthplace probability distribution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811178756.XA CN109522378A (en) | 2018-10-10 | 2018-10-10 | The display methods and display equipment of hereditary birthplace probability distribution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109522378A true CN109522378A (en) | 2019-03-26 |
Family
ID=65772290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811178756.XA Pending CN109522378A (en) | 2018-10-10 | 2018-10-10 | The display methods and display equipment of hereditary birthplace probability distribution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109522378A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100145981A1 (en) * | 2008-12-05 | 2010-06-10 | 23Andme, Inc. | Gamete donor selection based on genetic calculations |
CN105051208A (en) * | 2013-03-28 | 2015-11-11 | 深圳华大基因股份有限公司 | Method, system, and computer readable medium for determining base information of predetermined area in fetal genome |
US20170329899A1 (en) * | 2014-10-29 | 2017-11-16 | 23Andme, Inc. | Display of estimated parental contribution to ancestry |
US20170329866A1 (en) * | 2012-06-06 | 2017-11-16 | 23Andme, Inc. | Determining family connections of individuals in a database |
US20170330358A1 (en) * | 2008-03-19 | 2017-11-16 | 23Andme, Inc. | Ancestry painting |
CN108268753A (en) * | 2018-01-25 | 2018-07-10 | 清华大学 | A kind of microorganism group recognition methods and device, equipment |
-
2018
- 2018-10-10 CN CN201811178756.XA patent/CN109522378A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170330358A1 (en) * | 2008-03-19 | 2017-11-16 | 23Andme, Inc. | Ancestry painting |
US20100145981A1 (en) * | 2008-12-05 | 2010-06-10 | 23Andme, Inc. | Gamete donor selection based on genetic calculations |
US20170329866A1 (en) * | 2012-06-06 | 2017-11-16 | 23Andme, Inc. | Determining family connections of individuals in a database |
CN105051208A (en) * | 2013-03-28 | 2015-11-11 | 深圳华大基因股份有限公司 | Method, system, and computer readable medium for determining base information of predetermined area in fetal genome |
US20170329899A1 (en) * | 2014-10-29 | 2017-11-16 | 23Andme, Inc. | Display of estimated parental contribution to ancestry |
CN108268753A (en) * | 2018-01-25 | 2018-07-10 | 清华大学 | A kind of microorganism group recognition methods and device, equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Almet et al. | The landscape of cell–cell communication through single-cell transcriptomics | |
Ssekagiri et al. | microbiomeSeq: An R package for analysis of microbial communities in an environmental context | |
Lee et al. | Distinguishing among modes of convergent adaptation using population genomic data | |
von Kamp et al. | Use of CellNetAnalyzer in biotechnology and metabolic engineering | |
Handley et al. | Going the distance: human population genetics in a clinal world | |
Bird et al. | Detecting and measuring genetic differentiation | |
Xiao et al. | Realized niches explain spatial gradients in seasonal abundance of phytoplankton groups in the South China Sea | |
De Meeûs et al. | A step-by-step tutorial to use HierFstat to analyse populations hierarchically structured at multiple levels | |
Harbinson et al. | High throughput screening with chlorophyll fluorescence imaging and its use in crop improvement | |
CN105868584B (en) | The method for carrying out full-length genome selection and use by choosing extreme character individual | |
CN109346124A (en) | Genes location method based on SNP parting | |
CN110029187A (en) | A kind of application for marking the method for map based on competitive equipotential PCR building rice molecular and it being utilized to carry out breeding | |
CN113278712B (en) | Gene chip, molecular probe combination, kit and application for analyzing sheep hair color | |
CN107025384A (en) | A kind of construction method of complex data forecast model | |
Maniatis et al. | Positional cloning by linkage disequilibrium | |
Edwards et al. | Generating linkage disequilibrium patterns in data simulations using genomeSIMLA | |
Mathieson | Human adaptation over the past 40,000 years | |
Dellicour et al. | Landscape genetic analyses of Cervus elaphus and Sus scrofa: comparative study and analytical developments | |
CN111524545A (en) | Method and apparatus for whole genome selective breeding | |
Zwiessele et al. | Topslam: Waddington landscape recovery for single cell experiments | |
CN109473142A (en) | The construction method of sample data sets and its hereditary birthplace prediction technique | |
Lamb et al. | PconsFam: an interactive database of structure predictions of Pfam families | |
Werth et al. | Propagule size is not a good predictor for regional population subdivision or fine-scale spatial structure in lichenized fungi | |
CN109522378A (en) | The display methods and display equipment of hereditary birthplace probability distribution | |
CN105740649A (en) | Multi-character correlation analysis method based on mixed linear model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190326 |
|
RJ01 | Rejection of invention patent application after publication |