CN113380321A - Method and device for evaluating quality of genetic map and computer-readable storage medium containing same - Google Patents

Method and device for evaluating quality of genetic map and computer-readable storage medium containing same Download PDF

Info

Publication number
CN113380321A
CN113380321A CN202110667699.7A CN202110667699A CN113380321A CN 113380321 A CN113380321 A CN 113380321A CN 202110667699 A CN202110667699 A CN 202110667699A CN 113380321 A CN113380321 A CN 113380321A
Authority
CN
China
Prior art keywords
genetic
map
genetic map
recombination
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110667699.7A
Other languages
Chinese (zh)
Inventor
马玉昆
李伟华
张晓伟
孙琼琳
李峰峰
王帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fruit Shell Biotechnology Co ltd
Original Assignee
Beijing Fruit Shell Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Fruit Shell Biotechnology Co ltd filed Critical Beijing Fruit Shell Biotechnology Co ltd
Priority to CN202110667699.7A priority Critical patent/CN113380321A/en
Publication of CN113380321A publication Critical patent/CN113380321A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a device for evaluating the quality of a genetic map and a computer readable storage medium containing the same, wherein the method comprises the following steps: the method comprises the following steps of carrying out icon marking index statistics on a genetic map, carrying out collinear analysis on the genetic map and a genome physical map, carrying out genetic map recombination heat map analysis, carrying out genetic map recombination hot spot analysis, and carrying out genetic map partial separation region analysis. By processing the biological information analysis result of the upstream genetic map, the genetic map index is reevaluated, and meanwhile, a method for identifying a recombination hot spot region and a partial separation region is provided, so that the problem that the existing genetic map quality evaluation system is incomplete is solved.

Description

Method and device for evaluating quality of genetic map and computer-readable storage medium containing same
Technical Field
The present invention relates to the field of bioinformatics, and more particularly, to a method and an apparatus for quality assessment of genetic maps and a computer-readable storage medium containing the same.
Background
The genetic map is also called genetic linkage map, is a map for storing relative genetic positions of genetic markers on a genome, and the theoretical basis for constructing the genetic map is recombination exchange during meiosis. From early first generation sequencing microsatellite and other markers to the current high-throughput second generation sequencing polymorphic molecular marker, the genetic marker detection of the genetic map is more accurate, the quantity is more, the coverage degree to the genome is higher, the genetic recombination information in most hybridization experiments can be reflected, and the genetic map is widely used for quantitative character positioning, genome-assisted assembly and species evolution research. Especially in animal and plant genome research, genetic maps play a crucial role.
The research on the characteristics of species, the processing method of hybridization experiments, the normalization and the strategy of the hybridization experiments and other factors directly influence the genetic recombination information of the genetic polymorphism between parents and the progeny population, and the genetic recombination information of the genetic polymorphism between parents and the progeny population directly influences the quality of the genetic map construction. Therefore, the genetic map results of different genetic groups are very different, and the genetic map quality assessment is very important.
At present, the conventional analysis method for genetic map quality evaluation mainly comprises genetic map index statistics, colinearity analysis and genetic recombination heat map analysis, but has the following problems:
considering that the genetic markers with similar genetic background in the genetic map often have genotype repetition, the redundant repeated markers are removed before the indexes are evaluated and before the recombination heat map analysis, so that the genetic recombination times on the linkage group can be reflected more truly.
The genetic map collinearity analysis reflects the consistency of the genetic map and the physical map, and the result comprises a highly conserved region and a chromosome recombination region of the genetic map and the physical map. Because the genetic distance of the genetic marker is a relative distance, whether the sequence of the genetic position is opposite to that of the physical position is judged before the colinearity analysis, and then whether the inversion processing is carried out is judged, so that the consistency of the positive sequence of the genetic map and the physical map in a highly conserved region is ensured.
The genetic map recombination hot spot is a section which has high transmission genetic recombination frequency, and is displayed in the form that a larger genetic distance difference appears in a shorter physical position interval in the genetic map. The genetic map recombination hot spots are related to the molecular marker density of the polymorphism between parents, the distribution density of genes and the GC base content in the regions, and the genetic map recombination hot spots are mainly distributed in the telomere region of a chromosome and are fewer in the centromere and the upstream and downstream regions thereof, so that the distribution of the genetic map recombination hot spot regions in exons is obviously higher than that of introns, and the distribution frequency in the upstream and downstream region regions of transposon initiation and transposon termination is higher, which indicates that the genetic map recombination hot spot research is favorable for mining the genetic information in the regulation and control region with excellent characters. Meanwhile, the research shows that the methylation level influences the recombination frequency of the chromosome, and the hot spot research of genetic map recombination is favorable for verifying the biological process of chromosome recombination exchange by combining the epigenomics. In addition, in genomics research, genetic map recombination hot spots help to determine the genome rearrangement region and the centromere region.
Segregation in the genetic population is a common phenomenon, which means that the proportion of genotypes observed in the genetic population deviates significantly from the genotype frequencies expected by Mendelian's Law, and is common in highly heterozygous plants and aquatic species in particular. The species type, the genetic population type, the population construction process, the development of polymorphic marker types and other factors play a decisive role in the segregation source, degree and effect of the genetic marker. Since it does not follow the law of separation, systematic analysis cannot be performed according to existing genetic theories. However, it has been shown that segregation, which is related to the frequency change of genes in plants under selective pressure, plays an important role in the evolution of sex and recombination in sexual reproduction of animals and in reproductive isolation. The segregation phenomenon can be considered as a powerful evolutionary driving force, and is particularly important for researching segregation regions in genetic maps. The existing method can only judge whether a single mark is a partial separation mark or not, and a partial separation area is not clearly defined.
Disclosure of Invention
The invention aims to provide a method and a device for evaluating the quality of a genetic map, which solve the problem that the existing genetic map quality evaluation system is not perfect. The complete quality evaluation of the genetic map is realized by carrying out icon mark index statistics on the genetic map, carrying out collinear analysis on the genetic map and a genome physical map, carrying out genetic map recombination heat map analysis, carrying out genetic map recombination hot spot analysis and analyzing a genetic map partial separation region.
The invention claims a method for evaluating the quality of a genetic map, which comprises the following steps:
1) carrying out icon marking index statistics on the genetic map based on the biological information of the genetic map to obtain a statistical result of the marked icons;
2) carrying out colinearity analysis on the genetic map and the genome physical map based on the statistical result in the step 1),
3) analyzing the recombination heat map of the genetic map based on the statistical result in the step 1),
4) analyzing the recombination hot spot of the genetic map based on the statistical result in the step 1),
5) analyzing partial separation regions of the genetic map based on the statistical results in step 1)
And the whole quality evaluation is completed through genetic map recombination heat map analysis, genetic map recombination hot spot analysis and genetic map partial separation area analysis.
Wherein, the method for counting the icon marking indexes on the genetic map in the step 1) comprises the following steps:
traversing the genetic distance of the genetic marker on each linkage group in the genetic map, and if the genetic distance of the first genetic marker of the linkage group is not 0, subtracting the genetic distance of the first genetic marker from all the genetic markers of the linkage group according to the principle that the genetic distance between adjacent physical markers is not changed, namely performing equal difference processing; (if the first is 0, it indicates a normal case and no processing is required);
after the arithmetic difference processing is carried out, traversing the genetic distance of the genetic marker on each linkage group in the genetic map, and carrying out de-duplication processing on the genetic markers on the map according to the principle that the adjacent genetic markers are regarded as duplicated if the genetic distances are the same, namely, only one genetic marker with the same genetic distance is reserved;
and (4) counting the number of the genetic markers after the deduplication of each linkage group, the total genetic distance, the average genetic distance and the maximum genetic distance difference between adjacent genetic markers, namely the gap value.
Wherein the method for analyzing the co-linearity of the genetic map and the genome physical map in the step 2) comprises the following steps:
counting the genetic distance and the physical position of the genetic marker in each linkage group based on the data after the icon marker index on the genetic map is counted, and if the numerical sequence of more than 90 percent of the genetic distance and the physical position is opposite, reversing the genetic markers on the linkage groups according to the principle that the genetic distance between adjacent physical markers is unchanged;
and (5) carrying out co-linear result visualization on the processed genetic map by using an R language drawing tool.
Wherein, the method for analyzing the genetic map recombination heat map in the step 3) comprises the following steps: and constructing a genetic marker recombination matrix according to the information of the genetic distance gap between adjacent genetic markers in the icon marker index statistics on the genetic map, and carrying out analysis visualization on the recombination heat map by using an R language drawing tool.
Wherein, the method for analyzing the genetic map recombination hot spot in the step 4) comprises the following steps: calculating an average gap value according to the information of the genetic distance gap between adjacent genetic markers in the marker index statistics on the genetic map, and classifying according to the average gap value range; and (4) counting a continuous region with the gap ratio <1 by a sliding window, namely a recombination hotspot region (if nothing else is), and performing recombination hotspot display by using a perl svg module.
The method for classifying according to the average gap value range comprises the following steps: using the genetic position interval/physical position interval between adjacent markers as an average gap value (Cm/Mb), which is abbreviated as gap ratio, classifying the average gap value according to the following criteria: the gap ratio is <1, 1 ═ 2,2 ═ 5, and 5.
Wherein, the method for analyzing the partial separation region of the genetic map in the step 5) comprises the following steps: performing chi-square test on genotype frequency of the genetic marker in the genetic map result based on data after the statistics of the marker indexes on the genetic map, deducing whether the genetic marker has partial separation, and determining the direction of the partial separation according to the genotype;
the method for determining the partial separation area comprises the following steps: and (3) adjacent genetic markers are partially separated, and the partial separation directions are consistent, namely the genetic markers are all biased to the female parent or are all biased to the male parent, so that the physical position interval corresponding to the genetic distance region is defined as a partial separation region.
The invention also provides a quality evaluation system of the genetic map, which comprises a genetic map marking index statistical module, a genetic map and genome physical map collinear analysis module, a genetic map recombination heat map analysis module, a genetic map recombination hot spot analysis module and a genetic map partial separation region analysis module;
the genetic map spectrogram marking index statistical module collects genetic map biological information and carries out statistical processing to obtain and output a statistical result of a marking icon;
the genetic map and genome physical map collinearity analysis module receives and further processes the statistical result of the marked icons transmitted by the genetic map mark index statistical module to obtain a visual result of collinearity analysis;
the genetic map recombination heat map analysis module receives and further processes the statistical result of the marked icons transmitted by the genetic map marking index statistical module to obtain a visual result of genetic map recombination heat map analysis;
the genetic map recombination hotspot analysis module receives and further processes the statistical result of the marked icons transmitted by the genetic map marking index statistical module to obtain a visual result of genetic map recombination hotspot analysis;
and the genetic map partial separation area analysis module receives the statistical result of the marked icons transmitted by the genetic map marking index statistical module and further processes the statistical result to obtain a genetic map partial separation area.
The present invention also provides a computer-readable storage medium having stored thereon a computer program which is executed by a processor to implement the method for quality assessment of genetic maps.
Compared with the prior art, the method and the device for evaluating the quality of the genetic map have the following advantages that:
firstly, before the genetic map index statistics, the recombination heat map analysis and the colinearity analysis of the genetic map and the physical map are carried out, the genetic distance information of the genetic markers of each linkage group in the original genetic map is primarily processed according to the constant genetic distance between adjacent physical markers and the fact that the adjacent genetic markers are identical in genetic distance, and the consistency of the recombination exchange information in the genetic map and the physical map can be reflected more truly.
Secondly, the recombination hotspot regions RHRs of the genetic map are important analysis options in subsequent data mining, and the quality evaluation method and the device of the genetic map systematically provide a method for judging the recombination hotspot regions according to the statistics of the gap ratio value in a sliding window.
Finally, the segregation phenomenon is a powerful evolutionary driving force, and the quality evaluation method and the quality evaluation device of the genetic map systematically propose a method for determining the segregation region. The method has the advantages of considering the bias of the genetic marker and eliminating the influence of the abnormal marker on the judgment of the bias separation region. The method is put forward for the first time in the evaluation of genetic map quality, is convenient for researchers to locate partially separated genes and discover more partially separated gene sites, and is very important to genetic research and improvement of animals and plants.
Drawings
FIG. 1 shows the molecular marker information of each linkage group of the genetic map in the example of the present invention.
FIG. 2 is a graph showing the calculation of presence or absence of duplication, gap value, gap ratio value range, partial separation level P value, and partial separation direction based on the results of FIG. 1.
FIG. 3 is a statistical analysis of the genetic map indices based on the results of FIGS. 1 and 2.
FIG. 4 is a visualization of the genetic map based on the results of FIG. 2.
FIG. 5 is a co-linear representation of a genetic map and a physical map based on the results of FIG. 2.
FIG. 6 is a display of a genetic map recombination heatmap based on the results of FIG. 2.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
The invention provides a method and a device for evaluating the quality of a genetic map, which solve the problem that the existing genetic map quality evaluation system is not perfect.
The following embodiments are described in detail with reference to the accompanying drawings, so that how to implement the technical features of the present invention to solve the technical problems and achieve the technical effects can be fully understood and implemented.
The implementation method in the embodiment of the application has the following general idea:
a quality evaluation method of a genetic map is characterized by comprising the steps of carrying out icon mark index statistics on the genetic map, carrying out collinear analysis on the genetic map and a genome physical map, carrying out genetic map recombination heat map analysis, carrying out genetic map recombination hot spot analysis and carrying out genetic map partial separation region analysis.
And (4) counting the genetic position information of the genetic marker on each linkage group in the genetic map according to the result of the biological information analysis of the upstream genetic map.
And according to the principle that the genetic distance between adjacent physical markers is not changed, if the genetic position of the first genetic marker of the linkage group is not 0, subtracting the genetic distance of the first genetic marker from all genetic markers of the linkage group, and performing equal difference processing.
And counting the genetic distance information of the genetic markers on each linkage group in the genetic map again, performing deduplication processing on the genetic markers on the map according to the principle that the adjacent genetic markers are regarded as repetitive if the genetic distances are the same, namely only one genetic marker with the same genetic distance exists, and counting the total number of the genetic markers before the retained genetic markers are deduplicated to a data storage object, so that the subsequent genetic map index statistics is convenient. Seven kinds of statistical data in the second graph: gap, pos gap, gap ratio, P, female, male, het gap and pos gap are used for calculating the gap ratio; gap ratio, P, fe, le, het provide the basis for the computation in the following steps.
After the repeated genetic markers are removed, the total number of the non-redundant markers on each linkage group, the total length of the genetic distance of each linkage group and the gap value information between adjacent genetic markers are counted to a data storage object, so that the subsequent genetic map index statistics and the genetic map recombination hotspot analysis are facilitated.
Counting the genetic distance and the physical position of the original genetic marker in each linkage group, and if the numerical sequence of more than 90 percent of the genetic distance and the physical position is opposite, storing opposite genetic position information to a data storage object; if the same or not exactly the opposite, no processing is required.
Because the genetic distance is a relative distance, the genetic markers on the linkage group are subjected to reverse order inversion treatment according to the principle that the genetic distance between adjacent physical markers is not changed.
And after inversion processing, storing the genetic distance and the physical position information of all the genetic markers in a data storage object, calling an R language drawing tool, and performing collinear visualization analysis.
And calculating the average genetic distance between the adjacent genetic markers, namely genetic position interval/physical position interval, which is marked as gap ratio, based on the gap value between the adjacent genetic markers. And (4) carrying out sliding window treatment on the genetic markers in each linkage group according to 3 genetic markers in a window and 1 genetic marker in a step length. The gap ratio values of 3 genetic markers in each window are counted, and if the gap ratio of at most 1 genetic marker is less than 1(cM/Mb) and the other two genetic markers are adjacent, i.e., continuous on the chromosome, the region is considered to be 1 recombination hotspot region RHR.
And after the window is drawn to judge the RHR, counting all the RHR intervals, if the RHR intervals are continuous, combining the RHR intervals into 1 total RHR, and storing the RHR to a data storage object after counting is completed so as to facilitate subsequent visual analysis of the genetic map recombination hotspot region.
And (3) counting the genotype information of the original genetic marker in each linkage group, intercepting the genotype of the filial generation according to a filial generation sample list, and counting the number and the proportion of the genotype in each genetic marker, wherein the genotype of the filial generation is mainly divided into the genotype consistent with that of the female parent (marked as "perfect"), the genotype consistent with that of the male parent (marked as "male"), the heterozygous genotype (marked as "het"), the genotype deficiency (marked as "miss"), and the miss genotype is not used for partial segregation analysis. While heterozygous genotypes are only present in population types where at least 1 of the parents, such as F2, RIL, DH, BC, is homozygous.
And (3) counting the female parent genotype consistent with the female parent genotype, the male parent genotype consistent with the male parent genotype, the heterozygous genotype het, calculating a chi-square value according to a chi-square test formula, and converting into a partial separation significant level P value by using a distribution method.
If the P value of the segregation significance level of the genetic marker is less than the set threshold value of 0.001, the genetic marker is considered as a segregation locus, and the segregation direction of the genetic marker is further counted.
And if the number of the females consistent with the female parent genotype in the genotype statistical result of the segregation genetic marker is larger than that of the male parent genotype, the segregation genetic marker is considered to be biased to the female parent. Saving the partial separation direction information to the data storage object.
And if the number of the mas consistent with the male parent genotype is larger than that of the famale consistent with the female parent genotype in the genotype statistical result of the segregation genetic marker, the segregation genetic marker is considered to be biased to the male parent. Saving the partial separation direction information to the data storage object.
And after the partial segregation statistics is completed, counting partial segregation genetic markers on each linkage group, and if 3 or more than 3 adjacent genetic markers exist in a certain region and are all partial segregation genetic markers and the partial segregation directions are consistent, namely the genetic markers are all biased to female parents or are all biased to male parents, considering the region as a genetic map partial segregation region.
In the method, before counting map indexes on the genetic map, the genetic marker genetic distance information of each linkage group in the original genetic map is subjected to preliminary processing, and the consistency of the recombination exchange information in the genetic map and the physical map can be reflected more truly. Meanwhile, an identification method of the genetic map recombination hot spot region and the genetic map partial separation region is provided, and the defects of the current genetic map quality evaluation system and the technical vacancy of the identification method of the genetic map recombination hot spot region and the genetic map partial separation region are overcome.
In order to realize the method, the invention also provides a quality evaluation system of the genetic map, which comprises a genetic map marking index statistical module, a genetic map and genome physical map collinear analysis module, a genetic map recombination heat map analysis module, a genetic map recombination hotspot analysis module and a genetic map partial separation region analysis module;
the genetic map spectrogram marking index statistical module collects genetic map biological information and carries out statistical processing to obtain and output a statistical result of a marking icon;
the genetic map and genome physical map collinearity analysis module receives and further processes the statistical result of the marked icons transmitted by the genetic map mark index statistical module to obtain a visual result of collinearity analysis;
the genetic map recombination heat map analysis module receives and further processes the statistical result of the marked icons transmitted by the genetic map marking index statistical module to obtain a visual result of genetic map recombination heat map analysis;
the genetic map recombination hotspot analysis module receives and further processes the statistical result of the marked icons transmitted by the genetic map marking index statistical module to obtain a visual result of genetic map recombination hotspot analysis;
and the genetic map partial separation area analysis module receives the statistical result of the marked icons transmitted by the genetic map marking index statistical module and further processes the statistical result to obtain a genetic map partial separation area.
For purposes of clarity and clarity of the description of the embodiments, the details of the methods in the examples are set forth in the accompanying drawings, which are a part of the examples and are not intended to be exhaustive or all of the examples. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, belong to the protection scope of the present invention.
The embodiment provides a genetic map quality evaluation method aiming at the construction result of F2 population sequencing data genetic maps of 200 rice.
FIG. 1 shows the molecular marker information (original information) of each linkage group of the genetic map in the example of the present invention, and the top 5 columns in FIG. 1 respectively show the genetic polymorphism markers, the number of linkage groups in which the markers are located, the genetic distance of the markers, the chromosome/scaffold number in which the markers are located, and the physical positions of the markers. Columns 6-7 are the genotypes of the parents, with the default female parent in front and the default male parent in back. The genotypes of the progeny are listed in column 8 and thereafter.
The information in FIG. 1 is derived from the analysis of upstream genetic map bioinformatics, and it is common to construct genetic populations, develop inter-parental polymorphic markers such as SNP (single nucleotide mutation) markers by high throughput sequencing or chip techniques, obtain high quality genetic markers after minimal coverage depth, minimal mass value, integrity, segregation and filtration, and perform genetic map construction using existing analysis software such as join map. After the genetic distance information of the genetic marker is obtained, the format in the figure 1 can be obtained by processing the genetic distance information by a computer programming language according to the physical position and the genotype of the genetic marker.
Aiming at various information in the figure 1, the quality evaluation of the genetic map is carried out by the quality evaluation system of the genetic map, and the specific method is as follows:
according to the principle that the genetic distance between adjacent physical markers is not changed, if the genetic position of the first genetic marker of the linkage group is not 0, the genetic distance of the first genetic marker is subtracted from all genetic markers of the linkage group, and the genetic distance of each linkage group is ensured to be from 0 centimorgan.
And (3) according to the principle that adjacent genetic markers are regarded as repeated if the genetic distances are the same, carrying out deduplication processing on the genetic markers of the upper map, reserving only one genetic marker at the same genetic distance, and counting the total number of the genetic markers before the genetic markers are reserved for recombination to the data storage object. (the number of genetic markers after de-duplication corresponds to the second row marker _ num in FIG. 3, the total genetic distance corresponds to the third row length in FIG. 3, the average genetic distance corresponds to the fourth row average _ length in FIG. 3, and the maximum genetic distance difference, i.e., gap value, between adjacent genetic markers corresponds to the fifth row max _ gap in FIG. 3.)
The total number of non-redundant markers on each linkage group, the total length of genetic distance of each linkage group, and the gap value information between adjacent genetic markers are counted to the data storage object, as shown in column 6 of FIG. 2. And judging whether the numerical sequence of the genetic distance and the physical position is completely opposite. If not, the genetic markers on the linkage group are processed by reverse order and inversion according to the principle that the genetic distance between the adjacent physical markers is not changed. After the inversion processing, calling an R language drawing tool to perform the collinear visualization analysis, as shown in FIG. 5, wherein the two sides in FIG. 5 are a genetic map and a physical map respectively. Calling R language drawing tool to perform recombination heat map visualization analysis, as shown in FIG. 6, wherein the color of FIG. 6 is closer to red, the recombination rate between genetic markers is smaller, and the linkage is more compact.
Calculating genetic distance densities of the adjacent genetic markers based on the gap values between the adjacent genetic markers. I.e. genetic location interval/physical location interval, i.e. the gap ratio according to the invention. And (4) carrying out sliding window treatment on the genetic markers in each linkage group according to 3 genetic markers in a window and 1 genetic marker in a step length. The gap ratio values of 3 genetic markers in each window are counted, and if the gap ratio of at most 1 genetic marker is less than 1(cM/Mb) and the other two genetic markers are adjacent, i.e., continuous on the chromosome, the region is considered to be 1 recombination hotspot region RHR. And counting all RHR intervals after the window is drawn, and if the RHR intervals are continuous, combining the RHR intervals into 1 total RHR, namely the 10 th column in the figure 2.
After the gap statistics are completed, genetic maps are visualized with perl svg, as shown in FIG. 4, where the different colored regions in FIG. 4 represent different gap ratio regions.
And (4) counting the genotype information of the original genetic marker in each linkage group, wherein the genotype of the filial generation is consistent with that of the female parent (marked as female), is consistent with that of the male parent (marked as male), and is heterozygous (marked as het). The chi-squared value was calculated according to the chi-squared test and converted to the partial separation significance level P value using the distribution method, i.e. columns 11,12,13,14 in fig. 2.
If the P value of the segregation significance level of the genetic marker is less than the set threshold value of 0.001, the genetic marker is considered to be a segregation locus, the number of the females consistent with the female parent genotype is greater than the number of the mases consistent with the male parent genotype, the segregation genetic marker is considered to be biased to the female parent, and on the contrary, the segregation genetic marker is considered to be biased to the male parent or heterozygous. If 3 or more than 3 adjacent genetic markers in a certain region are all segregation genetic markers and the segregation directions are consistent, namely the genetic markers are all biased to female parents or are all biased to male parents, the region is considered as a genetic map segregation region SDR.
The results obtained by the above method are counted and compared with those in fig. 1, and the results are shown in fig. 2, wherein the information in fig. 2 is the result of genetic map result statistics, the first 5 columns are basic information, the rows 6 to 15 are respectively the genetic position gap of the adjacent genetic marker, the physical position gap of the genetic marker, the gap ratio value described in the present invention, RHR (Y represents recombination hot spot region, N represents not recombination hot spot region), RHRs (Y represents combined recombination hot spot region, N represents not recombination hot spot region), P represents partial segregation degree level value, Female represents number of genotype biased toward Female parent, Male represents number of genotype biased toward copy, het represents number of genotype biased toward heterozygous, SDR represents partial segregation region described in the present invention (Y represents partial segregation region, N represents not partial segregation region).
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method for assessing the quality of a genetic map, comprising the steps of:
1) carrying out icon marking index statistics on the genetic map based on the biological information of the genetic map to obtain a statistical result of the marked icons;
2) carrying out colinearity analysis on the genetic map and the genome physical map based on the statistical result in the step 1),
3) analyzing the recombination heat map of the genetic map based on the statistical result in the step 1),
4) analyzing the recombination hot spot of the genetic map based on the statistical result in the step 1),
5) analyzing partial separation regions of the genetic map based on the statistical results in step 1)
And the whole quality evaluation is completed through genetic map recombination heat map analysis, genetic map recombination hot spot analysis and genetic map partial separation area analysis.
2. The method of assessing the quality of a genetic map according to claim 1, wherein the genetic map is characterized by
The method for counting the icon marking indexes on the genetic map in the step 1) comprises the following steps:
traversing the genetic distance of the genetic marker on each linkage group in the genetic map, and if the genetic distance of the first genetic marker of the linkage group is not 0, subtracting the genetic distance of the first genetic marker from all the genetic markers of the linkage group according to the principle that the genetic distance between adjacent physical markers is not changed, namely performing equal difference processing;
after the arithmetic difference processing is carried out, traversing the genetic distance of the genetic marker on each linkage group in the genetic map, and carrying out de-duplication processing on the genetic markers on the map according to the principle that the adjacent genetic markers are regarded as duplicated if the genetic distances are the same, namely, only one genetic marker with the same genetic distance is reserved;
and (4) counting the number of the genetic markers after the deduplication of each linkage group, the total genetic distance, the average genetic distance and the maximum genetic distance difference between adjacent genetic markers, namely the gap value.
3. The method of assessing the quality of a genetic map according to claim 1, wherein the genetic map is characterized by
The method for the co-linear analysis of the genetic map and the genome physical map in the step 2) comprises the following steps:
counting the genetic distance and the physical position of the genetic marker in each linkage group based on the data after the icon marker index on the genetic map is counted, and if the numerical sequence of more than 90 percent of the genetic distance and the physical position is opposite, reversing the genetic markers on the linkage groups according to the principle that the genetic distance between adjacent physical markers is unchanged;
and (5) carrying out co-linear result visualization on the processed genetic map by using an R language drawing tool.
4. The method for quality assessment of genetic maps according to claim 1, characterized in that the method of genetic map recombination thermographic analysis in step 3) is: constructing a genetic marker recombination matrix according to the information of the genetic distance gap between adjacent genetic markers in the iconic marker index statistics on the genetic map as claimed in claim 2, and performing recombination heat map analysis visualization by using an R language drawing tool.
5. The method for evaluating the quality of a genetic map according to claim 1, wherein the method for analyzing the genetic map recombination hotspot in step 4) is: calculating an average gap value according to the information of the genetic distance gap between adjacent genetic markers in the marker index statistics on the genetic map according to claim 2, and classifying according to the average gap value range; and counting a continuous area with an average gap value less than 1 by using a sliding window, namely a recombination hotspot area, and displaying the recombination hotspot by using a perl svg module.
6. The method for assessing the quality of a genetic map according to claim 5, wherein the classification according to the average gap value range is performed by: using the genetic position interval/physical position interval between adjacent markers as an average gap value (Cm/Mb), which is abbreviated as gap ratio, classifying the average gap value according to the following criteria: the gap ratio is <1, 1 ═ 2,2 ═ 5, and 5.
7. The method for evaluating the quality of a genetic map according to claim 1, wherein the method for analyzing the segregation regions of a genetic map in the step 5) is: and performing chi-square test on the genotype frequency of the genetic marker in the genetic map result based on the data after the statistics of the marker indexes on the genetic map, deducing whether the genetic marker has partial separation, and determining the direction of the partial separation according to the genotype.
8. The method of assessing the quality of a genetic map according to claim 7, wherein the method of determining the segregation regions is: and (3) adjacent genetic markers are partially separated, and the partial separation directions are consistent, namely the genetic markers are all biased to the female parent or are all biased to the male parent, so that the physical position interval corresponding to the genetic distance region is defined as a partial separation region.
9. A quality evaluation system of genetic map is characterized in that the method comprises a genetic map marking index statistical module, a genetic map and genome physical map collinearity analysis module, a genetic map recombination heat map analysis module, a genetic map recombination hot spot analysis module and a genetic map partial separation region analysis module;
the genetic map spectrogram marking index statistical module collects genetic map biological information and carries out statistical processing to obtain and output a statistical result of a marking icon;
the genetic map and genome physical map collinearity analysis module receives and further processes the statistical result of the marked icons transmitted by the genetic map mark index statistical module to obtain a visual result of collinearity analysis;
the genetic map recombination heat map analysis module receives and further processes the statistical result of the marked icons transmitted by the genetic map marking index statistical module to obtain a visual result of genetic map recombination heat map analysis;
the genetic map recombination hotspot analysis module receives and further processes the statistical result of the marked icons transmitted by the genetic map marking index statistical module to obtain a visual result of genetic map recombination hotspot analysis;
and the genetic map partial separation area analysis module receives the statistical result of the marked icons transmitted by the genetic map marking index statistical module and further processes the statistical result to obtain a genetic map partial separation area.
10. A computer-readable storage medium, characterized in that a computer program is stored thereon, which is executed by a processor to implement the method for quality assessment of a genetic map according to any one of claims 1 to 8.
CN202110667699.7A 2021-06-16 2021-06-16 Method and device for evaluating quality of genetic map and computer-readable storage medium containing same Withdrawn CN113380321A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110667699.7A CN113380321A (en) 2021-06-16 2021-06-16 Method and device for evaluating quality of genetic map and computer-readable storage medium containing same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110667699.7A CN113380321A (en) 2021-06-16 2021-06-16 Method and device for evaluating quality of genetic map and computer-readable storage medium containing same

Publications (1)

Publication Number Publication Date
CN113380321A true CN113380321A (en) 2021-09-10

Family

ID=77572788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110667699.7A Withdrawn CN113380321A (en) 2021-06-16 2021-06-16 Method and device for evaluating quality of genetic map and computer-readable storage medium containing same

Country Status (1)

Country Link
CN (1) CN113380321A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102952855A (en) * 2011-08-26 2013-03-06 深圳华大基因科技有限公司 Genetic map construction method and device, haplotype analytical method and device
CN103525917A (en) * 2013-09-24 2014-01-22 北京百迈客生物科技有限公司 Construction and evaluation of parting High Map on basis of high throughput
CN112182247A (en) * 2020-10-15 2021-01-05 华中农业大学 Genetic population map construction method and system, storage medium and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102952855A (en) * 2011-08-26 2013-03-06 深圳华大基因科技有限公司 Genetic map construction method and device, haplotype analytical method and device
CN103525917A (en) * 2013-09-24 2014-01-22 北京百迈客生物科技有限公司 Construction and evaluation of parting High Map on basis of high throughput
CN112182247A (en) * 2020-10-15 2021-01-05 华中农业大学 Genetic population map construction method and system, storage medium and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MARILYN A.L. WEST等: "High-density haplotyping with microarray-based expression and single feature polymorphism markers in Arabidopsis", GENOME RESEARCH, vol. 15, pages 787 - 795 *
XUEHUI HUANG等: "Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm", NATURE GENETICS, vol. 44, no. 1, pages 32 - 39 *

Similar Documents

Publication Publication Date Title
Sigmon et al. Content and performance of the MiniMUGA genotyping array: a new tool to improve rigor and reproducibility in mouse research
Ober et al. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster
Drouaud et al. Sex-specific crossover distributions and variations in interference level along Arabidopsis thaliana chromosome 4
CN110211635B (en) Method for livestock and poultry genome selection analysis and livestock and poultry breeding method
CN113113081B (en) System for detecting polyploid and genome homozygous region ROH based on CNV-seq sequencing data
AU2021104302A4 (en) Marker primer combination for molecular identification of quantitative traits of spines of apostichopus japonicus and use thereof
CN116030892B (en) System and method for identifying chromosome reciprocal translocation breakpoint position
CN110444253B (en) Method and system suitable for mixed pool gene positioning
Rubinstein et al. Ultrahigh-density linkage map for cultivated cucumber (Cucumis sativus L.) using a single-nucleotide polymorphism genotyping array
Leibon et al. A SNP streak model for the identification of genetic regions identical-by-descent
Rosa et al. Review of microarray experimental design strategies for genetical genomics studies
US20210269887A1 (en) Method and application for rapid and accurate chromosomal location of economic traits in laver
CN117095746A (en) GBS whole genome association analysis method for buffalo
CN113380321A (en) Method and device for evaluating quality of genetic map and computer-readable storage medium containing same
CN114303202A (en) System and method for determining genetic patterns in embryos
Macpherson et al. Extensive linkage disequilibrium in the achaete-scute complex of Drosophila melanogaster.
JP7446343B2 (en) Systems, computer programs and methods for determining genome ploidy
CN114921536A (en) Method, device, storage medium and equipment for detecting uniparental diploid and loss of heterozygosity
CN114566213A (en) Single-parent diploid analysis method and system for family high-throughput sequencing data
CN114990202A (en) Application of SNP (Single nucleotide polymorphism) locus in evaluation of genome abnormality and method for evaluating genome abnormality
CN114743596A (en) Mendelian genetic error analysis method based on second-generation sequencing data
Boopathi et al. QTL analysis
GB2481783A (en) Data presentation and annotation software for genetic copy number analysis.
Arnheim et al. Genetic mapping by single sperm typing
EP3825414A1 (en) Method for the study of embryo mutations in vitro reproduction processes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210910