CN114023379A - Method and device for determining genotype - Google Patents
Method and device for determining genotype Download PDFInfo
- Publication number
- CN114023379A CN114023379A CN202111658718.6A CN202111658718A CN114023379A CN 114023379 A CN114023379 A CN 114023379A CN 202111658718 A CN202111658718 A CN 202111658718A CN 114023379 A CN114023379 A CN 114023379A
- Authority
- CN
- China
- Prior art keywords
- peak
- determining
- allele
- genotype
- locus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 108700028369 Alleles Proteins 0.000 claims abstract description 174
- 238000001819 mass spectrum Methods 0.000 claims abstract description 129
- 230000003595 spectral effect Effects 0.000 claims abstract description 29
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 27
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 24
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 24
- 108090000623 proteins and genes Proteins 0.000 claims description 16
- 238000012163 sequencing technique Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 abstract description 16
- 239000000523 sample Substances 0.000 description 87
- 230000006870 function Effects 0.000 description 32
- 238000001514 detection method Methods 0.000 description 13
- 238000004364 calculation method Methods 0.000 description 10
- 238000004949 mass spectrometry Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000003321 amplification Effects 0.000 description 4
- 238000003205 genotyping method Methods 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 108091028043 Nucleic acid sequence Proteins 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 150000007524 organic acids Chemical class 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 206010064571 Gene mutation Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003752 polymerase chain reaction Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The application discloses a method and a device for determining genotype, wherein the method comprises the following steps: firstly, dividing mass spectrum peaks in a nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each site unit comprises a primer mass spectrum peak and at least one sample mass spectrum peak; second, for any sample mass spectrum peak in the site unit: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining validity parameters of sample mass spectral peaks based on the fitting function data; if the validity parameter meets the validity threshold, determining the sample mass spectrum peak as an allele peak; and finally, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the allele peak in the locus unit. Therefore, the genotype of the site unit can be automatically determined based on the nucleic acid mass spectrogram and the site molecular weight, and the accuracy of genotype analysis is improved.
Description
Technical Field
The invention belongs to the technical field of nucleic acid mass spectrometry, and particularly relates to a method and a device for determining a genotype.
Background
The nucleic acid mass spectrometry technology is based on accurate detection after multi-site single base extension is realized after Polymerase Chain Reaction (abbreviated as PCR) amplification, so the nucleic acid mass spectrometry technology is an essential detection means in the aspect of biological analysis; accurate analysis and expression of mass spectrometry data, i.e., genotyping, is often overlooked. As the accuracy of the genotype analysis result plays an important role in clinical analysis, the genotype analysis is an essential important link in the data processing process of mass spectrometry experiments.
In modern clinical application, genotype expression plays an extremely important role in most of physiological or disease traits, drug types, dosage guidance and other aspects caused by the regulation of a series of genes existing on a nucleic acid sequence; therefore, accurate genotype analysis is particularly important for genotype expression. The existing common genotype analysis techniques comprise restriction fragment length polymorphism, terminal restriction length polymorphism, amplified fragment length polymorphism, multiple ligation probe amplification and the like; however, the existing genotype analysis technology can only make approximate prediction and cannot obtain accurate genotype analysis results. For this reason, it is necessary to provide a highly reliable and highly accurate method for genotyping.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for determining a genotype, which can accurately analyze mass spectrometry experimental data, and improve reliability and accuracy of genotype analysis.
To achieve the above object, according to a first aspect of embodiments of the present invention, there is provided a method for determining a genotype, the method comprising: dividing mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak; mass spectral peaks for any sample in the site cell: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and obtaining a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak; and determining the genotype of the site unit based on the site molecular weight and the characteristic value corresponding to the allele peak in the site unit.
Optionally, the determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit includes: if the characteristic value of only one allele peak of the locus unit meets the threshold value of the characteristic value, determining the genotype of the locus unit based on the molecular weight of the locus corresponding to the allele peak; and if the characteristic values of two or more allele peaks in the locus unit all meet the characteristic value threshold, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak.
Optionally, if the feature values of two allele peaks in the locus unit both satisfy the feature value threshold, determining the genotype of the locus unit based on the locus molecular weight and the feature value corresponding to each of the allele peaks, including: if the characteristic values of two allele peaks in the locus unit both meet the characteristic value threshold, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele; determining product intensities for the locus units based on eigenvalues of both of the allele peaks; acquiring a characteristic value of the primer mass spectrum peak, and determining the elongation of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; determining a penalty value of the locus unit based on the characteristic value of the minor allele and the characteristic value of the major allele if the product intensity and the elongation both satisfy a preset condition; and judging whether the penalty value meets a preset penalty value or not, and if so, recording the genotype of the locus unit as a secondary allele-main allele.
Determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak if the characteristic values of more than two allele peaks all satisfy the characteristic value threshold, including: determining the product intensity of the locus unit based on the characteristic values of more than two allele peaks if the characteristic values of more than two allele peaks in the locus unit all meet a characteristic value threshold; acquiring a characteristic value of the primer mass spectrum peak, and determining the elongation of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; if both the product intensity and the elongation meet a predetermined condition, then for any one of the allele peaks in the locus unit: determining a frequency of the allelic peak based on the eigenvalue of the allelic peak and the product intensity; arranging the frequencies of all allele peaks in the locus unit from big to small, taking the allele peak with the highest frequency in the sequence as an initial gene peak of the genotype, then sequentially judging whether the frequency of each allele peak meets a preset frequency threshold value according to the sequence, and sequentially arranging the allele peaks meeting the preset frequency threshold value at the initial gene peak based on the judgment result to obtain the genotype sequence; and determining the genotype of the site unit based on the site molecular weight corresponding to the allele peak in the genotype sequencing.
Optionally, the determining the validity parameter of the sample mass spectrum peak based on the fitting function data includes: determining a signal-to-noise ratio parameter, a resolution parameter, a peak offset parameter, a peak width parameter, and a peak shape parameter of the sample mass spectrum peak based on the fitting function data of the sample mass spectrum peak; and determining the validity parameter of the sample mass spectrum peak based on the signal-to-noise ratio parameter, the resolution parameter, the peak offset parameter, the peak width parameter and the peak shape parameter.
Optionally, the method further includes: determining a genotype reliability parameter for the locus unit; judging whether the genotype reliability parameters meet a preset reliability threshold value or not; and determining the genotype reliability result of the site unit based on the judgment result.
Optionally, the determining the genotype reliability parameter of the site unit includes: selecting a minimum effectiveness parameter from the site unit as a quality parameter of the site unit; determining an influential parameter for the site unit; determining a yield parameter of the locus unit based on the characteristic values of all allele peaks in the locus unit and the characteristic values of the primer mass spectrum peaks; determining a genotype reliability parameter for the locus unit based on the quality parameter, the influential parameter, and the yield parameter.
Optionally, the determining the influential parameter of the site unit includes: if only one allele peak exists in the locus unit, determining that the influence parameter of the locus unit is a constant; and if two or more allele peaks exist in the locus unit, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele, and determining the influence parameter of the locus unit based on the ratio between the secondary allele and the main allele.
Optionally, the preset reliability threshold includes a first preset reliability threshold, a second preset reliability threshold, and a third preset reliability threshold; the determining the genotype reliability result of the locus unit based on the judgment result comprises the following steps: if the genotype reliability parameter is smaller than a first preset reliability threshold, determining that the genotype reliability result of the locus unit is a low probability type; if the genotype reliability parameter is larger than a first preset reliability threshold and smaller than a second preset reliability threshold, determining that the genotype reliability result of the locus unit is a possible genotype; if the genotype reliability parameter is greater than a second preset reliability threshold and less than a third preset reliability threshold, determining that the genotype reliability result of the locus unit is positive;
and if the genotype reliability parameter is larger than a third preset reliability threshold value, determining the genotype reliability result of the locus unit as a determination type.
To achieve the above object, according to a second aspect of the embodiments of the present application, there is also provided an apparatus for determining a genotype, the apparatus comprising: the dividing module is used for dividing mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak; a first determination module to determine, for any sample mass spectral peak in the site cell: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and obtaining a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak; and the second determination module is used for determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit.
To achieve the above object, according to a third aspect of embodiments of the present application, there is further provided a computer readable medium having a computer program stored thereon, the computer program, when executed by a processor, implementing the method for determining a genotype as described in the first aspect.
To achieve the above object, according to a fourth aspect of embodiments of the present application, there is also provided an electronic device, including a processor; a memory for storing processor-executable instructions; a processor for reading executable instructions from the memory and executing the instructions to carry out the method of determining a genotype as described in the first aspect.
Compared with the prior art, the method and the device for determining the genotype provided by the embodiment of the application comprise the following steps: firstly, dividing mass spectrum peaks in a nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak; mass spectral peaks for any sample in the site cell were then: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and obtaining a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak; and finally, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the allele peak in the locus unit. Therefore, the genotype of the site unit can be determined based on the nucleic acid mass spectrogram and the site molecular weight, the accuracy of genotype analysis is improved, and the problem that the genotype of a nucleic acid sequence cannot be accurately predicted due to the adoption of the technologies such as restriction fragment length polymorphism, terminal restriction length polymorphism, amplified fragment length polymorphism, multiple ligation probe amplification and the like in the prior art is solved.
It is to be understood that the teachings of this application need not achieve all of the above-described benefits, but rather that specific embodiments may achieve specific technical results, and that other embodiments of this application may achieve benefits not mentioned above.
Drawings
The drawings are included to provide a further understanding of the application and are not to be construed as limiting the application. Wherein like or corresponding reference numerals designate like or corresponding parts throughout the several views.
FIG. 1 is a schematic flow chart of a method for determining genotype in one embodiment of the present application;
FIG. 2 is a schematic representation of site unit mass spectra peaks in an embodiment of the present application;
FIG. 3 is a schematic flow chart of a method of genotyping in another embodiment of the present application;
FIG. 4 is a schematic block diagram of an apparatus for genotyping according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, a schematic flow chart of a method for determining a genotype according to an embodiment of the present application; FIG. 2 is a schematic diagram of site unit mass spectra peaks in an embodiment of the present application.
A method for determining genotype, the method comprising at least the following procedures: s101, dividing mass spectrum peaks in a nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each site unit comprises a primer mass spectrum peak and at least one sample mass spectrum peak; s102, aiming at any sample mass spectrum peak in the site unit: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining validity parameters of sample mass spectral peaks based on the fitting function data; if the validity parameter meets the validity threshold, determining the sample mass spectrum peak as an allele peak; s103, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit.
In S101, according to the arrangement of the sites in the nucleic acid mass spectrum detection reagent, mass spectrum peaks in a nucleic acid mass spectrum output by a nucleic acid mass spectrometer are sequentially divided into a plurality of site units according to site molecular weights, each site unit has the same primer mass spectrum peak, but the number of sample mass spectrum peaks in each site unit is not necessarily the same, each site unit at least comprises one sample mass spectrum peak, and the number of the sample mass spectrum peaks is not more than five. For example: the number of the sample mass spectrum peaks in the first site unit is one, the number of the sample mass spectrum peaks in the second site unit is three, and the number of the sample mass spectrum peaks in the third site unit is five.
As shown in FIG. 2, the mass spectrum peak of the primer in the site unit is UP as the reference peak, and the mass spectrum peaks of the four samples are PC, PA, PG and PU, respectively. Based on the positions of the four sample mass spectrum peaks in the mass spectrogram, determining the site molecular weights corresponding to the four sample mass spectrum peaks as follows: the site molecular weight of PC is UP site molecular weight +247.2 Da; the site molecular weight of PA is UP site molecular weight +271.2 Da; the site molecular weight of PG is UP site molecular weight +287.2 Da; the site molecular weight of PU is UP site molecular weight +327.2Da or 262.2 Da.
In S102, the functional expression corresponding to the bell curve is expressed as follows:
formula (1);
formula (2);
formula (3);
V=A/SNR
formula (4).
Wherein,to fit the peak height above the baseline at the center of the peak,for fitting the line width,In order to fit the center of the peak,for mass spectrometry of samplesThe expected molecular weight of the peak(s),in order to shift the peak of the image,is the peak area between the fitted peak and the baseline,Is the signal-to-noise ratio,Is the area variance, yiAndand obtaining fitting function data for the peak height and the molecular weight corresponding to any point on the mass spectrogram of the sample. Fitting function data includes, but is not limited to、、A、、And an。
Then, a normalization model can be trained by using a deep learning method, and the normalization model is used for carrying out prediction processing on fitting function data to obtain the validity parameters of the sample mass spectrum peak; or, carrying out normalization calculation on the fitting function data by using the existing algorithm to obtain the validity parameters of the sample mass spectrum peak.
Finally, whether the validity parameter corresponding to the sample mass spectrum peak is greater than a validity threshold value or not is judged, and if the validity parameter is greater than the validity threshold value, the sample mass spectrum peak is determined to be an allele peak; if the validity parameter is not greater than the validity threshold, determining that the sample mass spectrum peak is not an allele peak. Here, the validity threshold is preset and obtained based on practical experience.
It should be noted that, based on the difference of the purpose of nucleic acid detection, when used for detecting gene type, peak height or signal-to-noise ratio is preferred as a yield detection factor, and peak area is less preferred as a yield detection factor; when the method is used for detecting gene mutation, peak area is preferably used as a yield detection factor, and secondary peak height or signal-to-noise ratio is preferably used as the yield detection factor. When selecting peak height or signal-to-noise ratio as a yield detection factor, the selection is usually based on the following rules: the peak height is selected as a yield detection factor when the sample concentration is high and the signal-to-noise ratio is selected as a yield detection factor when the sample concentration is low. Therefore, according to different purposes of nucleic acid detection, different yield detection factors need to be selected from fitting function data as characteristic values.
In S103, if there is no allele peak in the site unit, the genotype of the site unit is determined to be inconclusive. If the locus unit has an allele peak, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the allele peak in the locus unit; illustratively, if the feature value of only one allele peak of a site unit meets the feature value threshold, determining the genotype of the site unit based on the molecular weight of the site corresponding to the allele peak; if the characteristic values of two or more allele peaks in the locus unit all meet the characteristic value threshold, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak; and if the characteristic values of the allelic gene peaks of the locus units do not meet the characteristic value threshold value, determining that the genotype of the locus units is a no result.
The method for determining the genotype is provided by the embodiment and aims at the sample mass spectrum peak of any site unit in a nucleic acid mass spectrum: fitting the sample mass spectrum peak to obtain fitting function data; determining validity parameters of sample mass spectrum peaks based on fitting function data, and then judging whether the sample mass spectrum peaks are allele peaks based on the validity parameters; and finally, determining the genotype of the locus unit based on the corresponding locus molecular weight and the characteristic value of the locus peak in the locus unit. Therefore, the genotype of the site unit can be determined based on the nucleic acid mass spectrogram and the site molecular weight, the accuracy of the genotype analysis result is improved, and the problem that the genotype of the nucleic acid sequence cannot be accurately predicted due to the adoption of the technologies such as restriction fragment length polymorphism, terminal restriction length polymorphism, amplified fragment length polymorphism, multiple ligation probe amplification and the like in the prior art is solved.
As shown in FIG. 3, a schematic flow chart of a method for determining genotype in another embodiment of the present application. The embodiment is further optimized on the basis of the previous embodiment. The method at least comprises the following operation flows: s301, dividing mass spectrum peaks in a nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each site unit comprises a primer mass spectrum peak and at least one sample mass spectrum peak; s302, aiming at any sample mass spectrum peak in the site unit: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining validity parameters of sample mass spectral peaks based on the fitting function data; if the validity parameter meets the validity threshold, determining the sample mass spectrum peak as an allele peak; s303, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit; s304, determining the genotype reliability parameters of the locus units; s305, judging whether the genotype reliability parameters meet a preset reliability threshold value; and S306, determining the genotype reliability result of the site unit based on the judgment result.
The specific implementation process of S301 is similar to the specific implementation process of S101 in the embodiment shown in fig. 1, and is not described here again.
In S302, the fitting function data is normalized by using the existing algorithm, so as to obtain the validity parameter of the sample mass spectrum peak. Illustratively, determining a signal-to-noise ratio parameter, a resolution parameter, a peak offset parameter, a peak width parameter, and a peak shape parameter of the sample mass spectral peak based on fitting function data of the sample mass spectral peak; and determining the validity parameter of the sample mass spectrum peak based on the signal-to-noise ratio parameter, the resolution parameter, the peak offset parameter, the peak width parameter and the peak shape parameter.
Signal to noise ratio parameter of sample mass spectral peakFor expressing the effect of baseline variation (chemical noise) on peak height, the calculation formula of the signal-to-noise ratio parameter is as follows:
wherein,a threshold value representing the effectiveness of a sample mass spectral peak, preferably 0.8;a signal-to-noise threshold representing the identification of sample mass spectral peaks, preferably 1.5.
Resolution parameters of sample mass spectral peaksFor representing the position relationship between the target peak and the surrounding peaks, the calculation formula of the resolution parameter is as follows:
Wherein,heelRespectively representing the mass of the ith sample mass spectrum peak and a peak shape weighting value,heelRespectively representing the mass and peak shape weights of sample mass spectral peaks adjacent to the ith sample mass spectral peak; wherein the weighted values of the sample mass spectrum peak and the primer mass spectrum peak of the strong peak are both 1, and the weighted values of the sample mass spectrum peak and the primer mass spectrum peak of the weak peak are both 0.05, whereinA special parameter, preferably 0.7,representing parameters calculated based on the molecular weight at the mass spectrum peak of the jth sample。
Offset parameter of sample mass spectral peakFor representing the closeness of the fitted peak of the sample mass spectral peak to the sample mass spectral peak mass signal, the offset parameter is calculated as follows:
wherein,a threshold value representing the effectiveness of a sample mass spectral peak, preferably 0.8;one particular parameter, preferably 0.7, represents a calculated offset parameter.
where D calculates a particular parameter of the peak width parameter, preferably-0.001 or-0.0005.
wherein E is a special parameter for calculating the peak shape parameter, preferably 0.2 or 0.1;is the square root of the difference in fitted area, i.e., the sum of squared differences between the fitted intensity and the measured intensity.
The formula for calculating the validity parameter of the sample mass spectrum peak is as follows:
In S303, a method for determining a genotype for a site unit based on a single allele peak, comprising: and if the characteristic value of only one allele peak of the site unit meets the threshold value of the characteristic value, determining the genotype of the site unit based on the molecular weight of the site corresponding to the allele peak. For example, if the peak height of only one allele peak of a site unit is greater than the peak height threshold, the genotype of the site unit is determined according to the site molecular weight corresponding to the allele peak. The peak height threshold may be 1.0.
A method for determining the genotype of a site unit based on two allelic peaks, comprising: and if the characteristic values of two allele peaks in the locus unit both meet the characteristic value threshold value, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele. Determining product intensities for the locus units based on the eigenvalues of the two allele peaks; acquiring a characteristic value of a primer mass spectrum peak, and determining the elongation rate of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; if the product strength and the elongation rate both meet preset conditions, determining a penalty value of the locus unit based on the characteristic value of the minor allele and the characteristic value of the major allele; and judging whether the penalty value meets a preset penalty value or not, and if so, recording the genotype of the locus unit as a secondary allele-main allele.
For example: and S11, if the peak heights of two allele peaks in the locus unit are both larger than the peak height threshold value, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele.
S12, summing the peak height of the major allele and the peak height of the minor allele to obtain the product intensity M of the site unit.
S13, acquiring the peak height of the primer mass spectrum peak, and summing the product intensity and the peak height of the primer mass spectrum peak to obtain a summation result; the product intensity is ratioed to the addition result to obtain the elongation of the site unit.
S14, judging whether the product strength of the site unit is greater than the preset strength and the elongation is greater than the preset elongation; and if the judgment result indicates that the strength of the product is greater than the preset strength and the elongation rate is greater than the preset elongation rate, determining that the genotype of the locus unit has a result, and executing the step S15.
And if the judgment result indicates that one of the strength and the elongation rate of the product does not meet the preset condition, determining that the genotype of the site unit is a no result.
S15, obtaining a parameter X by taking the peak height of the minor allele and the sum of the peak heights of the minor allele and the major allele as a ratio; calculating a penalty value Y by using the parameter X according to a penalty value calculation formula; the penalty value Y is calculated as follows:
wherein, the parameter sets A (1.0,0.5,0), B (0.1,0.4,0.1), i are array subscripts.
S15, judging whether the penalty value is smaller than a preset penalty value or not, and recording the genotype of the locus unit as a secondary allele-main allele if the judgment result represents that the penalty value is smaller than the preset penalty value; and if the judgment result represents that the penalty value is not less than the preset penalty value, determining that the genotype of the locus unit is a no result.
A method for determining the genotype of a site unit based on more than two allelic peaks, comprising: if the characteristic values of more than two allele peaks in the locus unit meet the characteristic value threshold, determining the product strength of the locus unit based on the characteristic values of more than two allele peaks; acquiring a characteristic value of a primer mass spectrum peak, and determining the elongation rate of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; if the product intensity and the elongation rate both meet preset conditions, aiming at any allele peak in the locus unit: determining a frequency of the allelic peak based on the eigenvalue of the allelic peak and the product intensity; arranging the frequencies of all allele peaks in the locus unit from big to small, taking the allele peak with the largest frequency in the sequence as an initial gene peak of the genotype, then sequentially judging whether the frequency of each allele peak meets a preset frequency threshold value according to the sequence, and sequentially arranging the allele peaks meeting the preset frequency threshold value behind the initial gene peak based on the judgment result to obtain the genotype sequence; and determining the genotype of the site unit based on the site molecular weight corresponding to the allele peak in the genotype sequencing.
For example, in S11, if the peak heights of two or more allele peaks in the site unit are both greater than the peak height threshold, the peak heights of the two or more allele peaks are summed to obtain the product intensity M of the site unit.
S12, acquiring the peak height of the primer mass spectrum peak, and summing the product intensity and the peak height of the primer mass spectrum peak to obtain a summation result; the product intensity is ratioed to the addition result to obtain the elongation of the site unit.
S13, judging whether the product strength of the site unit is greater than the preset strength and the elongation is greater than the preset elongation; and if the judgment result indicates that the strength of the product is greater than the preset strength and the elongation rate is greater than the preset elongation rate, determining that the genotype of the locus unit has a result, and executing the step S14.
And if the judgment result indicates that one of the strength and the elongation rate of the product does not meet the preset condition, determining that the genotype of the site unit is a no result.
S14, for any allelic peak in the locus unit: obtaining a frequency of an allelic peak based on a ratio of a peak height of the allelic peak and a product intensity; arranging the frequencies of all the allelic gene peaks in the locus units in descending order, taking the allelic gene peak with the highest frequency in the sequence as the initial gene peak of the genotype,
s15, for any allelic peak in the ranking: and judging whether the frequency of the allele peak is greater than a preset frequency threshold value or not according to the sequence, splicing the allele peak at the initial gene peak in sequence if the frequency of the allele peak represented by the judgment result is greater than the preset frequency, and discarding the allele peak if the frequency of the allele peak represented by the judgment result is not greater than the preset frequency. Then sequentially judging whether other allele peaks are larger than a preset frequency threshold according to the sequence, thereby obtaining genotype sequence;
and S16, determining the genotype of the site unit based on the site molecular weight corresponding to the allele peak in the genotype sorting.
Therefore, the genotype of the site unit can be determined according to the number of the allele peaks in the site unit, and the accuracy of the genotype analysis of the site unit is improved.
In S304 to S306, the minimum validity parameter is selected from the site unit as the quality parameter of the site unit(ii) a Determining influencing parameters of a site unit(ii) a Determining yield parameters of the locus units based on the characteristic values of all allele peaks in the locus units and the characteristic values of the mass spectrum peaks of the primers(ii) a Determining a genotype reliability parameter for the locus unit based on the quality parameter, the influential parameter, and the yield parameter.
Determining the influence parameters of the locus units if only one allele peak exists in the locus unitsIs a constant; for example, influencing parameters=1。
If there are two or more allele peaks in the site unit, the allele peak with the largest molecular weight at the site among all the allele peaks is taken as the major allele, the allele peak with the smallest molecular weight at the site is taken as the minor allele, and the influence parameter of the site unit is determined based on the ratio between the minor allele and the major alleleInfluencing parameters of site unitsThe calculation formula of (a) is as follows:
wherein H is an adjustable parameter, preferably 0.2 or 0.5,calculated as the ratio between the peak height of the major allele and the peak height of the minor allele,a threshold value of validity representing a peak of the mass spectrum of the sample, preferably 0.8, if(influencing parameter threshold, preferably 0.8), recalculationIs marked asAnd, it means a reverse tilt,the calculation formula of (a) is as follows:
wherein,for adjustable parameters, preferably 0.1 or 0.25,preferably the concentration of the organic acid in the organic acid is 0.8,the peak heights of the peaks of the allele and the primer mass spectrum are respectively shown.
The preset reliability threshold comprises a first preset reliability thresholdA second predetermined reliability thresholdAnd a third predetermined reliability threshold. Reliability parameter of gene typeLess than a first predetermined reliability thresholdDetermining that the genotype reliability result of the locus unit is a low probability type; if the genotype reliability parameter is greater than a first preset reliability thresholdAnd is less than a second preset reliability thresholdDetermining the genotype reliability result of the locus unit as a possible genotype; if the genotype reliability parameter is larger than a second preset reliability threshold valueAnd is less than a third preset reliability thresholdDetermining the genotype reliability result of the locus unit as a positive type; if the genotype reliability parameter is larger than a third preset reliability threshold valueAnd determining the genotype reliability result of the site unit as the determination type. And similarly, traversing all the site units in the nucleic acid mass spectrogram to obtain a genotype reliability result corresponding to each site unit. And finally, outputting the genotype corresponding to the nucleic acid mass spectrogram and the genotype reliability result as a genotype analysis report.
In this example, the reliability of genotype analysis was improved by analyzing the reliability of genotype to obtain the reliability result of genotype corresponding to the nucleic acid mass spectrum.
As shown in FIG. 4, a schematic block diagram of an apparatus for determining a genotype according to an embodiment of the present application. An apparatus for determining a genotype, the apparatus 400 comprising: the dividing module 401 is configured to divide mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each site unit comprises a primer mass spectrum peak and at least one sample mass spectrum peak; a first determining module 402 for determining, for any sample mass spectral peak in the site unit: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining validity parameters of sample mass spectral peaks based on the fitting function data; if the validity parameter meets the validity threshold, determining the sample mass spectrum peak as an allele peak; and a second determining module 403, configured to determine the genotype of the locus unit based on the locus molecular weight and the feature value corresponding to the locus peak in the locus unit.
In an alternative embodiment, the second determining module 403 includes: the first determining subunit is used for determining the genotype of the locus unit based on the locus molecular weight corresponding to the allele peak if the characteristic value of only one allele peak of the locus unit meets the characteristic value threshold; and the second determining subunit is used for determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak if the characteristic values of two or more allele peaks in the locus unit all meet the characteristic value threshold.
In an alternative embodiment, the second determining subunit includes: the first subunit is used for taking the allele peak with the maximum molecular weight as a main allele and taking the allele peak with the minimum molecular weight as a secondary allele if the characteristic values of two allele peaks in the locus unit both meet a characteristic value threshold; a second subunit for determining the product intensity of the site unit based on the eigenvalues of the two allele peaks; the third subunit is used for acquiring the characteristic value of the primer mass spectrum peak and determining the elongation rate of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; the fourth subunit is used for determining the penalty value of the locus unit based on the characteristic value of the minor allele and the characteristic value of the major allele if the product strength and the elongation rate both meet preset conditions; and judging whether the penalty value meets a preset penalty value or not, and if so, recording the genotype of the locus unit as a secondary allele-main allele.
In an optional embodiment, the second determining subunit further includes: a fifth subunit, configured to determine, if the feature values of more than two allele peaks in the site unit both satisfy the feature value threshold, a product intensity of the site unit based on the feature values of the more than two allele peaks; the sixth subunit is used for acquiring the characteristic value of the primer mass spectrum peak and determining the elongation rate of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; a seventh subunit, configured to, if both the product intensity and the elongation rate satisfy the predetermined condition, direct at any one of the allele peaks in the locus unit: determining a frequency of the allelic peak based on the eigenvalue of the allelic peak and the product intensity; the eighth subunit is used for arranging the frequencies of all the allele peaks in the locus unit in a descending order, taking the allele peak with the highest frequency in the ordering as an initial gene peak of the genotype, then sequentially judging whether the frequency of each allele peak meets a preset frequency threshold value according to the ordering, and sequentially arranging the allele peaks meeting the preset frequency threshold value behind the initial gene peak based on the judgment result to obtain the genotype ordering; and the ninth subunit is used for determining the genotype of the locus unit based on the locus molecular weight corresponding to the allele peak in the genotype sequencing.
In an alternative embodiment, the first determining module comprises: the first determining subunit is used for determining a signal-to-noise ratio parameter, a resolution parameter, a peak offset parameter, a peak width parameter and a peak shape parameter of a sample mass spectrum peak based on fitting function data of the sample mass spectrum peak; and the second determining subunit is used for determining the validity parameter of the sample mass spectrum peak based on the signal-to-noise ratio parameter, the resolution parameter, the peak offset parameter, the peak width parameter and the peak shape parameter.
In an alternative embodiment, the apparatus for determining a genotype further comprises: the third determining module is used for determining the genotype reliability parameters of the locus unit; the judging module is used for judging whether the genotype reliability parameters meet a preset reliability threshold value; and the fourth determining module is used for determining the genotype reliability result of the site unit based on the judgment result.
In an alternative embodiment, the third determining module includes: the selecting unit is used for selecting the minimum effectiveness parameter from the site unit as the quality parameter of the site unit; a first determination unit for determining an influential parameter of the site unit; the second determination unit is used for determining the yield parameters of the locus unit based on the characteristic values of all allele peaks in the locus unit and the characteristic values of the primer mass spectrum peaks; and the third determining unit is used for determining the genotype reliability parameter of the locus unit based on the quality parameter, the influential parameter and the yield parameter.
In an alternative embodiment, the first determination unit includes: the first subunit is used for determining that the influence parameter of the locus unit is a constant if only one allele peak exists in the locus unit; and a second subunit, configured to, if there are two or more allele peaks in the site unit, determine an influence parameter of the site unit based on a ratio between the minor allele and the major allele by using, as a major allele, an allele peak having a largest site molecular weight among all allele peaks and by using, as a minor allele, an allele peak having a smallest site molecular weight.
In an optional embodiment, the preset reliability threshold includes a first preset reliability threshold, a second preset reliability threshold, and a third preset reliability threshold; the fourth determining module includes: the first determining subunit is used for determining that the genotype reliability result of the locus unit is a low probability type if the genotype reliability parameter is smaller than a first preset reliability threshold; the second determining subunit is used for determining that the genotype reliability result of the locus unit is a possible genotype if the genotype reliability parameter is greater than the first preset reliability threshold and less than a second preset reliability threshold; the third determining subunit is used for determining that the genotype reliability result of the locus unit is positive if the genotype reliability parameter is greater than the second preset reliability threshold and less than a third preset reliability threshold; and the fourth determining subunit is used for determining that the genotype reliability result of the locus unit is the determined type if the genotype reliability parameter is greater than the third preset reliability threshold.
The device can execute the method for determining the genotype, and has the corresponding functional modules and beneficial effects for executing the method for determining the genotype. The details of the techniques not described in detail in this example can be found in the methods for determining the genotype provided in the examples of the present application.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of a unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another device, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage medium, a Read Only Memory (ROM), a magnetic disk, and an optical disk.
Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the related art may be embodied in the form of a software product stored in a storage medium, and including several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a removable storage medium, a ROM, a magnetic disk, an optical disk, or the like, which can store the program code.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A method of determining genotype, comprising:
dividing mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak;
mass spectral peaks for any sample in the site cell: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value comprises at least one of peak height, peak area and signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak;
and determining the genotype of the site unit based on the site molecular weight and the characteristic value corresponding to the allele peak in the site unit.
2. The method according to claim 1, wherein determining the genotype of the site unit based on the site molecular weight and the characteristic value corresponding to the allele peak in the site unit comprises:
if the characteristic value of only one allele peak of the locus unit meets the threshold value of the characteristic value, determining the genotype of the locus unit based on the molecular weight of the locus corresponding to the allele peak;
and if the characteristic values of two or more allele peaks in the locus unit all meet the characteristic value threshold, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak.
3. The method of claim 2, wherein if the feature values of two allelic peaks in the site unit satisfy the feature value threshold, determining the genotype of the site unit based on the site molecular weight and the feature value corresponding to each of the allelic peaks comprises:
if the characteristic values of two allele peaks in the locus unit both meet the characteristic value threshold, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele;
determining product intensities for the locus units based on eigenvalues of both of the allele peaks;
acquiring a characteristic value of the primer mass spectrum peak, and determining the elongation of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak;
determining a penalty value of the locus unit based on the characteristic value of the minor allele and the characteristic value of the major allele if the product intensity and the elongation both satisfy a preset condition; and judging whether the penalty value meets a preset penalty value or not, and if so, recording the genotype of the locus unit as a secondary allele-main allele.
4. The method according to claim 2, wherein if the feature values of two or more allelic peaks satisfy the feature value threshold, determining the genotype of the site unit based on the site molecular weight and the feature value corresponding to each of the allelic peaks comprises:
determining the product intensity of the locus unit based on the characteristic values of more than two allele peaks if the characteristic values of more than two allele peaks in the locus unit all meet a characteristic value threshold;
acquiring a characteristic value of the primer mass spectrum peak, and determining the elongation of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak;
if both the product intensity and the elongation meet a predetermined condition, then for any one of the allele peaks in the locus unit: determining a frequency of the allelic peak based on the eigenvalue of the allelic peak and the product intensity;
arranging the frequencies of all allele peaks in the locus unit from big to small, taking the allele peak with the highest frequency in the sequence as an initial gene peak of the genotype, then sequentially judging whether the frequency of each allele peak meets a preset frequency threshold value according to the sequence, and sequentially arranging the allele peaks meeting the preset frequency threshold value at the initial gene peak based on the judgment result to obtain the genotype sequence;
and determining the genotype of the site unit based on the site molecular weight corresponding to the allele peak in the genotype sequencing.
5. The method of claim 1, wherein said determining a significance parameter for said sample mass spectral peak based on said fit function data comprises:
determining a signal-to-noise ratio parameter, a resolution parameter, a peak offset parameter, a peak width parameter, and a peak shape parameter of the sample mass spectrum peak based on the fitting function data of the sample mass spectrum peak;
and determining the validity parameter of the sample mass spectrum peak based on the signal-to-noise ratio parameter, the resolution parameter, the peak offset parameter, the peak width parameter and the peak shape parameter.
6. The method of claim 1, further comprising:
determining a genotype reliability parameter for the locus unit;
judging whether the genotype reliability parameters meet a preset reliability threshold value or not;
and determining the genotype reliability result of the site unit based on the judgment result.
7. The method of claim 6, wherein said determining a genotype reliability parameter for said site unit comprises:
selecting a minimum effectiveness parameter from the site unit as a quality parameter of the site unit;
determining an influential parameter for the site unit;
determining a yield parameter of the locus unit based on the characteristic values of all allele peaks in the locus unit and the characteristic values of the primer mass spectrum peaks;
determining a genotype reliability parameter for the locus unit based on the quality parameter, the influential parameter, and the yield parameter.
8. The method of claim 7, wherein determining an impact parameter of the site unit comprises:
if only one allele peak exists in the locus unit, determining that the influence parameter of the locus unit is a constant;
and if two or more allele peaks exist in the locus unit, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele, and determining the influence parameter of the locus unit based on the ratio between the secondary allele and the main allele.
9. The method of claim 6, wherein the preset reliability threshold comprises a first preset reliability threshold, a second preset reliability threshold, and a third preset reliability threshold; the determining the genotype reliability result of the locus unit based on the judgment result comprises the following steps:
if the genotype reliability parameter is smaller than a first preset reliability threshold, determining that the genotype reliability result of the locus unit is a low probability type;
if the genotype reliability parameter is larger than a first preset reliability threshold and smaller than a second preset reliability threshold, determining that the genotype reliability result of the locus unit is a possible genotype;
if the genotype reliability parameter is greater than a second preset reliability threshold and less than a third preset reliability threshold, determining that the genotype reliability result of the locus unit is positive;
and if the genotype reliability parameter is larger than a third preset reliability threshold value, determining the genotype reliability result of the locus unit as a determination type.
10. An apparatus for determining a genotype, comprising:
the dividing module is used for dividing mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak;
a first determination module to determine, for any sample mass spectral peak in the site cell: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value comprises at least one of peak height, peak area and signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak;
and the second determination module is used for determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111658718.6A CN114023379B (en) | 2021-12-31 | 2021-12-31 | Method and device for determining genotype |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111658718.6A CN114023379B (en) | 2021-12-31 | 2021-12-31 | Method and device for determining genotype |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114023379A true CN114023379A (en) | 2022-02-08 |
CN114023379B CN114023379B (en) | 2022-05-13 |
Family
ID=80069452
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111658718.6A Active CN114023379B (en) | 2021-12-31 | 2021-12-31 | Method and device for determining genotype |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114023379B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115684606A (en) * | 2022-10-21 | 2023-02-03 | 南方医科大学珠江医院 | M protein detection method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001027857A2 (en) * | 1999-10-13 | 2001-04-19 | Sequenom, Inc. | Methods for generating databases and databases for identifying polymorphic genetic markers |
CN101984445A (en) * | 2010-03-04 | 2011-03-09 | 深圳华大基因科技有限公司 | Method and system for implementing typing based on polymerase chain reaction sequencing |
CN103589797A (en) * | 2013-11-12 | 2014-02-19 | 中国农业科学院蔬菜花卉研究所 | SNP (single nucleotide polymorphism) genotyping method and application thereof |
CN106755408A (en) * | 2016-12-22 | 2017-05-31 | 北京林业大学 | A kind of plant allele imbalance detection of expression method |
CN111041079A (en) * | 2019-12-31 | 2020-04-21 | 博淼生物科技(北京)有限公司 | Flight mass spectrum genotyping detection method |
CN111325121A (en) * | 2020-02-10 | 2020-06-23 | 浙江迪谱诊断技术有限公司 | Nucleic acid mass spectrum numerical value processing method |
CN112143816A (en) * | 2019-06-26 | 2020-12-29 | 司法鉴定科学研究院 | 29-plex Y-STR typing system for family search and paternal biological geographic ancestry inference |
-
2021
- 2021-12-31 CN CN202111658718.6A patent/CN114023379B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001027857A2 (en) * | 1999-10-13 | 2001-04-19 | Sequenom, Inc. | Methods for generating databases and databases for identifying polymorphic genetic markers |
CN101984445A (en) * | 2010-03-04 | 2011-03-09 | 深圳华大基因科技有限公司 | Method and system for implementing typing based on polymerase chain reaction sequencing |
CN103589797A (en) * | 2013-11-12 | 2014-02-19 | 中国农业科学院蔬菜花卉研究所 | SNP (single nucleotide polymorphism) genotyping method and application thereof |
CN106755408A (en) * | 2016-12-22 | 2017-05-31 | 北京林业大学 | A kind of plant allele imbalance detection of expression method |
CN112143816A (en) * | 2019-06-26 | 2020-12-29 | 司法鉴定科学研究院 | 29-plex Y-STR typing system for family search and paternal biological geographic ancestry inference |
CN111041079A (en) * | 2019-12-31 | 2020-04-21 | 博淼生物科技(北京)有限公司 | Flight mass spectrum genotyping detection method |
CN111325121A (en) * | 2020-02-10 | 2020-06-23 | 浙江迪谱诊断技术有限公司 | Nucleic acid mass spectrum numerical value processing method |
Non-Patent Citations (2)
Title |
---|
PAUL OETH等: "《Qualitative and Quantitative Genotyping Using Single Base Primer Extension Coupled with Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry》", 《SPRINGERLINK》 * |
郭甜利等: "《GAMarker基因分型专家系统的设计与实现》", 《刑事技术》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115684606A (en) * | 2022-10-21 | 2023-02-03 | 南方医科大学珠江医院 | M protein detection method |
CN115684606B (en) * | 2022-10-21 | 2023-11-28 | 南方医科大学珠江医院 | M protein detection method |
Also Published As
Publication number | Publication date |
---|---|
CN114023379B (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kudaravalli et al. | Gene expression levels are a target of recent natural selection in the human genome | |
KR101542529B1 (en) | Examination methods of the bio-marker of allele | |
US20080154512A1 (en) | Systems and methods for baselining and real-time pcr data analysis | |
KR101460520B1 (en) | Detecting method for disease markers of NGS data | |
KR101936934B1 (en) | Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same | |
CN108913776B (en) | Screening method and kit for DNA molecular markers related to radiotherapy and chemotherapy injury | |
KR101936933B1 (en) | Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same | |
JP2005531853A (en) | System and method for SNP genotype clustering | |
CN114023379B (en) | Method and device for determining genotype | |
US7640113B2 (en) | Methods and apparatus for complex genetics classification based on correspondence analysis and linear/quadratic analysis | |
EP2660310A1 (en) | Comprehensive glaucoma determination method utilizing glaucoma diagnosis chip and deformed proteomics cluster analysis | |
Barnett et al. | Genomic machine learning meta-regression: insights on associations of study features with reported model performance | |
EP1635276A2 (en) | Display method and display apparatus of gene information | |
CN109584955A (en) | A method of mankind's rdaiation response biomarker is identified based on various plants genome | |
US20100203546A1 (en) | Allele Determining Device, Allele Determining Method And Computer Program | |
Campos-Martin et al. | Reliable genotyping of recombinant genomes using a robust hidden Markov model | |
KR20150039484A (en) | Method and apparatus for diagnosing cancer using genetic information | |
Won et al. | EnsemPro: an ensemble approach to predicting transcription start sites in human genomic DNA sequences | |
CN113862371A (en) | Prediction device for alcohol-related hepatocellular carcinoma disease progression and prognosis risk and training method of prediction model thereof | |
Zhang et al. | An information gain-based method for evaluating the classification power of features towards identifying enhancers | |
US20150347674A1 (en) | System and method for analyzing biological sample | |
WO2008070328A2 (en) | Systems and methods for baselining and real-time pcr data analysis | |
KR20220085139A (en) | Method of gene selection for predicting medical information of patients and uses thereof | |
Márquez et al. | Dimensionality and the statistical power of multivariate genome-wide association studies | |
JP4414823B2 (en) | Gene information display method and display device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |