CN114023379A - Method and device for determining genotype - Google Patents

Method and device for determining genotype Download PDF

Info

Publication number
CN114023379A
CN114023379A CN202111658718.6A CN202111658718A CN114023379A CN 114023379 A CN114023379 A CN 114023379A CN 202111658718 A CN202111658718 A CN 202111658718A CN 114023379 A CN114023379 A CN 114023379A
Authority
CN
China
Prior art keywords
peak
determining
allele
genotype
locus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111658718.6A
Other languages
Chinese (zh)
Other versions
CN114023379B (en
Inventor
相双红
树建伟
毕少逸
李璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dipu Diagnosis Technology Co ltd
Original Assignee
Zhejiang Dipu Diagnosis Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dipu Diagnosis Technology Co ltd filed Critical Zhejiang Dipu Diagnosis Technology Co ltd
Priority to CN202111658718.6A priority Critical patent/CN114023379B/en
Publication of CN114023379A publication Critical patent/CN114023379A/en
Application granted granted Critical
Publication of CN114023379B publication Critical patent/CN114023379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application discloses a method and a device for determining genotype, wherein the method comprises the following steps: firstly, dividing mass spectrum peaks in a nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each site unit comprises a primer mass spectrum peak and at least one sample mass spectrum peak; second, for any sample mass spectrum peak in the site unit: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining validity parameters of sample mass spectral peaks based on the fitting function data; if the validity parameter meets the validity threshold, determining the sample mass spectrum peak as an allele peak; and finally, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the allele peak in the locus unit. Therefore, the genotype of the site unit can be automatically determined based on the nucleic acid mass spectrogram and the site molecular weight, and the accuracy of genotype analysis is improved.

Description

Method and device for determining genotype
Technical Field
The invention belongs to the technical field of nucleic acid mass spectrometry, and particularly relates to a method and a device for determining a genotype.
Background
The nucleic acid mass spectrometry technology is based on accurate detection after multi-site single base extension is realized after Polymerase Chain Reaction (abbreviated as PCR) amplification, so the nucleic acid mass spectrometry technology is an essential detection means in the aspect of biological analysis; accurate analysis and expression of mass spectrometry data, i.e., genotyping, is often overlooked. As the accuracy of the genotype analysis result plays an important role in clinical analysis, the genotype analysis is an essential important link in the data processing process of mass spectrometry experiments.
In modern clinical application, genotype expression plays an extremely important role in most of physiological or disease traits, drug types, dosage guidance and other aspects caused by the regulation of a series of genes existing on a nucleic acid sequence; therefore, accurate genotype analysis is particularly important for genotype expression. The existing common genotype analysis techniques comprise restriction fragment length polymorphism, terminal restriction length polymorphism, amplified fragment length polymorphism, multiple ligation probe amplification and the like; however, the existing genotype analysis technology can only make approximate prediction and cannot obtain accurate genotype analysis results. For this reason, it is necessary to provide a highly reliable and highly accurate method for genotyping.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for determining a genotype, which can accurately analyze mass spectrometry experimental data, and improve reliability and accuracy of genotype analysis.
To achieve the above object, according to a first aspect of embodiments of the present invention, there is provided a method for determining a genotype, the method comprising: dividing mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak; mass spectral peaks for any sample in the site cell: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and obtaining a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak; and determining the genotype of the site unit based on the site molecular weight and the characteristic value corresponding to the allele peak in the site unit.
Optionally, the determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit includes: if the characteristic value of only one allele peak of the locus unit meets the threshold value of the characteristic value, determining the genotype of the locus unit based on the molecular weight of the locus corresponding to the allele peak; and if the characteristic values of two or more allele peaks in the locus unit all meet the characteristic value threshold, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak.
Optionally, if the feature values of two allele peaks in the locus unit both satisfy the feature value threshold, determining the genotype of the locus unit based on the locus molecular weight and the feature value corresponding to each of the allele peaks, including: if the characteristic values of two allele peaks in the locus unit both meet the characteristic value threshold, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele; determining product intensities for the locus units based on eigenvalues of both of the allele peaks; acquiring a characteristic value of the primer mass spectrum peak, and determining the elongation of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; determining a penalty value of the locus unit based on the characteristic value of the minor allele and the characteristic value of the major allele if the product intensity and the elongation both satisfy a preset condition; and judging whether the penalty value meets a preset penalty value or not, and if so, recording the genotype of the locus unit as a secondary allele-main allele.
Determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak if the characteristic values of more than two allele peaks all satisfy the characteristic value threshold, including: determining the product intensity of the locus unit based on the characteristic values of more than two allele peaks if the characteristic values of more than two allele peaks in the locus unit all meet a characteristic value threshold; acquiring a characteristic value of the primer mass spectrum peak, and determining the elongation of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; if both the product intensity and the elongation meet a predetermined condition, then for any one of the allele peaks in the locus unit: determining a frequency of the allelic peak based on the eigenvalue of the allelic peak and the product intensity; arranging the frequencies of all allele peaks in the locus unit from big to small, taking the allele peak with the highest frequency in the sequence as an initial gene peak of the genotype, then sequentially judging whether the frequency of each allele peak meets a preset frequency threshold value according to the sequence, and sequentially arranging the allele peaks meeting the preset frequency threshold value at the initial gene peak based on the judgment result to obtain the genotype sequence; and determining the genotype of the site unit based on the site molecular weight corresponding to the allele peak in the genotype sequencing.
Optionally, the determining the validity parameter of the sample mass spectrum peak based on the fitting function data includes: determining a signal-to-noise ratio parameter, a resolution parameter, a peak offset parameter, a peak width parameter, and a peak shape parameter of the sample mass spectrum peak based on the fitting function data of the sample mass spectrum peak; and determining the validity parameter of the sample mass spectrum peak based on the signal-to-noise ratio parameter, the resolution parameter, the peak offset parameter, the peak width parameter and the peak shape parameter.
Optionally, the method further includes: determining a genotype reliability parameter for the locus unit; judging whether the genotype reliability parameters meet a preset reliability threshold value or not; and determining the genotype reliability result of the site unit based on the judgment result.
Optionally, the determining the genotype reliability parameter of the site unit includes: selecting a minimum effectiveness parameter from the site unit as a quality parameter of the site unit; determining an influential parameter for the site unit; determining a yield parameter of the locus unit based on the characteristic values of all allele peaks in the locus unit and the characteristic values of the primer mass spectrum peaks; determining a genotype reliability parameter for the locus unit based on the quality parameter, the influential parameter, and the yield parameter.
Optionally, the determining the influential parameter of the site unit includes: if only one allele peak exists in the locus unit, determining that the influence parameter of the locus unit is a constant; and if two or more allele peaks exist in the locus unit, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele, and determining the influence parameter of the locus unit based on the ratio between the secondary allele and the main allele.
Optionally, the preset reliability threshold includes a first preset reliability threshold, a second preset reliability threshold, and a third preset reliability threshold; the determining the genotype reliability result of the locus unit based on the judgment result comprises the following steps: if the genotype reliability parameter is smaller than a first preset reliability threshold, determining that the genotype reliability result of the locus unit is a low probability type; if the genotype reliability parameter is larger than a first preset reliability threshold and smaller than a second preset reliability threshold, determining that the genotype reliability result of the locus unit is a possible genotype; if the genotype reliability parameter is greater than a second preset reliability threshold and less than a third preset reliability threshold, determining that the genotype reliability result of the locus unit is positive;
and if the genotype reliability parameter is larger than a third preset reliability threshold value, determining the genotype reliability result of the locus unit as a determination type.
To achieve the above object, according to a second aspect of the embodiments of the present application, there is also provided an apparatus for determining a genotype, the apparatus comprising: the dividing module is used for dividing mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak; a first determination module to determine, for any sample mass spectral peak in the site cell: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and obtaining a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak; and the second determination module is used for determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit.
To achieve the above object, according to a third aspect of embodiments of the present application, there is further provided a computer readable medium having a computer program stored thereon, the computer program, when executed by a processor, implementing the method for determining a genotype as described in the first aspect.
To achieve the above object, according to a fourth aspect of embodiments of the present application, there is also provided an electronic device, including a processor; a memory for storing processor-executable instructions; a processor for reading executable instructions from the memory and executing the instructions to carry out the method of determining a genotype as described in the first aspect.
Compared with the prior art, the method and the device for determining the genotype provided by the embodiment of the application comprise the following steps: firstly, dividing mass spectrum peaks in a nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak; mass spectral peaks for any sample in the site cell were then: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and obtaining a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak; and finally, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the allele peak in the locus unit. Therefore, the genotype of the site unit can be determined based on the nucleic acid mass spectrogram and the site molecular weight, the accuracy of genotype analysis is improved, and the problem that the genotype of a nucleic acid sequence cannot be accurately predicted due to the adoption of the technologies such as restriction fragment length polymorphism, terminal restriction length polymorphism, amplified fragment length polymorphism, multiple ligation probe amplification and the like in the prior art is solved.
It is to be understood that the teachings of this application need not achieve all of the above-described benefits, but rather that specific embodiments may achieve specific technical results, and that other embodiments of this application may achieve benefits not mentioned above.
Drawings
The drawings are included to provide a further understanding of the application and are not to be construed as limiting the application. Wherein like or corresponding reference numerals designate like or corresponding parts throughout the several views.
FIG. 1 is a schematic flow chart of a method for determining genotype in one embodiment of the present application;
FIG. 2 is a schematic representation of site unit mass spectra peaks in an embodiment of the present application;
FIG. 3 is a schematic flow chart of a method of genotyping in another embodiment of the present application;
FIG. 4 is a schematic block diagram of an apparatus for genotyping according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, a schematic flow chart of a method for determining a genotype according to an embodiment of the present application; FIG. 2 is a schematic diagram of site unit mass spectra peaks in an embodiment of the present application.
A method for determining genotype, the method comprising at least the following procedures: s101, dividing mass spectrum peaks in a nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each site unit comprises a primer mass spectrum peak and at least one sample mass spectrum peak; s102, aiming at any sample mass spectrum peak in the site unit: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining validity parameters of sample mass spectral peaks based on the fitting function data; if the validity parameter meets the validity threshold, determining the sample mass spectrum peak as an allele peak; s103, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit.
In S101, according to the arrangement of the sites in the nucleic acid mass spectrum detection reagent, mass spectrum peaks in a nucleic acid mass spectrum output by a nucleic acid mass spectrometer are sequentially divided into a plurality of site units according to site molecular weights, each site unit has the same primer mass spectrum peak, but the number of sample mass spectrum peaks in each site unit is not necessarily the same, each site unit at least comprises one sample mass spectrum peak, and the number of the sample mass spectrum peaks is not more than five. For example: the number of the sample mass spectrum peaks in the first site unit is one, the number of the sample mass spectrum peaks in the second site unit is three, and the number of the sample mass spectrum peaks in the third site unit is five.
As shown in FIG. 2, the mass spectrum peak of the primer in the site unit is UP as the reference peak, and the mass spectrum peaks of the four samples are PC, PA, PG and PU, respectively. Based on the positions of the four sample mass spectrum peaks in the mass spectrogram, determining the site molecular weights corresponding to the four sample mass spectrum peaks as follows: the site molecular weight of PC is UP site molecular weight +247.2 Da; the site molecular weight of PA is UP site molecular weight +271.2 Da; the site molecular weight of PG is UP site molecular weight +287.2 Da; the site molecular weight of PU is UP site molecular weight +327.2Da or 262.2 Da.
In S102, the functional expression corresponding to the bell curve is expressed as follows:
Figure 485326DEST_PATH_IMAGE002
formula (1);
Figure 368969DEST_PATH_IMAGE003
-
Figure 842675DEST_PATH_IMAGE004
=
Figure 811768DEST_PATH_IMAGE005
formula (2);
SNR=
Figure 763544DEST_PATH_IMAGE006
/N(
Figure 767272DEST_PATH_IMAGE003
)
formula (3);
V=A/SNR
formula (4).
Wherein,
Figure 411880DEST_PATH_IMAGE007
to fit the peak height above the baseline at the center of the peak,
Figure 868269DEST_PATH_IMAGE008
for fitting the line width,
Figure 623735DEST_PATH_IMAGE009
In order to fit the center of the peak,
Figure 216391DEST_PATH_IMAGE010
for mass spectrometry of samplesThe expected molecular weight of the peak(s),
Figure 31900DEST_PATH_IMAGE011
in order to shift the peak of the image,
Figure 975585DEST_PATH_IMAGE012
is the peak area between the fitted peak and the baseline,
Figure 534743DEST_PATH_IMAGE013
Is the signal-to-noise ratio,
Figure 247484DEST_PATH_IMAGE014
Is the area variance, yiAnd
Figure 968315DEST_PATH_IMAGE015
and obtaining fitting function data for the peak height and the molecular weight corresponding to any point on the mass spectrogram of the sample. Fitting function data includes, but is not limited to
Figure 399296DEST_PATH_IMAGE016
Figure 496565DEST_PATH_IMAGE011
、A、
Figure 329392DEST_PATH_IMAGE013
Figure 221125DEST_PATH_IMAGE014
And an
Figure 139402DEST_PATH_IMAGE017
Then, a normalization model can be trained by using a deep learning method, and the normalization model is used for carrying out prediction processing on fitting function data to obtain the validity parameters of the sample mass spectrum peak; or, carrying out normalization calculation on the fitting function data by using the existing algorithm to obtain the validity parameters of the sample mass spectrum peak.
Finally, whether the validity parameter corresponding to the sample mass spectrum peak is greater than a validity threshold value or not is judged, and if the validity parameter is greater than the validity threshold value, the sample mass spectrum peak is determined to be an allele peak; if the validity parameter is not greater than the validity threshold, determining that the sample mass spectrum peak is not an allele peak. Here, the validity threshold is preset and obtained based on practical experience.
It should be noted that, based on the difference of the purpose of nucleic acid detection, when used for detecting gene type, peak height or signal-to-noise ratio is preferred as a yield detection factor, and peak area is less preferred as a yield detection factor; when the method is used for detecting gene mutation, peak area is preferably used as a yield detection factor, and secondary peak height or signal-to-noise ratio is preferably used as the yield detection factor. When selecting peak height or signal-to-noise ratio as a yield detection factor, the selection is usually based on the following rules: the peak height is selected as a yield detection factor when the sample concentration is high and the signal-to-noise ratio is selected as a yield detection factor when the sample concentration is low. Therefore, according to different purposes of nucleic acid detection, different yield detection factors need to be selected from fitting function data as characteristic values.
In S103, if there is no allele peak in the site unit, the genotype of the site unit is determined to be inconclusive. If the locus unit has an allele peak, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the allele peak in the locus unit; illustratively, if the feature value of only one allele peak of a site unit meets the feature value threshold, determining the genotype of the site unit based on the molecular weight of the site corresponding to the allele peak; if the characteristic values of two or more allele peaks in the locus unit all meet the characteristic value threshold, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak; and if the characteristic values of the allelic gene peaks of the locus units do not meet the characteristic value threshold value, determining that the genotype of the locus units is a no result.
The method for determining the genotype is provided by the embodiment and aims at the sample mass spectrum peak of any site unit in a nucleic acid mass spectrum: fitting the sample mass spectrum peak to obtain fitting function data; determining validity parameters of sample mass spectrum peaks based on fitting function data, and then judging whether the sample mass spectrum peaks are allele peaks based on the validity parameters; and finally, determining the genotype of the locus unit based on the corresponding locus molecular weight and the characteristic value of the locus peak in the locus unit. Therefore, the genotype of the site unit can be determined based on the nucleic acid mass spectrogram and the site molecular weight, the accuracy of the genotype analysis result is improved, and the problem that the genotype of the nucleic acid sequence cannot be accurately predicted due to the adoption of the technologies such as restriction fragment length polymorphism, terminal restriction length polymorphism, amplified fragment length polymorphism, multiple ligation probe amplification and the like in the prior art is solved.
As shown in FIG. 3, a schematic flow chart of a method for determining genotype in another embodiment of the present application. The embodiment is further optimized on the basis of the previous embodiment. The method at least comprises the following operation flows: s301, dividing mass spectrum peaks in a nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each site unit comprises a primer mass spectrum peak and at least one sample mass spectrum peak; s302, aiming at any sample mass spectrum peak in the site unit: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining validity parameters of sample mass spectral peaks based on the fitting function data; if the validity parameter meets the validity threshold, determining the sample mass spectrum peak as an allele peak; s303, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit; s304, determining the genotype reliability parameters of the locus units; s305, judging whether the genotype reliability parameters meet a preset reliability threshold value; and S306, determining the genotype reliability result of the site unit based on the judgment result.
The specific implementation process of S301 is similar to the specific implementation process of S101 in the embodiment shown in fig. 1, and is not described here again.
In S302, the fitting function data is normalized by using the existing algorithm, so as to obtain the validity parameter of the sample mass spectrum peak. Illustratively, determining a signal-to-noise ratio parameter, a resolution parameter, a peak offset parameter, a peak width parameter, and a peak shape parameter of the sample mass spectral peak based on fitting function data of the sample mass spectral peak; and determining the validity parameter of the sample mass spectrum peak based on the signal-to-noise ratio parameter, the resolution parameter, the peak offset parameter, the peak width parameter and the peak shape parameter.
Signal to noise ratio parameter of sample mass spectral peak
Figure 40362DEST_PATH_IMAGE018
For expressing the effect of baseline variation (chemical noise) on peak height, the calculation formula of the signal-to-noise ratio parameter is as follows:
Figure 993275DEST_PATH_IMAGE019
Figure 587067DEST_PATH_IMAGE020
formula (5);
wherein,
Figure 727061DEST_PATH_IMAGE021
a threshold value representing the effectiveness of a sample mass spectral peak, preferably 0.8;
Figure 431712DEST_PATH_IMAGE022
a signal-to-noise threshold representing the identification of sample mass spectral peaks, preferably 1.5.
Resolution parameters of sample mass spectral peaks
Figure 239131DEST_PATH_IMAGE023
For representing the position relationship between the target peak and the surrounding peaks, the calculation formula of the resolution parameter is as follows:
Figure 472666DEST_PATH_IMAGE024
Figure 99957DEST_PATH_IMAGE025
formula (6);
Wherein,
Figure 342719DEST_PATH_IMAGE015
heel
Figure 739066DEST_PATH_IMAGE026
Respectively representing the mass of the ith sample mass spectrum peak and a peak shape weighting value,
Figure 426659DEST_PATH_IMAGE027
heel
Figure 541246DEST_PATH_IMAGE028
Respectively representing the mass and peak shape weights of sample mass spectral peaks adjacent to the ith sample mass spectral peak; wherein the weighted values of the sample mass spectrum peak and the primer mass spectrum peak of the strong peak are both 1, and the weighted values of the sample mass spectrum peak and the primer mass spectrum peak of the weak peak are both 0.05, wherein
Figure 322120DEST_PATH_IMAGE029
A special parameter, preferably 0.7,
Figure 838552DEST_PATH_IMAGE030
representing parameters calculated based on the molecular weight at the mass spectrum peak of the jth sample
Figure 679469DEST_PATH_IMAGE031
Offset parameter of sample mass spectral peak
Figure 281352DEST_PATH_IMAGE032
For representing the closeness of the fitted peak of the sample mass spectral peak to the sample mass spectral peak mass signal, the offset parameter is calculated as follows:
Figure 865917DEST_PATH_IMAGE033
Figure 236855DEST_PATH_IMAGE034
formula (7);
wherein,
Figure 248674DEST_PATH_IMAGE021
a threshold value representing the effectiveness of a sample mass spectral peak, preferably 0.8;
Figure 72273DEST_PATH_IMAGE035
one particular parameter, preferably 0.7, represents a calculated offset parameter.
Peak width parameter of sample mass spectral peak
Figure 460529DEST_PATH_IMAGE036
The calculation formula of (a) is as follows:
Figure 685974DEST_PATH_IMAGE037
formula (8);
where D calculates a particular parameter of the peak width parameter, preferably-0.001 or-0.0005.
Peak shape parameter
Figure 868694DEST_PATH_IMAGE038
The calculation formula of the peak shape parameter is as follows:
Figure 445169DEST_PATH_IMAGE039
formula (9);
wherein E is a special parameter for calculating the peak shape parameter, preferably 0.2 or 0.1;
Figure 371536DEST_PATH_IMAGE017
is the square root of the difference in fitted area, i.e., the sum of squared differences between the fitted intensity and the measured intensity.
The formula for calculating the validity parameter of the sample mass spectrum peak is as follows:
Figure 451488DEST_PATH_IMAGE040
=
Figure 70688DEST_PATH_IMAGE041
×
Figure 868880DEST_PATH_IMAGE042
×
Figure 598938DEST_PATH_IMAGE043
×
Figure 267817DEST_PATH_IMAGE044
×
Figure 57919DEST_PATH_IMAGE045
formula (10).
In S303, a method for determining a genotype for a site unit based on a single allele peak, comprising: and if the characteristic value of only one allele peak of the site unit meets the threshold value of the characteristic value, determining the genotype of the site unit based on the molecular weight of the site corresponding to the allele peak. For example, if the peak height of only one allele peak of a site unit is greater than the peak height threshold, the genotype of the site unit is determined according to the site molecular weight corresponding to the allele peak. The peak height threshold may be 1.0.
A method for determining the genotype of a site unit based on two allelic peaks, comprising: and if the characteristic values of two allele peaks in the locus unit both meet the characteristic value threshold value, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele. Determining product intensities for the locus units based on the eigenvalues of the two allele peaks; acquiring a characteristic value of a primer mass spectrum peak, and determining the elongation rate of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; if the product strength and the elongation rate both meet preset conditions, determining a penalty value of the locus unit based on the characteristic value of the minor allele and the characteristic value of the major allele; and judging whether the penalty value meets a preset penalty value or not, and if so, recording the genotype of the locus unit as a secondary allele-main allele.
For example: and S11, if the peak heights of two allele peaks in the locus unit are both larger than the peak height threshold value, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele.
S12, summing the peak height of the major allele and the peak height of the minor allele to obtain the product intensity M of the site unit.
S13, acquiring the peak height of the primer mass spectrum peak, and summing the product intensity and the peak height of the primer mass spectrum peak to obtain a summation result; the product intensity is ratioed to the addition result to obtain the elongation of the site unit.
S14, judging whether the product strength of the site unit is greater than the preset strength and the elongation is greater than the preset elongation; and if the judgment result indicates that the strength of the product is greater than the preset strength and the elongation rate is greater than the preset elongation rate, determining that the genotype of the locus unit has a result, and executing the step S15.
And if the judgment result indicates that one of the strength and the elongation rate of the product does not meet the preset condition, determining that the genotype of the site unit is a no result.
S15, obtaining a parameter X by taking the peak height of the minor allele and the sum of the peak heights of the minor allele and the major allele as a ratio; calculating a penalty value Y by using the parameter X according to a penalty value calculation formula; the penalty value Y is calculated as follows:
Ymin=
Figure 608986DEST_PATH_IMAGE046
formula (11);
wherein, the parameter sets A (1.0,0.5,0), B (0.1,0.4,0.1), i are array subscripts.
S15, judging whether the penalty value is smaller than a preset penalty value or not, and recording the genotype of the locus unit as a secondary allele-main allele if the judgment result represents that the penalty value is smaller than the preset penalty value; and if the judgment result represents that the penalty value is not less than the preset penalty value, determining that the genotype of the locus unit is a no result.
A method for determining the genotype of a site unit based on more than two allelic peaks, comprising: if the characteristic values of more than two allele peaks in the locus unit meet the characteristic value threshold, determining the product strength of the locus unit based on the characteristic values of more than two allele peaks; acquiring a characteristic value of a primer mass spectrum peak, and determining the elongation rate of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; if the product intensity and the elongation rate both meet preset conditions, aiming at any allele peak in the locus unit: determining a frequency of the allelic peak based on the eigenvalue of the allelic peak and the product intensity; arranging the frequencies of all allele peaks in the locus unit from big to small, taking the allele peak with the largest frequency in the sequence as an initial gene peak of the genotype, then sequentially judging whether the frequency of each allele peak meets a preset frequency threshold value according to the sequence, and sequentially arranging the allele peaks meeting the preset frequency threshold value behind the initial gene peak based on the judgment result to obtain the genotype sequence; and determining the genotype of the site unit based on the site molecular weight corresponding to the allele peak in the genotype sequencing.
For example, in S11, if the peak heights of two or more allele peaks in the site unit are both greater than the peak height threshold, the peak heights of the two or more allele peaks are summed to obtain the product intensity M of the site unit.
S12, acquiring the peak height of the primer mass spectrum peak, and summing the product intensity and the peak height of the primer mass spectrum peak to obtain a summation result; the product intensity is ratioed to the addition result to obtain the elongation of the site unit.
S13, judging whether the product strength of the site unit is greater than the preset strength and the elongation is greater than the preset elongation; and if the judgment result indicates that the strength of the product is greater than the preset strength and the elongation rate is greater than the preset elongation rate, determining that the genotype of the locus unit has a result, and executing the step S14.
And if the judgment result indicates that one of the strength and the elongation rate of the product does not meet the preset condition, determining that the genotype of the site unit is a no result.
S14, for any allelic peak in the locus unit: obtaining a frequency of an allelic peak based on a ratio of a peak height of the allelic peak and a product intensity; arranging the frequencies of all the allelic gene peaks in the locus units in descending order, taking the allelic gene peak with the highest frequency in the sequence as the initial gene peak of the genotype,
s15, for any allelic peak in the ranking: and judging whether the frequency of the allele peak is greater than a preset frequency threshold value or not according to the sequence, splicing the allele peak at the initial gene peak in sequence if the frequency of the allele peak represented by the judgment result is greater than the preset frequency, and discarding the allele peak if the frequency of the allele peak represented by the judgment result is not greater than the preset frequency. Then sequentially judging whether other allele peaks are larger than a preset frequency threshold according to the sequence, thereby obtaining genotype sequence;
and S16, determining the genotype of the site unit based on the site molecular weight corresponding to the allele peak in the genotype sorting.
Therefore, the genotype of the site unit can be determined according to the number of the allele peaks in the site unit, and the accuracy of the genotype analysis of the site unit is improved.
In S304 to S306, the minimum validity parameter is selected from the site unit as the quality parameter of the site unit
Figure 877156DEST_PATH_IMAGE047
(ii) a Determining influencing parameters of a site unit
Figure 666120DEST_PATH_IMAGE048
(ii) a Determining yield parameters of the locus units based on the characteristic values of all allele peaks in the locus units and the characteristic values of the mass spectrum peaks of the primers
Figure 627123DEST_PATH_IMAGE049
(ii) a Determining a genotype reliability parameter for the locus unit based on the quality parameter, the influential parameter, and the yield parameter.
Determining the influence parameters of the locus units if only one allele peak exists in the locus units
Figure 134328DEST_PATH_IMAGE048
Is a constant; for example, influencing parameters
Figure 206189DEST_PATH_IMAGE048
=1。
If there are two or more allele peaks in the site unit, the allele peak with the largest molecular weight at the site among all the allele peaks is taken as the major allele, the allele peak with the smallest molecular weight at the site is taken as the minor allele, and the influence parameter of the site unit is determined based on the ratio between the minor allele and the major allele
Figure 380819DEST_PATH_IMAGE050
Influencing parameters of site units
Figure 247143DEST_PATH_IMAGE051
The calculation formula of (a) is as follows:
Figure 507223DEST_PATH_IMAGE052
Figure 117196DEST_PATH_IMAGE053
formula (12);
wherein H is an adjustable parameter, preferably 0.2 or 0.5,
Figure 146332DEST_PATH_IMAGE054
calculated as the ratio between the peak height of the major allele and the peak height of the minor allele,
Figure 183558DEST_PATH_IMAGE021
a threshold value of validity representing a peak of the mass spectrum of the sample, preferably 0.8, if
Figure 936794DEST_PATH_IMAGE055
(influencing parameter threshold, preferably 0.8), recalculation
Figure 350458DEST_PATH_IMAGE051
Is marked as
Figure 234100DEST_PATH_IMAGE056
And, it means a reverse tilt,
Figure 442227DEST_PATH_IMAGE056
the calculation formula of (a) is as follows:
Figure 411321DEST_PATH_IMAGE057
Figure 363096DEST_PATH_IMAGE058
formula (13);
wherein,
Figure 366824DEST_PATH_IMAGE059
for adjustable parameters, 0.75 or 0.6 is preferred.
Yield parameters of site units
Figure 11432DEST_PATH_IMAGE060
The calculation formula of (a) is as follows:
Figure 467821DEST_PATH_IMAGE061
Figure 223288DEST_PATH_IMAGE062
Figure 815943DEST_PATH_IMAGE063
Figure 897031DEST_PATH_IMAGE064
formula (14);
wherein,
Figure 840717DEST_PATH_IMAGE065
for adjustable parameters, preferably 0.1 or 0.25,
Figure 134295DEST_PATH_IMAGE066
preferably the concentration of the organic acid in the organic acid is 0.8,
Figure 581457DEST_PATH_IMAGE067
the peak heights of the peaks of the allele and the primer mass spectrum are respectively shown.
Genotype reliability parameters for site units
Figure 567867DEST_PATH_IMAGE014
The calculation formula of (a) is as follows:
Figure 998848DEST_PATH_IMAGE068
=
Figure 361697DEST_PATH_IMAGE047
×
Figure 663365DEST_PATH_IMAGE048
×
Figure 86256DEST_PATH_IMAGE049
formula (15).
The preset reliability threshold comprises a first preset reliability threshold
Figure 4534DEST_PATH_IMAGE069
A second predetermined reliability threshold
Figure 639914DEST_PATH_IMAGE070
And a third predetermined reliability threshold
Figure 592827DEST_PATH_IMAGE071
. Reliability parameter of gene type
Figure 921040DEST_PATH_IMAGE068
Less than a first predetermined reliability threshold
Figure 61034DEST_PATH_IMAGE069
Determining that the genotype reliability result of the locus unit is a low probability type; if the genotype reliability parameter is greater than a first preset reliability threshold
Figure 765685DEST_PATH_IMAGE069
And is less than a second preset reliability threshold
Figure 307525DEST_PATH_IMAGE070
Determining the genotype reliability result of the locus unit as a possible genotype; if the genotype reliability parameter is larger than a second preset reliability threshold value
Figure 806639DEST_PATH_IMAGE070
And is less than a third preset reliability threshold
Figure 433930DEST_PATH_IMAGE071
Determining the genotype reliability result of the locus unit as a positive type; if the genotype reliability parameter is larger than a third preset reliability threshold value
Figure 411113DEST_PATH_IMAGE071
And determining the genotype reliability result of the site unit as the determination type. And similarly, traversing all the site units in the nucleic acid mass spectrogram to obtain a genotype reliability result corresponding to each site unit. And finally, outputting the genotype corresponding to the nucleic acid mass spectrogram and the genotype reliability result as a genotype analysis report.
In this example, the reliability of genotype analysis was improved by analyzing the reliability of genotype to obtain the reliability result of genotype corresponding to the nucleic acid mass spectrum.
As shown in FIG. 4, a schematic block diagram of an apparatus for determining a genotype according to an embodiment of the present application. An apparatus for determining a genotype, the apparatus 400 comprising: the dividing module 401 is configured to divide mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each site unit comprises a primer mass spectrum peak and at least one sample mass spectrum peak; a first determining module 402 for determining, for any sample mass spectral peak in the site unit: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value is peak height, peak area or signal-to-noise ratio; determining validity parameters of sample mass spectral peaks based on the fitting function data; if the validity parameter meets the validity threshold, determining the sample mass spectrum peak as an allele peak; and a second determining module 403, configured to determine the genotype of the locus unit based on the locus molecular weight and the feature value corresponding to the locus peak in the locus unit.
In an alternative embodiment, the second determining module 403 includes: the first determining subunit is used for determining the genotype of the locus unit based on the locus molecular weight corresponding to the allele peak if the characteristic value of only one allele peak of the locus unit meets the characteristic value threshold; and the second determining subunit is used for determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak if the characteristic values of two or more allele peaks in the locus unit all meet the characteristic value threshold.
In an alternative embodiment, the second determining subunit includes: the first subunit is used for taking the allele peak with the maximum molecular weight as a main allele and taking the allele peak with the minimum molecular weight as a secondary allele if the characteristic values of two allele peaks in the locus unit both meet a characteristic value threshold; a second subunit for determining the product intensity of the site unit based on the eigenvalues of the two allele peaks; the third subunit is used for acquiring the characteristic value of the primer mass spectrum peak and determining the elongation rate of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; the fourth subunit is used for determining the penalty value of the locus unit based on the characteristic value of the minor allele and the characteristic value of the major allele if the product strength and the elongation rate both meet preset conditions; and judging whether the penalty value meets a preset penalty value or not, and if so, recording the genotype of the locus unit as a secondary allele-main allele.
In an optional embodiment, the second determining subunit further includes: a fifth subunit, configured to determine, if the feature values of more than two allele peaks in the site unit both satisfy the feature value threshold, a product intensity of the site unit based on the feature values of the more than two allele peaks; the sixth subunit is used for acquiring the characteristic value of the primer mass spectrum peak and determining the elongation rate of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak; a seventh subunit, configured to, if both the product intensity and the elongation rate satisfy the predetermined condition, direct at any one of the allele peaks in the locus unit: determining a frequency of the allelic peak based on the eigenvalue of the allelic peak and the product intensity; the eighth subunit is used for arranging the frequencies of all the allele peaks in the locus unit in a descending order, taking the allele peak with the highest frequency in the ordering as an initial gene peak of the genotype, then sequentially judging whether the frequency of each allele peak meets a preset frequency threshold value according to the ordering, and sequentially arranging the allele peaks meeting the preset frequency threshold value behind the initial gene peak based on the judgment result to obtain the genotype ordering; and the ninth subunit is used for determining the genotype of the locus unit based on the locus molecular weight corresponding to the allele peak in the genotype sequencing.
In an alternative embodiment, the first determining module comprises: the first determining subunit is used for determining a signal-to-noise ratio parameter, a resolution parameter, a peak offset parameter, a peak width parameter and a peak shape parameter of a sample mass spectrum peak based on fitting function data of the sample mass spectrum peak; and the second determining subunit is used for determining the validity parameter of the sample mass spectrum peak based on the signal-to-noise ratio parameter, the resolution parameter, the peak offset parameter, the peak width parameter and the peak shape parameter.
In an alternative embodiment, the apparatus for determining a genotype further comprises: the third determining module is used for determining the genotype reliability parameters of the locus unit; the judging module is used for judging whether the genotype reliability parameters meet a preset reliability threshold value; and the fourth determining module is used for determining the genotype reliability result of the site unit based on the judgment result.
In an alternative embodiment, the third determining module includes: the selecting unit is used for selecting the minimum effectiveness parameter from the site unit as the quality parameter of the site unit; a first determination unit for determining an influential parameter of the site unit; the second determination unit is used for determining the yield parameters of the locus unit based on the characteristic values of all allele peaks in the locus unit and the characteristic values of the primer mass spectrum peaks; and the third determining unit is used for determining the genotype reliability parameter of the locus unit based on the quality parameter, the influential parameter and the yield parameter.
In an alternative embodiment, the first determination unit includes: the first subunit is used for determining that the influence parameter of the locus unit is a constant if only one allele peak exists in the locus unit; and a second subunit, configured to, if there are two or more allele peaks in the site unit, determine an influence parameter of the site unit based on a ratio between the minor allele and the major allele by using, as a major allele, an allele peak having a largest site molecular weight among all allele peaks and by using, as a minor allele, an allele peak having a smallest site molecular weight.
In an optional embodiment, the preset reliability threshold includes a first preset reliability threshold, a second preset reliability threshold, and a third preset reliability threshold; the fourth determining module includes: the first determining subunit is used for determining that the genotype reliability result of the locus unit is a low probability type if the genotype reliability parameter is smaller than a first preset reliability threshold; the second determining subunit is used for determining that the genotype reliability result of the locus unit is a possible genotype if the genotype reliability parameter is greater than the first preset reliability threshold and less than a second preset reliability threshold; the third determining subunit is used for determining that the genotype reliability result of the locus unit is positive if the genotype reliability parameter is greater than the second preset reliability threshold and less than a third preset reliability threshold; and the fourth determining subunit is used for determining that the genotype reliability result of the locus unit is the determined type if the genotype reliability parameter is greater than the third preset reliability threshold.
The device can execute the method for determining the genotype, and has the corresponding functional modules and beneficial effects for executing the method for determining the genotype. The details of the techniques not described in detail in this example can be found in the methods for determining the genotype provided in the examples of the present application.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of a unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another device, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage medium, a Read Only Memory (ROM), a magnetic disk, and an optical disk.
Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the related art may be embodied in the form of a software product stored in a storage medium, and including several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a removable storage medium, a ROM, a magnetic disk, an optical disk, or the like, which can store the program code.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of determining genotype, comprising:
dividing mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak;
mass spectral peaks for any sample in the site cell: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value comprises at least one of peak height, peak area and signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak;
and determining the genotype of the site unit based on the site molecular weight and the characteristic value corresponding to the allele peak in the site unit.
2. The method according to claim 1, wherein determining the genotype of the site unit based on the site molecular weight and the characteristic value corresponding to the allele peak in the site unit comprises:
if the characteristic value of only one allele peak of the locus unit meets the threshold value of the characteristic value, determining the genotype of the locus unit based on the molecular weight of the locus corresponding to the allele peak;
and if the characteristic values of two or more allele peaks in the locus unit all meet the characteristic value threshold, determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to each allele peak.
3. The method of claim 2, wherein if the feature values of two allelic peaks in the site unit satisfy the feature value threshold, determining the genotype of the site unit based on the site molecular weight and the feature value corresponding to each of the allelic peaks comprises:
if the characteristic values of two allele peaks in the locus unit both meet the characteristic value threshold, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele;
determining product intensities for the locus units based on eigenvalues of both of the allele peaks;
acquiring a characteristic value of the primer mass spectrum peak, and determining the elongation of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak;
determining a penalty value of the locus unit based on the characteristic value of the minor allele and the characteristic value of the major allele if the product intensity and the elongation both satisfy a preset condition; and judging whether the penalty value meets a preset penalty value or not, and if so, recording the genotype of the locus unit as a secondary allele-main allele.
4. The method according to claim 2, wherein if the feature values of two or more allelic peaks satisfy the feature value threshold, determining the genotype of the site unit based on the site molecular weight and the feature value corresponding to each of the allelic peaks comprises:
determining the product intensity of the locus unit based on the characteristic values of more than two allele peaks if the characteristic values of more than two allele peaks in the locus unit all meet a characteristic value threshold;
acquiring a characteristic value of the primer mass spectrum peak, and determining the elongation of the site unit based on the product intensity and the characteristic value of the primer mass spectrum peak;
if both the product intensity and the elongation meet a predetermined condition, then for any one of the allele peaks in the locus unit: determining a frequency of the allelic peak based on the eigenvalue of the allelic peak and the product intensity;
arranging the frequencies of all allele peaks in the locus unit from big to small, taking the allele peak with the highest frequency in the sequence as an initial gene peak of the genotype, then sequentially judging whether the frequency of each allele peak meets a preset frequency threshold value according to the sequence, and sequentially arranging the allele peaks meeting the preset frequency threshold value at the initial gene peak based on the judgment result to obtain the genotype sequence;
and determining the genotype of the site unit based on the site molecular weight corresponding to the allele peak in the genotype sequencing.
5. The method of claim 1, wherein said determining a significance parameter for said sample mass spectral peak based on said fit function data comprises:
determining a signal-to-noise ratio parameter, a resolution parameter, a peak offset parameter, a peak width parameter, and a peak shape parameter of the sample mass spectrum peak based on the fitting function data of the sample mass spectrum peak;
and determining the validity parameter of the sample mass spectrum peak based on the signal-to-noise ratio parameter, the resolution parameter, the peak offset parameter, the peak width parameter and the peak shape parameter.
6. The method of claim 1, further comprising:
determining a genotype reliability parameter for the locus unit;
judging whether the genotype reliability parameters meet a preset reliability threshold value or not;
and determining the genotype reliability result of the site unit based on the judgment result.
7. The method of claim 6, wherein said determining a genotype reliability parameter for said site unit comprises:
selecting a minimum effectiveness parameter from the site unit as a quality parameter of the site unit;
determining an influential parameter for the site unit;
determining a yield parameter of the locus unit based on the characteristic values of all allele peaks in the locus unit and the characteristic values of the primer mass spectrum peaks;
determining a genotype reliability parameter for the locus unit based on the quality parameter, the influential parameter, and the yield parameter.
8. The method of claim 7, wherein determining an impact parameter of the site unit comprises:
if only one allele peak exists in the locus unit, determining that the influence parameter of the locus unit is a constant;
and if two or more allele peaks exist in the locus unit, taking the allele peak with the largest locus molecular weight as a main allele and taking the allele peak with the smallest locus molecular weight as a secondary allele, and determining the influence parameter of the locus unit based on the ratio between the secondary allele and the main allele.
9. The method of claim 6, wherein the preset reliability threshold comprises a first preset reliability threshold, a second preset reliability threshold, and a third preset reliability threshold; the determining the genotype reliability result of the locus unit based on the judgment result comprises the following steps:
if the genotype reliability parameter is smaller than a first preset reliability threshold, determining that the genotype reliability result of the locus unit is a low probability type;
if the genotype reliability parameter is larger than a first preset reliability threshold and smaller than a second preset reliability threshold, determining that the genotype reliability result of the locus unit is a possible genotype;
if the genotype reliability parameter is greater than a second preset reliability threshold and less than a third preset reliability threshold, determining that the genotype reliability result of the locus unit is positive;
and if the genotype reliability parameter is larger than a third preset reliability threshold value, determining the genotype reliability result of the locus unit as a determination type.
10. An apparatus for determining a genotype, comprising:
the dividing module is used for dividing mass spectrum peaks in the nucleic acid mass spectrogram into a plurality of site units according to site molecular weights; each of the site units comprises a primer mass spectrum peak and at least one sample mass spectrum peak;
a first determination module to determine, for any sample mass spectral peak in the site cell: fitting the sample mass spectrum peak according to a bell-shaped curve to obtain fitting function data, and acquiring a characteristic value of the sample mass spectrum peak from the fitting function data, wherein the characteristic value comprises at least one of peak height, peak area and signal-to-noise ratio; determining a validity parameter for the sample mass spectral peak based on the fit function data; if the validity parameter meets a validity threshold, determining the sample mass spectrum peak as an allele peak;
and the second determination module is used for determining the genotype of the locus unit based on the locus molecular weight and the characteristic value corresponding to the locus peak in the locus unit.
CN202111658718.6A 2021-12-31 2021-12-31 Method and device for determining genotype Active CN114023379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111658718.6A CN114023379B (en) 2021-12-31 2021-12-31 Method and device for determining genotype

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111658718.6A CN114023379B (en) 2021-12-31 2021-12-31 Method and device for determining genotype

Publications (2)

Publication Number Publication Date
CN114023379A true CN114023379A (en) 2022-02-08
CN114023379B CN114023379B (en) 2022-05-13

Family

ID=80069452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111658718.6A Active CN114023379B (en) 2021-12-31 2021-12-31 Method and device for determining genotype

Country Status (1)

Country Link
CN (1) CN114023379B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115684606A (en) * 2022-10-21 2023-02-03 南方医科大学珠江医院 M protein detection method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001027857A2 (en) * 1999-10-13 2001-04-19 Sequenom, Inc. Methods for generating databases and databases for identifying polymorphic genetic markers
CN101984445A (en) * 2010-03-04 2011-03-09 深圳华大基因科技有限公司 Method and system for implementing typing based on polymerase chain reaction sequencing
CN103589797A (en) * 2013-11-12 2014-02-19 中国农业科学院蔬菜花卉研究所 SNP (single nucleotide polymorphism) genotyping method and application thereof
CN106755408A (en) * 2016-12-22 2017-05-31 北京林业大学 A kind of plant allele imbalance detection of expression method
CN111041079A (en) * 2019-12-31 2020-04-21 博淼生物科技(北京)有限公司 Flight mass spectrum genotyping detection method
CN111325121A (en) * 2020-02-10 2020-06-23 浙江迪谱诊断技术有限公司 Nucleic acid mass spectrum numerical value processing method
CN112143816A (en) * 2019-06-26 2020-12-29 司法鉴定科学研究院 29-plex Y-STR typing system for family search and paternal biological geographic ancestry inference

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001027857A2 (en) * 1999-10-13 2001-04-19 Sequenom, Inc. Methods for generating databases and databases for identifying polymorphic genetic markers
CN101984445A (en) * 2010-03-04 2011-03-09 深圳华大基因科技有限公司 Method and system for implementing typing based on polymerase chain reaction sequencing
CN103589797A (en) * 2013-11-12 2014-02-19 中国农业科学院蔬菜花卉研究所 SNP (single nucleotide polymorphism) genotyping method and application thereof
CN106755408A (en) * 2016-12-22 2017-05-31 北京林业大学 A kind of plant allele imbalance detection of expression method
CN112143816A (en) * 2019-06-26 2020-12-29 司法鉴定科学研究院 29-plex Y-STR typing system for family search and paternal biological geographic ancestry inference
CN111041079A (en) * 2019-12-31 2020-04-21 博淼生物科技(北京)有限公司 Flight mass spectrum genotyping detection method
CN111325121A (en) * 2020-02-10 2020-06-23 浙江迪谱诊断技术有限公司 Nucleic acid mass spectrum numerical value processing method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PAUL OETH等: "《Qualitative and Quantitative Genotyping Using Single Base Primer Extension Coupled with Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry》", 《SPRINGERLINK》 *
郭甜利等: "《GAMarker基因分型专家系统的设计与实现》", 《刑事技术》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115684606A (en) * 2022-10-21 2023-02-03 南方医科大学珠江医院 M protein detection method
CN115684606B (en) * 2022-10-21 2023-11-28 南方医科大学珠江医院 M protein detection method

Also Published As

Publication number Publication date
CN114023379B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
Kudaravalli et al. Gene expression levels are a target of recent natural selection in the human genome
KR101542529B1 (en) Examination methods of the bio-marker of allele
US20080154512A1 (en) Systems and methods for baselining and real-time pcr data analysis
KR101460520B1 (en) Detecting method for disease markers of NGS data
KR101936934B1 (en) Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same
CN108913776B (en) Screening method and kit for DNA molecular markers related to radiotherapy and chemotherapy injury
KR101936933B1 (en) Methods for detecting nucleic acid sequence variations and a device for detecting nucleic acid sequence variations using the same
JP2005531853A (en) System and method for SNP genotype clustering
CN114023379B (en) Method and device for determining genotype
US7640113B2 (en) Methods and apparatus for complex genetics classification based on correspondence analysis and linear/quadratic analysis
EP2660310A1 (en) Comprehensive glaucoma determination method utilizing glaucoma diagnosis chip and deformed proteomics cluster analysis
Barnett et al. Genomic machine learning meta-regression: insights on associations of study features with reported model performance
EP1635276A2 (en) Display method and display apparatus of gene information
CN109584955A (en) A method of mankind's rdaiation response biomarker is identified based on various plants genome
US20100203546A1 (en) Allele Determining Device, Allele Determining Method And Computer Program
Campos-Martin et al. Reliable genotyping of recombinant genomes using a robust hidden Markov model
KR20150039484A (en) Method and apparatus for diagnosing cancer using genetic information
Won et al. EnsemPro: an ensemble approach to predicting transcription start sites in human genomic DNA sequences
CN113862371A (en) Prediction device for alcohol-related hepatocellular carcinoma disease progression and prognosis risk and training method of prediction model thereof
Zhang et al. An information gain-based method for evaluating the classification power of features towards identifying enhancers
US20150347674A1 (en) System and method for analyzing biological sample
WO2008070328A2 (en) Systems and methods for baselining and real-time pcr data analysis
KR20220085139A (en) Method of gene selection for predicting medical information of patients and uses thereof
Márquez et al. Dimensionality and the statistical power of multivariate genome-wide association studies
JP4414823B2 (en) Gene information display method and display device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant