CN108710782A - Genotype conversion method, device and electronic equipment - Google Patents

Genotype conversion method, device and electronic equipment Download PDF

Info

Publication number
CN108710782A
CN108710782A CN201810471186.7A CN201810471186A CN108710782A CN 108710782 A CN108710782 A CN 108710782A CN 201810471186 A CN201810471186 A CN 201810471186A CN 108710782 A CN108710782 A CN 108710782A
Authority
CN
China
Prior art keywords
genotype
gene
site information
information
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810471186.7A
Other languages
Chinese (zh)
Other versions
CN108710782B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shuo Medical Data Technology (beijing) Co Ltd
Original Assignee
Shuo Medical Data Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shuo Medical Data Technology (beijing) Co Ltd filed Critical Shuo Medical Data Technology (beijing) Co Ltd
Priority to CN201810471186.7A priority Critical patent/CN108710782B/en
Publication of CN108710782A publication Critical patent/CN108710782A/en
Application granted granted Critical
Publication of CN108710782B publication Critical patent/CN108710782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Abstract

The present invention provides a kind of genotype conversion method, device and electronic equipment, which includes:Genotype library is established according to pharmacogenetics and pharmacogenomics database pharmGKB;The corresponding VCF files of gene to be detected and BAM files are obtained, the VCF files and the BAM files carry gene loci information;Wild type gene site information is extracted from BAM files, and mutated genes site information is extracted from VCF files;According to wild type gene site information and mutated genes site information, it is based on said gene type library, gene loci information is converted into genotype using genotype converter.In this way by wild type gene site and mutated genes site, based on the genotype library accurate positionin established previously according to pharmGKB to genotype, speed is fast, accuracy rate is high, and can mass processing, effectively increase the Classification and Identification efficiency of genotype, and greatly reduce cost.

Description

Genotype conversion method, device and electronic equipment
Technical field
The present invention relates to pharmacogenomics technical fields, more particularly, to a kind of genotype conversion method, device and electricity Sub- equipment.
Background technology
Pharmacogenomics (Pharmacogenomics, PGx) be study genetic mutation caused by various disease to drug Differential responses subject, main research be genome or genetic mutation to drug in people's body absorption, metabolism, curative effect And adverse reaction the phenomenon that having an impact and its mechanism.In July, 2016, clinical pharmacology implemented alliance (CPIC) in Genetics Document is delivered in Medicine magazines, pharmacogenomics related gene is divided into three categories:Drug-metabolization enzymes (CYP family Race, UGT1A1, DPYD and TPMT), drug transporters (such as SLCO1B1), high risk genotype (such as HLA-B).
Pharmacogenomics, which have become, to be instructed clinical individual medication, assessment severe drug adverse reaction occurrence risk, refers to It leads new drug development and evaluates the important tool of new drug, the new drug of part listing is only limitted to the eligible patients of specific genotype.It is beautiful State's FDA approveds increase drug gene group information, the Drug Discovery biology mark being related in the medicine label of more than 140 kind drugs Remember object 42.In addition, part industry guide is also by the biomarker and its characteristic (such as mgmt gene first of the non-FDA approvals in part Base) detection be included in the treatment guidelines of disease.The Molecular Detection of drug response related gene and its expression product is implementation The premise of body drug therapy
Genetic test is carried out to patient, and by traditional bioinformatic analysis, is obtained by a generation, two generation sequencing technologies To patient gene's site information, such as:Rs17886522 A/A, but drug metabolism, transhipment not necessarily with individual gene site phase It closes, is more associated with multiple gene locis, so, the gene point mutation information of patient is needed by identification, conversion At pharmGKB (the pharmacogenetics and pharmacogenomics knowledgebase, pharmacogenetics With pharmacogenomics database) in genotype, the drug metabolism situation of patient could be understood, realize precisely use Medicine, the prior art mainly carry out Classification and Identification to patient gene's detection site by artificial method, determine genotype, then Data deciphering is carried out, time-consuming, high labor cost, and accuracy rate is unstable.
Invention content
In view of this, the purpose of the present invention is to provide a kind of genotype conversion method, device and electronic equipments, with accurate Navigate to genotype, speed is fast, and accuracy rate is high, and can mass processing, effectively increase the Classification and Identification efficiency of genotype, And greatly reduce cost.
In a first aspect, an embodiment of the present invention provides a kind of genotype conversion methods, including:
Genotype library is established according to pharmacogenetics and pharmacogenomics database pharmGKB;
The corresponding VCF files of gene to be detected and BAM files are obtained, the VCF files and the BAM files carry Gene loci information;
Wild type gene site information is extracted from the BAM files, and mutated genes position is extracted from the VCF files Point information;
According to the wild type gene site information and the mutated genes site information, it is based on the genotype library, The gene loci information is converted into genotype using genotype converter.
With reference to first aspect, an embodiment of the present invention provides the first possible embodiments of first aspect, wherein institute It states before extracting mutated genes site information in the VCF files, further includes:To the data format in the VCF files It is standardized.
The possible embodiment of with reference to first aspect the first, an embodiment of the present invention provides second of first aspect Possible embodiment, wherein the data format in the VCF files be standardized after further include:
Gene annotation, and the gene institute to being carried in VCF files are carried out to the data in the VCF files after standardized format Positive minus strand carry out left and right correction.
With reference to first aspect, an embodiment of the present invention provides the third possible embodiments of first aspect, wherein institute Stating genotype library includes:Single-point genotypic database and multiple spot genotypic database;In the wherein described single-point genotypic database The single-point genotype determined by individual gene site is stored, the multiple spot genotype data library storage is determined by multiple gene locis Multiple spot genotype.
With reference to first aspect, an embodiment of the present invention provides the 4th kind of possible embodiments of first aspect, wherein institute It includes Gene Name, genotype, site information to state the corresponding parameter of the genotype in genotype library;
The process that the gene loci information is converted to genotype includes by the genotype converter:
The wild type gene site information and the mutated genes site information are integrated to obtain integrator gene Site information;
Genotype in the integrator gene site information and the genotype library is subjected to Gene Name matching, if described The Gene Name of integrator gene site information and the Gene Name of genotype in the genotype library match, then carry out genotype It is matched with site information;
According to genotype and the matched matching result of site information, the integrator gene site information is converted to accordingly Genotype.
The 4th kind of possible embodiment with reference to first aspect, an embodiment of the present invention provides the 5th kind of first aspect Possible embodiment, wherein
It is described the integrator gene site information is matched with the Gene Name in the genotype library before also wrap It includes:
When the mutant gene locus of the integrator gene site information carries CNV information, to the integrator gene position Point information carries out haplotype analysis;
When the variation frequency of the mutant gene locus is more than predetermined threshold value, determine that the integrator gene site information is Haplotype executes the genotype by the integrator gene site information and the genotype library and carries out Gene Name matching The step of.
Second aspect, the embodiment of the present invention also provide a kind of genotype conversion equipment, including:
Module is established in genotype library, for establishing base according to pharmacogenetics and pharmacogenomics database pharmGKB Because of type library;
File acquisition module, for obtaining the corresponding VCF files of gene to be detected and BAM files, the VCF files and institute It states BAM files and carries gene loci information;
Site extraction module, for extracting wild type gene site information from the BAM files, from the VCF files Middle extraction mutated genes site information;
Genotype conversion module, for being believed according to the wild type gene site information and the mutated genes site Breath is based on the genotype library, the gene loci information is converted to genotype using genotype converter.
In conjunction with second aspect, an embodiment of the present invention provides the first possible embodiments of second aspect, wherein institute It includes Gene Name, genotype, site information to state the corresponding parameter of the genotype in genotype library;
The process that the gene loci information is converted to genotype includes by the genotype converter:
The wild type gene site information and the mutated genes site information are integrated to obtain integrator gene Site information;
Genotype in the integrator gene site information and the genotype library is subjected to Gene Name matching, if described The Gene Name of integrator gene site information and the Gene Name of genotype in the genotype library match, then carry out genotype It is matched with site information;
According to genotype and the matched matching result of site information, the integrator gene site information is converted to accordingly Genotype.
The third aspect, the embodiment of the present invention also provide a kind of electronic equipment, including memory, processor, the memory On be stored with the computer program that can be run on the processor, the processor is realized when executing the computer program State the method described in first aspect and its any possible embodiment.
Fourth aspect, the embodiment of the present invention also provide a kind of meter for the non-volatile program code that can perform with processor Calculation machine readable medium, said program code make the processor execute the first aspect and its any possible embodiment The method.
The embodiment of the present invention brings following advantageous effect:
Genotype conversion method provided in an embodiment of the present invention can extract wild type gene site information from BAM files, Mutated genes site information is extracted from VCF files, passes through wild type gene site information and mutated genes site information, base It is accurately positioned to genotype in the genotype library established previously according to pharmGKB, speed is fast, and accuracy rate is high, and being capable of mass Processing, effectively increases the Classification and Identification efficiency of genotype, and greatly reduce cost.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification It obtains it is clear that understand through the implementation of the invention.The purpose of the present invention and other advantages are in specification, claims And specifically noted structure is realized and is obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate Appended attached drawing, is described in detail below.
Description of the drawings
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described, it should be apparent that, in being described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, other drawings may also be obtained based on these drawings.
Fig. 1 is a kind of flow diagram of genotype conversion method provided in an embodiment of the present invention;
Fig. 2 is another flow diagram of genotype conversion method provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of genotype conversion equipment provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Lower obtained every other embodiment, shall fall within the protection scope of the present invention.
When the genotype of patient being identified at present, mainly by artificial method to patient gene's detection site into Row Classification and Identification determines genotype, then carries out data deciphering, time-consuming, high labor cost, and accuracy rate is unstable.Based on this, A kind of genotype conversion method, device and electronic equipment provided in an embodiment of the present invention.
Genotype conversion method provided by the invention as shown in Figure 1, from BAM files extract wild type gene site information, Mutated genes site information is extracted from VCF files, passes through wild type gene site information and mutated genes site information, base It is accurately positioned to genotype in the genotype library established previously according to pharmGKB, speed is fast, and accuracy rate is high, and being capable of mass Processing, effectively increases the Classification and Identification efficiency of genotype, and greatly reduce cost.
Specifically, Fig. 2 shows the flow diagrams of another genotype conversion method provided in an embodiment of the present invention, such as Shown in Fig. 2, which includes:
Step S101 establishes genotype library according to pharmacogenetics and pharmacogenomics database pharmGKB.
Specifically, by pharmacogenetics and pharmacogenomics database, Drug Discovery associated genotype number is downloaded According to local, genotype library is established according to certain rule.In a possible embodiment, said gene type library includes:Single-point base Because of type database and multiple spot genotypic database;The list determined by individual gene site is stored wherein in single-point genotypic database Point gene type, the multiple spot genotype that multiple spot genotype data library storage is determined by multiple gene locis.In one embodiment, on It states single-point genotypic database and is expressed as data text translator_single-point.v1.0.txt, multiple spot genotype number It is expressed as data text translator_diplotype.v2.0.txt according to library.
In a possible embodiment, corresponding information (the namely gene of genotype stored in said gene type database The corresponding parameter of type) include three domains, respectively Gene Name (Gene_Name), genotype (Target_Nomenclature) And site information (Variant_Allele).Such as UGT1A1;*1/*28;Chr=2, Start=234668892, End= 234668894, Genetype=ATA/A, rsID=rs8175347, Type=mutant.
Step S102, obtains the corresponding VCF files of gene to be detected and BAM files, and the VCF files and the BAM files are equal Carry gene loci information.
In a possible embodiment, patient gene is detected using gene sequencing equipment, at detection information VCF (Variant Call Format) files and BAM files are obtained after reason.Wherein, VCF is for storing gene order mutation The text formatting of information indicates single nucleotide mutation, insertion/deletion etc..BAM files are SAM (sequence alignment/ Map format) file binary file, be mainly used in the result that sequencing sequence is mapped on genome and indicate.
Further, the data format in the VCF files is standardized, such as genotype ACGT->ACT is most simplified For CG->C.Then gene annotation, and the base to being carried in VCF files are carried out to the gene loci in the VCF files after standardization Because the positive minus strand at place is corrected, to being split as two annotations simultaneously according to positive minus strand positioned at the mutational site of lap Correction.In a possible embodiment, gene annotation is annotated using SnpEff.
Step S103 extracts wild type gene site information from BAM files, mutated genes is extracted from VCF files Site information.
In a possible embodiment, wild type base is extracted from BAM files using analysis of biological information tool Samtools Because of site information, mutated genes site information is extracted from VCF files.
Step S104, according to above-mentioned wild type gene site information and above-mentioned mutated genes site information, based on described Gene loci information is converted to genotype by genotype library using genotype converter.
Said gene type converter can be the form of script file, and the genotype converter is PGx_ in the present embodiment Translator_v2.0.pl converter script files.
In a possible embodiment, gene loci information is converted to gene by the genotype converter in above-mentioned steps S102 The process of type includes the following steps:
(a) wild type gene site and mutated genes site are integrated to obtain integrator gene site information.
(b) when the mutant gene locus in integrator gene site information carries CNV information, integrator gene site is believed Breath carries out haplotype analysis.
(c) when the variation frequency of mutant gene locus is more than predetermined threshold value, determine that the integrator gene site information is single Times type executes step (d).
During being converted, if detecting that mutant gene locus carries CNV (Copy number Variations, gene copy number variation) information, then need to the corresponding integrator gene site information of the mutant gene locus into Row haplotype analysis.In a possible embodiment, if the corresponding variation frequency of the mutant gene locus is more than 50%, it is determined that The CNV information is on same chromosome.If not detecting that mutant gene locus carries CNV information, step is directly executed Suddenly (d).
(d) genotype in integrator gene site information and genotype library is subjected to Gene Name matching, if integrator gene The Gene Name of site information and the Gene Name of genotype in genotype library match, then carry out genotype and site information Match.
It is, if the parameter Gene_ of the Gene Name of integrator gene site information and genotype in genotype library Name is identical, then the genotype of integrator gene site information and site information is corresponding with above-mentioned parameter Gene_Name Target_Nomenclature and Variant_Allele are matched.
(e) according to genotype and the matched matching result of site information, integrator gene site information is converted to accordingly Genotype.
In a possible embodiment, if the genotype of integrator gene site information and site information and the above-mentioned parameter The corresponding Target_Nomenclature of Gene_Name are identical with Variant_Allele, it is determined that the integrator gene position Gene loci information in point information is matched with parameter Target_Nomenclature, and genotype transformation result includes this The genotype to match.In a possible embodiment, which can be labeled as exactly_ Matched, and export and form of presentation consistent to genotype in pharmGKB, such as CYP2C19*1.If integrator gene Genotype and site information Target_Nomenclature and Variant_ corresponding with above-mentioned parameter Gene_Name Allele is different, then exports the information that need to supplement detection site, indicates the Target_ for not being matched to the Gene_Name Nomenclature does not occur the genotype in final genotype conversion results.
Specifically, above-mentioned integrator gene site information may include one or more detection site.In specific progress During matching, can single-point gene matching be carried out to gene loci information according to single-point genotypic database first, then again Multiple spot gene matching is carried out to gene loci information according to multiple spot genotypic database;Or first according to multiple spot genotype data Library carries out multiple spot gene matching to gene loci information, is then carried out to gene loci information further according to single-point genotypic database Single-point gene matches, and specific matching order does not limit here.
In a possible embodiment, multiple spot gene is carried out to gene loci information according to multiple spot genotypic database first Match, then according to single-point genotypic database to gene loci information carry out single-point gene matching, therefore above-mentioned steps (d) and Step (e) can specifically include following steps (1) to (6):
(1) genotype in integrator gene site information and the multiple spot genotypic database is subjected to Gene Name matching, If the Gene Name phase of the Gene Name of the integrator gene site information and genotype in the multiple spot genotypic database Matching then carries out genotype and site information matching;
(2) if all sites information of the integrator gene site information a certain gene corresponding with the Gene Name The site information of type exactly matches, then is labeled as exactly matching, and the integrator gene site information is converted to and is matched Genotype;
(3) if all sites information of the integrator gene site information a certain gene corresponding with the Gene Name The moiety site information of type exactly matches, then the integrator gene site information is converted to the genotype to match, while defeated The site information of detection need to be supplemented by going out;
In this case, integrator gene site information is converted into the genotype that the moiety site information exactly matches, And the site information for prompting supplement to detect.
(4) if all sites information of the integrator gene site information cannot be corresponding with the Gene Name any The site information of genotype completely or partially matches, then by the integrator gene site information and the single-point genotype number Gene Name matching is carried out according to genotype in library;
(5) if the Gene Name of the integrator gene site information and genotype in the single-point genotypic database Gene Name matches, then carries out site information matching;
It (6), will be described if integration site information and the site information of genotype in single-point type library exactly match Integrator gene site information is converted to the genotype to match.
Further, in order to be direction of medication usage provides effective foundation, in a possible embodiment, said gene type The output result of converter further includes the corresponding metabolic type of genotype.
In a possible embodiment, the output result of said gene type converter includes parameter:Gene Name Gene_ Name, genotype Genotype, detection site Variants_detected, site Variants_need_ to be detected Detected, metabolic type Phenotype and associativity Zygosity.Such as exporting result is:CYP2C19*1/*2 NA|NA exactly_matched 4 Heterozygote。
Genotype conversion method provided in an embodiment of the present invention can extract wild type gene site information from BAM files, Mutated genes site information is extracted from VCF files, passes through wild type gene site information and mutated genes site information, base It is accurately positioned to genotype in the genotype library established previously according to pharmGKB, speed is fast, and accuracy rate is high, and being capable of mass Processing, effectively increases the Classification and Identification efficiency of genotype, and greatly reduce cost.Genotype converter exports gene simultaneously The corresponding metabolic type of type provides effective evidence for direction of medication usage.
It is directed to said gene type conversion method, present embodiments provides a kind of genotype conversion equipment, which turns Changing device includes:
Module 11 is established in genotype library, for being established according to pharmacogenetics and pharmacogenomics database pharmGKB Genotype library;
File acquisition module 12, for obtaining the corresponding VCF files of gene to be detected and BAM files, the VCF files and should BAM files carry gene loci information;
Site extraction module 13 is extracted for extracting wild type gene site information from BAM files from VCF files Mutated genes site information;
Genotype conversion module 14, for according to wild type gene site information and mutated genes site information, being based on Gene loci information is converted to genotype by genotype library using genotype converter.
Further, the corresponding parameter of genotype in said gene type library includes Gene Name, genotype, site letter Breath;The process that the gene loci information is converted to genotype includes by said gene type converter:
Wild type gene site information and mutated genes site information are integrated to obtain integrator gene site information;
Genotype in integrator gene site information and genotype library is subjected to Gene Name matching, if integrator gene site The Gene Name of information and the Gene Name of genotype in genotype library match, then carry out genotype and site information matching;
According to genotype and the matched matching result of site information, integrator gene site information is converted into corresponding gene Type.
Genotype conversion method provided in an embodiment of the present invention can extract wild type gene site information from BAM files, Mutated genes site information is extracted from VCF files, passes through wild type gene site information and mutated genes site information, base It is accurately positioned to genotype in the genotype library established previously according to pharmGKB, speed is fast, and accuracy rate is high, and being capable of mass Processing, effectively increases the Classification and Identification efficiency of genotype, and greatly reduce cost.
Referring to Fig. 4, the embodiment of the present invention also provides a kind of electronic equipment 100, including:Processor 40, memory 41, bus 42 and communication interface 43, the processor 40, communication interface 43 and memory 41 connected by bus 42;Processor 40 is for holding The executable module stored in line storage 41, such as computer program.
Wherein, memory 41 may include high-speed random access memory (RAM, Random Access Memory), May further include nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.By at least One communication interface 43 (can be wired or wireless) realizes the communication between the system network element and at least one other network element Connection can use internet, wide area network, local network, Metropolitan Area Network (MAN) etc..
Bus 42 can be isa bus, pci bus or eisa bus etc..The bus can be divided into address bus, data Bus, controlling bus etc..Only indicated with a four-headed arrow for ease of indicating, in Fig. 4, it is not intended that an only bus or A type of bus.
Wherein, memory 41 is for storing program, and the processor 40 executes the journey after receiving and executing instruction Sequence, the method performed by device that the process that aforementioned any embodiment of the embodiment of the present invention discloses defines can be applied to processor In 40, or realized by processor 40.
Processor 40 may be a kind of IC chip, the processing capacity with signal.During realization, above-mentioned side Each step of method can be completed by the integrated logic circuit of the hardware in processor 40 or the instruction of software form.Above-mentioned Processor 40 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network Processor (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), application-specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or other are programmable Logical device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute in the embodiment of the present invention Disclosed each method, step and logic diagram.General processor can be microprocessor or the processor can also be to appoint What conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processing Device executes completion, or in decoding processor hardware and software module combination execute completion.Software module can be located at Machine memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable memory, register etc. are originally In the storage medium of field maturation.The storage medium is located at memory 41, and processor 40 reads the information in memory 41, in conjunction with Its hardware completes the step of above method.
Genotype conversion equipment and electronic equipment provided in an embodiment of the present invention turn with the genotype that above-described embodiment provides Method technical characteristic having the same is changed, so can also solve identical technical problem, reaches identical technique effect.
The computer program product for the carry out genotype conversion method that the embodiment of the present invention is provided, including store processing The computer readable storage medium of the executable non-volatile program code of device, the instruction that said program code includes can be used for holding Method described in row previous methods embodiment, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description And the specific work process of electronic equipment, it can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
Flow chart and block diagram in attached drawing show multiple embodiment method and computer program products according to the present invention Architecture, function and operation in the cards.In this regard, each box in flowchart or block diagram can represent one A part for module, section or code, the part of the module, section or code include it is one or more for realizing The executable instruction of defined logic function.It should also be noted that in some implementations as replacements, the work(marked in box Can also can in a different order than that indicated in the drawings it occur.For example, two continuous boxes can essentially be substantially parallel Ground executes, they can also be executed in the opposite order sometimes, this is depended on the functions involved.It is also noted that block diagram And/or the combination of each box in flow chart and the box in block diagram and or flow chart, work(as defined in executing can be used Can or the dedicated hardware based system of action realize, or can come using a combination of dedicated hardware and computer instructions real It is existing.
Term " first ", " second ", " third " are used for description purposes only, and are not understood to indicate or imply relatively important Property.In addition, unless specifically stated otherwise, the opposite step of the component and step that otherwise illustrate in these embodiments, digital table It is not limit the scope of the invention up to formula and numerical value.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of division of logic function, formula that in actual implementation, there may be another division manner, in another example, multiple units or component can To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be by some communication interfaces, device or unit it is indirect Coupling or communication connection can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer read/write memory medium of a processor.Based on this understanding, of the invention Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the present invention State all or part of step of method.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with Store the medium of program code.
Finally it should be noted that:Embodiment described above, only specific implementation mode of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, it will be understood by those of ordinary skill in the art that:Any one skilled in the art In the technical scope disclosed by the present invention, it can still modify to the technical solution recorded in previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover the protection in the present invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. a kind of genotype conversion method, which is characterized in that including:
Genotype library is established according to pharmacogenetics and pharmacogenomics database pharmGKB;
The corresponding VCF files of gene to be detected and BAM files are obtained, the VCF files and the BAM files carry gene Site information;
Wild type gene site information is extracted from the BAM files, and mutated genes site letter is extracted from the VCF files Breath;
According to the wild type gene site information and the mutated genes site information, it is based on the genotype library, is utilized The gene loci information is converted to genotype by genotype converter.
2. according to the method described in claim 1, it is characterized in that, described extract mutated genes position from the VCF files Before point information, further include:Data format in the VCF files is standardized.
3. according to the method described in claim 2, it is characterized in that, the data format in the VCF files is into rower Further include after standardization:
Gene annotation is carried out to the data in the VCF files after standardized format, and where the gene to being carried in VCF files Positive minus strand carries out left and right correction.
4. according to the method described in claim 1, it is characterized in that, the genotype library includes:Single-point genotypic database and Multiple spot genotypic database;The single-point gene determined by individual gene site is stored in the wherein described single-point genotypic database Type, the multiple spot genotype that the multiple spot genotype data library storage is determined by multiple gene locis.
5. according to the method described in claim 1, it is characterized in that, the corresponding parameter of genotype in the genotype library includes Gene Name, genotype, site information;
The process that the gene loci information is converted to genotype includes by the genotype converter:
The wild type gene site information and the mutated genes site information are integrated to obtain integrator gene site Information;
Genotype in the integrator gene site information and the genotype library is subjected to Gene Name matching, if the integration The Gene Name of gene loci information and the Gene Name of genotype in the genotype library match, then carry out genotype and position Point information matches;
According to genotype and the matched matching result of site information, the integrator gene site information is converted into corresponding gene Type.
6. according to the method described in claim 5, it is characterized in that, described by the integrator gene site information and the gene Genotype in type library carries out:
When the mutant gene locus of the integrator gene site information carries CNV information, the integrator gene site is believed Breath carries out haplotype analysis;
When the variation frequency of the mutant gene locus is more than predetermined threshold value, determine that the integrator gene site information is single times Type executes the genotype by the integrator gene site information and the genotype library and carries out the matched step of Gene Name Suddenly.
7. a kind of genotype conversion equipment, which is characterized in that including:
Module is established in genotype library, for establishing genotype according to pharmacogenetics and pharmacogenomics database pharmGKB Library;
File acquisition module, for obtaining the corresponding VCF files of gene to be detected and BAM files, VCF files and described BAM files carry gene loci information;
Site extraction module is carried for extracting wild type gene site information from the BAM files from the VCF files Take mutated genes site information;
Genotype conversion module, for according to the wild type gene site information and the mutated genes site information, base In the genotype library, the gene loci information is converted into genotype using genotype converter.
8. device according to claim 7, which is characterized in that the corresponding parameter of genotype in the genotype library includes Gene Name, genotype, site information;
The process that the gene loci information is converted to genotype includes by the genotype converter:
The wild type gene site information and the mutated genes site information are integrated to obtain integrator gene site Information;
Genotype in the integrator gene site information and the genotype library is subjected to Gene Name matching, if the integration The Gene Name of gene loci information and the Gene Name of genotype in the genotype library match, then carry out genotype and position Point information matches;
According to genotype and the matched matching result of site information, the integrator gene site information is converted into corresponding gene Type.
9. a kind of electronic equipment, including memory, processor, be stored on the memory to run on the processor Computer program, which is characterized in that the processor realizes that the claims 1 to 6 are any when executing the computer program Method described in.
10. a kind of computer-readable medium for the non-volatile program code that can perform with processor, which is characterized in that described Program code makes the processor execute claim 1 to 6 any one of them method.
CN201810471186.7A 2018-05-16 2018-05-16 Genotype conversion method, genotype conversion device and electronic equipment Active CN108710782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810471186.7A CN108710782B (en) 2018-05-16 2018-05-16 Genotype conversion method, genotype conversion device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810471186.7A CN108710782B (en) 2018-05-16 2018-05-16 Genotype conversion method, genotype conversion device and electronic equipment

Publications (2)

Publication Number Publication Date
CN108710782A true CN108710782A (en) 2018-10-26
CN108710782B CN108710782B (en) 2021-03-16

Family

ID=63868197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810471186.7A Active CN108710782B (en) 2018-05-16 2018-05-16 Genotype conversion method, genotype conversion device and electronic equipment

Country Status (1)

Country Link
CN (1) CN108710782B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637581A (en) * 2018-12-10 2019-04-16 江苏医联生物科技有限公司 Whole process mass analysis method is sequenced in a kind of bis- generation of DNA
CN111798926A (en) * 2020-06-30 2020-10-20 广州金域医学检验中心有限公司 Pathogenic gene locus database and establishment method thereof
CN115295116A (en) * 2022-08-04 2022-11-04 上海康黎医学检验所有限公司 Medication comment method and system and electronic equipment
CN116246715A (en) * 2023-04-27 2023-06-09 倍科为(天津)生物技术有限公司 Multi-sample gene mutation data storage method, device, equipment and medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070178501A1 (en) * 2005-12-06 2007-08-02 Matthew Rabinowitz System and method for integrating and validating genotypic, phenotypic and medical information into a database according to a standardized ontology
CN101358245A (en) * 2008-09-19 2009-02-04 上海市动物疫病预防控制中心 Detection method of halothane genotype in swine
CN101548010A (en) * 2006-12-01 2009-09-30 佳能株式会社 Method of determining the haplotype of multiple allelic genes
CN102367490A (en) * 2008-12-12 2012-03-07 深圳华大基因科技有限公司 Method for detecting viruses
US20140222349A1 (en) * 2013-01-16 2014-08-07 Assurerx Health, Inc. System and Methods for Pharmacogenomic Classification
US20150261913A1 (en) * 2014-03-11 2015-09-17 The Board of Trustees of the Leland Stanford, Junior, University Method and System for Identifying Clinical Phenotypes in Whole Genome DNA Sequence Data
CN105586389A (en) * 2014-10-21 2016-05-18 天津华大基因科技有限公司 Kit and application thereof in detection on hereditary bone disease genes
CN106156538A (en) * 2016-06-29 2016-11-23 天津诺禾医学检验所有限公司 The annotation method of a kind of full-length genome variation data and annotation system
CN106202936A (en) * 2016-07-13 2016-12-07 为朔医学数据科技(北京)有限公司 A kind of disease risks Forecasting Methodology and system
CN107292129A (en) * 2017-05-26 2017-10-24 中国科学院上海药物研究所 Susceptible genotype detection method
CN107557458A (en) * 2017-10-11 2018-01-09 华东医药(杭州)基因科技有限公司 A kind of method and device of effective detection genotype

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070178501A1 (en) * 2005-12-06 2007-08-02 Matthew Rabinowitz System and method for integrating and validating genotypic, phenotypic and medical information into a database according to a standardized ontology
CN101548010A (en) * 2006-12-01 2009-09-30 佳能株式会社 Method of determining the haplotype of multiple allelic genes
CN101358245A (en) * 2008-09-19 2009-02-04 上海市动物疫病预防控制中心 Detection method of halothane genotype in swine
CN102367490A (en) * 2008-12-12 2012-03-07 深圳华大基因科技有限公司 Method for detecting viruses
US20140222349A1 (en) * 2013-01-16 2014-08-07 Assurerx Health, Inc. System and Methods for Pharmacogenomic Classification
US20150261913A1 (en) * 2014-03-11 2015-09-17 The Board of Trustees of the Leland Stanford, Junior, University Method and System for Identifying Clinical Phenotypes in Whole Genome DNA Sequence Data
CN105586389A (en) * 2014-10-21 2016-05-18 天津华大基因科技有限公司 Kit and application thereof in detection on hereditary bone disease genes
CN106156538A (en) * 2016-06-29 2016-11-23 天津诺禾医学检验所有限公司 The annotation method of a kind of full-length genome variation data and annotation system
CN106202936A (en) * 2016-07-13 2016-12-07 为朔医学数据科技(北京)有限公司 A kind of disease risks Forecasting Methodology and system
CN107292129A (en) * 2017-05-26 2017-10-24 中国科学院上海药物研究所 Susceptible genotype detection method
CN107557458A (en) * 2017-10-11 2018-01-09 华东医药(杭州)基因科技有限公司 A kind of method and device of effective detection genotype

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHU, JING 等: "Serotonin Transporter Gene Polymorphisms and Selective Serotonin Reuptake Inhibitor Tolerability: Review of Pharmacogenetic Evidence", 《PHARMACOTHERAPY》 *
简正伟 等: "一种检测多种NPM1突变体的ARMS-PCR方法的建立", 《中国实验血液学杂志》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637581A (en) * 2018-12-10 2019-04-16 江苏医联生物科技有限公司 Whole process mass analysis method is sequenced in a kind of bis- generation of DNA
CN111798926A (en) * 2020-06-30 2020-10-20 广州金域医学检验中心有限公司 Pathogenic gene locus database and establishment method thereof
CN111798926B (en) * 2020-06-30 2023-09-29 广州金域医学检验中心有限公司 Pathogenic gene locus database and establishment method thereof
CN115295116A (en) * 2022-08-04 2022-11-04 上海康黎医学检验所有限公司 Medication comment method and system and electronic equipment
CN115295116B (en) * 2022-08-04 2023-09-19 上海康黎医学检验所有限公司 Medicine comment method, system and electronic equipment
CN116246715A (en) * 2023-04-27 2023-06-09 倍科为(天津)生物技术有限公司 Multi-sample gene mutation data storage method, device, equipment and medium
CN116246715B (en) * 2023-04-27 2024-04-16 倍科为(天津)生物技术有限公司 Multi-sample gene mutation data storage method, device, equipment and medium

Also Published As

Publication number Publication date
CN108710782B (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN108710782A (en) Genotype conversion method, device and electronic equipment
Garber et al. Identifying novel constrained elements by exploiting biased substitution patterns
JP6314091B2 (en) DNA sequence data analysis
EP2718862B1 (en) Method for assembly of nucleic acid sequence data
Coombe et al. Assembly of the complete Sitka spruce chloroplast genome using 10X Genomics’ GemCode sequencing data
JP6762932B2 (en) Methods, systems, and processes for de novo assembly of sequencing leads
WO2016141294A1 (en) Systems and methods for genomic pattern analysis
JP2017527257A (en) Determination of chromosome presentation
US20170329899A1 (en) Display of estimated parental contribution to ancestry
Brozynska et al. Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding
CN111566227A (en) Analysis of structural variants
Kremer et al. Approaches for in silico finishing of microbial genome sequences
Llinares-López et al. Genome-wide genetic heterogeneity discovery with categorical covariates
CN110621785A (en) Method and device for typing diploid genome haploid based on third generation capture sequencing
Wang et al. Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data
CN107967411B (en) Method and device for detecting off-target site and terminal equipment
KR20220076444A (en) Method and apparatus for classifying variation candidates within whole genome sequence
CN115083521A (en) Method and system for identifying tumor cell group in single cell transcriptome sequencing data
CN114121153A (en) Gene mutation site detection method, device, electronic equipment and storage medium
US20140229114A1 (en) Genomic/proteomic sequence representation, visualization, comparison and reporting using bioinformatics character set and mapped bioinformatics font
US20150142328A1 (en) Calculation method for interchromosomal translocation position
Molinari et al. Transcriptome analysis using RNA-Seq fromexperiments with and without biological replicates: areview
Videm et al. ChiRA: an integrated framework for chimeric read analysis from RNA-RNA interactome and RNA structurome data
Kerzendorfer et al. A thesaurus of genetic variation for interrogation of repetitive genomic regions
Song et al. AnchorWave: sensitive alignment of genomes with high diversity, structural polymorphism and whole-genome duplication variation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant