CN108710782A - Genotype conversion method, device and electronic equipment - Google Patents
Genotype conversion method, device and electronic equipment Download PDFInfo
- Publication number
- CN108710782A CN108710782A CN201810471186.7A CN201810471186A CN108710782A CN 108710782 A CN108710782 A CN 108710782A CN 201810471186 A CN201810471186 A CN 201810471186A CN 108710782 A CN108710782 A CN 108710782A
- Authority
- CN
- China
- Prior art keywords
- genotype
- gene
- site information
- information
- library
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Abstract
The present invention provides a kind of genotype conversion method, device and electronic equipment, which includes:Genotype library is established according to pharmacogenetics and pharmacogenomics database pharmGKB;The corresponding VCF files of gene to be detected and BAM files are obtained, the VCF files and the BAM files carry gene loci information;Wild type gene site information is extracted from BAM files, and mutated genes site information is extracted from VCF files;According to wild type gene site information and mutated genes site information, it is based on said gene type library, gene loci information is converted into genotype using genotype converter.In this way by wild type gene site and mutated genes site, based on the genotype library accurate positionin established previously according to pharmGKB to genotype, speed is fast, accuracy rate is high, and can mass processing, effectively increase the Classification and Identification efficiency of genotype, and greatly reduce cost.
Description
Technical field
The present invention relates to pharmacogenomics technical fields, more particularly, to a kind of genotype conversion method, device and electricity
Sub- equipment.
Background technology
Pharmacogenomics (Pharmacogenomics, PGx) be study genetic mutation caused by various disease to drug
Differential responses subject, main research be genome or genetic mutation to drug in people's body absorption, metabolism, curative effect
And adverse reaction the phenomenon that having an impact and its mechanism.In July, 2016, clinical pharmacology implemented alliance (CPIC) in Genetics
Document is delivered in Medicine magazines, pharmacogenomics related gene is divided into three categories:Drug-metabolization enzymes (CYP family
Race, UGT1A1, DPYD and TPMT), drug transporters (such as SLCO1B1), high risk genotype (such as HLA-B).
Pharmacogenomics, which have become, to be instructed clinical individual medication, assessment severe drug adverse reaction occurrence risk, refers to
It leads new drug development and evaluates the important tool of new drug, the new drug of part listing is only limitted to the eligible patients of specific genotype.It is beautiful
State's FDA approveds increase drug gene group information, the Drug Discovery biology mark being related in the medicine label of more than 140 kind drugs
Remember object 42.In addition, part industry guide is also by the biomarker and its characteristic (such as mgmt gene first of the non-FDA approvals in part
Base) detection be included in the treatment guidelines of disease.The Molecular Detection of drug response related gene and its expression product is implementation
The premise of body drug therapy
Genetic test is carried out to patient, and by traditional bioinformatic analysis, is obtained by a generation, two generation sequencing technologies
To patient gene's site information, such as:Rs17886522 A/A, but drug metabolism, transhipment not necessarily with individual gene site phase
It closes, is more associated with multiple gene locis, so, the gene point mutation information of patient is needed by identification, conversion
At pharmGKB (the pharmacogenetics and pharmacogenomics knowledgebase, pharmacogenetics
With pharmacogenomics database) in genotype, the drug metabolism situation of patient could be understood, realize precisely use
Medicine, the prior art mainly carry out Classification and Identification to patient gene's detection site by artificial method, determine genotype, then
Data deciphering is carried out, time-consuming, high labor cost, and accuracy rate is unstable.
Invention content
In view of this, the purpose of the present invention is to provide a kind of genotype conversion method, device and electronic equipments, with accurate
Navigate to genotype, speed is fast, and accuracy rate is high, and can mass processing, effectively increase the Classification and Identification efficiency of genotype,
And greatly reduce cost.
In a first aspect, an embodiment of the present invention provides a kind of genotype conversion methods, including:
Genotype library is established according to pharmacogenetics and pharmacogenomics database pharmGKB;
The corresponding VCF files of gene to be detected and BAM files are obtained, the VCF files and the BAM files carry
Gene loci information;
Wild type gene site information is extracted from the BAM files, and mutated genes position is extracted from the VCF files
Point information;
According to the wild type gene site information and the mutated genes site information, it is based on the genotype library,
The gene loci information is converted into genotype using genotype converter.
With reference to first aspect, an embodiment of the present invention provides the first possible embodiments of first aspect, wherein institute
It states before extracting mutated genes site information in the VCF files, further includes:To the data format in the VCF files
It is standardized.
The possible embodiment of with reference to first aspect the first, an embodiment of the present invention provides second of first aspect
Possible embodiment, wherein the data format in the VCF files be standardized after further include:
Gene annotation, and the gene institute to being carried in VCF files are carried out to the data in the VCF files after standardized format
Positive minus strand carry out left and right correction.
With reference to first aspect, an embodiment of the present invention provides the third possible embodiments of first aspect, wherein institute
Stating genotype library includes:Single-point genotypic database and multiple spot genotypic database;In the wherein described single-point genotypic database
The single-point genotype determined by individual gene site is stored, the multiple spot genotype data library storage is determined by multiple gene locis
Multiple spot genotype.
With reference to first aspect, an embodiment of the present invention provides the 4th kind of possible embodiments of first aspect, wherein institute
It includes Gene Name, genotype, site information to state the corresponding parameter of the genotype in genotype library;
The process that the gene loci information is converted to genotype includes by the genotype converter:
The wild type gene site information and the mutated genes site information are integrated to obtain integrator gene
Site information;
Genotype in the integrator gene site information and the genotype library is subjected to Gene Name matching, if described
The Gene Name of integrator gene site information and the Gene Name of genotype in the genotype library match, then carry out genotype
It is matched with site information;
According to genotype and the matched matching result of site information, the integrator gene site information is converted to accordingly
Genotype.
The 4th kind of possible embodiment with reference to first aspect, an embodiment of the present invention provides the 5th kind of first aspect
Possible embodiment, wherein
It is described the integrator gene site information is matched with the Gene Name in the genotype library before also wrap
It includes:
When the mutant gene locus of the integrator gene site information carries CNV information, to the integrator gene position
Point information carries out haplotype analysis;
When the variation frequency of the mutant gene locus is more than predetermined threshold value, determine that the integrator gene site information is
Haplotype executes the genotype by the integrator gene site information and the genotype library and carries out Gene Name matching
The step of.
Second aspect, the embodiment of the present invention also provide a kind of genotype conversion equipment, including:
Module is established in genotype library, for establishing base according to pharmacogenetics and pharmacogenomics database pharmGKB
Because of type library;
File acquisition module, for obtaining the corresponding VCF files of gene to be detected and BAM files, the VCF files and institute
It states BAM files and carries gene loci information;
Site extraction module, for extracting wild type gene site information from the BAM files, from the VCF files
Middle extraction mutated genes site information;
Genotype conversion module, for being believed according to the wild type gene site information and the mutated genes site
Breath is based on the genotype library, the gene loci information is converted to genotype using genotype converter.
In conjunction with second aspect, an embodiment of the present invention provides the first possible embodiments of second aspect, wherein institute
It includes Gene Name, genotype, site information to state the corresponding parameter of the genotype in genotype library;
The process that the gene loci information is converted to genotype includes by the genotype converter:
The wild type gene site information and the mutated genes site information are integrated to obtain integrator gene
Site information;
Genotype in the integrator gene site information and the genotype library is subjected to Gene Name matching, if described
The Gene Name of integrator gene site information and the Gene Name of genotype in the genotype library match, then carry out genotype
It is matched with site information;
According to genotype and the matched matching result of site information, the integrator gene site information is converted to accordingly
Genotype.
The third aspect, the embodiment of the present invention also provide a kind of electronic equipment, including memory, processor, the memory
On be stored with the computer program that can be run on the processor, the processor is realized when executing the computer program
State the method described in first aspect and its any possible embodiment.
Fourth aspect, the embodiment of the present invention also provide a kind of meter for the non-volatile program code that can perform with processor
Calculation machine readable medium, said program code make the processor execute the first aspect and its any possible embodiment
The method.
The embodiment of the present invention brings following advantageous effect:
Genotype conversion method provided in an embodiment of the present invention can extract wild type gene site information from BAM files,
Mutated genes site information is extracted from VCF files, passes through wild type gene site information and mutated genes site information, base
It is accurately positioned to genotype in the genotype library established previously according to pharmGKB, speed is fast, and accuracy rate is high, and being capable of mass
Processing, effectively increases the Classification and Identification efficiency of genotype, and greatly reduce cost.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification
It obtains it is clear that understand through the implementation of the invention.The purpose of the present invention and other advantages are in specification, claims
And specifically noted structure is realized and is obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate
Appended attached drawing, is described in detail below.
Description of the drawings
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art are briefly described, it should be apparent that, in being described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, other drawings may also be obtained based on these drawings.
Fig. 1 is a kind of flow diagram of genotype conversion method provided in an embodiment of the present invention;
Fig. 2 is another flow diagram of genotype conversion method provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of genotype conversion equipment provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention
Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Lower obtained every other embodiment, shall fall within the protection scope of the present invention.
When the genotype of patient being identified at present, mainly by artificial method to patient gene's detection site into
Row Classification and Identification determines genotype, then carries out data deciphering, time-consuming, high labor cost, and accuracy rate is unstable.Based on this,
A kind of genotype conversion method, device and electronic equipment provided in an embodiment of the present invention.
Genotype conversion method provided by the invention as shown in Figure 1, from BAM files extract wild type gene site information,
Mutated genes site information is extracted from VCF files, passes through wild type gene site information and mutated genes site information, base
It is accurately positioned to genotype in the genotype library established previously according to pharmGKB, speed is fast, and accuracy rate is high, and being capable of mass
Processing, effectively increases the Classification and Identification efficiency of genotype, and greatly reduce cost.
Specifically, Fig. 2 shows the flow diagrams of another genotype conversion method provided in an embodiment of the present invention, such as
Shown in Fig. 2, which includes:
Step S101 establishes genotype library according to pharmacogenetics and pharmacogenomics database pharmGKB.
Specifically, by pharmacogenetics and pharmacogenomics database, Drug Discovery associated genotype number is downloaded
According to local, genotype library is established according to certain rule.In a possible embodiment, said gene type library includes:Single-point base
Because of type database and multiple spot genotypic database;The list determined by individual gene site is stored wherein in single-point genotypic database
Point gene type, the multiple spot genotype that multiple spot genotype data library storage is determined by multiple gene locis.In one embodiment, on
It states single-point genotypic database and is expressed as data text translator_single-point.v1.0.txt, multiple spot genotype number
It is expressed as data text translator_diplotype.v2.0.txt according to library.
In a possible embodiment, corresponding information (the namely gene of genotype stored in said gene type database
The corresponding parameter of type) include three domains, respectively Gene Name (Gene_Name), genotype (Target_Nomenclature)
And site information (Variant_Allele).Such as UGT1A1;*1/*28;Chr=2, Start=234668892, End=
234668894, Genetype=ATA/A, rsID=rs8175347, Type=mutant.
Step S102, obtains the corresponding VCF files of gene to be detected and BAM files, and the VCF files and the BAM files are equal
Carry gene loci information.
In a possible embodiment, patient gene is detected using gene sequencing equipment, at detection information
VCF (Variant Call Format) files and BAM files are obtained after reason.Wherein, VCF is for storing gene order mutation
The text formatting of information indicates single nucleotide mutation, insertion/deletion etc..BAM files are SAM (sequence alignment/
Map format) file binary file, be mainly used in the result that sequencing sequence is mapped on genome and indicate.
Further, the data format in the VCF files is standardized, such as genotype ACGT->ACT is most simplified
For CG->C.Then gene annotation, and the base to being carried in VCF files are carried out to the gene loci in the VCF files after standardization
Because the positive minus strand at place is corrected, to being split as two annotations simultaneously according to positive minus strand positioned at the mutational site of lap
Correction.In a possible embodiment, gene annotation is annotated using SnpEff.
Step S103 extracts wild type gene site information from BAM files, mutated genes is extracted from VCF files
Site information.
In a possible embodiment, wild type base is extracted from BAM files using analysis of biological information tool Samtools
Because of site information, mutated genes site information is extracted from VCF files.
Step S104, according to above-mentioned wild type gene site information and above-mentioned mutated genes site information, based on described
Gene loci information is converted to genotype by genotype library using genotype converter.
Said gene type converter can be the form of script file, and the genotype converter is PGx_ in the present embodiment
Translator_v2.0.pl converter script files.
In a possible embodiment, gene loci information is converted to gene by the genotype converter in above-mentioned steps S102
The process of type includes the following steps:
(a) wild type gene site and mutated genes site are integrated to obtain integrator gene site information.
(b) when the mutant gene locus in integrator gene site information carries CNV information, integrator gene site is believed
Breath carries out haplotype analysis.
(c) when the variation frequency of mutant gene locus is more than predetermined threshold value, determine that the integrator gene site information is single
Times type executes step (d).
During being converted, if detecting that mutant gene locus carries CNV (Copy number
Variations, gene copy number variation) information, then need to the corresponding integrator gene site information of the mutant gene locus into
Row haplotype analysis.In a possible embodiment, if the corresponding variation frequency of the mutant gene locus is more than 50%, it is determined that
The CNV information is on same chromosome.If not detecting that mutant gene locus carries CNV information, step is directly executed
Suddenly (d).
(d) genotype in integrator gene site information and genotype library is subjected to Gene Name matching, if integrator gene
The Gene Name of site information and the Gene Name of genotype in genotype library match, then carry out genotype and site information
Match.
It is, if the parameter Gene_ of the Gene Name of integrator gene site information and genotype in genotype library
Name is identical, then the genotype of integrator gene site information and site information is corresponding with above-mentioned parameter Gene_Name
Target_Nomenclature and Variant_Allele are matched.
(e) according to genotype and the matched matching result of site information, integrator gene site information is converted to accordingly
Genotype.
In a possible embodiment, if the genotype of integrator gene site information and site information and the above-mentioned parameter
The corresponding Target_Nomenclature of Gene_Name are identical with Variant_Allele, it is determined that the integrator gene position
Gene loci information in point information is matched with parameter Target_Nomenclature, and genotype transformation result includes this
The genotype to match.In a possible embodiment, which can be labeled as exactly_
Matched, and export and form of presentation consistent to genotype in pharmGKB, such as CYP2C19*1.If integrator gene
Genotype and site information Target_Nomenclature and Variant_ corresponding with above-mentioned parameter Gene_Name
Allele is different, then exports the information that need to supplement detection site, indicates the Target_ for not being matched to the Gene_Name
Nomenclature does not occur the genotype in final genotype conversion results.
Specifically, above-mentioned integrator gene site information may include one or more detection site.In specific progress
During matching, can single-point gene matching be carried out to gene loci information according to single-point genotypic database first, then again
Multiple spot gene matching is carried out to gene loci information according to multiple spot genotypic database;Or first according to multiple spot genotype data
Library carries out multiple spot gene matching to gene loci information, is then carried out to gene loci information further according to single-point genotypic database
Single-point gene matches, and specific matching order does not limit here.
In a possible embodiment, multiple spot gene is carried out to gene loci information according to multiple spot genotypic database first
Match, then according to single-point genotypic database to gene loci information carry out single-point gene matching, therefore above-mentioned steps (d) and
Step (e) can specifically include following steps (1) to (6):
(1) genotype in integrator gene site information and the multiple spot genotypic database is subjected to Gene Name matching,
If the Gene Name phase of the Gene Name of the integrator gene site information and genotype in the multiple spot genotypic database
Matching then carries out genotype and site information matching;
(2) if all sites information of the integrator gene site information a certain gene corresponding with the Gene Name
The site information of type exactly matches, then is labeled as exactly matching, and the integrator gene site information is converted to and is matched
Genotype;
(3) if all sites information of the integrator gene site information a certain gene corresponding with the Gene Name
The moiety site information of type exactly matches, then the integrator gene site information is converted to the genotype to match, while defeated
The site information of detection need to be supplemented by going out;
In this case, integrator gene site information is converted into the genotype that the moiety site information exactly matches,
And the site information for prompting supplement to detect.
(4) if all sites information of the integrator gene site information cannot be corresponding with the Gene Name any
The site information of genotype completely or partially matches, then by the integrator gene site information and the single-point genotype number
Gene Name matching is carried out according to genotype in library;
(5) if the Gene Name of the integrator gene site information and genotype in the single-point genotypic database
Gene Name matches, then carries out site information matching;
It (6), will be described if integration site information and the site information of genotype in single-point type library exactly match
Integrator gene site information is converted to the genotype to match.
Further, in order to be direction of medication usage provides effective foundation, in a possible embodiment, said gene type
The output result of converter further includes the corresponding metabolic type of genotype.
In a possible embodiment, the output result of said gene type converter includes parameter:Gene Name Gene_
Name, genotype Genotype, detection site Variants_detected, site Variants_need_ to be detected
Detected, metabolic type Phenotype and associativity Zygosity.Such as exporting result is:CYP2C19*1/*2 NA|NA
exactly_matched 4 Heterozygote。
Genotype conversion method provided in an embodiment of the present invention can extract wild type gene site information from BAM files,
Mutated genes site information is extracted from VCF files, passes through wild type gene site information and mutated genes site information, base
It is accurately positioned to genotype in the genotype library established previously according to pharmGKB, speed is fast, and accuracy rate is high, and being capable of mass
Processing, effectively increases the Classification and Identification efficiency of genotype, and greatly reduce cost.Genotype converter exports gene simultaneously
The corresponding metabolic type of type provides effective evidence for direction of medication usage.
It is directed to said gene type conversion method, present embodiments provides a kind of genotype conversion equipment, which turns
Changing device includes:
Module 11 is established in genotype library, for being established according to pharmacogenetics and pharmacogenomics database pharmGKB
Genotype library;
File acquisition module 12, for obtaining the corresponding VCF files of gene to be detected and BAM files, the VCF files and should
BAM files carry gene loci information;
Site extraction module 13 is extracted for extracting wild type gene site information from BAM files from VCF files
Mutated genes site information;
Genotype conversion module 14, for according to wild type gene site information and mutated genes site information, being based on
Gene loci information is converted to genotype by genotype library using genotype converter.
Further, the corresponding parameter of genotype in said gene type library includes Gene Name, genotype, site letter
Breath;The process that the gene loci information is converted to genotype includes by said gene type converter:
Wild type gene site information and mutated genes site information are integrated to obtain integrator gene site information;
Genotype in integrator gene site information and genotype library is subjected to Gene Name matching, if integrator gene site
The Gene Name of information and the Gene Name of genotype in genotype library match, then carry out genotype and site information matching;
According to genotype and the matched matching result of site information, integrator gene site information is converted into corresponding gene
Type.
Genotype conversion method provided in an embodiment of the present invention can extract wild type gene site information from BAM files,
Mutated genes site information is extracted from VCF files, passes through wild type gene site information and mutated genes site information, base
It is accurately positioned to genotype in the genotype library established previously according to pharmGKB, speed is fast, and accuracy rate is high, and being capable of mass
Processing, effectively increases the Classification and Identification efficiency of genotype, and greatly reduce cost.
Referring to Fig. 4, the embodiment of the present invention also provides a kind of electronic equipment 100, including:Processor 40, memory 41, bus
42 and communication interface 43, the processor 40, communication interface 43 and memory 41 connected by bus 42;Processor 40 is for holding
The executable module stored in line storage 41, such as computer program.
Wherein, memory 41 may include high-speed random access memory (RAM, Random Access Memory),
May further include nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.By at least
One communication interface 43 (can be wired or wireless) realizes the communication between the system network element and at least one other network element
Connection can use internet, wide area network, local network, Metropolitan Area Network (MAN) etc..
Bus 42 can be isa bus, pci bus or eisa bus etc..The bus can be divided into address bus, data
Bus, controlling bus etc..Only indicated with a four-headed arrow for ease of indicating, in Fig. 4, it is not intended that an only bus or
A type of bus.
Wherein, memory 41 is for storing program, and the processor 40 executes the journey after receiving and executing instruction
Sequence, the method performed by device that the process that aforementioned any embodiment of the embodiment of the present invention discloses defines can be applied to processor
In 40, or realized by processor 40.
Processor 40 may be a kind of IC chip, the processing capacity with signal.During realization, above-mentioned side
Each step of method can be completed by the integrated logic circuit of the hardware in processor 40 or the instruction of software form.Above-mentioned
Processor 40 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network
Processor (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal
Processing, abbreviation DSP), application-specific integrated circuit (Application Specific Integrated Circuit, referred to as
ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or other are programmable
Logical device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute in the embodiment of the present invention
Disclosed each method, step and logic diagram.General processor can be microprocessor or the processor can also be to appoint
What conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processing
Device executes completion, or in decoding processor hardware and software module combination execute completion.Software module can be located at
Machine memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable memory, register etc. are originally
In the storage medium of field maturation.The storage medium is located at memory 41, and processor 40 reads the information in memory 41, in conjunction with
Its hardware completes the step of above method.
Genotype conversion equipment and electronic equipment provided in an embodiment of the present invention turn with the genotype that above-described embodiment provides
Method technical characteristic having the same is changed, so can also solve identical technical problem, reaches identical technique effect.
The computer program product for the carry out genotype conversion method that the embodiment of the present invention is provided, including store processing
The computer readable storage medium of the executable non-volatile program code of device, the instruction that said program code includes can be used for holding
Method described in row previous methods embodiment, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description
And the specific work process of electronic equipment, it can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
Flow chart and block diagram in attached drawing show multiple embodiment method and computer program products according to the present invention
Architecture, function and operation in the cards.In this regard, each box in flowchart or block diagram can represent one
A part for module, section or code, the part of the module, section or code include it is one or more for realizing
The executable instruction of defined logic function.It should also be noted that in some implementations as replacements, the work(marked in box
Can also can in a different order than that indicated in the drawings it occur.For example, two continuous boxes can essentially be substantially parallel
Ground executes, they can also be executed in the opposite order sometimes, this is depended on the functions involved.It is also noted that block diagram
And/or the combination of each box in flow chart and the box in block diagram and or flow chart, work(as defined in executing can be used
Can or the dedicated hardware based system of action realize, or can come using a combination of dedicated hardware and computer instructions real
It is existing.
Term " first ", " second ", " third " are used for description purposes only, and are not understood to indicate or imply relatively important
Property.In addition, unless specifically stated otherwise, the opposite step of the component and step that otherwise illustrate in these embodiments, digital table
It is not limit the scope of the invention up to formula and numerical value.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with
It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit,
Only a kind of division of logic function, formula that in actual implementation, there may be another division manner, in another example, multiple units or component can
To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for
The mutual coupling, direct-coupling or communication connection of opinion can be by some communication interfaces, device or unit it is indirect
Coupling or communication connection can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple
In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in the executable non-volatile computer read/write memory medium of a processor.Based on this understanding, of the invention
Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words
The form of product embodies, which is stored in a storage medium, including some instructions use so that
One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the present invention
State all or part of step of method.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-
Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with
Store the medium of program code.
Finally it should be noted that:Embodiment described above, only specific implementation mode of the invention, to illustrate the present invention
Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, it will be understood by those of ordinary skill in the art that:Any one skilled in the art
In the technical scope disclosed by the present invention, it can still modify to the technical solution recorded in previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover the protection in the present invention
Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. a kind of genotype conversion method, which is characterized in that including:
Genotype library is established according to pharmacogenetics and pharmacogenomics database pharmGKB;
The corresponding VCF files of gene to be detected and BAM files are obtained, the VCF files and the BAM files carry gene
Site information;
Wild type gene site information is extracted from the BAM files, and mutated genes site letter is extracted from the VCF files
Breath;
According to the wild type gene site information and the mutated genes site information, it is based on the genotype library, is utilized
The gene loci information is converted to genotype by genotype converter.
2. according to the method described in claim 1, it is characterized in that, described extract mutated genes position from the VCF files
Before point information, further include:Data format in the VCF files is standardized.
3. according to the method described in claim 2, it is characterized in that, the data format in the VCF files is into rower
Further include after standardization:
Gene annotation is carried out to the data in the VCF files after standardized format, and where the gene to being carried in VCF files
Positive minus strand carries out left and right correction.
4. according to the method described in claim 1, it is characterized in that, the genotype library includes:Single-point genotypic database and
Multiple spot genotypic database;The single-point gene determined by individual gene site is stored in the wherein described single-point genotypic database
Type, the multiple spot genotype that the multiple spot genotype data library storage is determined by multiple gene locis.
5. according to the method described in claim 1, it is characterized in that, the corresponding parameter of genotype in the genotype library includes
Gene Name, genotype, site information;
The process that the gene loci information is converted to genotype includes by the genotype converter:
The wild type gene site information and the mutated genes site information are integrated to obtain integrator gene site
Information;
Genotype in the integrator gene site information and the genotype library is subjected to Gene Name matching, if the integration
The Gene Name of gene loci information and the Gene Name of genotype in the genotype library match, then carry out genotype and position
Point information matches;
According to genotype and the matched matching result of site information, the integrator gene site information is converted into corresponding gene
Type.
6. according to the method described in claim 5, it is characterized in that, described by the integrator gene site information and the gene
Genotype in type library carries out:
When the mutant gene locus of the integrator gene site information carries CNV information, the integrator gene site is believed
Breath carries out haplotype analysis;
When the variation frequency of the mutant gene locus is more than predetermined threshold value, determine that the integrator gene site information is single times
Type executes the genotype by the integrator gene site information and the genotype library and carries out the matched step of Gene Name
Suddenly.
7. a kind of genotype conversion equipment, which is characterized in that including:
Module is established in genotype library, for establishing genotype according to pharmacogenetics and pharmacogenomics database pharmGKB
Library;
File acquisition module, for obtaining the corresponding VCF files of gene to be detected and BAM files, VCF files and described
BAM files carry gene loci information;
Site extraction module is carried for extracting wild type gene site information from the BAM files from the VCF files
Take mutated genes site information;
Genotype conversion module, for according to the wild type gene site information and the mutated genes site information, base
In the genotype library, the gene loci information is converted into genotype using genotype converter.
8. device according to claim 7, which is characterized in that the corresponding parameter of genotype in the genotype library includes
Gene Name, genotype, site information;
The process that the gene loci information is converted to genotype includes by the genotype converter:
The wild type gene site information and the mutated genes site information are integrated to obtain integrator gene site
Information;
Genotype in the integrator gene site information and the genotype library is subjected to Gene Name matching, if the integration
The Gene Name of gene loci information and the Gene Name of genotype in the genotype library match, then carry out genotype and position
Point information matches;
According to genotype and the matched matching result of site information, the integrator gene site information is converted into corresponding gene
Type.
9. a kind of electronic equipment, including memory, processor, be stored on the memory to run on the processor
Computer program, which is characterized in that the processor realizes that the claims 1 to 6 are any when executing the computer program
Method described in.
10. a kind of computer-readable medium for the non-volatile program code that can perform with processor, which is characterized in that described
Program code makes the processor execute claim 1 to 6 any one of them method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810471186.7A CN108710782B (en) | 2018-05-16 | 2018-05-16 | Genotype conversion method, genotype conversion device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810471186.7A CN108710782B (en) | 2018-05-16 | 2018-05-16 | Genotype conversion method, genotype conversion device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108710782A true CN108710782A (en) | 2018-10-26 |
CN108710782B CN108710782B (en) | 2021-03-16 |
Family
ID=63868197
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810471186.7A Active CN108710782B (en) | 2018-05-16 | 2018-05-16 | Genotype conversion method, genotype conversion device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108710782B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109637581A (en) * | 2018-12-10 | 2019-04-16 | 江苏医联生物科技有限公司 | Whole process mass analysis method is sequenced in a kind of bis- generation of DNA |
CN111798926A (en) * | 2020-06-30 | 2020-10-20 | 广州金域医学检验中心有限公司 | Pathogenic gene locus database and establishment method thereof |
CN115295116A (en) * | 2022-08-04 | 2022-11-04 | 上海康黎医学检验所有限公司 | Medication comment method and system and electronic equipment |
CN116246715A (en) * | 2023-04-27 | 2023-06-09 | 倍科为(天津)生物技术有限公司 | Multi-sample gene mutation data storage method, device, equipment and medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070178501A1 (en) * | 2005-12-06 | 2007-08-02 | Matthew Rabinowitz | System and method for integrating and validating genotypic, phenotypic and medical information into a database according to a standardized ontology |
CN101358245A (en) * | 2008-09-19 | 2009-02-04 | 上海市动物疫病预防控制中心 | Detection method of halothane genotype in swine |
CN101548010A (en) * | 2006-12-01 | 2009-09-30 | 佳能株式会社 | Method of determining the haplotype of multiple allelic genes |
CN102367490A (en) * | 2008-12-12 | 2012-03-07 | 深圳华大基因科技有限公司 | Method for detecting viruses |
US20140222349A1 (en) * | 2013-01-16 | 2014-08-07 | Assurerx Health, Inc. | System and Methods for Pharmacogenomic Classification |
US20150261913A1 (en) * | 2014-03-11 | 2015-09-17 | The Board of Trustees of the Leland Stanford, Junior, University | Method and System for Identifying Clinical Phenotypes in Whole Genome DNA Sequence Data |
CN105586389A (en) * | 2014-10-21 | 2016-05-18 | 天津华大基因科技有限公司 | Kit and application thereof in detection on hereditary bone disease genes |
CN106156538A (en) * | 2016-06-29 | 2016-11-23 | 天津诺禾医学检验所有限公司 | The annotation method of a kind of full-length genome variation data and annotation system |
CN106202936A (en) * | 2016-07-13 | 2016-12-07 | 为朔医学数据科技(北京)有限公司 | A kind of disease risks Forecasting Methodology and system |
CN107292129A (en) * | 2017-05-26 | 2017-10-24 | 中国科学院上海药物研究所 | Susceptible genotype detection method |
CN107557458A (en) * | 2017-10-11 | 2018-01-09 | 华东医药(杭州)基因科技有限公司 | A kind of method and device of effective detection genotype |
-
2018
- 2018-05-16 CN CN201810471186.7A patent/CN108710782B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070178501A1 (en) * | 2005-12-06 | 2007-08-02 | Matthew Rabinowitz | System and method for integrating and validating genotypic, phenotypic and medical information into a database according to a standardized ontology |
CN101548010A (en) * | 2006-12-01 | 2009-09-30 | 佳能株式会社 | Method of determining the haplotype of multiple allelic genes |
CN101358245A (en) * | 2008-09-19 | 2009-02-04 | 上海市动物疫病预防控制中心 | Detection method of halothane genotype in swine |
CN102367490A (en) * | 2008-12-12 | 2012-03-07 | 深圳华大基因科技有限公司 | Method for detecting viruses |
US20140222349A1 (en) * | 2013-01-16 | 2014-08-07 | Assurerx Health, Inc. | System and Methods for Pharmacogenomic Classification |
US20150261913A1 (en) * | 2014-03-11 | 2015-09-17 | The Board of Trustees of the Leland Stanford, Junior, University | Method and System for Identifying Clinical Phenotypes in Whole Genome DNA Sequence Data |
CN105586389A (en) * | 2014-10-21 | 2016-05-18 | 天津华大基因科技有限公司 | Kit and application thereof in detection on hereditary bone disease genes |
CN106156538A (en) * | 2016-06-29 | 2016-11-23 | 天津诺禾医学检验所有限公司 | The annotation method of a kind of full-length genome variation data and annotation system |
CN106202936A (en) * | 2016-07-13 | 2016-12-07 | 为朔医学数据科技(北京)有限公司 | A kind of disease risks Forecasting Methodology and system |
CN107292129A (en) * | 2017-05-26 | 2017-10-24 | 中国科学院上海药物研究所 | Susceptible genotype detection method |
CN107557458A (en) * | 2017-10-11 | 2018-01-09 | 华东医药(杭州)基因科技有限公司 | A kind of method and device of effective detection genotype |
Non-Patent Citations (2)
Title |
---|
ZHU, JING 等: "Serotonin Transporter Gene Polymorphisms and Selective Serotonin Reuptake Inhibitor Tolerability: Review of Pharmacogenetic Evidence", 《PHARMACOTHERAPY》 * |
简正伟 等: "一种检测多种NPM1突变体的ARMS-PCR方法的建立", 《中国实验血液学杂志》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109637581A (en) * | 2018-12-10 | 2019-04-16 | 江苏医联生物科技有限公司 | Whole process mass analysis method is sequenced in a kind of bis- generation of DNA |
CN111798926A (en) * | 2020-06-30 | 2020-10-20 | 广州金域医学检验中心有限公司 | Pathogenic gene locus database and establishment method thereof |
CN111798926B (en) * | 2020-06-30 | 2023-09-29 | 广州金域医学检验中心有限公司 | Pathogenic gene locus database and establishment method thereof |
CN115295116A (en) * | 2022-08-04 | 2022-11-04 | 上海康黎医学检验所有限公司 | Medication comment method and system and electronic equipment |
CN115295116B (en) * | 2022-08-04 | 2023-09-19 | 上海康黎医学检验所有限公司 | Medicine comment method, system and electronic equipment |
CN116246715A (en) * | 2023-04-27 | 2023-06-09 | 倍科为(天津)生物技术有限公司 | Multi-sample gene mutation data storage method, device, equipment and medium |
CN116246715B (en) * | 2023-04-27 | 2024-04-16 | 倍科为(天津)生物技术有限公司 | Multi-sample gene mutation data storage method, device, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN108710782B (en) | 2021-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108710782A (en) | Genotype conversion method, device and electronic equipment | |
Garber et al. | Identifying novel constrained elements by exploiting biased substitution patterns | |
JP6314091B2 (en) | DNA sequence data analysis | |
EP2718862B1 (en) | Method for assembly of nucleic acid sequence data | |
Coombe et al. | Assembly of the complete Sitka spruce chloroplast genome using 10X Genomics’ GemCode sequencing data | |
JP6762932B2 (en) | Methods, systems, and processes for de novo assembly of sequencing leads | |
WO2016141294A1 (en) | Systems and methods for genomic pattern analysis | |
JP2017527257A (en) | Determination of chromosome presentation | |
US20170329899A1 (en) | Display of estimated parental contribution to ancestry | |
Brozynska et al. | Direct chloroplast sequencing: comparison of sequencing platforms and analysis tools for whole chloroplast barcoding | |
CN111566227A (en) | Analysis of structural variants | |
Kremer et al. | Approaches for in silico finishing of microbial genome sequences | |
Llinares-López et al. | Genome-wide genetic heterogeneity discovery with categorical covariates | |
CN110621785A (en) | Method and device for typing diploid genome haploid based on third generation capture sequencing | |
Wang et al. | Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data | |
CN107967411B (en) | Method and device for detecting off-target site and terminal equipment | |
KR20220076444A (en) | Method and apparatus for classifying variation candidates within whole genome sequence | |
CN115083521A (en) | Method and system for identifying tumor cell group in single cell transcriptome sequencing data | |
CN114121153A (en) | Gene mutation site detection method, device, electronic equipment and storage medium | |
US20140229114A1 (en) | Genomic/proteomic sequence representation, visualization, comparison and reporting using bioinformatics character set and mapped bioinformatics font | |
US20150142328A1 (en) | Calculation method for interchromosomal translocation position | |
Molinari et al. | Transcriptome analysis using RNA-Seq fromexperiments with and without biological replicates: areview | |
Videm et al. | ChiRA: an integrated framework for chimeric read analysis from RNA-RNA interactome and RNA structurome data | |
Kerzendorfer et al. | A thesaurus of genetic variation for interrogation of repetitive genomic regions | |
Song et al. | AnchorWave: sensitive alignment of genomes with high diversity, structural polymorphism and whole-genome duplication variation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |