Summary of the invention
Object of the present invention is intended at least solve one of above-mentioned technological deficiency.
For this reason, the object of the invention is to propose the method for a kind of PCR product sequencing and typing.The method has advantages of that somatotype efficiency is high, somatotype accurately and reliably and the somatotype time is short and human cost is low.
Another object of the present invention is to propose the system of a kind of PCR product sequencing and typing.
For reaching described object, embodiments of the invention provide the method for a kind of PCR product sequencing and typing, comprise the following steps: obtain treat somatotype base sequence, with described in treat fileinfo and the primer fileinfo that somatotype base sequence is associated, wherein, described fileinfo is treated the site title of somatotype base sequence described in comprising; According to treating described in the allele type information identification of described fileinfo and reference sequences that the connection of somatotype base sequence and reference sequences joins position relationship, to obtain first candidate's type combination to be analyzed; According to the primer information of described primer fileinfo and described reference sequences, to described, treat that somatotype base sequence carries out further somatotype identification, to obtain second candidate's type combination to be analyzed; And according to described first candidate's type to be analyzed combination and described second candidate's type to be analyzed, combine the final type for the treatment of somatotype base sequence described in obtaining.
According to the method for the PCR product sequencing and typing of the embodiment of the present invention, according to reference sequences with and relevant information and treat somatotype base sequence with and relevant information can Direct Recognition treat that somatotype base sequence joins position relationship to the connection of reference sequences, thereby, reduced and read the step that base sequence is compared the corresponding site of each reference sequences one by one, reduced comparison time.Especially when carrying out candidate's type (as a lot of reference sequences, a lot of types) examination retrieval on a large scale, greatly shortened comparison time.In addition, according to primer information, again identify somatotype, like this, guarantee correctness and the reliability of somatotype.In addition, the method can automatically be treated somatotype base sequence and carry out somatotype, and the artificial importing of minimizing GSSP information etc. are loaded down with trivial details, effectively saved the time and lowered cost of labor.
In addition, the method for PCR according to the above embodiment of the present invention product sequencing and typing can also have following additional technical characterictic:
In some instances, described in the identification of the described allele type information according to described fileinfo and reference sequences, treat that the connection of somatotype base sequence and reference sequences joins position relationship, to obtain first candidate's type to be analyzed combination, further comprise: described in comprising according to described fileinfo, treat that the site title of somatotype base sequence and the allele type information of described reference sequences mates; Described in obtaining according to matching result, treat that the connection of somatotype base sequence and reference sequences joins position relationship; According to the described position relationship of joining, obtain described first candidate's type combination to be analyzed.
Further, the allele type packets of information of described reference sequences is drawn together HLA I type and the II type locus gene information of IMGT, described fileinfo comprise described in treat that the site name of somatotype base sequence is called the locus gene information in corresponding site.
In some instances, describedly according to the primer information of described primer fileinfo and described reference sequences, to described, treat that somatotype base sequence carries out further somatotype identification, to obtain second candidate's type combination to be analyzed, further comprise: the primer information to described primer fileinfo and described reference sequences is mated; If coupling is consistent, corresponding reference sequences is added in described second candidate's type combination to be analyzed.
Further, the primer information of described reference sequences comprises that the GSSP primer information of HLA I type and II type is standby primer information.
In some instances, the described final type for the treatment of somatotype base sequence described in obtaining according to described first candidate's type to be analyzed combination and described second candidate's type combination to be analyzed, further comprises: obtain the common factor that described first candidate's type combination to be analyzed and described second candidate's type to be analyzed combine; Described in determining according to described common factor, treat the final type of somatotype base sequence.
The embodiment of second aspect present invention provides the system of a kind of PCR product sequencing and typing, comprise: memory module, for store treat somatotype base sequence, with described in treat fileinfo and the primer fileinfo that somatotype base sequence is associated, and allele type information and the primer information of reference sequences, described reference sequences, wherein, described fileinfo is treated the site title of somatotype base sequence described in comprising; Analysis module, for according to treating described in the allele type information identification of described fileinfo and reference sequences that the connection of somatotype base sequence and reference sequences joins position relationship, to obtain first candidate's type combination to be analyzed, and to described, treat that somatotype base sequence carries out further somatotype identification according to the primer information of described primer fileinfo and described reference sequences, to obtain second candidate's type to be analyzed combination, and according to described first candidate's type combination to be analyzed and described second candidate's type to be analyzed, combine the final type for the treatment of somatotype base sequence described in obtaining; And display module, for showing the analysis result of described memory module canned data and described analysis module.
According to the system of the PCR product sequencing and typing of the embodiment of the present invention, according to reference sequences with and relevant information and treat somatotype base sequence with and relevant information can Direct Recognition treat that somatotype base sequence joins position relationship to the connection of reference sequences, thereby, reduced and read the step that base sequence is compared the corresponding site of each reference sequences one by one, reduced comparison time.Especially when carrying out candidate's type (as a lot of reference sequences, a lot of types) examination retrieval on a large scale, greatly shortened comparison time.In addition, according to primer information, again identify somatotype, like this, guarantee correctness and the reliability of somatotype.In addition, this system can automatically be treated somatotype base sequence and carry out somatotype, and the artificial importing of minimizing GSSP information etc. are loaded down with trivial details, effectively saved the time and lowered cost of labor.
In addition, the system of PCR according to the above embodiment of the present invention product sequencing and typing can also have following additional technical characterictic:
In some instances, described analysis module is used for: described in comprising according to described fileinfo, treat that the site title of somatotype base sequence and the allele type information of described reference sequences mates; Described in obtaining according to matching result, treat that the connection of somatotype base sequence and reference sequences joins position relationship; According to the described position relationship of joining, obtain described first candidate's type combination to be analyzed.
Further, the allele type packets of information of described reference sequences is drawn together HLA I type and the II type locus gene information of IMGT, described fileinfo comprise described in treat that the site name of somatotype base sequence is called the locus gene information in corresponding site.
In some instances, described analysis module also for: the primer information to described primer fileinfo and described reference sequences is mated; If coupling is consistent, corresponding reference sequences is added in described second candidate's type combination to be analyzed.
Further, the primer information of described reference sequences comprises that the GSSP primer information of HLA I type and II type is standby primer information.
In some instances, described analysis module also for: obtain the common factor that described first candidate's type to be analyzed combination and described second candidate's type to be analyzed combine; Described in determining according to described common factor, treat the final type of somatotype base sequence.
The aspect that the present invention is additional and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of identical or similar functions from start to finish.Below by the embodiment being described with reference to the drawings, be exemplary, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
In description of the invention, it will be appreciated that, term " longitudinally ", " laterally ", " on ", orientation or the position relationship of the indication such as D score, 'fornt', 'back', " left side ", " right side ", " vertically ", " level ", " top ", " end " " interior ", " outward " be based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, rather than indicate or imply that the device of indication or element must have specific orientation, with specific orientation, construct and operation, therefore can not be interpreted as limitation of the present invention.
In description of the invention, it should be noted that, unless otherwise prescribed and limit, term " installation ", " being connected ", " connection " should be interpreted broadly, for example, can be mechanical connection or electrical connection, also can be the connection of two element internals, can be to be directly connected, and also can indirectly be connected by intermediary, for the ordinary skill in the art, can understand as the case may be the concrete meaning of described term.
Below in conjunction with accompanying drawing, describe according to the method and system of the PCR product sequencing and typing of the embodiment of the present invention.
Fig. 1 is the process flow diagram of the method for PCR product sequencing and typing according to an embodiment of the invention.As shown in Figure 1, the method for PCR product sequencing and typing, comprises the steps: according to an embodiment of the invention
Step S101: obtain treat somatotype base sequence, with fileinfo and the primer fileinfo for the treatment of that somatotype base sequence is associated, wherein, fileinfo comprises the site title for the treatment of somatotype base sequence.
Specifically, treat somatotype base sequence, with treat that fileinfo and primer fileinfo that somatotype base sequence is associated can be pre-stored in database.
Such as: treat somatotype base sequence, be stored in a dynamic data base with treating fileinfo that somatotype base sequence is associated and primer fileinfo etc., this dynamic data base is mainly used in the intermediate result data in inventory analysis sample (treating somatotype base sequence).This dynamic data base has for example been stored 3 database tables, is respectively file information table, primer file information table and sample message table.File information table is stored the useful information (for example: with the fileinfo for the treatment of that somatotype base sequence is associated) of each sequential file; Primer file information table is stored the useful information (for example: with the primer fileinfo for the treatment of that somatotype base sequence is associated) of each primer file; Sample message table is stored the useful information (for example: treat somatotype base sequence) of each sample.
Step S102: treat that according to the allele type information identification of fileinfo and reference sequences the connection of somatotype base sequence and reference sequences joins position relationship, to obtain first candidate's type combination to be analyzed.
In one embodiment of the invention, can realize as follows:
Step S1021: the site title for the treatment of somatotype base sequence comprising according to fileinfo is mated with the allele type information of reference sequences.
Step S1022: obtain treating that according to matching result the connection of somatotype base sequence and reference sequences joins position relationship.
Step S1023: join position relationship according to connection and obtain first candidate's type combination to be analyzed.
Wherein, the allele type packets of information of reference sequences is drawn together HLA I type and the II type locus gene information of IMGT, and the site name for the treatment of somatotype base sequence that fileinfo comprises is called the locus gene information in corresponding site.
Specifically, the reference sequences information associated with it can be stored in the static database setting in advance, this static database has also been stored 3 database tables, is respectively gene information table (being the gene information of stored reference sequence), type information table (being the type of stored reference sequence) and primer information table (being the primer information of stored reference sequence).More specifically, gene information table has stored the HLA I type from IMGT, II type locus gene information, and type information table has stored all allele type information, and primer information table has stored HLA I type, all GSSP primers of II type and standby primer information.
Like this, by the reference sequences in static database with and relevant information and dynamic data base in treat somatotype base sequence with and relevant information can treat somatotype base sequence and carry out preliminary somatotype.As mentioned above, for example: according to these above-mentioned information, can Direct Recognition treat that somatotype base sequence joins position relationship to the connection of reference sequences, thereby obtain first candidate's type combination to be analyzed, thereby, reduced and read the step that base sequence is compared the corresponding site of each reference sequences one by one, reduced comparison time.Especially when carrying out candidate's type (as a lot of reference sequences, a lot of types) examination retrieval on a large scale, greatly shortened comparison time.
Step S103: treat somatotype base sequence according to the primer information of primer fileinfo and reference sequences and carry out further somatotype identification, to obtain second candidate's type combination to be analyzed.
In one embodiment of the invention, can comprise the steps:
Step S1031: the primer information to primer fileinfo and reference sequences is mated.
Step S1032: if coupling is consistent, corresponding reference sequences is added in second candidate's type combination to be analyzed.
Wherein, the primer information of reference sequences comprises that the GSSP primer information of HLA I type and II type is standby primer information.
That is to say, during identification GSSP primer order-checking base sequence, can directly from static database, call GSSP primer information, somatotype base sequence is treated in interpretation, thereby obtains second candidate's type combination to be analyzed.By treating that the primer information of somatotype base sequence association and the primer information of reference sequences compares, thereby obtain second above-mentioned candidate's type combination to be analyzed.
Step S104: combine the final type that obtains treating somatotype base sequence according to first candidate's type combination to be analyzed and second candidate's type to be analyzed.
Particularly, can realize as follows:
Step S1041: the common factor that obtains first candidate's type combination to be analyzed and second candidate's type combination to be analyzed.
Step S1042: determine the final type for the treatment of somatotype base sequence according to occuring simultaneously.
That is: according to the common factor of first candidate's type combination to be analyzed and second candidate's type combination to be analyzed, with confirmation, treat the final type of somatotype base sequence.
As shown in Figure 2, the method for the application embodiment of the present invention, carries out the schematic diagram of internal memory, processor CPU time and somatotype (the analyzing) time that somatotype consumes for the reference sequences of varying number.As seen from Figure 2, though to 88 samples (treating somatotype base sequence) carry out the shared time of somatotype also only need a few minutes just separable complete.
The method of the embodiment of the present invention can realize by software, and this software can exist with a kind of form of software product.Like this, when this software of manual operation carries out somatotype, only need to import some necessary datas.Specifically, as shown in Figure 3 and Figure 4, wherein, Fig. 3 is that the method for the application embodiment of the present invention is while carrying out somatotype, the step of the thing of artificial required processing, Fig. 4 is a kind of schematic diagram of operation interface, and wherein, in Fig. 4,1 represents that base navigation, 2 represents that sequence shows, 3 represents that peak figure displaying, 4 represents that sample list, 5 represents type list.Shown in Fig. 3 and Fig. 4, for example artificially use the method for the embodiment of the present invention to carry out somatotype, only need operate by following operation steps:
Step S301: import file.
Step S302: a sample file clicking listed files module.
Step S303: check mispairing site.
Step S304: base editor.
Step S305: check type list block.
Step S306: according to sequence file information, editor's base mismatch, until occur that mispairing number is type or the type combination of 0.
Step S307: judge whether to need Gssp primer.If so, perform step S308, otherwise execution step S311.
Step S308: preserve.
Step S309: operation Gssp.
Step S310: import and preserve file and Gssp file, and return to step S306.
Step S311: preserve.
Step S312: mark.
Step S313: preserve.
Step S314: derive message.
According to the method for the PCR product sequencing and typing of the embodiment of the present invention, according to reference sequences with and relevant information and treat somatotype base sequence with and relevant information can Direct Recognition treat that somatotype base sequence joins position relationship to the connection of reference sequences, thereby, reduced and read the step that base sequence is compared the corresponding site of each reference sequences one by one, reduced comparison time.Especially when carrying out candidate's type (as a lot of reference sequences, a lot of types) examination retrieval on a large scale, greatly shortened comparison time.In addition, according to primer information, again identify somatotype, like this, guarantee correctness and the reliability of somatotype.In addition, the method can automatically be treated somatotype base sequence and carry out somatotype, and the artificial importing of minimizing GSSP information etc. are loaded down with trivial details, effectively saved the time and lowered cost of labor.
Further embodiment of the present invention provides the system of a kind of PCR product sequencing and typing.As shown in Figure 5, the system 500 of PCR product sequencing and typing, comprising: memory module 510, analysis module 520 and display module 530 according to an embodiment of the invention.
Wherein, memory module 510 for store treat somatotype base sequence, with fileinfo and the primer fileinfo for the treatment of that somatotype base sequence is associated, and allele type information and the primer information of reference sequences, reference sequences, wherein, fileinfo comprises the site title for the treatment of somatotype base sequence.
Memory module 510 can be divided into static database and dynamic data base.Such as: treat somatotype base sequence, be stored in dynamic data base with treating fileinfo that somatotype base sequence is associated and primer fileinfo etc., this dynamic data base is mainly used in the intermediate result data in inventory analysis sample (treating somatotype base sequence).This dynamic data base has for example been stored 3 database tables, is respectively file information table, primer file information table and sample message table.File information table is stored the useful information (for example: with the fileinfo for the treatment of that somatotype base sequence is associated) of each sequential file; Primer file information table is stored the useful information (for example: with the primer fileinfo for the treatment of that somatotype base sequence is associated) of each primer file; Sample message table is stored the useful information (for example: treat somatotype base sequence) of each sample.
The reference sequences information associated with it can be stored in static database, this static database has also been stored 3 database tables, is respectively gene information table (being the gene information of stored reference sequence), type information table (being the type of stored reference sequence) and primer information table (being the primer information of stored reference sequence).More specifically, gene information table has stored the HLA I type from IMGT, II type locus gene information, and type information table has stored all allele type information, and primer information table has stored HLA I type, all GSSP primers of II type and standby primer information.
Analysis module 520 is joined position relationship for treat the connection of somatotype base sequence and reference sequences according to the allele type information identification of fileinfo and reference sequences, to obtain first candidate's type combination to be analyzed, and according to the primer information of primer fileinfo and reference sequences, treat somatotype base sequence and carry out further somatotype identification, to obtain second candidate's type combination to be analyzed, and combine according to first candidate's type combination to be analyzed and second candidate's type to be analyzed the final type that obtains treating somatotype base sequence.
Particularly, analysis module 520 for: the site title for the treatment of somatotype base sequence comprising according to fileinfo is mated with the allele type information of reference sequences, and obtain treating that according to matching result the connection of somatotype base sequence and reference sequences joins position relationship, and according to connection, join position relationship and obtain described first candidate's type combination to be analyzed.
Like this, by the reference sequences in static database with and relevant information and dynamic data base in treat somatotype base sequence with and relevant information can treat somatotype base sequence and carry out preliminary somatotype.As mentioned above, for example: according to these above-mentioned information, can Direct Recognition treat that somatotype base sequence joins position relationship to the connection of reference sequences, thereby obtain first candidate's type combination to be analyzed, thereby, reduced and read the step that base sequence is compared the corresponding site of each reference sequences one by one, reduced comparison time.Especially when carrying out candidate's type (as a lot of reference sequences, a lot of types) examination retrieval on a large scale, greatly shortened comparison time.
In one embodiment of the invention, analysis module 520 also for: the primer information to primer fileinfo and reference sequences is mated, if coupling is consistent, corresponding reference sequences is added in second candidate's type combination to be analyzed.Wherein, the primer information of reference sequences comprises that the GSSP primer information of HLA I type and II type is standby primer information.
That is to say, during identification GSSP primer order-checking base sequence, can directly from static database, call GSSP primer information, somatotype base sequence is treated in interpretation, thereby obtains second candidate's type combination to be analyzed.By treating that the primer information of somatotype base sequence association and the primer information of reference sequences compares, thereby obtain second above-mentioned candidate's type combination to be analyzed.
Further, analysis module 520 also for: obtain the common factor of first candidate's type to be analyzed combination and second candidate's type combination to be analyzed, and treat the final type of somatotype base sequence described in definite according to occuring simultaneously.Wherein, the allele type packets of information of reference sequences is drawn together HLA I type and the II type locus gene information of IMGT, and the site name for the treatment of somatotype base sequence that fileinfo comprises is called the locus gene information in corresponding site.That is: according to the common factor of first candidate's type combination to be analyzed and second candidate's type combination to be analyzed, with confirmation, treat the final type of somatotype base sequence.
Display module 530 is for showing the analysis result of memory module canned data and analysis module.
To sum up, the system of the embodiment of the present invention, comprise: memory module (being database module), analysis module and display module (being showing interface module), wherein, in IMGT/HLA database, all known HLA type sequences can deposit in the static database of memory module by the form designing, the valid data that sample to be analyzed (base sequence to be analyzed) imports after software simultaneously also can deposit in the dynamic data base of memory module by set form, transfer data during convenient analysis.
Analysis module is realized somatotype function, for example: analysis module is realized the Core Feature to sample HLA gene type somatotype, and can two submodules: Study document information extraction module and analytic sample HLA somatotype module.The function that Study document information extraction module mainly realizes is to extract sequencing sequence peak map file useful information, stores in memory module, and analytic sample HLA somatotype module merges fileinfo, realizes the function of HLA somatotype by sample.
Display module (being showing interface module) generating run interface, i.e. visual interface.For example as shown in Figure 4, sample list realizes shows the file of opening showing interface module by the structure of file tree, and realizes relevant right button function; Type list realizes the result of somatotype is displayed, and realizes relevant right button function; Base navigation realizes the function of rapid navigation extron zones of different; Sequence shows realizes the display function of aligned sequences.Peak figure shows the display function of realizing peak figure.
According to the system of the PCR product sequencing and typing of the embodiment of the present invention, according to reference sequences with and relevant information and treat somatotype base sequence with and relevant information can Direct Recognition treat that somatotype base sequence joins position relationship to the connection of reference sequences, thereby, reduced and read the step that base sequence is compared the corresponding site of each reference sequences one by one, reduced comparison time.Especially when carrying out candidate's type (as a lot of reference sequences, a lot of types) examination retrieval on a large scale, greatly shortened comparison time.In addition, according to primer information, again identify somatotype, like this, guarantee correctness and the reliability of somatotype.In addition, this system can automatically be treated somatotype base sequence and carry out somatotype, and the artificial importing of minimizing GSSP information etc. are loaded down with trivial details, effectively saved the time and lowered cost of labor.
In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the feature of this embodiment or example description.In this manual, the schematic statement of described term is not necessarily referred to identical embodiment or example.And the specific features of description, structure, material or feature can be with suitable mode combinations in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment, scope of the present invention is by claims and be equal to and limit.