DE19950050C2 - Method for the functional assignment of unclassified DNA sequences - Google Patents
Method for the functional assignment of unclassified DNA sequencesInfo
- Publication number
- DE19950050C2 DE19950050C2 DE19950050A DE19950050A DE19950050C2 DE 19950050 C2 DE19950050 C2 DE 19950050C2 DE 19950050 A DE19950050 A DE 19950050A DE 19950050 A DE19950050 A DE 19950050A DE 19950050 C2 DE19950050 C2 DE 19950050C2
- Authority
- DE
- Germany
- Prior art keywords
- sequence
- sequences
- consensus
- gap
- unclassified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Description
Die vorliegende Erfindung betrifft ein Verfahren zum funktionellen Zuordnen von nichtklassifizierten DNA-Sequenzen, in dem eine nichtklassifizierte Sequenz mittels einfacher Schritte bekannten Referenzsequenz zugeordnet (aligned) werden kann.The present invention relates to a method for the functional assignment of unclassified DNA sequences using an unclassified sequence simple steps can be assigned to known reference sequence (aligned).
Seit der kommerziellen Verfügbarkeit der PCR-Technik stellt die funktionelle Zuordnung der auf diese Weise verfügbar gemachten DNA-Information ein grundsätzliches Problem der Biotechnologie dar.Since the commercial availability of PCR technology, the functional Assignment of the DNA information made available in this way fundamental problem of biotechnology.
In herkömmlichen Verfahren wird daher entweder über funktionelle Besonderheiten bzw. direkten Abgleich der nichtklassifizierten Sequenz mit Sequenzen mit bekannter Eigenschaft abgeglichen.In conventional processes, therefore, is either about functional peculiarities or direct comparison of the unclassified sequence with sequences with known ones Matched property.
Die vorliegende Anmeldung stellt ein Verfahren zur funktionellen Zuordnung einer
nichtklassifizierten DNA-Sequenz zur Verfügung, die die folgenden Schritte umfaßt:
The present application provides a method for the functional assignment of an unclassified DNA sequence, which comprises the following steps:
- a) Abgleichen der nichtklassifizierten Sequenz (A) mit Referenzsequenzen (B1-Bn) untera) Compare the unclassified sequence (A) with reference sequences (B 1 -B n ) below
- b) Erstellen von Lückenmustern (C1-Cm) für die Referenzsequenzen (B1-Bn) und einer Konsensussequenz (D) für die Sequenz (A),b) creating gap patterns (C 1 -C m ) for the reference sequences (B 1 -B n ) and a consensus sequence (D) for the sequence (A),
- c) Aufspalten der Lückenmuster (C1-Cm) in kurze Konsensussequenzen (E1-Em) und Lückeninformationen (F1-Fm),c) splitting the gap patterns (C 1 -C m ) into short consensus sequences (E 1 -E m ) and gap information (F 1 -F m ),
- d) positionsweiser Vergleich der kurzen Konsensussequenzen (E1-Em) mit iterierendem Offset mit der nichtklassifizierten Sequenz (A) unter Bestimmung der kurzen Konsensussequenz mit höchster Übereinstimmung (Emax),d) position-by-point comparison of the short consensus sequences (E 1 -E m ) with iterating offset with the unclassified sequence (A) while determining the short consensus sequence with the highest agreement (E max ),
- e) Einfügen der der Konsensussequenz (Emax) entsprechenden Lückeninformation (Fmax) in die Sequenz (A) unter Erstellen einer Aligned-Sequenz (G).e) inserting the corresponding one of the consensus sequence (E max) gap information (F max) in the sequence (A) to create a Aligned sequence (G).
Die der Konsensussequenz (D) zugrunde liegenden Referenzsequenzen (B1-Bn) sollten untereinander einen Übereinstimmungsgrad besitzen der größer als 60 und kleiner als 80-90% ist. Gemäß der vorliegenden Anmeldung können die in Schritt (b) gesammelten Informationen (Lückenmuster (C1-Cm) und Konsensussequenz (E1-Em) zwischengespeichert und für spätere Vergleiche direkt verwendet werden.The reference sequences (B 1 -B n ) on which the consensus sequence (D) is based should have a degree of agreement among themselves which is greater than 60 and less than 80-90%. According to the present application, the information collected in step (b) (gap pattern (C 1 -C m ) and consensus sequence (E 1 -E m ) can be buffered and used directly for later comparisons.
Bei der Erstellung der Lückenmuster werden gleiche Lückenmuster eliminiert, so daß für E1-Em gilt: m ≦ n.When creating the gap patterns, the same gap patterns are eliminated, so that for E 1 -E m the following applies: m ≦ n.
Der kritische Schritt in dem erfindungsgemäßen Verfahren ist der positionsweise Vergleich der kurzen Konsensussequenzen (E1-Em) ein möglichst hoher Übereinstimmungsgrad erzielt wird. Für eine sinnvolle Klassifikation ist dabei erforderlich, daß dieser Übereinstimmungsgrad möglichst größer als 60%, vorzugsweise größer als 80% ist.The critical step in the method according to the invention is the positional comparison of the short consensus sequences (E 1 -E m ) to achieve the highest possible degree of agreement. For a meaningful classification it is necessary that this degree of agreement is as large as possible greater than 60%, preferably greater than 80%.
Das erfindungsgemäße Verfahren kann weiterhin in mehreren Zyklen erfolgen, wobei nach Finden der Aligned-Sequenz (G) eine beste Referenzsequenz (Bmax) gefunden wird, die zum Finden eines neuen Satzes Referenzsequenzen (H1-Hn) verwendet wird, die zur Familie der Referenzsequenz mit dem höchsten Übereinstimmungsgrad (Bmax) des ersten Zyklusses gehört.The method according to the invention can also be carried out in several cycles, and after finding the aligned sequence (G), a best reference sequence (B max ) is found, which is used to find a new set of reference sequences (H 1 -H n ) belonging to the family belongs to the reference sequence with the highest degree of agreement (B max ) of the first cycle.
Das Verfahren bietet den Vorteil der sehr schnellen Generierung von multiplen Sequenz-Alignments, so daß diese sehr schnell für weitere Sequenzverarbeitungen wie z. B. Sequenzannotierung oder für Sequenzvergleiche zur Verfügung stehen. Insbesondere Sequenzvergleiche in korrekt berechneten multiplen Sequenzalignments sind sehr schnell, da die Sequenzen im Alignment für den Sequenzvergleich nicht mehr gegeneinander verschoben werden müssen, sondern Position für Position direkt verglichen werden können.The method offers the advantage of the very fast generation of multiples Sequence alignments so that they can be used very quickly for further sequence processing such as B. sequence annotation or for sequence comparisons are available. Especially sequence comparisons in correctly calculated multiples Sequence alignments are very fast since the sequences in the alignment for the Sequence comparison no longer need to be shifted against each other, but Position by position can be compared directly.
Die Erfindung wird anhand der nachfolgenden Figuren näher erläutert:The invention is explained in more detail with reference to the following figures:
Fig. 1 Einlesen der Sequenzen Zunächst werden die zu verarbeitenden Sequenzen und die Referenzsequenzen eingelesen. Fig. 1 Reading the sequences First, the sequences to be processed and the reference sequences are read.
Fig. 2 Aus den Referenzsequenzen werden die Lückenmuster extrahiert und aus der Sequenzinformation wird eine Konsensussequenz bestimmt. Fig. 2 The gap patterns are extracted from the reference sequences and a consensus sequence is determined from the sequence information.
Fig. 3a Dann wird eine Liste von Konsensussequenzen erstellt, aus der jeweils ein bestimmtes Lückenmuster entfernt wird. Fig. 3a A list of consensus sequences is then created, from which a particular gap pattern is removed in each case.
Fig. 3b Zeigt ein spezielles Beispiel, wie aus Konsensussequenz Shorted Konsensi erzeugt werden. Fig. 3B shows a specific example of how to generate consensus sequence from Shorted Konsensi.
Fig. 4a Durch positionsweisen Vergleich jeder der in "Shorted-Konsensus"- Sequenzen mit der neuen Sequenz wird ein optimaler Satz von Aligner-Parametern bestimmt. FIG. 4a An optimal set of aligner parameters is determined by comparing each of the sequences in "Shorted Consensus" sequences with the new sequence.
Fig. 4b Zeigt ein spezielles Beispiel für den positionsweisen Vergleich. Fig. 4b shows a specific example for the positional comparison.
Fig. 5a Die neue Sequenz wird aligned, indem, entsprechend des Parametersatzes, die Lücken eingefügt und der Offset verschoben wird. Fig. 5a The new sequence is aligned by inserting the gaps and shifting the offset in accordance with the parameter set.
Fig. 5b Experimentelles Beispiel über Einfügungen der Lücken und Verschiebungen des Offsets. Fig. 5b Experimental example of inserting the gaps and offsets of the offset.
Claims (4)
- a) Abgleichen der nichtklassifizierten Sequenz (A) mit Referenzsequenzen (B1-Bi) unter
- b) Erstellen von Lückenmustern (C1-Cm) für die Referenzsequenzen (B1-Bn) und einer Konsensussequenz (D) für die Sequenz (A),
- c) Aufspalten der Lückenmuster (C1-Cm) in kurze Konsensussequenzen (E1-Em) und Lückeninformationen (F1-Fm),
- d) positionsweiser Vergleich der kurzen Konsensussequenzen (E1-Em) mit iterierendem Offset mit der Sequenz (A) unter Bestimmung der kurzen Konsensussequenz mit höchster Übereinstimmung (Emax),
- e) Einfügen der der Konsensussequenz (Emax) entsprechenden Lückeninformation (Fmax) in die Sequenz (A) unter Erstellen einer Aligned-Sequenz (G).
- a) Compare the unclassified sequence (A) with reference sequences (B 1 -B i ) below
- b) creating gap patterns (C 1 -C m ) for the reference sequences (B 1 -B n ) and a consensus sequence (D) for the sequence (A),
- c) splitting the gap patterns (C 1 -C m ) into short consensus sequences (E 1 -E m ) and gap information (F 1 -F m ),
- d) position-by-point comparison of the short consensus sequences (E 1 -E m ) with iterating offset with the sequence (A) while determining the short consensus sequence with the highest agreement (E max ),
- e) inserting the corresponding one of the consensus sequence (E max) gap information (F max) in the sequence (A) to create a Aligned sequence (G).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19950050A DE19950050C2 (en) | 1999-10-16 | 1999-10-16 | Method for the functional assignment of unclassified DNA sequences |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19950050A DE19950050C2 (en) | 1999-10-16 | 1999-10-16 | Method for the functional assignment of unclassified DNA sequences |
Publications (2)
Publication Number | Publication Date |
---|---|
DE19950050A1 DE19950050A1 (en) | 2001-04-26 |
DE19950050C2 true DE19950050C2 (en) | 2002-07-18 |
Family
ID=7925982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
DE19950050A Expired - Fee Related DE19950050C2 (en) | 1999-10-16 | 1999-10-16 | Method for the functional assignment of unclassified DNA sequences |
Country Status (1)
Country | Link |
---|---|
DE (1) | DE19950050C2 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002332742A1 (en) * | 2001-08-29 | 2003-04-22 | Genome Therapeutics Corporation | Confirmation sequencing |
-
1999
- 1999-10-16 DE DE19950050A patent/DE19950050C2/en not_active Expired - Fee Related
Non-Patent Citations (2)
Title |
---|
Nucleic Acids Research Vol. 25, Nr. 17, S. 3389- 3402, 1997 * |
Proc. Natl. Acad. Sci. USA 85, S. 2444-2448, 1988 * |
Also Published As
Publication number | Publication date |
---|---|
DE19950050A1 (en) | 2001-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1003146B1 (en) | Method for identifying products by using microparticles | |
DE3923449A1 (en) | METHOD FOR DETERMINING EDGES IN IMAGES | |
DE19755831A1 (en) | Method for generating a radio frequency hopping sequence for a radio communication, radio device and radio communication system therefor | |
DE96066T1 (en) | METHOD AND SYSTEM FOR TREATING DATA SIGNALS PRESENTING AN UNKNOWN CHARACTER. | |
EP0795841A3 (en) | Method for creating an image transform matrix | |
DE19950050C2 (en) | Method for the functional assignment of unclassified DNA sequences | |
DE102012223587B4 (en) | Method for testing an application | |
DE1909657C3 (en) | Digital filter | |
EP0858051A3 (en) | Digital image segmentation method | |
EP2622540A1 (en) | Method for classifying patterns in image data records | |
EP1267566A3 (en) | Method for creating trapping contours for a page to be printed | |
EP1709587B1 (en) | Image processing system | |
DE602004012845T2 (en) | METHOD, COMPUTER PROGRAM PRODUCTS AND DEVICE FOR CHECKING THE IDENTITY | |
DE10319496B4 (en) | A method of providing context specific recipes in a semiconductor manufacturing facility by defining product categories | |
WO2011082798A1 (en) | Method and device for controlling character strings on a plurality of printed sheets | |
EP1261936B1 (en) | Method and device for personalising chip cards | |
EP0047512A2 (en) | Method and circuitry for character segmentation in a sequentially read series of characters | |
EP3367261A1 (en) | Method for classifying information and classification processor | |
DE102022203297A1 (en) | Method for processing and producing a pneumatic vehicle tire | |
DE3928270C2 (en) | ||
EP0992027A2 (en) | Chip card for executing non-modifiable system program routines and replacement program routines allocated thereto, and method for operating the chip card | |
DE102004055473A1 (en) | Image representation method for use in medical examination, involves executing different electronic processing of representation of image in subarea of image in comparison to remaining area, and shifting electronic processing across image | |
DE10037742C2 (en) | System for the detection and classification of objects | |
DE10324996A1 (en) | Chip card with at least one application | |
WO1992006450A1 (en) | Procedure for automatically diluting an encoded image stored using at least one chain-coded edge of chain-coded image dots |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
OP8 | Request for examination as to paragraph 44 patent law | ||
D2 | Grant after examination | ||
8339 | Ceased/non-payment of the annual fee |