DE19950050C2 - Method for the functional assignment of unclassified DNA sequences - Google Patents

Method for the functional assignment of unclassified DNA sequences

Info

Publication number
DE19950050C2
DE19950050C2 DE19950050A DE19950050A DE19950050C2 DE 19950050 C2 DE19950050 C2 DE 19950050C2 DE 19950050 A DE19950050 A DE 19950050A DE 19950050 A DE19950050 A DE 19950050A DE 19950050 C2 DE19950050 C2 DE 19950050C2
Authority
DE
Germany
Prior art keywords
sequence
sequences
consensus
gap
unclassified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
DE19950050A
Other languages
German (de)
Other versions
DE19950050A1 (en
Inventor
Werner Mueller
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to DE19950050A priority Critical patent/DE19950050C2/en
Publication of DE19950050A1 publication Critical patent/DE19950050A1/en
Application granted granted Critical
Publication of DE19950050C2 publication Critical patent/DE19950050C2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Description

Die vorliegende Erfindung betrifft ein Verfahren zum funktionellen Zuordnen von nichtklassifizierten DNA-Sequenzen, in dem eine nichtklassifizierte Sequenz mittels einfacher Schritte bekannten Referenzsequenz zugeordnet (aligned) werden kann.The present invention relates to a method for the functional assignment of unclassified DNA sequences using an unclassified sequence simple steps can be assigned to known reference sequence (aligned).

Seit der kommerziellen Verfügbarkeit der PCR-Technik stellt die funktionelle Zuordnung der auf diese Weise verfügbar gemachten DNA-Information ein grundsätzliches Problem der Biotechnologie dar.Since the commercial availability of PCR technology, the functional Assignment of the DNA information made available in this way fundamental problem of biotechnology.

In herkömmlichen Verfahren wird daher entweder über funktionelle Besonderheiten bzw. direkten Abgleich der nichtklassifizierten Sequenz mit Sequenzen mit bekannter Eigenschaft abgeglichen.In conventional processes, therefore, is either about functional peculiarities or direct comparison of the unclassified sequence with sequences with known ones Matched property.

Die vorliegende Anmeldung stellt ein Verfahren zur funktionellen Zuordnung einer nichtklassifizierten DNA-Sequenz zur Verfügung, die die folgenden Schritte umfaßt:
The present application provides a method for the functional assignment of an unclassified DNA sequence, which comprises the following steps:

  • a) Abgleichen der nichtklassifizierten Sequenz (A) mit Referenzsequenzen (B1-Bn) untera) Compare the unclassified sequence (A) with reference sequences (B 1 -B n ) below
  • b) Erstellen von Lückenmustern (C1-Cm) für die Referenzsequenzen (B1-Bn) und einer Konsensussequenz (D) für die Sequenz (A),b) creating gap patterns (C 1 -C m ) for the reference sequences (B 1 -B n ) and a consensus sequence (D) for the sequence (A),
  • c) Aufspalten der Lückenmuster (C1-Cm) in kurze Konsensussequenzen (E1-Em) und Lückeninformationen (F1-Fm),c) splitting the gap patterns (C 1 -C m ) into short consensus sequences (E 1 -E m ) and gap information (F 1 -F m ),
  • d) positionsweiser Vergleich der kurzen Konsensussequenzen (E1-Em) mit iterierendem Offset mit der nichtklassifizierten Sequenz (A) unter Bestimmung der kurzen Konsensussequenz mit höchster Übereinstimmung (Emax),d) position-by-point comparison of the short consensus sequences (E 1 -E m ) with iterating offset with the unclassified sequence (A) while determining the short consensus sequence with the highest agreement (E max ),
  • e) Einfügen der der Konsensussequenz (Emax) entsprechenden Lückeninformation (Fmax) in die Sequenz (A) unter Erstellen einer Aligned-Sequenz (G).e) inserting the corresponding one of the consensus sequence (E max) gap information (F max) in the sequence (A) to create a Aligned sequence (G).

Die der Konsensussequenz (D) zugrunde liegenden Referenzsequenzen (B1-Bn) sollten untereinander einen Übereinstimmungsgrad besitzen der größer als 60 und kleiner als 80-90% ist. Gemäß der vorliegenden Anmeldung können die in Schritt (b) gesammelten Informationen (Lückenmuster (C1-Cm) und Konsensussequenz (E1-Em) zwischengespeichert und für spätere Vergleiche direkt verwendet werden.The reference sequences (B 1 -B n ) on which the consensus sequence (D) is based should have a degree of agreement among themselves which is greater than 60 and less than 80-90%. According to the present application, the information collected in step (b) (gap pattern (C 1 -C m ) and consensus sequence (E 1 -E m ) can be buffered and used directly for later comparisons.

Bei der Erstellung der Lückenmuster werden gleiche Lückenmuster eliminiert, so daß für E1-Em gilt: m ≦ n.When creating the gap patterns, the same gap patterns are eliminated, so that for E 1 -E m the following applies: m ≦ n.

Der kritische Schritt in dem erfindungsgemäßen Verfahren ist der positionsweise Vergleich der kurzen Konsensussequenzen (E1-Em) ein möglichst hoher Übereinstimmungsgrad erzielt wird. Für eine sinnvolle Klassifikation ist dabei erforderlich, daß dieser Übereinstimmungsgrad möglichst größer als 60%, vorzugsweise größer als 80% ist.The critical step in the method according to the invention is the positional comparison of the short consensus sequences (E 1 -E m ) to achieve the highest possible degree of agreement. For a meaningful classification it is necessary that this degree of agreement is as large as possible greater than 60%, preferably greater than 80%.

Das erfindungsgemäße Verfahren kann weiterhin in mehreren Zyklen erfolgen, wobei nach Finden der Aligned-Sequenz (G) eine beste Referenzsequenz (Bmax) gefunden wird, die zum Finden eines neuen Satzes Referenzsequenzen (H1-Hn) verwendet wird, die zur Familie der Referenzsequenz mit dem höchsten Übereinstimmungsgrad (Bmax) des ersten Zyklusses gehört.The method according to the invention can also be carried out in several cycles, and after finding the aligned sequence (G), a best reference sequence (B max ) is found, which is used to find a new set of reference sequences (H 1 -H n ) belonging to the family belongs to the reference sequence with the highest degree of agreement (B max ) of the first cycle.

Das Verfahren bietet den Vorteil der sehr schnellen Generierung von multiplen Sequenz-Alignments, so daß diese sehr schnell für weitere Sequenzverarbeitungen wie z. B. Sequenzannotierung oder für Sequenzvergleiche zur Verfügung stehen. Insbesondere Sequenzvergleiche in korrekt berechneten multiplen Sequenzalignments sind sehr schnell, da die Sequenzen im Alignment für den Sequenzvergleich nicht mehr gegeneinander verschoben werden müssen, sondern Position für Position direkt verglichen werden können.The method offers the advantage of the very fast generation of multiples Sequence alignments so that they can be used very quickly for further sequence processing such as B. sequence annotation or for sequence comparisons are available. Especially sequence comparisons in correctly calculated multiples Sequence alignments are very fast since the sequences in the alignment for the Sequence comparison no longer need to be shifted against each other, but Position by position can be compared directly.

Die Erfindung wird anhand der nachfolgenden Figuren näher erläutert:The invention is explained in more detail with reference to the following figures:

Fig. 1 Einlesen der Sequenzen Zunächst werden die zu verarbeitenden Sequenzen und die Referenzsequenzen eingelesen. Fig. 1 Reading the sequences First, the sequences to be processed and the reference sequences are read.

Fig. 2 Aus den Referenzsequenzen werden die Lückenmuster extrahiert und aus der Sequenzinformation wird eine Konsensussequenz bestimmt. Fig. 2 The gap patterns are extracted from the reference sequences and a consensus sequence is determined from the sequence information.

Fig. 3a Dann wird eine Liste von Konsensussequenzen erstellt, aus der jeweils ein bestimmtes Lückenmuster entfernt wird. Fig. 3a A list of consensus sequences is then created, from which a particular gap pattern is removed in each case.

Fig. 3b Zeigt ein spezielles Beispiel, wie aus Konsensussequenz Shorted Konsensi erzeugt werden. Fig. 3B shows a specific example of how to generate consensus sequence from Shorted Konsensi.

Fig. 4a Durch positionsweisen Vergleich jeder der in "Shorted-Konsensus"- Sequenzen mit der neuen Sequenz wird ein optimaler Satz von Aligner-Parametern bestimmt. FIG. 4a An optimal set of aligner parameters is determined by comparing each of the sequences in "Shorted Consensus" sequences with the new sequence.

Fig. 4b Zeigt ein spezielles Beispiel für den positionsweisen Vergleich. Fig. 4b shows a specific example for the positional comparison.

Fig. 5a Die neue Sequenz wird aligned, indem, entsprechend des Parametersatzes, die Lücken eingefügt und der Offset verschoben wird. Fig. 5a The new sequence is aligned by inserting the gaps and shifting the offset in accordance with the parameter set.

Fig. 5b Experimentelles Beispiel über Einfügungen der Lücken und Verschiebungen des Offsets. Fig. 5b Experimental example of inserting the gaps and offsets of the offset.

Claims (4)

1. Verfahren zum funktionellen Zuordnen einer nichtklassifizierten DNA-Sequenz, umfassend:
  • a) Abgleichen der nichtklassifizierten Sequenz (A) mit Referenzsequenzen (B1-Bi) unter
  • b) Erstellen von Lückenmustern (C1-Cm) für die Referenzsequenzen (B1-Bn) und einer Konsensussequenz (D) für die Sequenz (A),
  • c) Aufspalten der Lückenmuster (C1-Cm) in kurze Konsensussequenzen (E1-Em) und Lückeninformationen (F1-Fm),
  • d) positionsweiser Vergleich der kurzen Konsensussequenzen (E1-Em) mit iterierendem Offset mit der Sequenz (A) unter Bestimmung der kurzen Konsensussequenz mit höchster Übereinstimmung (Emax),
  • e) Einfügen der der Konsensussequenz (Emax) entsprechenden Lückeninformation (Fmax) in die Sequenz (A) unter Erstellen einer Aligned-Sequenz (G).
1. A method for functionally mapping an unclassified DNA sequence comprising:
  • a) Compare the unclassified sequence (A) with reference sequences (B 1 -B i ) below
  • b) creating gap patterns (C 1 -C m ) for the reference sequences (B 1 -B n ) and a consensus sequence (D) for the sequence (A),
  • c) splitting the gap patterns (C 1 -C m ) into short consensus sequences (E 1 -E m ) and gap information (F 1 -F m ),
  • d) position-by-point comparison of the short consensus sequences (E 1 -E m ) with iterating offset with the sequence (A) while determining the short consensus sequence with the highest agreement (E max ),
  • e) inserting the corresponding one of the consensus sequence (E max) gap information (F max) in the sequence (A) to create a Aligned sequence (G).
2. Verfahren nach Anspruch 1, wobei bei dem positionsweisen Vergleich ein Übereinstimmungsgrad von ≧ 60%, vorzugsweise ≧ 80%, erforderlich ist.2. The method of claim 1, wherein in the positional comparison Degree of agreement of ≧ 60%, preferably ≧ 80%, is required. 3. Verfahren nach Anspruch 1 oder 2, wobei mehrere Zyklen der Schritte (a) bis (e) erfolgen.3. The method according to claim 1 or 2, wherein several cycles of steps (a) to (e) respectively. 4. Verfahren nach Anspruch 1, wobei die in Schritt (b) erstellten Lückenmuster (C1-Cm) und Konsensussequenz (D) für spätere Abgleiche verwendet werden.4. The method according to claim 1, wherein the gap pattern (C 1 -C m ) and consensus sequence (D) created in step (b) are used for later comparisons.
DE19950050A 1999-10-16 1999-10-16 Method for the functional assignment of unclassified DNA sequences Expired - Fee Related DE19950050C2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
DE19950050A DE19950050C2 (en) 1999-10-16 1999-10-16 Method for the functional assignment of unclassified DNA sequences

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
DE19950050A DE19950050C2 (en) 1999-10-16 1999-10-16 Method for the functional assignment of unclassified DNA sequences

Publications (2)

Publication Number Publication Date
DE19950050A1 DE19950050A1 (en) 2001-04-26
DE19950050C2 true DE19950050C2 (en) 2002-07-18

Family

ID=7925982

Family Applications (1)

Application Number Title Priority Date Filing Date
DE19950050A Expired - Fee Related DE19950050C2 (en) 1999-10-16 1999-10-16 Method for the functional assignment of unclassified DNA sequences

Country Status (1)

Country Link
DE (1) DE19950050C2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002332742A1 (en) * 2001-08-29 2003-04-22 Genome Therapeutics Corporation Confirmation sequencing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Nucleic Acids Research Vol. 25, Nr. 17, S. 3389- 3402, 1997 *
Proc. Natl. Acad. Sci. USA 85, S. 2444-2448, 1988 *

Also Published As

Publication number Publication date
DE19950050A1 (en) 2001-04-26

Similar Documents

Publication Publication Date Title
EP1003146B1 (en) Method for identifying products by using microparticles
DE3923449A1 (en) METHOD FOR DETERMINING EDGES IN IMAGES
DE19755831A1 (en) Method for generating a radio frequency hopping sequence for a radio communication, radio device and radio communication system therefor
DE96066T1 (en) METHOD AND SYSTEM FOR TREATING DATA SIGNALS PRESENTING AN UNKNOWN CHARACTER.
EP0795841A3 (en) Method for creating an image transform matrix
DE19950050C2 (en) Method for the functional assignment of unclassified DNA sequences
DE102012223587B4 (en) Method for testing an application
DE1909657C3 (en) Digital filter
EP0858051A3 (en) Digital image segmentation method
EP2622540A1 (en) Method for classifying patterns in image data records
EP1267566A3 (en) Method for creating trapping contours for a page to be printed
EP1709587B1 (en) Image processing system
DE602004012845T2 (en) METHOD, COMPUTER PROGRAM PRODUCTS AND DEVICE FOR CHECKING THE IDENTITY
DE10319496B4 (en) A method of providing context specific recipes in a semiconductor manufacturing facility by defining product categories
WO2011082798A1 (en) Method and device for controlling character strings on a plurality of printed sheets
EP1261936B1 (en) Method and device for personalising chip cards
EP0047512A2 (en) Method and circuitry for character segmentation in a sequentially read series of characters
EP3367261A1 (en) Method for classifying information and classification processor
DE102022203297A1 (en) Method for processing and producing a pneumatic vehicle tire
DE3928270C2 (en)
EP0992027A2 (en) Chip card for executing non-modifiable system program routines and replacement program routines allocated thereto, and method for operating the chip card
DE102004055473A1 (en) Image representation method for use in medical examination, involves executing different electronic processing of representation of image in subarea of image in comparison to remaining area, and shifting electronic processing across image
DE10037742C2 (en) System for the detection and classification of objects
DE10324996A1 (en) Chip card with at least one application
WO1992006450A1 (en) Procedure for automatically diluting an encoded image stored using at least one chain-coded edge of chain-coded image dots

Legal Events

Date Code Title Description
OP8 Request for examination as to paragraph 44 patent law
D2 Grant after examination
8339 Ceased/non-payment of the annual fee