AU2021101343A4 - A method for analysis and interpretation of crop bioinformatics repeats sequence pattern - Google Patents

A method for analysis and interpretation of crop bioinformatics repeats sequence pattern Download PDF

Info

Publication number
AU2021101343A4
AU2021101343A4 AU2021101343A AU2021101343A AU2021101343A4 AU 2021101343 A4 AU2021101343 A4 AU 2021101343A4 AU 2021101343 A AU2021101343 A AU 2021101343A AU 2021101343 A AU2021101343 A AU 2021101343A AU 2021101343 A4 AU2021101343 A4 AU 2021101343A4
Authority
AU
Australia
Prior art keywords
sequence
crop
pattern
genome
sequencing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2021101343A
Inventor
Cash Kumar
Sakshi Singh
Vinay Kumar SINGH
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Singh Vinay Kumar Dr
Original Assignee
Singh Vinay Kumar Dr
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Singh Vinay Kumar Dr filed Critical Singh Vinay Kumar Dr
Priority to AU2021101343A priority Critical patent/AU2021101343A4/en
Application granted granted Critical
Publication of AU2021101343A4 publication Critical patent/AU2021101343A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Mining
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Animal Husbandry (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Mining & Mineral Resources (AREA)
  • Agronomy & Crop Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present disclosure relates to a method for analysis and interpretation of crop bioinformatics repeats sequence pattern. The method comprises performing whole genome sequencing, whole transcriptome sequencing, and whole exome sequencing and aligning and merging fragments from a longer deoxyribonucleic acid sequence in order to reconstruct the original sequence; identifying complete set of genes and identifying complete transcripts; identifying single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) in whole crop proteome and genome for intronic and exonic regions; interpreting corresponding nucleotide genetic pattern and signature followed by determining crop gene codon composition; correlating pattern between proteome and genome of intronic and exonic regions using functional and structural motifs identification; and performing candidate gene based alignment and phylogenetic relationship and thereafter classifying sequence structure function. 17 100 perForrning w hole genorne sequence ing, w hole tra nscriptorne sequencing, a nd w hole exorne sequenti rig anrd alignir and rmergirg fragments from a lorkger deoxyribonuli cidsequerite irnorder to reconstruct theoriginal sequence anaigig A 1 102 identfing cornplete set of gernes arnl identifyi rig cornplete tra riscripts ? 106 identifyirigsingle nucleotide polymorphism [SNP) and simplesequence repeat [SSR) iriwhole crop proteome arid gerione for intrornic arid exotic regions interpreting corres pondirg nucleotide genetic pattern arid signature followed by determinirig crop gerie codon 108 composition correlating pattern betweeri proteorne arid genone ofintrornic and exonic regions usingfunctioral and structural rmotifs identification perform rig ca ndidate gerie-based aligrnsentand phylogenetic relations hip a nd thereafter classifying sequence 112 structure function Figure 1 i en sequ wotip Cnatmma [IA nM t te 0" 111W Trinsepts Iseilete itees -- -gd C& an(6"WMtJAMNroAtomp c n4e Cropgrneccdnccmenpositlondelerminaln PosIm wrbqiqn pd In Jnii; Ewrli mgionj Pat ter n carreIaionnbet ween prtelernead geom. (lai atn.orslacu•nni p rgm -Iittca malatlenssrp Figure 2A stunatttioclnlircmooFigure 2B co

Description

perForrning w hole genorne sequence ing, w hole tra nscriptorne sequencing, a nd w hole exorne sequenti rig anrdanaigig alignir A 1 102 and rmergirg fragments from a lorkger deoxyribonuli cidsequerite irnorder to reconstruct theoriginal sequence
identfing cornplete set of gernes arnl identifyi rig cornplete tra riscripts ?
106 identifyirigsingle nucleotide polymorphism [SNP) and simplesequence repeat [SSR) iriwhole crop proteome arid gerione for intrornic arid exotic regions
interpreting corres pondirg nucleotide genetic pattern arid signature followed by determinirig crop gerie codon 108 composition
correlating pattern betweeri proteorne arid genone ofintrornic and exonic regions usingfunctioral and structural rmotifs identification
perform rig ca ndidate gerie-based aligrnsentand phylogenetic relations hip a nd thereafter classifying sequence 112 structure function
Figure 1
ien sequ wotip
Cnatmma
Trinsepts
[IA nM t te 0" 111W
Iseilete itees
-- -gdC& an(6"WMtJAMNrocAtn4e omp Cropgrneccdnccmenpositlondelerminaln PosIm wrbqiqn pd InJnii; Ewrli mgionj Pattern carreIaionnbetween prtelernead geom.
(lai atn.orslacu•nni p rgm -Iittca
malatlenssrp
Figure 2A stunatttioclnlircmooFigure 2B co
A METHOD FOR ANALYSIS AND INTERPRETATION OF CROP BIOINFORMATICS REPEATS SEQUENCE PATTERN FIELD OF THE INVENTION
The present disclosure relates to a method for analysis and interpretation of crop bioinformatics repeats sequence pattern.
BACKGROUND OF THE INVENTION
Procedures for precise gene targeting or genome editing are critical for functional characterization of plant genes and agricultural crop genetic improvement. In comparison to microbial and mammalian systems, where gene targeting is a well-established method, effective gene targeting in plants is extremely inefficient and difficult, owing to the low frequency of homologous recombination. As a result, new technologies for more effective and precise gene targeting and genome editing in plants are critical.
Sequence-specific nucleases have been created in recent years to improve gene targeting and genome editing efficiency in animal and plant systems. The two most widely used sequence-specific chimeric proteins are zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs). The programmable DNA binding domain will directly bind to a corresponding sequence and direct the chimeric nuclease (e.g., the FokI nuclease) to make a particular DNA strand cleavage once the ZFN or TALEN constructs are incorporated into and expressed in cells. Double strand breaks (DSBs) can be produced by introducing a pair of ZFNs or TALENs, which activate the DNA repair systems and significantly increase the frequency of nonhomologous end joining (NHEJ) and homologous recombination (HR).
A single zinc-finger motif recognizes 3 bp in general, while engineered zinc-finger with tandem repeats can recognize 9-36 bp. Screening and identifying a desirable ZFN, on the other hand, is boring and time-consuming. ZFN has been used in plants to introduce minor mutations, gene deletion, or foreign DNA integration (gene replacement/knock-in) at particular genomic sites, despite its disadvantages. TALEs, unlike zinc finger proteins, are derived from the plant pathogenic bacteria Xanthomonas and contain 34 amino acid tandem repeats, with RVDs at positions 12 and 13 determining DNA-binding specificity. As a result, TALENs with 16-24 tandem repeats can recognize 16-24 by genomic sequences and produce DSBs at unique genomic sites using the chimeric nuclease. Many species, including yeast, birds, and plants, have already been shown to be able to modify their genomes using TALEN.
In one solution, a rapid targeting analysis in crops for determining donor insertion is disclosed. This disclosure provides methods for detecting and identifying plant items containing precisely targeted loci, and provides plants and plant cells comprising the targeted loci. This method can be used as a high-throughput method for screening the insertion of a donor DNA polynucleotide at a targeted locus. These methods can be immediately used to identify plant items produced by a targeting method that results from the use of a site-directed nuclease.
In another solution, a production of dha and other lc-pufas in plants is disclosed. The invention provides recombinant host organisms genetically modified with a polyunsaturated fatty acid (PUFA) synthase system and one or more accessory proteins that allow for and/or improve the production of PUFAs in the host organism. The present invention also relates to methods of making and using such organisms as well as products obtained from such organisms.
In another solution, a novel Crispr enzymes and systems is disclosed. The invention provides for systems, methods, and compositions for targeting nucleic acids. In particular, the invention provides non-naturally occurring or engineered DNA or RNA-targeting systems comprising a novel DNA or RNA-targeting CRISPR effector protein and at least one targeting nucleic acid component like a guide RNA.
Yet in another solution, a gene targeting and genetic modification of plants via RNA guided genome editing is disclosed. The present invention provides compositions and methods for specific gene targeting and precise editing of DNA sequences in plant genomes using the CRISPR (cluster regularly interspaced short palindromic repeats) associated nuclease. Non-transgenic, genetically modified crops can be produced using these compositions and methods.
However, due to significant differences between plants and animals, it is still unknown if the CRISPR-Cas system is functional in the plant system and if it can be exploited for specific gene targeting and genome editing in crop species. In order to overcome the aforementioned drawbacks, there exists a need to develop a method for analysis and interpretation of crop bioinformatics repeats sequence pattern.
SUMMARY OF THE INVENTION
The present disclosure seeks to provide a method for analysis and interpretation of crop bioinformatics repeats sequence pattern on crop genome application through Bioinformatics tools and technology.
In an embodiment, a method for analysis and interpretation of crop bioinformatics repeats sequence pattern. The method comprises:
performing whole genome sequencing, whole transcriptome sequencing, and whole exome sequencing and aligning and merging fragments from a longer deoxyribonucleic acid sequence in order to reconstruct the original sequence; identifying complete set of genes and identifying complete transcripts; identifying single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) in whole crop proteome and genome for intronic and exonic regions; interpreting corresponding nucleotide genetic pattern and signature followed by determining crop gene codon composition; correlating pattern between proteome and genome of intronic and exonic regions using functional and structural motifs identification; and performing candidate gene-based alignment and phylogenetic relationship and thereafter classifying sequence structure function.
In an embodiment, a sequence or structural motif that is often related to a basic biochemical feature. In an embodiment, the secret portion of genomes is satellite DNAs which is known as junk DNA, wherein the practical value of satellite DNA repeats and their sequences is currently increasingly recognized.
In an embodiment, the sequence motif is a sequence pattern of nucleotides or amino acids that is widespread and commonly thought to be related to the macromolecule's biological role.
In an embodiment, the pattern of codon use differs greatly between different species, and even between genes that are expressed in the same organism at different levels, wherein with respect to the variables influencing the codon use pattern, a variety of theories prevail, wherein attempts have been made to understand the distributions of codons in the protein coding genes and the improvements in the use of codons by different synon-genes.
In an embodiment, in nucleic acid and protein sequences, biological sequence motifs are classified as brief, typically fixed length, sequence patterns that may reflect significant structural or functional features such as transcription binding sites, splice junctions, active sites, or interaction interfaces.
In an embodiment, alignments between sequences that are descendants of a common ancestor may represent a degree of evolutionary shift, wherein Phylogenies and sequence alignments are linked to each other.
In an embodiment, Phylogenetic trees occur through successive speciation events (branching) and Phylogenetic relationship refers to the relative times in the past that common ancestors shared between species.
In an embodiment, Plant protein superfamily designation with its sequence-structure function similarity analysis to elucidate the target protein's complete characteristics. In an embodiment, Curated crop genome includes NGS (Next Generation Sequencing), GWAS, Transcriptome Shotgun Assembly (TSA), Whole Exome Sequencing (WES) and RNA seq Analysis.
An objective of the present disclosure is to provide STR and SSR role in crop quantity and quality improvement.
Another object of the present disclosure is to develop a candidate gene-based approach for designer crops development.
Another object of the present disclosure is useful in gene and genome editing for multiple resistance crops development.
Yet another object of the present invention is to deliver an expeditious and cost effective method for analysis and interpretation of crop bioinformatics repeats sequence pattern.
To further clarify advantages and features of the present disclosure, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
BRIEF DESCRIPTION OF FIGURES
These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Figure 1 illustrates a flow chart of a method for analysis and interpretation of crop bioinformatics repeats sequence pattern in accordance with an embodiment of the present disclosure; Figures 2A and 2B illustrate a workflow of a method for analysis and interpretation of crop bioinformatics repeats sequence pattern in accordance with an embodiment of the present disclosure; and Figures 3A and 3B illustrate an exemplary profile of a repeat and pattern correlation between proteome and genome (Complete Exonic (CDS) region) of wheat chromosome 1 in accordance with an embodiment of the present disclosure.
Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have been necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present disclosure. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.
DETAILED DESCRIPTION
For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.
Reference throughout this specification to "an aspect", "another aspect" or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by "comprises...a" does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.
Referring to Figure 1, a flow chart of a method for analysis and interpretation of crop bioinformatics repeats sequence pattern is illustrated in accordance with an embodiment of the present disclosure. At step 102, method 100 includes performing whole genome sequencing, whole transcriptome sequencing, and whole exome sequencing and aligning and merging fragments from a longer deoxyribonucleic acid sequence in order to reconstruct the original sequence.
The whole-genome sequencing (WGS) is a comprehensive method for analyzing entire genomes. Genomic information has been instrumental in identifying inherited disorders, characterizing the mutations that drive cancer progression, and tracking disease outbreaks. The whole-transcriptome analysis with total Ribonucleic acid (RNA) sequencing (RNA-Seq) detects coding plus multiple forms of noncoding RNA. The exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome).
At step 104, method 100 includes identifying complete set of genes and identifying complete transcripts. At step 106, method 100 includes identifying single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) in whole crop proteome and genome for intronic and exonic regions.
The single nucleotide polymorphisms, frequently called SNPs are the most common type of genetic variation among people. Each SNP represents a difference in a single
Deoxyribonucleic acid (DNA) building block, called a nucleotide. The simple sequence repeats (SSRs), sometimes described as genetic 'stutters,' are DNA tracts in which a short base-pair motif is repeated several to many times in tandem (e.g. CAGCAGCAG). These sequences experience frequent mutations that alter the number of repeats. The proteome is the entire set of proteins that is, or can be, expressed by a genome, cell, tissue, or organism at a certain time. It is the set of expressed proteins in a given type of cell or organism, at a given time, under defined conditions. The genome is an organism's complete set of genetic instructions. Each genome contains all of the information needed to build that organism and allow it to grow and develop.
At step 108, method 100 includes interpreting corresponding nucleotide genetic pattern and signature followed by determining crop gene codon composition. At step 110, method 100 includes correlating pattern between proteome and genome of intronic and exonic regions using functional and structural motifs identification.
At step 112, method 100 includes performing candidate gene-based alignment and phylogenetic relationship and thereafter classifying sequence structure function. In an embodiment, a sequence or structural motif that is often related to a basic biochemical feature. In an embodiment, the secret portion of genomes is satellite DNAs which is known as junk DNA, wherein the practical value of satellite DNA repeats and their sequences is currently increasingly recognized.
In an embodiment, the sequence motif is a sequence pattern of nucleotides or amino acids that is widespread and commonly thought to be related to the macromolecule's biological role.
In an embodiment, the pattern of codon use differs greatly between different species, and even between genes that are expressed in the same organism at different levels, wherein with respect to the variables influencing the codon use pattern, a variety of theories prevail, wherein attempts have been made to understand the distributions of codons in the protein coding genes and the improvements in the use of codons by different synon-genes.
In an embodiment, in nucleic acid and protein sequences, biological sequence motifs are classified as brief, typically fixed length, sequence patterns that may reflect significant structural or functional features such as transcription binding sites, splice junctions, active sites, or interaction interfaces.
In an embodiment, alignments between sequences that are descendants of a common ancestor may represent a degree of evolutionary shift, wherein Phylogenies and sequence alignments are linked to each other.
In an embodiment, Phylogenetic trees occur through successive speciation events (branching) and Phylogenetic relationship refers to the relative times in the past that common ancestors shared between species.
In an embodiment, Plant protein superfamily designation with its sequence-structure function similarity analysis to elucidate the target protein's complete characteristics. In an embodiment, Curated crop genome includes NGS (Next Generation Sequencing), GWAS, Transcriptome Shotgun Assembly (TSA), Whole Exome Sequencing (WES) and RNA seq Analysis.
Figures 2A and 2B illustrate a workflow of a method for analysis and interpretation of crop bioinformatics repeats sequence pattern in accordance with an embodiment of the present disclosure. In correlation with pattern and repeats using functional and structural motifs identification, a sequence or structural motif that is often related to a basic biochemical feature.
In SNP, SSR, Satellite DNA and Amino acid repeats identification in whole crop proteome and Genome (Intronic + Exonic regions), the secret portion of genomes has been satellite DNAs. Initially known as junk DNA, the practical value of satellite DNA repeats and their sequences is currently increasingly recognized.
In corresponding nucleotide genetic pattern and signature interpretation, the sequence motif is a sequence pattern of nucleotides or amino acids that is widespread and commonly thought to be related to the macromolecule's biological role.
In crop gene codon composition determination, the pattern of codon use differs greatly between different species, and even between genes that are expressed in the same organism at different levels. With respect to the variables influencing the codon use pattern, a variety of theories prevail. Attempts have been made to understand the distributions of codons in the protein-coding genes and the improvements in the use of codons by different synon-genes.
In Amino Acid Composition, Protein tandem repeats and Functional Signature correlation, in nucleic acid and protein sequences, biological sequence motifs are classified as brief, typically fixed length, sequence patterns that may reflect significant structural or functional features such as transcription binding sites, splice junctions, active sites, or interaction interfaces.
In candidate gene-based alignment and phylogenetic relationship, alignments between sequences that are descendants of a common ancestor may represent a degree of evolutionary shift. Phylogenies and sequence alignments are linked to each other. Phylogenetic trees occur through successive speciation events (branching). Phylogenetic relationship refers to the relative times in the past that common ancestors shared between species.
In sequence-Structure-Function Classification, plant protein superfamily designation with its sequence-structure-function similarity analysis to elucidate the target protein's complete characteristics.
The genome, specifically genetic sequence code is the text and the language of God. Genome sequence of model organisms from all kingdoms now big achievement for human. This work will be mainly focused on crop genome application through Bioinformatics tools and technology.
Steps 1. Curated crop genome 2. Crop genome assembly 3. Complete features generation 4. Complete crop transcriptome 5. Complete crop proteome 6. Repeats identification in whole crop proteome 7. Corresponding nucleotide genetic pattern interpretation 8. Crop gene codon composition determination
9. Pattern analysis of identified crop candidate intronic and exonic DNA regions and protein 10. Crop identified candidate gene phylogenetic relationship for checking pattern similarities and differences 11. Crop gene functional elucidation 12. Sequence-Structure-Function Relationship
Figures 3A and 3B illustrate an exemplary profile of a repeat and pattern correlation between proteome and genome (Complete Exonic (CDS) region) of wheat chromosome 1 in accordance with an embodiment of the present disclosure.
Sequence analysis result, >lcllCM022211.1_cdsKAF6981636.1_111 [locustag=CFC21_000095]
[protein=hypothetical protein] [protein id=KAF6981636.1] [location=complement (5244415...5245434)] [gbkey=CDS] ATGAAGACCTTACTCATCCTGACAATCATTGCGGTGGCACTAACTACCACCACCG CCAATATACAGGTCG ACCCTAGTGGCCAAGTACAATGGCCACAACAACAACAACCATTCCCCCAGCCCC AACAACCATTCTCCCA ACAACCACAACAAATTTTTCCCCAACCCCAACAAACATTCCCCCATCAACCACAA CAAGCATTTCCCCAA CCCCAACAAACATTCCCCCATCAACCACAACAACAATTTCCCCAGCCCCAGCAAC CACAACAACCATTTC CCCAGCAACCACAACAACAATTTCCCCAGCCCCAACAACCACAACAACCATTTC CCCAGCAACCACAACA ACAATTTCCCCAGCCCCAACAACCACAACAACCATTTCCCCAGCCCCAACAACCC CAACTACCATTTCCG CAACAACCACAACAACCATTCCCCCAGCCTCAACAACCCCAACAACCATTTCCCC AGTTACAGCAACCAC AACAACCTTTACCCCAGCCCCAACAACCGCAACAACCATTCCCCCAGCAACAAC AACCATTGATTCAGCC ATACCTACAACAACAGATGAACCCCTGCAAGAATTACCTCTTGCAGCAATGCAA CCCTGTGTCATTGGTG TCATCCCTCGTGTCAATGATCTTGCCACGAAGTGATTGCAAGGTGATGCGGCAAC AATGTTGCCAACAAC TAGCACAGATTCCTCAGCAGCTCCAGTGCGCAGCCATCCATGGCATCGTGCATTC CATCATCATGCAGCA AGAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAAGGCA TACAGATCATGCGGCCA CTATTTCAGCTCGTCCAGGGTCAGGGCATCATCCAACCTCAACAACCAGCTCAAT TGGAGGTGATCAGGT CATTGGTATTGGGAACTCTTCCAACCATGTGCAACGTGTTTGTTCCACCTGAGTG CTCCACCACCAAGGC ACCATTTGCCAGCATAGTCGCCGACATTGGTGGCCAATGA
>lcllCM022211.1_protKAF6981636.1_111 [locustag=CFC21_000095]
[protein=hypothetical protein] [protein id=KAF6981636.1] [location=complement (5244415...5245434)] [gbkey=CDS] MKTLLILTIIAVALTTTTANIQVDPSGQVQWPQQQQPFPQPQQPFSQQPQQIFPQPQQT FPHQPQQAFPQ PQQTFPHQPQQQFPQPQQPQQPFPQQPQQQFPQPQQPQQPFPQQPQQQFPQPQQPQQ PFPQPQQPQLPFP QQPQQPFPQPQQPQQPFPQLQQPQQPLPQPQQPQQPFPQQQQPLIQPYLQQQMNPCK NYLLQQCNPVSLV SSLVSMILPRSDCKVMRQQCCQQLAQIPQQLQCAAIHGIVHSIIMQQEQQQQQQQQQ QQQQQQGIQIMRP LFQLVQGQGIIQPQQPAQLEVIRSLVLGTLPTMCNVFVPPECSTTKAPFASIVA DIGGQ The method provides STR and SSR role in crop quantity and quality improvement, candidate gene-based approach for designer crops development, and the method is useful in gene and genome editing for multiple resistance crops development.
The method disclosed in accordance with the present disclosure is useful for plant Bioinformatics in Crop Improvement Program. The method requires Bioinformatics Databases, Tools and user interface. The method facilitates Simple Sequence Repeat (SSR) identification. The method provide Single Nucleotide Polymorphism (SNP) analysis. The disclosed method provides Marker Assisted Selection (MAS) by Designing of Molecular Markers. The disclosed method facilitates design of different type of PCR primers and specificity checking and Sequential, Structural and Functional Classification.
List of some important bioinformatics databases for genomics research
URL's 1. https://www.plantcyc.org/ 2. https://phytozome.jgi.doe.gov/pz/portal.html 3. http://bioinformatics.cau.edu.cn/PMRD/ 4. https://www.ncbi.nlm.nih.gov/ 5. https://www.ddbi.nig.ac.jp/index-e.html 6. https://www.embl.org/ 7. https://plants.ensembl.org/index.html 8. http://www.plantgdb.org/ 9. https://phytozome.igi.doe.gov/pz/portal.html
10. https://www.uniprot.org/ 11. https://www.rcsb.org/ 12. https://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml 13. https://www.ncbi.nlm.nih.gov/geo/ 14. http://www.cs.umd.edu/hcil/hce/ 15. https://biit.cs.ut.ee/clustvis/ 16. https://blast.ncbi.nlm.nih.gov/Blast.cgi 17. https://www.ebi.ac.uk/Tools/msa/clustalo/ 18. http://tree.bio.ed.ac.uk/software/figtree/ 19. http://www.mbio.ncsu.edu/BioEdit/bioedit.html 20. https://www.megasoftware.net/index.html 21. http://circos.ca/ 22. https://prosite.expasy.org/ 23. https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml 24. https://www.ebi.ac.uk/interpro/search/sequence/ 25. http://mordred.bioc.cam.ac.uk/-rapper/rampage.php 26. https://servicesn.mbi.ucla.edu/Verify3D/ 27. https://bioinfo3d.cs.tau.ac.il/PatchDock/ 28. http://hex.loria.fr/
The possible industrial application are: 1. Marker assisted selection using by development of molecular markers, SNP, SSR identification. 2. Useful in development of designer crops (quantitative and qualitative based crop improvement program and development of resistant crop varieties).
The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.

Claims (10)

WE CLAIM
1. A method for analysis and interpretation of crop bioinformatics repeats sequence pattern, the method comprises:
performing whole genome sequencing, whole transcriptome sequencing, and whole exome sequencing and aligning and merging fragments from a longer deoxyribonucleic acid sequence in order to reconstruct the original sequence; identifying complete set of genes and identifying complete transcripts; identifying single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) in whole crop proteome and genome for intronic and exonic regions; interpreting corresponding nucleotide genetic pattern and signature followed by determining crop gene codon composition; correlating pattern between proteome and genome of intronic and exonic regions using functional and structural motifs identification; and performing candidate gene-based alignment and phylogenetic relationship and thereafter classifying sequence structure function.
2. The method as claimed in claim 1, wherein a sequence or structural motif that is often related to a basic biochemical feature.
3. The method as claimed in claim 1, wherein the secret portion of genomes is satellite Deoxyribonucleic acids (DNAs) which is known as junk DNA, wherein the practical value of satellite DNA repeats and their sequences is currently increasingly recognized.
4. The method as claimed in claim 1, wherein the sequence motif is a sequence pattern of nucleotides or amino acids that is widespread and commonly thought to be related to the macromolecule's biological role.
5. The method as claimed in claim 1, wherein the pattern of codon use differs greatly between different species, and even between genes that are expressed in the same organism at different levels, wherein with respect to the variables influencing the codon use pattern, a variety of theories prevail, wherein attempts have been made to understand the distributions of codons in the protein-coding genes and the improvements in the use of codons by different synon-genes.
6. The method as claimed in claim 1, wherein in nucleic acid and protein sequences, biological sequence motifs are classified as brief, typically fixed length, sequence patterns that may reflect significant structural or functional features such as transcription binding sites, splice junctions, active sites, or interaction interfaces.
7. The method as claimed in claim 1, wherein alignments between sequences that are descendants of a common ancestor may represent a degree of evolutionary shift, wherein Phylogenies and sequence alignments are linked to each other.
8. The method as claimed in claim 1, wherein Phylogenetic trees occur through successive speciation events (branching) and Phylogenetic relationship refers to the relative times in the past that common ancestors shared between species.
9. The method as claimed in claim 1, wherein Plant protein superfamily designation with its sequence-structure-function similarity analysis to elucidate the target protein's complete characteristics.
10. The method as claimed in claim 1, wherein Curated crop genome includes NGS (Next Generation Sequencing), GWAS, Transcriptome Shotgun Assembly (TSA), Whole Exome Sequencing (WES) and Ribonucleic acid (RNA) seq Analysis.
AU2021101343A 2021-03-16 2021-03-16 A method for analysis and interpretation of crop bioinformatics repeats sequence pattern Ceased AU2021101343A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021101343A AU2021101343A4 (en) 2021-03-16 2021-03-16 A method for analysis and interpretation of crop bioinformatics repeats sequence pattern

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2021101343A AU2021101343A4 (en) 2021-03-16 2021-03-16 A method for analysis and interpretation of crop bioinformatics repeats sequence pattern

Publications (1)

Publication Number Publication Date
AU2021101343A4 true AU2021101343A4 (en) 2021-05-13

Family

ID=75829125

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021101343A Ceased AU2021101343A4 (en) 2021-03-16 2021-03-16 A method for analysis and interpretation of crop bioinformatics repeats sequence pattern

Country Status (1)

Country Link
AU (1) AU2021101343A4 (en)

Similar Documents

Publication Publication Date Title
Li et al. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution
Xia et al. RNA-Seq analysis and de novo transcriptome assembly of Hevea brasiliensis
Comai et al. TILLING: practical single‐nucleotide mutation discovery
Delseny et al. High throughput DNA sequencing: the new sequencing revolution
Rudd Expressed sequence tags: alternative or complement to whole genome sequences?
Peters et al. Forward genetics and map-based cloning approaches
Berendzen et al. A rapid and versatile combined DNA/RNA extraction protocol and its application to the analysis of a novel DNA marker set polymorphic between Arabidopsis thaliana ecotypes Col-0 and Landsberg erecta
Yamasaki et al. A large-scale screen for artificial selection in maize identifies candidate agronomic loci for domestication and crop improvement
Powell et al. Polymorphism revealed by simple sequence repeats
MacIntosh et al. Identification and analysis of Arabidopsis expressed sequence tags characteristic of non-coding RNAs
Collard et al. Rice molecular breeding laboratories in the genomics era: current status and future considerations
Xue et al. Exploiting genome variation to improve next-generation sequencing data analysis and genome editing efficiency in Populus tremula× alba 717-1B4
Pucker et al. A de novo genome sequence assembly of the Arabidopsis thaliana accession Niederzenz-1 displays presence/absence variation and strong synteny
Bajaj et al. EcoTILLING-based association mapping efficiently delineates functionally relevant natural allelic variants of candidate genes governing agronomic traits in chickpea
US10373705B2 (en) Providing nucleotide sequence data
Schallenberg-Rüdinger et al. Coevolution of organelle RNA editing and nuclear specificity factors in early land plants
Yao et al. A global survey of the transcriptome of allopolyploid Brassica napus based on single‐molecule long‐read isoform sequencing and Illumina‐based RNA sequencing data
Giolai et al. Targeted capture and sequencing of gene-sized DNA molecules
US20160215331A1 (en) Flexible and scalable genotyping-by-sequencing methods for population studies
Yang et al. Methods for developing molecular markers
Varshney Gene-based marker systems in plants: high throughput approaches for marker discovery and genotyping
Liu et al. Transposon mutagenesis and analysis of mutants in UniformMu maize (Zea mays)
US20230257799A1 (en) Methods of identifying and characterizing gene editing variations in nucleic acids
Cullis Plant genomics and proteomics
AU2021101343A4 (en) A method for analysis and interpretation of crop bioinformatics repeats sequence pattern

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry