CN113517026A - Generation method and system of label sequence applied to biological product, intelligent terminal and computer readable storage medium - Google Patents

Generation method and system of label sequence applied to biological product, intelligent terminal and computer readable storage medium Download PDF

Info

Publication number
CN113517026A
CN113517026A CN202110664094.2A CN202110664094A CN113517026A CN 113517026 A CN113517026 A CN 113517026A CN 202110664094 A CN202110664094 A CN 202110664094A CN 113517026 A CN113517026 A CN 113517026A
Authority
CN
China
Prior art keywords
sequence
unique
subsequences
length
unique subsequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110664094.2A
Other languages
Chinese (zh)
Other versions
CN113517026B (en
Inventor
李智
许心意
刘超钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Lasso Biochip Technology Co ltd
Original Assignee
Suzhou Lasso Biochip Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Lasso Biochip Technology Co ltd filed Critical Suzhou Lasso Biochip Technology Co ltd
Priority to CN202110664094.2A priority Critical patent/CN113517026B/en
Publication of CN113517026A publication Critical patent/CN113517026A/en
Application granted granted Critical
Publication of CN113517026B publication Critical patent/CN113517026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/20Sequence assembly
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs

Abstract

The invention provides a generation method, a generation system, an intelligent terminal and a computer readable storage medium of a label sequence applied to a biological product. The tag sequence is composed of a plurality of unique subsequences spliced end to end in sequence, and both the unique subsequences and the tag sequence accord with a specific preset principle. After the technical scheme is adopted, the tag sequences meeting the requirements of a specific quantity and meeting specific design principles can be generated more simply and rapidly in a large quantity, the cost is lower, the specificity is high, and the similarity with a target genome is lower.

Description

Generation method and system of label sequence applied to biological product, intelligent terminal and computer readable storage medium
Technical Field
The invention relates to the technical field of biology, in particular to a generation method and application of a tag sequence of a biological product.
Background
In bio-related products, such as biochips (including gene chips and protein chips) and DNA self-assembly vectors as drug carriers, each contain a specific tag sequence, for example, in the decoding process of preparing a gene chip, the DNA tag sequence is used for specific hybridization with a decoding sequence matched with the DNA tag sequence, so as to complete decoding of the gene chip, that is, to obtain a corresponding relationship between the types of probes and the positions of the holes on the gene chip, and the decoded gene chip can be used for subsequent gene detection. Therefore, the tag sequence has an important effect on biological products such as biochips and DNA self-assembly drug carriers.
However, the tag sequences applied to biological products in the prior art have the defects of poor specificity, complicated generation operation and long time consumption.
Disclosure of Invention
In order to overcome the technical defects of poor specificity, complex generation operation and long time consumption of a label sequence applied to a biochip in the prior art, a first aspect of the present invention provides a method for generating a label sequence applied to a biological product, wherein the label sequence is composed of a plurality of unique subsequences spliced end to end in sequence, and both the unique subsequences and the label sequence conform to a preset principle, and the method comprises:
step S1: setting the length of the unique subsequence to be n, the length of the tag sequence to be m, wherein m and n are the number of basic units forming the unique subsequence respectively, m and n are positive integers, and m > n, and generating all unique subsequences with the length of n according to a preset principle to obtain the unique subsequence set; and
step S2: randomly extracting a first unique subsequence from the set of unique subsequences, then randomly extracting a second unique subsequence, splicing the second unique subsequence to the first unique subsequence to obtain a spliced sequence, and checking whether the spliced sequence meets the preset principle;
step S3: if yes, continuously randomly extracting subsequent unique subsequences one by one and splicing the subsequences one by one to obtain a new splicing sequence with continuously increased length, checking whether the new splicing sequence accords with the preset principle one by one and whether the length of the new splicing sequence reaches m, and storing the new splicing sequence as the label sequence when the new splicing sequence accords with the preset principle and has the length of m.
Optionally, m is an integer multiple of n. Optionally, m is not an integer multiple of n, for example, m is 8, n is 3, then the splicing is stopped when the length is 9, the last 1 basic unit is cut off, and then whether the remaining sequence with the length of 8 meets the preset rule is checked, and if yes, the remaining sequence is stored as a tag sequence.
Illustratively, 2 ≦ m/n ≦ 5. Illustratively, the unique subsequence is 7 bases in length, and the DNA tag sequence is 21 bases in length, m. Preferably, the nucleotide sequence of the DNA tag sequence is shown in SEQ ID NO. 1-SEQ ID NO. 20:
SEQ ID NO.1:AGAGCAAGAACCCTAAGTTAT;
SEQ ID NO.2:ATTCTGTATTGCGAGAGGAAA;
SEQ ID NO.3:CCCTCCTACTATCACATTATT;
SEQ ID NO.4:AGGTCGTCTCATTACACATAA;
SEQ ID NO.5:CCTTCCGATTCAACTCTATTA;
SEQ ID NO.6:GCTTAGCCAAACACCAATAAT;
SEQ ID NO.7:CTTCACCAGTCATTCACAATA;
SEQ ID NO.8:GGTAAGGTTCTCTGTTGTTTT;
SEQ ID NO.9:ACGACCCTACTTCAATCTTAT;
SEQ ID NO.10:AGGGTGGAACTTATGACTTTA;
SEQ ID NO.11:GGAAACACTTGATGACAGTAA;
SEQ ID NO.12:GGAAATGCGAATGTGTTAGTA;
SEQ ID NO.13:GAATAAGCGACAATGGTGTAA;
SEQ ID NO.14:TTTGTGCTCTTGCCATTTGAA;
SEQ ID NO.15:GGACCAGTAATCCAACATTTT;
SEQ ID NO.16:GAAACCTGGACTTCATCATTT;
SEQ ID NO.17:TATTACGCCCATACACACTAA;
SEQ ID NO.18:GAGCAGGATACTTTGGTTTTA;
SEQ ID NO.19:TCCTTTGTCTGAAGAGAGTAA;
SEQ ID NO.20:AGGCGTGTCATACTACTTATT。
further, step S2 further includes the steps of: and if not, deleting the second unique subsequence from the splicing sequence, and then randomly extracting another unique subsequence from the unique subsequence set to serve as a new second unique subsequence.
Further, the biological product is a gene chip, a protein chip or a DNA self-assembly drug carrier, basic units forming the unique subsequence in the gene chip and the DNA self-assembly drug carrier are deoxynucleotides with different bases, and basic units forming the unique subsequence in the protein chip are amino acids. The application one is as follows: a gene chip. The design of label sequence and probe is adopted, and the label sequence corresponds to the probe one by one. The type of the probe can be identified by identifying the tag sequence. The application II comprises the following steps: a protein chip. If protein probes are coupled on the microspheres and tag sequences are coupled on the microspheres, the probes correspond to the tag sequences one by one. The type of protein probe coupled to the microsphere can be known by identifying the tag sequence. The application is as follows: and (3) self-assembling the DNA. For example, some DNA self-assembled scaffolds need to bind polypeptide molecules, and some tag sequences can be designed on the scaffold. Polypeptide molecules coupled with complementary sequences of the tag sequences are then added, and these polypeptide molecules can be assembled to specific positions of the scaffold by means of the recognition tag sequences.
Further, the tag sequence of the gene chip is a DNA tag sequence, the DNA tag sequence includes at least two unique subsequences, and the preset rule includes: no more than 8 consecutive identical bases, a GC content of 30-60%, a hairpin structure no more than 8 bases in length, no more than 16 bases in self-complementary segment, and the DNA tag sequence is dissimilar to that of the target genome. "dissimilar" means that no match with the tag sequence E value less than 0.05 can be found in the target genome using BLAST. Illustratively, the target genome is a human genome or a mouse genome or the like.
Further, in the predetermined rule, the consecutive identical bases are not more than 3, the hairpin structure is not more than 3 bases long, and the self-complementary segment is not more than 6 bases.
Further, step S3 further includes the steps of: unique subsequences that are identical and complementary to all of the n-length subsequences in the tag sequence are deleted from the set of unique subsequences so that they are no longer involved in subsequent extraction and splicing. The term "all subsequences of length n" as used herein refers to all subsequences of length n in the tag sequence, and is not limited to the unique subsequences that make up the tag sequence, but also includes other subsequences of length n in the tag sequence.
The method for generating the tag sequence applied to the biological product further comprises the following steps: step S4: repeating steps S2-S3 until the set of unique subsequences is exhausted or the number of generated tag sequences reaches a requirement to obtain the set of tag sequences. The set of tag sequences generated using this method may satisfy the requirement: wherein, any unique subsequence with length n of any label sequence and complementary sequence thereof only appear once in the set of label sequences, thereby generating as many label sequences as possible and simultaneously ensuring the specificity of the label sequences to the maximum extent.
A second aspect of the present invention provides a system for generating a tag sequence for application to a biological product, comprising:
the unique subsequence module is used for setting the length of the unique subsequence to be n, the length of the tag sequence to be m, m and n are the number of basic units forming the unique subsequence respectively, m and n are positive integers, and m > n, and generating all unique subsequences with the length of n according to a preset principle, namely the unique subsequence set; and
the splicing checking module is used for randomly extracting a first unique subsequence from the set of unique subsequences, then randomly extracting a second unique subsequence, splicing the second unique subsequence to the first unique subsequence to obtain a spliced sequence, and checking whether the spliced sequence meets the preset principle; if not, the splicing checking module is further configured to delete the second unique subsequence from the spliced sequence, and then randomly extract another unique subsequence from the unique subsequence set to serve as a new second unique subsequence until the spliced sequence meets the preset rule; if yes, continuing to randomly extract subsequent unique subsequences one by one and splicing the subsequences one by one to obtain a new spliced sequence with the length increasing continuously, checking whether the new spliced sequence conforms to the preset principle one by one and whether the length of the new spliced sequence reaches m, and storing the new spliced sequence as the label sequence when the new spliced sequence conforms to the preset principle and has the length of m;
and the unique subsequence module and the splicing checking module are connected through a data stream.
Further, the splice check module is further configured to delete from the set of unique subsequences the unique subsequences that are identical and complementary to all of the n-length subsequences in the tag sequence so that they are no longer involved in subsequent extraction and splicing. Preferably, the concatenation checking module is further configured to repeat the concatenation and checking steps until the set of unique sub-sequences is used up or the number of generated tag sequences reaches a requirement, so as to obtain the set of tag sequences. The set of tag sequences generated using this method may satisfy the requirement: wherein, any unique subsequence with length n of any label sequence and complementary sequence thereof only appear once in the set of label sequences, thereby generating as many label sequences as possible and simultaneously ensuring the specificity of the label sequences to the maximum extent.
Illustratively, the DNA tag sequence comprises at least two of the unique subsequences, and the predetermined rule comprises: no more than 8 consecutive identical bases, a GC content of 30-60%, a hairpin structure no more than 8 bases in length, no more than 16 bases in self-complementary segment, and the DNA tag sequence is dissimilar to that of the target genome. "dissimilar" means that no match with the tag sequence E value less than 0.05 can be found in the target genome using BLAST. Illustratively, the target genome is a human genome or a mouse genome or the like.
A third aspect of the present invention provides an intelligent terminal, including:
a memory for storing executable program code; and
a processor for reading the executable program code stored in the memory to perform the above-described method of generating a tag sequence for application to a biological product. The intelligent terminal includes but is not limited to a PC, a portable computer, a mobile terminal and other devices having display and processing functions.
A fourth aspect of the invention provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, carry out the steps of the above-described method of generating a tag sequence for application to a biological product. The computer-readable storage medium includes, but is not limited to: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:
1. the generation method of the tag sequence can be directly realized in the intelligent terminal through the generation system, can generate the tag sequences with specific quantity and in accordance with specific design principles more simply and rapidly, and is lower in cost; and because the number of continuous same basic groups is not more than 3-8 and the synthesis success rate of the label sequence is 100%, the label is easy to synthesize.
2. The set of tag sequences generated by using the technical scheme of the application can meet the requirements: the unique subsequence with any length of n of any tag sequence and the complementary sequence thereof only appear once in the set, so that the specificity of the tag sequence is ensured to the maximum extent while generating as many tag sequences as possible, and the similarity with a target genome is lower. The technical scheme of the application is particularly suitable for rapidly generating a large number of label sequences meeting specific conditions.
Drawings
Fig. 1 is a block diagram of a system for generating a tag sequence applied to a biological product according to an embodiment of the present application.
Detailed Description
The advantages of the invention are further illustrated in the following description of specific embodiments in conjunction with the accompanying drawings. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and is not to be taken as limiting the scope of the invention.
In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.
In the following description, suffixes such as "module" used to represent elements are used only for facilitating the explanation of the present invention and have no specific meaning in itself.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Examples
Illustratively, 20 DNA tag sequences conforming to the preset rule and comprising 3 unique subsequences are generated by using the generation system and the generation method of the tag sequences applied to the biological products, the length n of the unique subsequences is 7, the length m of the DNA tag sequences is 21, and the DNA tag sequences are applied to the gene chip.
The tag sequence consists of three unique subsequences which are sequentially spliced end to end, and both the unique subsequences and the tag sequence accord with the following preset principles: no more than 8 contiguous identical bases, a GC content of 30% to 60%, no more than 8 bases in hairpin structure length, no more than 16 bases in self-complementary segment, and the DNA tag sequence is dissimilar to the human genome.
The module structure of the generation method and generation system of the tag sequence applied to the biological product of the present application is shown in fig. 1.
Specifically, the label sequence applied to the gene chip is generated in a computer through a DNA label sequence generating system, the DNA label sequence generating system comprises a unique subsequence module and a splicing checking module, and the unique subsequence module and the splicing checking module are connected through a data stream. The method for generating the DNA label sequence comprises the following steps:
step 1: generating a set of unique subsequences: generating a unique subsequence set according to a preset principle, wherein the length of each unique subsequence is n: illustratively, the unique subsequence module sets the length n of the unique subsequence to be 7 bases and the length m of the tag sequence to be 21 bases, and generates all unique subsequences with the length of 7 bases according to a preset principle, so as to obtain the unique subsequence set;
step 2: randomly extracting a unique subsequence: randomly extracting a first unique subsequence from the unique subsequence set through a splicing checking module, and then randomly extracting a second unique subsequence;
and step 3: generating a splicing sequence: splicing one end of the second unique subsequence with the length of 7 bases to one end of the first unique subsequence with the length of 7 bases, namely, connecting the two ends of the second unique subsequence in series in an end-to-end mode to obtain a spliced sequence (at the moment, the length of the spliced sequence is 14 bases);
and 4, step 4: checking whether the splicing sequence conforms to a preset principle: checking whether the splicing sequence obtained in the step 3 meets the preset principle, if not, deleting the second unique subsequence from the splicing sequence and repeating the steps 2 to 3; if yes, entering step 5;
and 5: check if the splice sequence reaches length m: further checking whether the length of the splicing sequence reaches m, and if not, repeating the step 2 to the step 4; if yes, entering step 6;
in other words, each time a new spliced sequence is obtained, the new spliced sequence needs to be checked for two pieces of information, namely whether the new spliced sequence meets the predetermined criteria, and whether the length of the new spliced sequence reaches m (for example, 21 bases). If the length of the label accords with the preset principle and reaches m, successfully obtaining a label sequence meeting the requirement; and if the length of the optical fiber meets the preset principle but does not reach m, continuing to perform the steps of random extraction, splicing and checking.
Step 6: the tag sequence was saved: storing the splicing sequence which accords with the preset principle and has the length of m as a label sequence;
and 7: deletion of identical and complementary unique subsequences: deleting unique subsequences which are identical to and complementary to all subsequences with the length of 7 bases in the tag sequence obtained in the step 6 from the unique subsequence set so that the unique subsequences do not participate in subsequent extraction and splicing;
and 8: generating a set of tag sequences: repeating the steps 2 to 7 to continuously obtain a new label sequence which accords with a preset principle and has the length of m;
and step 9: and (4) terminating: all flows are terminated when the number of all generated tag sequences meets the requirements or the available unique subsequences in the set of unique subsequences have been exhausted.
Illustratively, the nucleotide sequences of the generated 20 DNA tag sequences which conform to the preset principle and have the length of 21 are shown in SEQ ID NO. 1-SEQ ID NO. 20:
SEQ ID NO.1:AGAGCAAGAACCCTAAGTTAT;
SEQ ID NO.2:ATTCTGTATTGCGAGAGGAAA;
SEQ ID NO.3:CCCTCCTACTATCACATTATT;
SEQ ID NO.4:AGGTCGTCTCATTACACATAA;
SEQ ID NO.5:CCTTCCGATTCAACTCTATTA;
SEQ ID NO.6:GCTTAGCCAAACACCAATAAT;
SEQ ID NO.7:CTTCACCAGTCATTCACAATA;
SEQ ID NO.8:GGTAAGGTTCTCTGTTGTTTT;
SEQ ID NO.9:ACGACCCTACTTCAATCTTAT;
SEQ ID NO.10:AGGGTGGAACTTATGACTTTA;
SEQ ID NO.11:GGAAACACTTGATGACAGTAA;
SEQ ID NO.12:GGAAATGCGAATGTGTTAGTA;
SEQ ID NO.13:GAATAAGCGACAATGGTGTAA;
SEQ ID NO.14:TTTGTGCTCTTGCCATTTGAA;
SEQ ID NO.15:GGACCAGTAATCCAACATTTT;
SEQ ID NO.16:GAAACCTGGACTTCATCATTT;
SEQ ID NO.17:TATTACGCCCATACACACTAA;
SEQ ID NO.18:GAGCAGGATACTTTGGTTTTA;
SEQ ID NO.19:TCCTTTGTCTGAAGAGAGTAA;
SEQ ID NO.20:AGGCGTGTCATACTACTTATT。
the 20 DNA tag sequences obtained by implementing steps 1 to 7 on a computer or server configured as an 8-core and 16G memory take about 5 seconds in total.
Therefore, the generation method of the tag sequence can be directly realized in the intelligent terminal through a generation system, the tag sequences with specific number and in accordance with specific design principles can be generated more simply and rapidly, the specificity of the tag sequence is high, the similarity with a target genome is lower, and the cost is lower; and because the number of continuous same basic groups is not more than 3-8 and the synthesis success rate of the label sequence is 100%, the synthesis is easier.
The technical solution of the present application is particularly suitable for rapidly generating a large number of tag sequences satisfying specific conditions, but in view of space limitations, specific sequences of all synthesized tag sequences cannot be enumerated in detail herein, and the time cost and success rate results of tag sequences meeting different quantity requirements and preset principles of the present application are obtained by using the generation system of tag sequences applied to biological products and the generation method thereof of the present application, which are only summarized in table form in an exemplary manner, as shown in table 1. The length requirement of the unique subsequence, the length requirement of the tag sequence and the number requirement of the tag sequence are set according to actual requirements, and the present application is not particularly limited.
TABLE 1 time cost and success rate for obtaining tag sequences that meet the pre-set rules and meet different number requirements
Figure BDA0003116175740000081
Figure BDA0003116175740000091
It should be noted that the total consumed time data of embodiments 1 to 10 in table 1 are data when running on a computer with 8 cores (2.7GHz) and 16G memory, and if the hardware configuration of the computer is higher, the total consumed time is further shortened.
In addition, the technical scheme of the application can also be applied to the tag sequence of the protein chip. If protein probes are coupled on the microspheres and tag sequences are coupled on the microspheres, the probes correspond to the tag sequences one by one. The type of protein probe coupled to the microsphere can be known by identifying the tag sequence.
In addition, the technical scheme of the application can also be applied to DNA self-assembly. For example, some DNA self-assembled scaffolds need to bind polypeptide molecules, and some tag sequences can be designed on the scaffold. Polypeptide molecules coupled with complementary sequences of the tag sequences are then added, and these polypeptide molecules can be assembled to specific positions of the scaffold by means of the recognition tag sequences.
It will be appreciated by those skilled in the art that embodiments of the invention may be provided as a computer program product, a system, a smart terminal, or a computer-readable storage medium. Accordingly, the present invention may take the form of an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-executable program code embodied therein, the computer software product being stored on one of the storage media, and comprising instructions for causing a computer device (which may be a personal computer, a server, or a network appliance, etc.) or a processor to perform all or part of the steps of the method described herein. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing all or part of the steps of the method for generating a tag sequence for application to a biological article of manufacture of the present application.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function of all or some of the steps of the method for generating a tag sequence for application to a biological article of manufacture of the present application.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide functions for implementing all or part of the steps in the method for generating a tag sequence for application to a biological article of the present application.
The above-described embodiment of the generating system is merely illustrative, for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
It should be noted that the embodiments of the present invention have been described in terms of preferred embodiments, and not by way of limitation, and that those skilled in the art can make modifications and variations of the embodiments described above without departing from the spirit of the invention.
SEQUENCE LISTING
<110> Suzhou cable biochip technology Co., Ltd
<120> method, system, intelligent terminal and computer readable storage for generating tag sequence applied to biological product
Medium
<130> description
<160> 20
<170> PatentIn version 3.5
<210> 1
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
agagcaagaa ccctaagtta t 21
<210> 2
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
attctgtatt gcgagaggaa a 21
<210> 3
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
ccctcctact atcacattat t 21
<210> 4
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
aggtcgtctc attacacata a 21
<210> 5
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
ccttccgatt caactctatt a 21
<210> 6
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
gcttagccaa acaccaataa t 21
<210> 7
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
cttcaccagt cattcacaat a 21
<210> 8
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
ggtaaggttc tctgttgttt t 21
<210> 9
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
acgaccctac ttcaatctta t 21
<210> 10
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
agggtggaac ttatgacttt a 21
<210> 11
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 11
ggaaacactt gatgacagta a 21
<210> 12
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 12
ggaaatgcga atgtgttagt a 21
<210> 13
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 13
gaataagcga caatggtgta a 21
<210> 14
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 14
tttgtgctct tgccatttga a 21
<210> 15
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 15
ggaccagtaa tccaacattt t 21
<210> 16
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 16
gaaacctgga cttcatcatt t 21
<210> 17
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 17
tattacgccc atacacacta a 21
<210> 18
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 18
gagcaggata ctttggtttt a 21
<210> 19
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 19
tcctttgtct gaagagagta a 21
<210> 20
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 20
aggcgtgtca tactacttat t 21

Claims (10)

1. A generation method of a tag sequence applied to a biological product is characterized in that the tag sequence consists of a plurality of unique subsequences spliced end to end in sequence, and the unique subsequences and the tag sequence both accord with a preset principle, and the generation method comprises the following steps:
step S1: setting the length of the unique subsequence to be n, the length of the tag sequence to be m, wherein m and n are the number of basic units forming the unique subsequence respectively, m and n are positive integers, and m > n, and generating all unique subsequences with the length of n according to a preset principle to obtain the unique subsequence set;
step S2: randomly extracting a first unique subsequence from the set of unique subsequences, then randomly extracting a second unique subsequence, splicing the second unique subsequence to the first unique subsequence to obtain a spliced sequence, and checking whether the spliced sequence meets the preset principle; and
step S3: if yes, continuously randomly extracting subsequent unique subsequences one by one and splicing the subsequences one by one to obtain a new splicing sequence with continuously increased length, checking whether the new splicing sequence accords with the preset principle one by one and whether the length of the new splicing sequence reaches m, and storing the new splicing sequence as the label sequence when the new splicing sequence accords with the preset principle and has the length of m.
2. The method of generating a tag sequence for application to a biological product of claim 1, wherein step S2 further comprises the steps of: and if not, deleting the second unique subsequence from the splicing sequence, and then randomly extracting another unique subsequence from the unique subsequence set to serve as a new second unique subsequence until the splicing sequence conforms to the preset principle.
3. The method of claim 1, wherein the biological product is a gene chip, a protein chip or a DNA self-assembly drug carrier, the gene chip and the DNA self-assembly drug carrier have deoxynucleotides with different bases as basic units constituting the unique subsequence, and the protein chip has amino acids as basic units constituting the unique subsequence.
4. The method of claim 3, wherein the tag sequence of the gene chip is a DNA tag sequence, the DNA tag sequence comprises at least two of the unique subsequences, and the predetermined rule comprises: no more than 8 continuous identical bases, 30-60% GC content, no more than 8 bases in hairpin structure length, no more than 16 bases in self-complementary segment, and the DNA tag sequence is dissimilar to the target genome.
5. The method of generating a tag sequence for application to a biological product of claim 1, wherein step S3 further comprises the steps of: unique subsequences that are identical and complementary to all of the n-length subsequences in the tag sequence are deleted from the set of unique subsequences so that they are no longer involved in subsequent extraction and splicing.
6. The method of generating a tag sequence for application to a biological product of any one of claims 1 to 5, further comprising:
step S4: repeating steps S2-S3 until the set of unique subsequences is exhausted or the number of generated tag sequences reaches a requirement to obtain the set of tag sequences.
7. A system for generating a tag sequence for application to a biological product, comprising:
the unique subsequence module is used for setting the length of the unique subsequence to be n, the length of the tag sequence to be m, m and n are the number of basic units forming the unique subsequence respectively, m and n are positive integers, and m > n, and generating all unique subsequences with the length of n according to a preset principle, namely the unique subsequence set; and
the splicing checking module is used for randomly extracting a first unique subsequence from the set of unique subsequences, then randomly extracting a second unique subsequence, splicing the second unique subsequence to the first unique subsequence to obtain a spliced sequence, and checking whether the spliced sequence meets the preset principle; if not, the splicing checking module is further configured to delete the second unique subsequence from the spliced sequence, and then randomly extract another unique subsequence from the unique subsequence set to serve as a new second unique subsequence until the spliced sequence meets the preset rule; if yes, continuing to randomly extract subsequent unique subsequences one by one and splicing the subsequences one by one to obtain a new spliced sequence with continuously increased length, checking whether the new spliced sequence meets the preset principle one by one and whether the length of the new spliced sequence reaches m, and storing the new spliced sequence as a label sequence when the new spliced sequence meets the preset principle and has the length of m;
and the unique subsequence module and the splicing checking module are connected through a data stream.
8. The system of claim 7, wherein the assembly check module is further configured to delete all n-length subsequences and their complements in the tag sequence from the set of unique subsequences so that they are no longer involved in subsequent extraction and assembly.
9. An intelligent terminal, comprising:
a memory for storing executable program code; and
a processor for reading executable program code stored in the memory to perform the method of generating a tag sequence for application to a biological product of any one of claims 1 to 6.
10. A computer-readable storage medium, having stored thereon computer program instructions, which, when executed by a processor, implement a method of generating a tag sequence for application to a biological product as claimed in any one of claims 1 to 6.
CN202110664094.2A 2021-06-16 2021-06-16 Method and system for generating label sequence applied to biological product, intelligent terminal and computer readable storage medium Active CN113517026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110664094.2A CN113517026B (en) 2021-06-16 2021-06-16 Method and system for generating label sequence applied to biological product, intelligent terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110664094.2A CN113517026B (en) 2021-06-16 2021-06-16 Method and system for generating label sequence applied to biological product, intelligent terminal and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113517026A true CN113517026A (en) 2021-10-19
CN113517026B CN113517026B (en) 2022-08-19

Family

ID=78065617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110664094.2A Active CN113517026B (en) 2021-06-16 2021-06-16 Method and system for generating label sequence applied to biological product, intelligent terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113517026B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060281097A1 (en) * 2005-06-14 2006-12-14 Agency For Science, Technology And Research Method of processing and/or genome mapping of ditag sequences
CN102115789A (en) * 2010-12-15 2011-07-06 厦门大学 Nucleic acid label for second-generation high-flux sequencing and design method thereof
US20110281273A1 (en) * 2009-01-29 2011-11-17 Spiber Inc. Method of making dna tag
US20170247689A1 (en) * 2014-09-09 2017-08-31 Igenomx International Genomics Corporation Methods and compositions for rapid nucleic acid library preparation
US20180237950A1 (en) * 2015-02-25 2018-08-23 Jumpcode Genomics, Inc. Methods and compositions for in silico long read sequencing
CN110331187A (en) * 2019-08-12 2019-10-15 天津华大医学检验所有限公司 Combination tag, combination tag connector and its application
CN110468188A (en) * 2019-08-22 2019-11-19 广州微远基因科技有限公司 For the sequence label collection and its design method of the sequencing of two generations and application
WO2020072829A2 (en) * 2018-10-04 2020-04-09 Bluestar Genomics, Inc. Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample
CN112251422A (en) * 2020-10-21 2021-01-22 华中农业大学 Transposase complex containing unique molecular tag sequence and application thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060281097A1 (en) * 2005-06-14 2006-12-14 Agency For Science, Technology And Research Method of processing and/or genome mapping of ditag sequences
US20110281273A1 (en) * 2009-01-29 2011-11-17 Spiber Inc. Method of making dna tag
CN102115789A (en) * 2010-12-15 2011-07-06 厦门大学 Nucleic acid label for second-generation high-flux sequencing and design method thereof
US20170247689A1 (en) * 2014-09-09 2017-08-31 Igenomx International Genomics Corporation Methods and compositions for rapid nucleic acid library preparation
US20180237950A1 (en) * 2015-02-25 2018-08-23 Jumpcode Genomics, Inc. Methods and compositions for in silico long read sequencing
WO2020072829A2 (en) * 2018-10-04 2020-04-09 Bluestar Genomics, Inc. Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample
CN110331187A (en) * 2019-08-12 2019-10-15 天津华大医学检验所有限公司 Combination tag, combination tag connector and its application
CN110468188A (en) * 2019-08-22 2019-11-19 广州微远基因科技有限公司 For the sequence label collection and its design method of the sequencing of two generations and application
CN112251422A (en) * 2020-10-21 2021-01-22 华中农业大学 Transposase complex containing unique molecular tag sequence and application thereof

Also Published As

Publication number Publication date
CN113517026B (en) 2022-08-19

Similar Documents

Publication Publication Date Title
Shiraki et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage
Tang et al. Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching
Hauser et al. MMseqs software suite for fast and deep clustering and searching of large protein sequence sets
An et al. Synthesis of orthogonal transcription-translation networks
US10453559B2 (en) Method and system for rapid searching of genomic data and uses thereof
JP2020515243A (en) Nucleic acid based data storage
Chen et al. PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds
US9774351B2 (en) Method and apparatus for encoding information units in code word sequences avoiding reverse complementarity
Huang et al. Evolutionary analysis of a complete chicken genome
CN113744804B (en) Method and device for storing data by using DNA and storage equipment
WO2015176990A1 (en) Method and apparatus for storing information units in nucleic acid molecules and nucleic acid storage system
Chou et al. Tailor: a computational framework for detecting non-templated tailing of small silencing RNAs
CN105524920A (en) TN5 library building primer group for Ion Proton sequencing platform, TN5 library building kit for Ion Proton sequencing platform and library building method
CN113517026B (en) Method and system for generating label sequence applied to biological product, intelligent terminal and computer readable storage medium
Clary et al. The Drosophila mitochondrial genome.
WO2005096208A1 (en) Base sequence retrieval apparatus
CN105069325B (en) It is a kind of that matched method is carried out to nucleic acid sequence information
KR20130122816A (en) Coding apparatus and method for dna sequence
US8340917B2 (en) Sequence matching allowing for errors
WO2020052101A1 (en) Virtual pcr method for achieving sequence extension on basis of ngs read searching
CN115410651A (en) Feature vector-based high-performance gene matching discrimination method and system
Bogard et al. Multiple information carried by RNAs: total eclipse or a light at the end of the tunnel?
US20230032409A1 (en) Method for Information Encoding and Decoding, and Method for Information Storage and Interpretation
Di Giulio Why the genetic code originated: implications for the origin of protein synthesis
Planat et al. Group Theory of Messenger RNA Metabolism and Disease

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant