CN113517026A

CN113517026A - Generation method and system of label sequence applied to biological product, intelligent terminal and computer readable storage medium

Info

Publication number: CN113517026A
Application number: CN202110664094.2A
Authority: CN
Inventors: 李智; 许心意; 刘超钧
Original assignee: Suzhou Lasso Biochip Technology Co ltd
Current assignee: Suzhou Lasso Biochip Technology Co ltd
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2021-10-19
Anticipated expiration: 2041-06-16
Also published as: CN113517026B

Abstract

The invention provides a generation method, a generation system, an intelligent terminal and a computer readable storage medium of a label sequence applied to a biological product. The tag sequence is composed of a plurality of unique subsequences spliced end to end in sequence, and both the unique subsequences and the tag sequence accord with a specific preset principle. After the technical scheme is adopted, the tag sequences meeting the requirements of a specific quantity and meeting specific design principles can be generated more simply and rapidly in a large quantity, the cost is lower, the specificity is high, and the similarity with a target genome is lower.

Description

Generation method and system of label sequence applied to biological product, intelligent terminal and computer readable storage medium

Technical Field

The invention relates to the technical field of biology, in particular to a generation method and application of a tag sequence of a biological product.

Background

In bio-related products, such as biochips (including gene chips and protein chips) and DNA self-assembly vectors as drug carriers, each contain a specific tag sequence, for example, in the decoding process of preparing a gene chip, the DNA tag sequence is used for specific hybridization with a decoding sequence matched with the DNA tag sequence, so as to complete decoding of the gene chip, that is, to obtain a corresponding relationship between the types of probes and the positions of the holes on the gene chip, and the decoded gene chip can be used for subsequent gene detection. Therefore, the tag sequence has an important effect on biological products such as biochips and DNA self-assembly drug carriers.

However, the tag sequences applied to biological products in the prior art have the defects of poor specificity, complicated generation operation and long time consumption.

Disclosure of Invention

In order to overcome the technical defects of poor specificity, complex generation operation and long time consumption of a label sequence applied to a biochip in the prior art, a first aspect of the present invention provides a method for generating a label sequence applied to a biological product, wherein the label sequence is composed of a plurality of unique subsequences spliced end to end in sequence, and both the unique subsequences and the label sequence conform to a preset principle, and the method comprises:

step S1: setting the length of the unique subsequence to be n, the length of the tag sequence to be m, wherein m and n are the number of basic units forming the unique subsequence respectively, m and n are positive integers, and m > n, and generating all unique subsequences with the length of n according to a preset principle to obtain the unique subsequence set; and

step S2: randomly extracting a first unique subsequence from the set of unique subsequences, then randomly extracting a second unique subsequence, splicing the second unique subsequence to the first unique subsequence to obtain a spliced sequence, and checking whether the spliced sequence meets the preset principle;

step S3: if yes, continuously randomly extracting subsequent unique subsequences one by one and splicing the subsequences one by one to obtain a new splicing sequence with continuously increased length, checking whether the new splicing sequence accords with the preset principle one by one and whether the length of the new splicing sequence reaches m, and storing the new splicing sequence as the label sequence when the new splicing sequence accords with the preset principle and has the length of m.

Optionally, m is an integer multiple of n. Optionally, m is not an integer multiple of n, for example, m is 8, n is 3, then the splicing is stopped when the length is 9, the last 1 basic unit is cut off, and then whether the remaining sequence with the length of 8 meets the preset rule is checked, and if yes, the remaining sequence is stored as a tag sequence.

Illustratively, 2 ≦ m/n ≦ 5. Illustratively, the unique subsequence is 7 bases in length, and the DNA tag sequence is 21 bases in length, m. Preferably, the nucleotide sequence of the DNA tag sequence is shown in SEQ ID NO. 1-SEQ ID NO. 20:

SEQ ID NO.1：AGAGCAAGAACCCTAAGTTAT；

SEQ ID NO.2：ATTCTGTATTGCGAGAGGAAA；

SEQ ID NO.3：CCCTCCTACTATCACATTATT；

SEQ ID NO.4：AGGTCGTCTCATTACACATAA；

SEQ ID NO.5：CCTTCCGATTCAACTCTATTA；

SEQ ID NO.6：GCTTAGCCAAACACCAATAAT；

SEQ ID NO.7：CTTCACCAGTCATTCACAATA；

SEQ ID NO.8：GGTAAGGTTCTCTGTTGTTTT；

SEQ ID NO.9：ACGACCCTACTTCAATCTTAT；

SEQ ID NO.10：AGGGTGGAACTTATGACTTTA；

SEQ ID NO.11：GGAAACACTTGATGACAGTAA；

SEQ ID NO.12：GGAAATGCGAATGTGTTAGTA；

SEQ ID NO.13：GAATAAGCGACAATGGTGTAA；

SEQ ID NO.14：TTTGTGCTCTTGCCATTTGAA；

SEQ ID NO.15：GGACCAGTAATCCAACATTTT；

SEQ ID NO.16：GAAACCTGGACTTCATCATTT；

SEQ ID NO.17：TATTACGCCCATACACACTAA；

SEQ ID NO.18：GAGCAGGATACTTTGGTTTTA；

SEQ ID NO.19：TCCTTTGTCTGAAGAGAGTAA；

SEQ ID NO.20：AGGCGTGTCATACTACTTATT。

further, step S2 further includes the steps of: and if not, deleting the second unique subsequence from the splicing sequence, and then randomly extracting another unique subsequence from the unique subsequence set to serve as a new second unique subsequence.

Further, the biological product is a gene chip, a protein chip or a DNA self-assembly drug carrier, basic units forming the unique subsequence in the gene chip and the DNA self-assembly drug carrier are deoxynucleotides with different bases, and basic units forming the unique subsequence in the protein chip are amino acids. The application one is as follows: a gene chip. The design of label sequence and probe is adopted, and the label sequence corresponds to the probe one by one. The type of the probe can be identified by identifying the tag sequence. The application II comprises the following steps: a protein chip. If protein probes are coupled on the microspheres and tag sequences are coupled on the microspheres, the probes correspond to the tag sequences one by one. The type of protein probe coupled to the microsphere can be known by identifying the tag sequence. The application is as follows: and (3) self-assembling the DNA. For example, some DNA self-assembled scaffolds need to bind polypeptide molecules, and some tag sequences can be designed on the scaffold. Polypeptide molecules coupled with complementary sequences of the tag sequences are then added, and these polypeptide molecules can be assembled to specific positions of the scaffold by means of the recognition tag sequences.

Further, the tag sequence of the gene chip is a DNA tag sequence, the DNA tag sequence includes at least two unique subsequences, and the preset rule includes: no more than 8 consecutive identical bases, a GC content of 30-60%, a hairpin structure no more than 8 bases in length, no more than 16 bases in self-complementary segment, and the DNA tag sequence is dissimilar to that of the target genome. "dissimilar" means that no match with the tag sequence E value less than 0.05 can be found in the target genome using BLAST. Illustratively, the target genome is a human genome or a mouse genome or the like.

Further, in the predetermined rule, the consecutive identical bases are not more than 3, the hairpin structure is not more than 3 bases long, and the self-complementary segment is not more than 6 bases.

Further, step S3 further includes the steps of: unique subsequences that are identical and complementary to all of the n-length subsequences in the tag sequence are deleted from the set of unique subsequences so that they are no longer involved in subsequent extraction and splicing. The term "all subsequences of length n" as used herein refers to all subsequences of length n in the tag sequence, and is not limited to the unique subsequences that make up the tag sequence, but also includes other subsequences of length n in the tag sequence.

The method for generating the tag sequence applied to the biological product further comprises the following steps: step S4: repeating steps S2-S3 until the set of unique subsequences is exhausted or the number of generated tag sequences reaches a requirement to obtain the set of tag sequences. The set of tag sequences generated using this method may satisfy the requirement: wherein, any unique subsequence with length n of any label sequence and complementary sequence thereof only appear once in the set of label sequences, thereby generating as many label sequences as possible and simultaneously ensuring the specificity of the label sequences to the maximum extent.

A second aspect of the present invention provides a system for generating a tag sequence for application to a biological product, comprising:

the unique subsequence module is used for setting the length of the unique subsequence to be n, the length of the tag sequence to be m, m and n are the number of basic units forming the unique subsequence respectively, m and n are positive integers, and m > n, and generating all unique subsequences with the length of n according to a preset principle, namely the unique subsequence set; and

the splicing checking module is used for randomly extracting a first unique subsequence from the set of unique subsequences, then randomly extracting a second unique subsequence, splicing the second unique subsequence to the first unique subsequence to obtain a spliced sequence, and checking whether the spliced sequence meets the preset principle; if not, the splicing checking module is further configured to delete the second unique subsequence from the spliced sequence, and then randomly extract another unique subsequence from the unique subsequence set to serve as a new second unique subsequence until the spliced sequence meets the preset rule; if yes, continuing to randomly extract subsequent unique subsequences one by one and splicing the subsequences one by one to obtain a new spliced sequence with the length increasing continuously, checking whether the new spliced sequence conforms to the preset principle one by one and whether the length of the new spliced sequence reaches m, and storing the new spliced sequence as the label sequence when the new spliced sequence conforms to the preset principle and has the length of m;

and the unique subsequence module and the splicing checking module are connected through a data stream.

Further, the splice check module is further configured to delete from the set of unique subsequences the unique subsequences that are identical and complementary to all of the n-length subsequences in the tag sequence so that they are no longer involved in subsequent extraction and splicing. Preferably, the concatenation checking module is further configured to repeat the concatenation and checking steps until the set of unique sub-sequences is used up or the number of generated tag sequences reaches a requirement, so as to obtain the set of tag sequences. The set of tag sequences generated using this method may satisfy the requirement: wherein, any unique subsequence with length n of any label sequence and complementary sequence thereof only appear once in the set of label sequences, thereby generating as many label sequences as possible and simultaneously ensuring the specificity of the label sequences to the maximum extent.

Illustratively, the DNA tag sequence comprises at least two of the unique subsequences, and the predetermined rule comprises: no more than 8 consecutive identical bases, a GC content of 30-60%, a hairpin structure no more than 8 bases in length, no more than 16 bases in self-complementary segment, and the DNA tag sequence is dissimilar to that of the target genome. "dissimilar" means that no match with the tag sequence E value less than 0.05 can be found in the target genome using BLAST. Illustratively, the target genome is a human genome or a mouse genome or the like.

A third aspect of the present invention provides an intelligent terminal, including:

a memory for storing executable program code; and

a processor for reading the executable program code stored in the memory to perform the above-described method of generating a tag sequence for application to a biological product. The intelligent terminal includes but is not limited to a PC, a portable computer, a mobile terminal and other devices having display and processing functions.

A fourth aspect of the invention provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, carry out the steps of the above-described method of generating a tag sequence for application to a biological product. The computer-readable storage medium includes, but is not limited to: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

After the technical scheme is adopted, compared with the prior art, the method has the following beneficial effects:

1. the generation method of the tag sequence can be directly realized in the intelligent terminal through the generation system, can generate the tag sequences with specific quantity and in accordance with specific design principles more simply and rapidly, and is lower in cost; and because the number of continuous same basic groups is not more than 3-8 and the synthesis success rate of the label sequence is 100%, the label is easy to synthesize.

2. The set of tag sequences generated by using the technical scheme of the application can meet the requirements: the unique subsequence with any length of n of any tag sequence and the complementary sequence thereof only appear once in the set, so that the specificity of the tag sequence is ensured to the maximum extent while generating as many tag sequences as possible, and the similarity with a target genome is lower. The technical scheme of the application is particularly suitable for rapidly generating a large number of label sequences meeting specific conditions.

Drawings

Fig. 1 is a block diagram of a system for generating a tag sequence applied to a biological product according to an embodiment of the present application.

Detailed Description

The advantages of the invention are further illustrated in the following description of specific embodiments in conjunction with the accompanying drawings. It is to be understood by persons skilled in the art that the following detailed description is illustrative and not restrictive, and is not to be taken as limiting the scope of the invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the following description, suffixes such as "module" used to represent elements are used only for facilitating the explanation of the present invention and have no specific meaning in itself.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Examples

Illustratively, 20 DNA tag sequences conforming to the preset rule and comprising 3 unique subsequences are generated by using the generation system and the generation method of the tag sequences applied to the biological products, the length n of the unique subsequences is 7, the length m of the DNA tag sequences is 21, and the DNA tag sequences are applied to the gene chip.

The tag sequence consists of three unique subsequences which are sequentially spliced end to end, and both the unique subsequences and the tag sequence accord with the following preset principles: no more than 8 contiguous identical bases, a GC content of 30% to 60%, no more than 8 bases in hairpin structure length, no more than 16 bases in self-complementary segment, and the DNA tag sequence is dissimilar to the human genome.

The module structure of the generation method and generation system of the tag sequence applied to the biological product of the present application is shown in fig. 1.

Specifically, the label sequence applied to the gene chip is generated in a computer through a DNA label sequence generating system, the DNA label sequence generating system comprises a unique subsequence module and a splicing checking module, and the unique subsequence module and the splicing checking module are connected through a data stream. The method for generating the DNA label sequence comprises the following steps:

step 1: generating a set of unique subsequences: generating a unique subsequence set according to a preset principle, wherein the length of each unique subsequence is n: illustratively, the unique subsequence module sets the length n of the unique subsequence to be 7 bases and the length m of the tag sequence to be 21 bases, and generates all unique subsequences with the length of 7 bases according to a preset principle, so as to obtain the unique subsequence set;

step 2: randomly extracting a unique subsequence: randomly extracting a first unique subsequence from the unique subsequence set through a splicing checking module, and then randomly extracting a second unique subsequence;

and step 3: generating a splicing sequence: splicing one end of the second unique subsequence with the length of 7 bases to one end of the first unique subsequence with the length of 7 bases, namely, connecting the two ends of the second unique subsequence in series in an end-to-end mode to obtain a spliced sequence (at the moment, the length of the spliced sequence is 14 bases);

and 4, step 4: checking whether the splicing sequence conforms to a preset principle: checking whether the splicing sequence obtained in the step 3 meets the preset principle, if not, deleting the second unique subsequence from the splicing sequence and repeating the steps 2 to 3; if yes, entering step 5;

and 5: check if the splice sequence reaches length m: further checking whether the length of the splicing sequence reaches m, and if not, repeating the step 2 to the step 4; if yes, entering step 6;

in other words, each time a new spliced sequence is obtained, the new spliced sequence needs to be checked for two pieces of information, namely whether the new spliced sequence meets the predetermined criteria, and whether the length of the new spliced sequence reaches m (for example, 21 bases). If the length of the label accords with the preset principle and reaches m, successfully obtaining a label sequence meeting the requirement; and if the length of the optical fiber meets the preset principle but does not reach m, continuing to perform the steps of random extraction, splicing and checking.

Step 6: the tag sequence was saved: storing the splicing sequence which accords with the preset principle and has the length of m as a label sequence;

and 7: deletion of identical and complementary unique subsequences: deleting unique subsequences which are identical to and complementary to all subsequences with the length of 7 bases in the tag sequence obtained in the step 6 from the unique subsequence set so that the unique subsequences do not participate in subsequent extraction and splicing;

and 8: generating a set of tag sequences: repeating the steps 2 to 7 to continuously obtain a new label sequence which accords with a preset principle and has the length of m;

and step 9: and (4) terminating: all flows are terminated when the number of all generated tag sequences meets the requirements or the available unique subsequences in the set of unique subsequences have been exhausted.

Illustratively, the nucleotide sequences of the generated 20 DNA tag sequences which conform to the preset principle and have the length of 21 are shown in SEQ ID NO. 1-SEQ ID NO. 20:

SEQ ID NO.1：AGAGCAAGAACCCTAAGTTAT；

SEQ ID NO.2：ATTCTGTATTGCGAGAGGAAA；

SEQ ID NO.3：CCCTCCTACTATCACATTATT；

SEQ ID NO.4：AGGTCGTCTCATTACACATAA；

SEQ ID NO.5：CCTTCCGATTCAACTCTATTA；

SEQ ID NO.6：GCTTAGCCAAACACCAATAAT；

SEQ ID NO.7：CTTCACCAGTCATTCACAATA；

SEQ ID NO.8：GGTAAGGTTCTCTGTTGTTTT；

SEQ ID NO.9：ACGACCCTACTTCAATCTTAT；

SEQ ID NO.10：AGGGTGGAACTTATGACTTTA；

SEQ ID NO.11：GGAAACACTTGATGACAGTAA；

SEQ ID NO.12：GGAAATGCGAATGTGTTAGTA；

SEQ ID NO.13：GAATAAGCGACAATGGTGTAA；

SEQ ID NO.14：TTTGTGCTCTTGCCATTTGAA；

SEQ ID NO.15：GGACCAGTAATCCAACATTTT；

SEQ ID NO.16：GAAACCTGGACTTCATCATTT；

SEQ ID NO.17：TATTACGCCCATACACACTAA；

SEQ ID NO.18：GAGCAGGATACTTTGGTTTTA；

SEQ ID NO.19：TCCTTTGTCTGAAGAGAGTAA；

SEQ ID NO.20：AGGCGTGTCATACTACTTATT。

the 20 DNA tag sequences obtained by implementing steps 1 to 7 on a computer or server configured as an 8-core and 16G memory take about 5 seconds in total.

Therefore, the generation method of the tag sequence can be directly realized in the intelligent terminal through a generation system, the tag sequences with specific number and in accordance with specific design principles can be generated more simply and rapidly, the specificity of the tag sequence is high, the similarity with a target genome is lower, and the cost is lower; and because the number of continuous same basic groups is not more than 3-8 and the synthesis success rate of the label sequence is 100%, the synthesis is easier.

The technical solution of the present application is particularly suitable for rapidly generating a large number of tag sequences satisfying specific conditions, but in view of space limitations, specific sequences of all synthesized tag sequences cannot be enumerated in detail herein, and the time cost and success rate results of tag sequences meeting different quantity requirements and preset principles of the present application are obtained by using the generation system of tag sequences applied to biological products and the generation method thereof of the present application, which are only summarized in table form in an exemplary manner, as shown in table 1. The length requirement of the unique subsequence, the length requirement of the tag sequence and the number requirement of the tag sequence are set according to actual requirements, and the present application is not particularly limited.

TABLE 1 time cost and success rate for obtaining tag sequences that meet the pre-set rules and meet different number requirements

It should be noted that the total consumed time data of embodiments 1 to 10 in table 1 are data when running on a computer with 8 cores (2.7GHz) and 16G memory, and if the hardware configuration of the computer is higher, the total consumed time is further shortened.

In addition, the technical scheme of the application can also be applied to the tag sequence of the protein chip. If protein probes are coupled on the microspheres and tag sequences are coupled on the microspheres, the probes correspond to the tag sequences one by one. The type of protein probe coupled to the microsphere can be known by identifying the tag sequence.

In addition, the technical scheme of the application can also be applied to DNA self-assembly. For example, some DNA self-assembled scaffolds need to bind polypeptide molecules, and some tag sequences can be designed on the scaffold. Polypeptide molecules coupled with complementary sequences of the tag sequences are then added, and these polypeptide molecules can be assembled to specific positions of the scaffold by means of the recognition tag sequences.

It will be appreciated by those skilled in the art that embodiments of the invention may be provided as a computer program product, a system, a smart terminal, or a computer-readable storage medium. Accordingly, the present invention may take the form of an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-executable program code embodied therein, the computer software product being stored on one of the storage media, and comprising instructions for causing a computer device (which may be a personal computer, a server, or a network appliance, etc.) or a processor to perform all or part of the steps of the method described herein. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing all or part of the steps of the method for generating a tag sequence for application to a biological article of manufacture of the present application.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function of all or some of the steps of the method for generating a tag sequence for application to a biological article of manufacture of the present application.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions that execute on the computer or other programmable apparatus provide functions for implementing all or part of the steps in the method for generating a tag sequence for application to a biological article of the present application.

The above-described embodiment of the generating system is merely illustrative, for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiment. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

It should be noted that the embodiments of the present invention have been described in terms of preferred embodiments, and not by way of limitation, and that those skilled in the art can make modifications and variations of the embodiments described above without departing from the spirit of the invention.

SEQUENCE LISTING

<110> Suzhou cable biochip technology Co., Ltd

<120> method, system, intelligent terminal and computer readable storage for generating tag sequence applied to biological product

Medium

<130> description

<160> 20

<170> PatentIn version 3.5

<210> 1

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 1

agagcaagaa ccctaagtta t 21

<210> 2

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 2

attctgtatt gcgagaggaa a 21

<210> 3

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 3

ccctcctact atcacattat t 21

<210> 4

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 4

aggtcgtctc attacacata a 21

<210> 5

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 5

ccttccgatt caactctatt a 21

<210> 6

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 6

gcttagccaa acaccaataa t 21

<210> 7

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 7

cttcaccagt cattcacaat a 21

<210> 8

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 8

ggtaaggttc tctgttgttt t 21

<210> 9

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 9

acgaccctac ttcaatctta t 21

<210> 10

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 10

agggtggaac ttatgacttt a 21

<210> 11

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 11

ggaaacactt gatgacagta a 21

<210> 12

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 12

ggaaatgcga atgtgttagt a 21

<210> 13

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 13

gaataagcga caatggtgta a 21

<210> 14

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 14

tttgtgctct tgccatttga a 21

<210> 15

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 15

ggaccagtaa tccaacattt t 21

<210> 16

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 16

gaaacctgga cttcatcatt t 21

<210> 17

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 17

tattacgccc atacacacta a 21

<210> 18

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 18

gagcaggata ctttggtttt a 21

<210> 19

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 19

tcctttgtct gaagagagta a 21

<210> 20

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 20

aggcgtgtca tactacttat t 21

Claims

1. A generation method of a tag sequence applied to a biological product is characterized in that the tag sequence consists of a plurality of unique subsequences spliced end to end in sequence, and the unique subsequences and the tag sequence both accord with a preset principle, and the generation method comprises the following steps:

step S1: setting the length of the unique subsequence to be n, the length of the tag sequence to be m, wherein m and n are the number of basic units forming the unique subsequence respectively, m and n are positive integers, and m > n, and generating all unique subsequences with the length of n according to a preset principle to obtain the unique subsequence set;

step S2: randomly extracting a first unique subsequence from the set of unique subsequences, then randomly extracting a second unique subsequence, splicing the second unique subsequence to the first unique subsequence to obtain a spliced sequence, and checking whether the spliced sequence meets the preset principle; and

2. The method of generating a tag sequence for application to a biological product of claim 1, wherein step S2 further comprises the steps of: and if not, deleting the second unique subsequence from the splicing sequence, and then randomly extracting another unique subsequence from the unique subsequence set to serve as a new second unique subsequence until the splicing sequence conforms to the preset principle.

3. The method of claim 1, wherein the biological product is a gene chip, a protein chip or a DNA self-assembly drug carrier, the gene chip and the DNA self-assembly drug carrier have deoxynucleotides with different bases as basic units constituting the unique subsequence, and the protein chip has amino acids as basic units constituting the unique subsequence.

4. The method of claim 3, wherein the tag sequence of the gene chip is a DNA tag sequence, the DNA tag sequence comprises at least two of the unique subsequences, and the predetermined rule comprises: no more than 8 continuous identical bases, 30-60% GC content, no more than 8 bases in hairpin structure length, no more than 16 bases in self-complementary segment, and the DNA tag sequence is dissimilar to the target genome.

5. The method of generating a tag sequence for application to a biological product of claim 1, wherein step S3 further comprises the steps of: unique subsequences that are identical and complementary to all of the n-length subsequences in the tag sequence are deleted from the set of unique subsequences so that they are no longer involved in subsequent extraction and splicing.

6. The method of generating a tag sequence for application to a biological product of any one of claims 1 to 5, further comprising:

step S4: repeating steps S2-S3 until the set of unique subsequences is exhausted or the number of generated tag sequences reaches a requirement to obtain the set of tag sequences.

7. A system for generating a tag sequence for application to a biological product, comprising:

the splicing checking module is used for randomly extracting a first unique subsequence from the set of unique subsequences, then randomly extracting a second unique subsequence, splicing the second unique subsequence to the first unique subsequence to obtain a spliced sequence, and checking whether the spliced sequence meets the preset principle; if not, the splicing checking module is further configured to delete the second unique subsequence from the spliced sequence, and then randomly extract another unique subsequence from the unique subsequence set to serve as a new second unique subsequence until the spliced sequence meets the preset rule; if yes, continuing to randomly extract subsequent unique subsequences one by one and splicing the subsequences one by one to obtain a new spliced sequence with continuously increased length, checking whether the new spliced sequence meets the preset principle one by one and whether the length of the new spliced sequence reaches m, and storing the new spliced sequence as a label sequence when the new spliced sequence meets the preset principle and has the length of m;

8. The system of claim 7, wherein the assembly check module is further configured to delete all n-length subsequences and their complements in the tag sequence from the set of unique subsequences so that they are no longer involved in subsequent extraction and assembly.

9. An intelligent terminal, comprising:

a memory for storing executable program code; and

a processor for reading executable program code stored in the memory to perform the method of generating a tag sequence for application to a biological product of any one of claims 1 to 6.

10. A computer-readable storage medium, having stored thereon computer program instructions, which, when executed by a processor, implement a method of generating a tag sequence for application to a biological product as claimed in any one of claims 1 to 6.