CN113380322A - Artificial nucleic acid sequence watermark encoding system, watermark character string and encoding and decoding method - Google Patents
Artificial nucleic acid sequence watermark encoding system, watermark character string and encoding and decoding method Download PDFInfo
- Publication number
- CN113380322A CN113380322A CN202110716040.6A CN202110716040A CN113380322A CN 113380322 A CN113380322 A CN 113380322A CN 202110716040 A CN202110716040 A CN 202110716040A CN 113380322 A CN113380322 A CN 113380322A
- Authority
- CN
- China
- Prior art keywords
- nucleic acid
- watermark
- acid sequence
- information
- encoding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 150000007523 nucleic acids Chemical group 0.000 title claims abstract description 229
- 108091028043 Nucleic acid sequence Proteins 0.000 title claims abstract description 169
- 238000000034 method Methods 0.000 title claims abstract description 75
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 43
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 43
- 238000012163 sequencing technique Methods 0.000 claims abstract description 26
- 238000006243 chemical reaction Methods 0.000 claims description 22
- 230000000295 complement effect Effects 0.000 claims description 18
- 238000011144 upstream manufacturing Methods 0.000 claims description 16
- 239000012634 fragment Substances 0.000 claims description 14
- 238000007480 sanger sequencing Methods 0.000 claims description 14
- 238000013507 mapping Methods 0.000 claims description 13
- 239000003550 marker Substances 0.000 claims description 12
- 210000004436 artificial bacterial chromosome Anatomy 0.000 claims description 5
- 210000001106 artificial yeast chromosome Anatomy 0.000 claims description 5
- 210000003763 chloroplast Anatomy 0.000 claims description 5
- 230000002438 mitochondrial effect Effects 0.000 claims description 5
- 238000007702 DNA assembly Methods 0.000 claims description 4
- 108020005091 Replication Origin Proteins 0.000 claims description 4
- 108700008625 Reporter Genes Proteins 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 4
- 241000894007 species Species 0.000 claims description 4
- 238000005520 cutting process Methods 0.000 claims description 3
- 229910002059 quaternary alloy Inorganic materials 0.000 claims description 3
- 108090000623 proteins and genes Proteins 0.000 abstract description 8
- 238000003860 storage Methods 0.000 abstract description 3
- 102000004169 proteins and genes Human genes 0.000 abstract 1
- 239000013598 vector Substances 0.000 description 19
- 239000013612 plasmid Substances 0.000 description 14
- 238000013461 design Methods 0.000 description 7
- 238000004519 manufacturing process Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 4
- 238000010353 genetic engineering Methods 0.000 description 4
- 238000012772 sequence design Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 108020004414 DNA Proteins 0.000 description 2
- 238000012270 DNA recombination Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000013600 plasmid vector Substances 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000000407 epitaxy Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000009261 transgenic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Genetics & Genomics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioethics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Editing Of Facsimile Originals (AREA)
Abstract
The invention provides an artificial nucleic acid sequence watermark encoding system, a watermark character string and encoding and decoding methods. Wherein the artificial nucleic acid sequence watermark encoding system comprises: a variable encoding section and at least one watermark string connected to the variable encoding section; the variable coded segment includes at least one information unit; the information unit comprises a nucleic acid characteristic part and a numbering part which is connected with the nucleic acid characteristic part and corresponds to the nucleic acid characteristic part. The coding system provided by the invention can be used as a watermark sequence to protect intellectual property rights of gene products, and the corresponding sequence is determined according to the nucleic acid characteristics, so that the storage capacity and the sequencing efficiency are greatly improved.
Description
Technical Field
The invention belongs to the technical field of artificial nucleic acid sequences, and particularly relates to an artificial nucleic acid sequence watermark encoding system, a watermark character string and encoding and decoding methods.
Background
A DNA sequence or gene sequence is the primary structure of a real or hypothetical DNA molecule carrying genetic information, represented using a string of letters. The genetic engineering refers to strict design according to the desire of people, and endows organisms with new genetic characteristics through in vitro DNA recombination and transgenic technology, thereby creating new organism types and biological products which are more in line with the needs of people. Genetic engineering is the design and construction at the DNA molecular level, also known as DNA recombination techniques.
Plasmid vectors are plasmids that have been artificially constructed for adaptation to laboratory procedures on the basis of native plasmids. Compared with the natural plasmid, the plasmid vector usually has one or more selective marker genes (such as antibiotic resistance genes) and a synthetic multiple cloning site sequence containing multiple restriction enzyme recognition sites, and most of non-essential sequences are removed, so that the molecular weight is reduced as much as possible, thereby facilitating the genetic engineering operation.
The intellectual property of artificial nucleic acid sequences for genetic engineering, such as the above-mentioned vector plasmids, should be attributed to the company or laboratory that designed the sequence. However, the artificial nucleic acid sequence is often formed by combining the disclosed nucleic acid sequences, lacks specific intellectual property attribution information, and is difficult to protect intellectual property.
In addition, artificial nucleic acid sequences tend to be longer, as vector plasmids typically have DNA sequences of 3k-10 kbbp. If the artificial nucleic acid sequence needs to be identified, sequencing technology is often adopted to obtain the complete functional module. Taking the most common Sanger sequencing as an example, a single Sanger sequencing reaction can only effectively cover a sequence of 300-600bp, and 5-20 Sanger reactions are needed for completely identifying a vector plasmid sequence, so the cost is high.
Disclosure of Invention
The invention provides an artificial nucleic acid sequence watermark encoding system, a watermark character string and encoding and decoding methods, which aim to overcome the defects in the prior art.
In order to solve the above problems, the present invention provides an artificial nucleic acid sequence watermark encoding system, comprising:
a variable encoding section and at least one watermark string connected to the variable encoding section; wherein the variable coding section is arranged at a midstream position in the artificial nucleic acid sequence watermark coding system, and the watermark character string is arranged at an upstream position and/or a downstream position in the artificial nucleic acid sequence watermark coding system;
the variable coded segment includes at least one information unit;
the information unit comprises a nucleic acid characteristic part and a numbering part which is connected with the nucleic acid characteristic part and corresponds to the nucleic acid characteristic part.
Preferably, the watermark string length is not less than 3;
in the variable coding segment, a plurality of the information units are arranged in 5 'to 3' order for indicating the kind of functional fragment and the corresponding number on the artificial nucleic acid fragment.
Preferably, the nucleic acid features in each of the information elements are used to indicate DNA assembly methods and interface information, screening marker information, replication origin information, reporter gene information, insert backbone species information, insert promoter information, insert RBS information, and insert CDS information or insert terminator information.
Preferably, the method further comprises a usage scene segment for representing the end of the artificial nucleic acid sequence watermark encoding system and a usage scene;
the usage scene segment is connected with the variable coding segment;
the usage scene segment is arranged at an upstream position or a downstream position in the artificial nucleic acid sequence watermark encoding system;
the variable encoding section is provided at a midstream position between the usage scene and/or the watermark string.
Preferably, the usage scenario segment comprises a bacterial artificial chromosome marker, a yeast artificial chromosome marker, a nuclear genome marker, a mitochondrial genome marker and/or a chloroplast genome marker RNA fragment marker for identifying the usage scenario.
In addition, in order to solve the above problems, the present application further provides a watermark string in an artificial nucleic acid sequence watermark encoding system, wherein the length of the watermark string is not less than 15 bp/nt; and the watermark character string can be used as a general PCR primer sequence to amplify and sequence the artificial nucleic acid fragment.
Preferably, the watermark string, the nucleic acid sequence thereof, the reverse complement of the nucleic acid sequence, the sequence itself of the RNA sequence and/or the reverse complement of the RNA sequence and/or the subsequence thereof, is a sequence having a similarity of not less than 95% to the nucleic acid sequence, the reverse complement of the RNA sequence and the reverse complement of the RNA sequence; wherein,
the nucleic acid sequence is:
the reverse complement of the nucleic acid sequence is:
the RNA sequence is:
the reverse complement of the RNA sequence is:
in addition, to solve the above problems, the present application further provides an encoding and decoding method of an artificial nucleic acid sequence watermark encoding system, comprising:
the coding method comprises the following steps:
presetting a watermark character string, a variable coding section and a use scene section;
encoding the watermark character string, the variable encoding section and the use scene section to obtain a target artificial nucleic acid sequence;
the decoding method comprises the following steps:
performing Sanger sequencing on the target artificial nucleic acid sequence to obtain a sequencing result;
acquiring the watermark character strings, variable coding sections and nucleic acid sequences corresponding to the use scene sections at the upstream position, the midstream position and the downstream position from the sequencing result;
and performing information interpretation conversion on the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section to obtain decoding information.
Preferably, in the encoding method, "encoding the watermark string, the variable encoding section, and the usage scenario section to obtain the target artificial nucleic acid sequence" includes:
converting each character in the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section into an ASCII byte coded eight-bit binary number to obtain a first information set;
converting the first information set into a four-bit quaternion according to a mapping of 00 ═ A, 01 ═ G, 10 ═ C, 11 ═ T or U, and obtaining the target artificial nucleic acid sequence;
in the decoding method, "performing information interpretation conversion on the nucleic acid sequences corresponding to the watermark character string, the variable encoding section, and the usage scene section to obtain decoding information" includes:
converting the nucleic acid sequences corresponding to the watermark character string, the variable coding segment and the use scene segment into binary digital characters according to the rule of A-00, G-01, C-10, T or U-11 to obtain a second information set;
and encoding the second information set by using ASCII (American standard code for information interchange) with each 8 bits as a group, and converting the second information set into character strings to obtain decoding information corresponding to the target artificial nucleic acid sequence.
Preferably, in the encoding method, "encoding the watermark string, the variable encoding section, and the usage scenario section to obtain the target artificial nucleic acid sequence" includes:
respectively acquiring non-numeric characters and decimal numeric characters in the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section;
converting the non-numeric characters into ASCII byte coded eight-bit binary numbers to obtain a third information set;
converting the third information set into a four-bit quaternion according to a rule mapping of 00 ═ A, 01 ═ G, 10 ═ C, 11 ═ T or U, and obtaining a fourth information set;
performing quaternary conversion on the decimal numeric characters; according to the fixed digit number of the quaternary number converted in the quaternary system, the front end of the digit which is not enough for the fixed digit number after the quaternary conversion reaches the fixed digit number by 0; obtaining a fifth information set;
mapping the fifth information set according to a rule of 0 ═ A, 1 ═ G, 2 ═ C, 3 ═ T or U, and converting the fifth information set into decimal numbers to obtain the target artificial nucleic acid sequence;
in the decoding method, "performing information interpretation conversion on the nucleic acid sequences corresponding to the watermark character string, the variable encoding section, and the usage scene section to obtain decoding information" includes:
cutting the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section into subsequences with the length of 4+ n, and taking each subsequence as a decoding unit; wherein n is a fixed digit of a quaternary number converted from quaternary;
converting the nucleic acid sequence of the first four bits in each decoding unit into eight-bit binary ASCII codes according to the rule of A00, G01, C10, T or U11, and converting the eight-bit binary ASCII codes into characters to obtain a sixth information set; and,
converting the nucleic acid sequence of the last n bits in each decoding unit into a quaternary number according to the rule of A ═ 0, G ═ 1, C ═ 2, T or U ═ 3, and then converting the quaternary number into a decimal number to obtain a seventh information set;
and connecting the sixth information set and the seventh information set in all decoding units to obtain the decoding information.
The invention provides an artificial nucleic acid sequence watermark encoding system, a watermark character string and encoding and decoding methods. Wherein the artificial nucleic acid sequence watermark encoding system comprises: the method comprises the following steps: a variable encoding section and at least one watermark string connected to the variable encoding section; wherein the variable coding section is arranged at a midstream position in the artificial nucleic acid sequence watermark coding system, and the watermark character string is arranged at an upstream position and/or a downstream position in the artificial nucleic acid sequence watermark coding system; the variable coded segment includes at least one information unit; the information unit comprises a nucleic acid characteristic part and a numbering part which is connected with the nucleic acid characteristic part and corresponds to the nucleic acid characteristic part. The invention provides an encoding and decoding method of an artificial nucleic acid sequence watermark encoding system, which arranges a variable encoding section, a watermark character string and a relative position of a use scene section through artificial design, fixes the variable encoding section at a midstream position, fixes other watermark character strings and the like at the upstream and downstream of two sides, converts the artificial nucleic acid sequence watermark encoding system into a nucleic acid sequence, has no high homology with a known natural biological genome, can be used as a watermark sequence, and can be used for identifying the artificial nucleic acid sequence design of a specific unit or a laboratory to protect intellectual property rights of gene products; in addition, the length of the artificial nucleic acid sequence watermark encoding system is between 100-300bp, the artificial nucleic acid sequence watermark encoding system can be covered by a Sanger sequencing reaction, the complete artificial nucleic acid sequence does not need to be sequenced, and the component composition of the artificial nucleic acid fragment can be obtained only by one-time Sanger sequencing. The complete sequence of the artificial nucleic acid fragment can also be deduced by combining the preset mapping relationship among the nucleic acid characteristic part (assembly method, etc.), the numbering part (element library number) and the sequence.
Detailed Description
The technical solution of the present invention is further described in detail by way of the following specific embodiments, but the present invention is not limited thereto, and any limited number of modifications made by anyone within the scope of the claims of the present invention are still within the scope of the claims of the present invention.
The embodiment provides an artificial nucleic acid sequence watermark encoding system, which comprises:
a variable encoding section and at least one watermark string connected to the variable encoding section; wherein the variable coding section is arranged at a midstream position in the artificial nucleic acid sequence watermark coding system, and the watermark character string is arranged at an upstream position and/or a downstream position in the artificial nucleic acid sequence watermark coding system;
the variable coded segment includes at least one information unit;
the information unit comprises a nucleic acid characteristic part and a numbering part which is connected with the nucleic acid characteristic part and corresponds to the nucleic acid characteristic part.
The artificial nucleic acid sequence watermark encoding system has the overall length of 100-300bp, and the overall length can be an artificial nucleic acid fragment, and can include but is not limited to a vector plasmid; the partial fragment of the plasmid may be a BAC fragment, a YAC fragment, a nucleic acid fragment inserted into the genome of the organism, or an RNA fragment.
The watermark string and the variable coding segment of the artificial nucleic acid sequence watermark coding system should be in the same engineered organism as the artificial nucleic acid segment, but not necessarily in the same nucleic acid molecule, e.g., in the same chromosome, in the same vector plasmid.
The artificial nucleic acid sequence watermark encoding system is integrally composed of a plurality of parts, and can comprise a variable encoding section and at least one watermark character string connected with the variable encoding section. The length of the variable coding section can be determined by different lengths according to manual design, and the variable coding section can be long or short, so that the problem of connection of a plurality of variable coding sections is not involved, and the number of the watermark character strings can be 1 or more, and the watermark character strings are used for representing different unit information and the like. Multiple short messages may be connected by a character _.
In an artificial nucleic acid sequence watermark encoding system, which can be set to an upstream position, a midstream position and a downstream position, wherein variable encoding sections can be set to the midstream position and/or the downstream position, and at this time, the watermark character string can have 1 or more, the following may occur:
1. watermark string _ variable encoding section;
2. watermark string 1_ variable encoding section _ watermark string 2.
The boundary of the artificial nucleic acid sequence watermark coding system can be rapidly determined by fixing the position of the variable coding section at the midstream position and relative to the position of the fixed variable coding section, so that the accuracy and precision of identification are improved, the identification efficiency is improved, and the problem of messy codes during coding and decoding when the position of the variable coding section is not determined is solved.
As described above, the variable coding segment is composed of one or more information units, each of which includes two parts, a nucleic acid feature part and a numbering part. The nucleic acid characteristic part is used for marking the nucleic acid assembly method or the nuclear calculation and case epitaxy; a coding section for identifying a specific assembly method number or a number of the selected nucleic acid element in the library of elements.
Furthermore, the element library is a preset database with an encoding part of an index target and a corresponding DNA sequence, and can be used for encoding or decoding of an artificial nucleic acid sequence watermark encoding system.
As described above, the variable coding segment is composed of a plurality of information units, each of which includes a nucleic acid feature and a number, wherein the nucleic acid feature may be represented by A1-digit letter and the number may be represented by a plurality of digits (e.g., A123B456C 789). The number of the information units can be increased, decreased and customized according to the actual requirements, the requirements of the information amount and the requirements of conditions.
The watermark string may be disposed at an upstream or downstream position of the artificial nucleic acid sequence watermark encoding system, and may be designed by a sequence design unit to identify specific unit information, for example, the watermark string "biosensen" indicates that the sequence is designed and provided by the company "biosensen ltd.
For example, the artificial nucleic acid sequence watermark encoding system is:
biosensen _ a003S001O001R001I005P001B001C010T 003; the system includes two parts, a watermark string "biosensen" at an upstream location, and a variable code segment "a 003S001O001R001I005P001B001C010T 003" at a mid-stream location (or mid-downstream location). Wherein, the variable code segment comprises 9 information units, which are respectively 'A003', 'S001', 'O001', 'R001', 'I005', 'P001', 'B001', 'C010' and 'T003'; in the first information unit, the nucleic acid characteristic part is "A" and the numbering part is "003". Therefore, the corresponding element library can be indexed based on the number part in each information unit, and the corresponding sequence can be determined according to the nucleic acid characteristics, so that the storage capacity and the sequencing efficiency are greatly improved.
The application provides an artificial nucleic acid sequence watermark coding system, the relative positions of a variable coding section and a watermark character string are arranged by artificial design, the variable coding section is fixed at the midstream position, other watermark character strings and the like are fixed at the upstream and downstream of two sides, after the artificial nucleic acid sequence watermark coding system is converted into a nucleic acid sequence, the nucleic acid sequence has no high homology with the known natural biological genome, and can be used as a watermark sequence to mark the artificial nucleic acid sequence design of a specific unit or a laboratory and carry out intellectual property protection on a gene product; in addition, the length of the artificial nucleic acid sequence watermark encoding system is between 100-300bp, the artificial nucleic acid sequence watermark encoding system can be covered by a Sanger sequencing reaction, the complete artificial nucleic acid sequence does not need to be sequenced, and the component composition of the artificial nucleic acid fragment can be obtained only by one-time Sanger sequencing. The complete sequence of the artificial nucleic acid fragment can also be deduced by combining the preset mapping relationship among the nucleic acid characteristic part (assembly method, etc.), the numbering part (element library number) and the sequence.
Further, the length of the watermark character string is not less than 3;
in the variable coding segment, a plurality of the information units are arranged in 5 'to 3' order for indicating the kind of functional fragment and the corresponding number on the artificial nucleic acid fragment.
Since the length of the watermark string is now 3 or more, the universal primer can be guaranteed to be 16nt after encoding into nucleic acid.
In the above, in the variable coding section, a plurality of the information units are arranged in 5 'to 3' order, for example:
Biosysen_A003S001O001R001I005P001B001C010T003;
a003 is start information for identifying a nucleic acid assembling method and interface information, and 003 is a coding part for indexing information in a corresponding component library. Thereafter, each information unit represents the type and number of the functional fragment on the artificial nucleic acid fragment in the order from 5 'to 3' (for example, in the case of a circular nucleic acid system such as a plasmid, generally starting from the first functional element downstream of the inserted fragment); if there are some information units behind Ixxx and before the scene character string is used downstream, these information units represent the internal structure of the insert (i.e. I005P001B001C010T003 in this example, which represents the insert skeleton type of 005, and the internal structure is composed of promoter No. 001, RBS No. 001, CDS No. 010, and terminator No. 003 connected in sequence).
Further, the nucleic acid feature in each of the information elements is used to indicate DNA assembly method and interface information, screening marker information, replication origin information, reporter gene information, insert backbone species information, insert promoter information, insert RBS information, and insert CDS information or insert terminator information.
As described above, the use of the nucleic acid signature in each information element includes information indicating, but not limited to, DNA assembly methods and interfaces, screening marker information, replication origin information, reporter gene information, insert backbone species information, insert promoter information, insert RBS information, and insert CDS information or insert terminator information, and the like.
The above-mentioned nucleic acid features represent the nucleic acid assembly method or the type of nucleic acid element, and are shown by 1-digit letters in this example, but are not limited thereto. In this embodiment, the definition and representation may be performed as follows:
the nucleic acid assembling method and the nucleic acid element types of each information unit are identified through the nucleic acid characteristic part in each information unit, and the corresponding numbers in the element library are identified by utilizing the numbers of the encoding parts, so that the corresponding nucleic acid information can be accurately and quickly indexed, further, more related information of the nucleic acid sequence is stored in a simple artificial nucleic acid sequence watermark encoding system comprising a plurality of information units through corresponding encoding and decoding, the complete functional module composition of the artificial nucleic acid sequence can be completely known through sequencing the encoding system, even the complete nucleic acid sequence is deduced, the sequencing cost is greatly reduced, the sequencing efficiency is improved, and convenience is provided for sequencing work.
Further, the method comprises a use scene segment for representing the end and use scene of the artificial nucleic acid sequence watermark encoding system;
the usage scene segment is connected with the variable coding segment;
the usage scene segment is arranged at an upstream position or a downstream position in the artificial nucleic acid sequence watermark encoding system;
the variable encoding section is provided at a midstream position between the usage scene and/or the watermark string.
Further, the usage scenario segment includes a bacterial artificial chromosome identifier, a yeast artificial chromosome identifier, a nuclear genome identifier, a mitochondrial genome identifier, and/or a chloroplast genome identifier RNA fragment identifier for identifying the usage scenario.
As described above, the usage scenario segments include, but are not limited to, bacterial artificial chromosome markers, yeast artificial chromosome markers, nuclear genome markers, mitochondrial genome markers, and/or chloroplast genome marker RNA segment markers for identifying the usage scenario. The method can be used for identifying the end of the coding system and also can be used for identifying the use scene of the artificial nucleic acid sequence. The positions may be set at an upstream position or a downstream position, for example, they may be:
1. watermark string _ variable encoding section _ usage scene section;
2. scene segment _ variable encoding segment _ watermark string is used.
For example: biosensen _ a003S001O001R001I005P001B001C010T003_ Vector;
wherein, Vector is a use scene segment positioned at a downstream position and indicates that the artificial nucleic acid sequence is a Vector plasmid. In addition, "BAC," "YAC," "Genome," "Mito," "choro," "RNA," and the like may also be used to refer to, but are not limited to, artificial nucleic acid sequences for bacterial artificial chromosomes, yeast artificial chromosomes, nuclear genomes, mitochondrial genomes, chloroplast genomes, RNA fragments, and the like. By adopting the scene segment, the scene used by the sequence, such as a vector plasmid, can be obtained during sequencing, encoding and decoding, so that the sequencing efficiency can be improved, and the related information can be accurately obtained.
In addition, in order to solve the above problems, the present embodiment further provides a watermark string in an artificial nucleic acid sequence watermark encoding system, wherein the length of the watermark string is not less than 15 bp/nt; and the watermark character string can be used as a general PCR primer sequence to amplify and sequence the artificial nucleic acid fragment.
Further, the watermark character string, the nucleic acid sequence, the reverse complement of the nucleic acid sequence, the sequence itself of the RNA sequence and/or the reverse complement of the RNA sequence and/or the subsequence thereof is a sequence having a similarity of not less than 95% to the nucleic acid sequence, the reverse complement of the RNA sequence and the reverse complement of the RNA sequence; wherein,
the nucleic acid sequence is:
the reverse complement of the nucleic acid sequence is:
the RNA sequence is:
the reverse complement of the RNA sequence is:
the nucleic acid sequence is a nucleic acid sequence decoded correspondingly to the watermark character string 'biosensen', has no high homology with the genome of a known natural organism (no result of comparison with the NCBI NT database running blastn), and therefore can be regarded as a specific sequence designed by the company biosensen Ltd.
Accordingly, several sequences can be included as follows:
(1) the four sequences themselves;
(2) any subsequence of the four sequences is more than or equal to 15 bp/nt;
(3) the four sequences or the sequence with the subsequence similarity of more than or equal to 15bp/nt of the four sequences of more than or equal to 95 percent (blastn percent identity);
all the above cases can be regarded as watermark information of the artificial nucleic acid sequence designed by biosystemen Ltd; artificial nucleic acid sequences or genetically engineered organisms containing such watermark information are considered to contain intellectual property rights of the company Biosysen ltd.
By inserting the watermark character string into the artificial nucleic acid sequence watermark coding system, the specific unit information of the artificial nucleic acid sequence can be quickly obtained during coding or decoding, so that the production party corresponding to the sequence can be obtained as Biosysen Ltd. For example:
Biosysen_A003S001O001R001I005P001B001C010T003_Vector;
for the system, the 4 corresponding sequences can be obtained from "biosensen _", or in the sequencing process, the 4 sequences can correspondingly obtain the watermark character string corresponding to the sequence as "biosensen _", so that the production party is known as biosensen ltd ", and specific production unit information can be kept in the gene fragment, so that the corresponding production party can be obtained in the encoding or decoding process, the protection degree of intellectual property rights is improved, and another way for protecting intellectual property rights, product anti-counterfeiting and tracing is provided.
In addition, to solve the above problem, the present embodiment further provides an encoding and decoding method of an artificial nucleic acid sequence watermark encoding system, including:
the coding method comprises the following steps:
presetting a watermark character string, a variable coding section and a use scene section;
encoding the watermark character string, the variable encoding section and the use scene section to obtain a target artificial nucleic acid sequence;
the decoding method comprises the following steps:
performing Sanger sequencing on the target artificial nucleic acid sequence to obtain a sequencing result;
acquiring the watermark character strings, variable coding sections and nucleic acid sequences corresponding to the use scene sections at the upstream position, the midstream position and the downstream position from the sequencing result;
and performing information interpretation conversion on the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section to obtain decoding information.
As described above, the watermark string, the variable encoding section, and the usage scenario section need to be encoded and decoded if they need to be loaded into the complete sequence. In the encoding, the corresponding watermark character string, variable encoding section and use scene section are preset, and then encoding is carried out according to the watermark character string, the variable encoding section and the use scene section, so as to obtain the corresponding target artificial nucleic acid sequence. If the sequence obtained by sequencing is required to be used as a target artificial nucleic acid sequence to obtain corresponding information, the target artificial sequence needs to be decoded.
The embodiment provides an encoding and decoding method of an artificial nucleic acid sequence watermark encoding system, which is characterized in that a variable encoding section, a watermark character string and a use scene section are arranged by preset artificial design, and then encoding aiming at the variable encoding section, the watermark character string and the use scene section is carried out to obtain a target artificial nucleic acid sequence; or a target artificial nucleic acid sequence is obtained through sequencing and then decoded through a decoding method, so that corresponding decoding information is obtained, and therefore, after the artificial nucleic acid sequence watermark coding system is converted into a nucleic acid sequence, the nucleic acid sequence has no high homology with the known natural biological genome and can be used as a watermark sequence to mark the artificial nucleic acid sequence design of a specific unit or a laboratory, and intellectual property protection is carried out on a gene product; in addition, the length of the artificial nucleic acid sequence watermark encoding system is between 100-300bp, the artificial nucleic acid sequence watermark encoding system can be covered by a Sanger sequencing reaction, the complete artificial nucleic acid sequence does not need to be sequenced, and the component composition of the artificial nucleic acid fragment can be obtained only by one-time Sanger sequencing. The complete sequence of the artificial nucleic acid fragment can also be deduced by combining the preset mapping relationship among the nucleic acid characteristic part (assembly method, etc.), the numbering part (element library number) and the sequence.
Example 1:
in the encoding method, "encoding the watermark string, the variable encoding section, and the usage scenario section to obtain the target artificial nucleic acid sequence" includes:
converting each character in the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section into an ASCII byte coded eight-bit binary number to obtain a first information set;
converting the first information set into a four-bit quaternion according to a mapping of 00 ═ A, 01 ═ G, 10 ═ C, 11 ═ T or U, and obtaining the target artificial nucleic acid sequence;
in the decoding method, "performing information interpretation conversion on the nucleic acid sequences corresponding to the watermark character string, the variable encoding section, and the usage scene section to obtain decoding information" includes:
converting the nucleic acid sequences corresponding to the watermark character string, the variable coding segment and the use scene segment into binary digital characters according to the rule of A-00, G-01, C-10, T or U-11 to obtain a second information set;
and encoding the second information set by using ASCII (American standard code for information interchange) with each 8 bits as a group, and converting the second information set into character strings to obtain decoding information corresponding to the target artificial nucleic acid sequence.
In the encoding process of this embodiment (ASCII transcoding method encoding to obtain the target artificial nucleic acid sequence), the ASCII method is adopted: each character in the artificial nucleic acid sequence watermark coding system is converted into an ASCII byte code (eight-bit binary number), and then converted into a DNA sequence of four-bit quaternary (A, T/U, C, G) by mapping 00-A, 01-G, 10-C, 11-T or U in every two bits.
For example: biosensen _ a003S001O001R001I005P001B001C010T003_ Vector;
the artificial nucleic acid sequence watermark encoding system of this case converts the nucleic acid sequence by the method provided in this example into:
in the decoding process, the sequencing sequence is converted into a binary digital character string by a mode of a being 00, G being 01, C being 10, T or U being 11, and then converted into a character string by ASCII encoding with 8 bits as a group, that is, a variable encoding character string is obtained.
Specifically, sequencing the target artificial nucleic acid sequence obtained by encoding according to the ASCII transcoding method in the first embodiment:
the plasmid with the number of Biosysen _ a001S001O001R001I001_ Vector and the primer 5'-GAACGCCGGCTTGTATGTCGG-3' were selected for Sanger sequencing, and the sequencing result was:
decoding was performed with the sequence obtained in example 1, with the results:
biosensen _ a001S001O001R001I001 Vector, which completely reduced the encoded information, was tested successfully.
Example 2:
in the encoding method, "encoding the watermark string, the variable encoding section, and the usage scenario section to obtain the target artificial nucleic acid sequence" includes:
respectively acquiring non-numeric characters and decimal numeric characters in the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section;
converting the non-numeric characters into ASCII byte coded eight-bit binary numbers to obtain a third information set;
converting the third information set into a four-bit quaternion according to a rule mapping of 00 ═ A, 01 ═ G, 10 ═ C, 11 ═ T or U, and obtaining a fourth information set;
performing quaternary conversion on the decimal numeric characters; according to the fixed digit number of the quaternary number converted in the quaternary system, the front end of the digit which is not enough for the fixed digit number after the quaternary conversion reaches the fixed digit number by 0; obtaining a fifth information set;
mapping the fifth information set according to a rule of 0 ═ A, 1 ═ G, 2 ═ C, 3 ═ T or U, and converting the fifth information set into decimal numbers to obtain the target artificial nucleic acid sequence;
in the decoding method, "performing information interpretation conversion on the nucleic acid sequences corresponding to the watermark character string, the variable encoding section, and the usage scene section to obtain decoding information" includes:
cutting the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section into subsequences with the length of 4+ n, and taking each subsequence as a decoding unit; wherein n is a fixed digit of a quaternary number converted from quaternary;
converting the nucleic acid sequence of the first four bits in each decoding unit into eight-bit binary ASCII codes according to the rule of A00, G01, C10, T or U11, and converting the eight-bit binary ASCII codes into characters to obtain a sixth information set; and,
converting the nucleic acid sequence of the last n bits in each decoding unit into a quaternary number according to the rule of A ═ 0, G ═ 1, C ═ 2, T or U ═ 3, and then converting the quaternary number into a decimal number to obtain a seventh information set;
and connecting the sixth information set and the seventh information set in all decoding units to obtain the decoding information.
In the encoding process, a hybrid encoding method is adopted, in which non-numeric characters in the artificial nucleic acid sequence watermark encoding system are converted as in scheme one, and decimal numeric character strings are directly subjected to quaternary conversion (the converted quaternary numbers should have fixed numbers, and if the numbers are less than the fixed numbers, the front ends of the digits are complemented to the digits by 0, in this example, four-digit quaternary numbers are used, and the quaternary numbers 130 should be complemented to be 0130), and the digits are mapped into nucleic acid sequences according to a rule of 0 ═ a, 1 ═ G, 2 ═ C, 3 ═ T or U (e.g., 0130 → AGTA).
For example: biosensen _ a003S001O001R001I005P001B001C010T003_ Vector;
the nucleic acid sequence transformed by the artificial nucleic acid sequence watermark encoding system of this case through example 2 is:
in the decoding process, the sequencing sequence is first cut into subsequences with length of 4+ n (n is a fixed number of bits used for four-run number when encoding numbers, in this case 4), each subsequence represents an information unit: in each information unit, the first 4-bit nucleic acid sequence is converted into an 8-bit binary ASCII code by means of a ═ 00, G ═ 01, C ═ 10, T, or U ═ 11, and converted into characters; the n-bit sequence is converted into a four-digit number and then converted into a decimal number through the rule that A is 0, G is 1, C is 2, and T or U is 3; and connecting the characters restored by all the information units and the decimal numbers to restore to obtain the variable coding character string.
Sequencing was performed for example 2:
the plasmid with the number of Biosysen _ A003S001O001R001I001_ Vector and the primer 5'-GAACGCCGGCTTGTATGTCGG-3' were selected for Sanger sequencing, and the sequencing result was:
decoding was performed as in example 2, with the results:
biosensen _ a003S001O001R001I001_ Vector, the encoded information was completely reduced and the test was successful.
Example 3:
in addition, biological products widely relate to aspects of medical health, food safety, crop planting, livestock breeding and the like, and are closely related to human life health and safety, so that the uniqueness and correspondence of the biological products in logistics, storage and clinical delivery processes are important in ensuring biosafety application.
In order to solve the above problem, in this embodiment, an encryption and decryption method for checking a specific biological product is provided, which comprehensively utilizes an artificial nucleic acid sequence watermark encoding system, a watermark string technology, and an encryption algorithm, and specifically includes the following steps:
after the steps of presetting the watermark character string, the variable coding section and the using scene section, the method also comprises the following steps:
in the encoding method:
generating a public key and a private key based on the SHA-256 algorithm according to the preset watermark character string, the variable coding section and the use scene section;
based on the private key, carrying out encryption calculation on the watermark character string, the variable coding section and the use scene section to obtain the watermark character string, the variable coding section and the use scene section which are encrypted by the private key;
in the step of encoding the watermark character string, the variable encoding section and the usage scenario section to obtain the target artificial nucleic acid sequence, the watermark character string, the variable encoding section and the usage scenario section are the watermark character string, the variable encoding section and the usage scenario section encrypted by the private key.
In a decoding method:
after the step of performing information interpretation conversion on the watermark character string, the variable coding segment and the nucleic acid sequence corresponding to the usage scene segment to obtain decoding information, the method further includes:
decrypting the decoding information by using a public key to obtain a first hash value; meanwhile, calculating the decoding information by using an SHA-256 algorithm to obtain a second hash value;
comparing the first hash value with the second hash value, and judging whether the first hash value is consistent with the second hash value;
and if the first hash value is consistent with the second hash value, judging that the test is passed.
In the product preparation process, the public key and the private key are generated together, wherein the public key can be calculated by the private key, but the private key cannot be calculated by the public key.
Specifically, the private key is used for carrying out encryption calculation on the watermark character string, the variable coding section and the use scene section to obtain the watermark character string, the variable coding section and the use scene section which are encrypted by the private key; each item (category) is calculated by a private key to obtain a corresponding 256-bit digital value, and then the obtained digital value is reused to correspondingly obtain the target artificial nucleic acid sequence.
The public key disclosure may be held by the transferee, publicly available, for example, with the transfer of the product to the transferee, or obtained by the transferee at the public.
And the transferee calculates the nucleic acid sequence obtained after decoding by using the public key to obtain a 256-bit hash value, namely the first hash value.
Meanwhile, the SHA-256 algorithm is utilized to calculate the nucleic acid sequence, and a 256-bit hash value is obtained as the second hash value.
Comparing the two hash values, if the first hash value is identical to the second hash value, i.e. the first hash value is equal to the second hash value, the following conclusion can be proved:
1. the resulting decoded information is for the resulting product having the target artificial nucleic acid sequence;
2. this decoded information is for the transferor of the product;
3. the received decoding information is the same as the original record of the product delivered by the transferor and is not tampered.
The SHA-256 algorithm, which is an algorithm subdivided under SHA-2, is a hash function, also called hash algorithm, and is a method for creating a small digital "fingerprint" from any kind of data. The hash function compresses a message or data into a digest so that the amount of data becomes small, fixing the format of the data. This function mixes the data shuffled and recreates a fingerprint called a hash value (or hash value). The hash value is typically represented by a short string of random letters and numbers.
For example, after acquiring biological product a of company B (shipping according to the order intention) through cold-chain transportation according to the order intention of hospital X, Sanger sequencing is performed on biological product a, and the sequencing result is:
the decoding is performed by the method in the above embodiment 2, and the result (i.e., the decoding information) is:
Biosysen_A003S001O001R001I001_Vector;
calculating decoding information by using the public key corresponding to the biological product A as follows, namely "Biosysen _ A003S001O001R001I001_ Vector", to obtain a first hash value, namely:
55CA1E14F0FC5D3C3FDE11E223B8ACA5A7944297866DCF923F1F5E 5DD80B2075;
meanwhile, utilizing an SHA-256 calculation tool to calculate the decoding information "Biosysen _ A003S001O001R001I001_ Vector" to obtain a second hash value, namely:
55CA1E14F0FC5D3C3FDE11E223B8ACA5A7944297866DCF923F1F5E 5DD80B2075;
matching the first hash value with the second hash value, namely comparing, and obtaining a conclusion through comparison: if the first hash value is identical to the second hash value, that is, the first hash value is equal to the second hash value, it can be determined that:
1. the obtained decoding information is directed to the obtained biological product A with the target artificial nucleic acid sequence;
2. this decoded information is produced by company B corresponding to the biological product A;
3. the received decoding information is the same as the original record of product a that company B has exported, and has not been tampered with.
That is, product a obtained by hospital X is produced by company B, and is a specified product corresponding to the ordering intent required by hospital X and sent out to hospital X by company B.
In the embodiment, the information in the encoding method and the decoding method is encrypted and decrypted by using the RSA asymmetric encryption algorithm, so that the encrypted information is written in the sequence of the biochemical product by using the encryption and decryption of the algorithm from the top to the production source and from the bottom to the product transferee in the product circulation process, then the unique conclusion of the product is obtained after the product is obtained by sequencing, encoding and decrypting by using a public key and finally comparing the hash value, on one hand, the safety, the uniqueness and the correspondence of the product are improved, on the other hand, the intellectual property rights of the design, research and development and production side are fully ensured, and the product achieves the high confidentiality of the biological information level.
The embodiment greatly improves the threshold of encryption and decryption, fully combines the RSA asymmetric encryption method of the block chain with the watermark coding system of the artificial nucleic acid sequence and the coding and decoding of the character string for the first time, further avoids the possibility that the product is changed and forged and the information is falsified in the logistics process and the transfer and acceptance process based on the embodiment from multiple links of research and development, production, clinical medical treatment, industrial application and the like, and is more favorable for ensuring the safety problem that the biological product is applied to the specific food and medical health fields in different links.
While the preferred embodiment and the corresponding examples of the present invention have been described, it should be understood that various changes and modifications, including but not limited to, adjustments of proportions, flows and amounts, which are within the scope of the invention, may be made by those skilled in the art without departing from the inventive concept thereof.
Claims (10)
1. An artificial nucleic acid sequence watermark encoding system, comprising:
a variable encoding section and at least one watermark string connected to the variable encoding section; wherein the variable coding section is arranged at a midstream position in the artificial nucleic acid sequence watermark coding system, and the watermark character string is arranged at an upstream position and/or a downstream position in the artificial nucleic acid sequence watermark coding system;
the variable coded segment includes at least one information unit;
the information unit comprises a nucleic acid characteristic part and a numbering part which is connected with the nucleic acid characteristic part and corresponds to the nucleic acid characteristic part.
2. The artificial nucleic acid sequence watermark encoding system of claim 1,
the length of the watermark character string is not less than 3;
in the variable coding segment, a plurality of the information units are arranged in 5 'to 3' order for indicating the kind of functional fragment and the corresponding number on the artificial nucleic acid fragment.
3. The artificial nucleic acid sequence watermark encoding system of claim 1,
the nucleic acid features in each of the information elements are used to indicate DNA assembly methods and interface information, screening marker information, replication origin information, reporter gene information, insert backbone species information, insert promoter information, insert RBS information, and insert CDS information or insert terminator information.
4. The artificial nucleic acid sequence watermark encoding system of claim 1,
further comprising a usage scenario segment for representing an end of the artificial nucleic acid sequence watermark encoding system and a usage scenario;
the usage scene segment is connected with the variable coding segment;
the usage scene segment is arranged at an upstream position or a downstream position in the artificial nucleic acid sequence watermark encoding system;
the variable encoding section is provided at a midstream position between the usage scene and/or the watermark string.
5. The artificial nucleic acid sequence watermark encoding system of claim 4,
the usage scenario segment includes a bacterial artificial chromosome identifier, a yeast artificial chromosome identifier, a nuclear genome identifier, a mitochondrial genome identifier, and/or a chloroplast genome identifier RNA fragment identifier for identifying the usage scenario.
6. A watermark string in an artificial nucleic acid sequence watermark encoding system according to any one of claims 1 to 5,
the length of the watermark character string in the artificial nucleic acid sequence watermark coding system is not less than 15 bp/nt; and the watermark character string can be used as a general PCR primer sequence to amplify and sequence the artificial nucleic acid fragment.
7. The watermark string in an artificial nucleic acid sequence watermark encoding system of claim 6,
the watermark character string, the nucleic acid sequence, the reverse complementary sequence of the nucleic acid sequence, the sequence of the RNA sequence and/or the reverse complementary sequence of the RNA sequence and/or the subsequence thereof is a sequence with the similarity of the nucleic acid sequence, the reverse complementary sequence of the RNA sequence and the reverse complementary sequence of the RNA sequence not less than 95%; wherein,
the nucleic acid sequence is:
GAACGCCGGCTTGTATGTCGGTATGCGGGCTCGGTT;
the reverse complement of the nucleic acid sequence is:
AACCGAGCCCGCATACCGACATACAAGCCGGCGTTC;
the RNA sequence is:
GAACGCCGGCUUGUAUGUCGGUAUGCGGGCUCGGUU;
the reverse complement of the RNA sequence is:
AACCGAGCCCGCAUACCGACAUACAAGCCGGCGUUC。
8. an encoding and decoding method for an artificial nucleic acid sequence watermark encoding system, comprising:
the coding method comprises the following steps:
presetting a watermark character string, a variable coding section and a use scene section;
encoding the watermark character string, the variable encoding section and the use scene section to obtain a target artificial nucleic acid sequence;
the decoding method comprises the following steps:
performing Sanger sequencing on the target artificial nucleic acid sequence to obtain a sequencing result;
acquiring the watermark character strings, variable coding sections and nucleic acid sequences corresponding to the use scene sections at the upstream position, the midstream position and the downstream position from the sequencing result;
and performing information interpretation conversion on the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section to obtain decoding information.
9. The encoding and decoding method of an artificial nucleic acid sequence watermark encoding system of claim 8,
in the encoding method, "encoding the watermark string, the variable encoding section, and the usage scenario section to obtain the target artificial nucleic acid sequence" includes:
converting each character in the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section into an ASCII byte coded eight-bit binary number to obtain a first information set;
converting the first information set into a four-bit quaternion according to a mapping of 00 ═ A, 01 ═ G, 10 ═ C, 11 ═ T or U, and obtaining the target artificial nucleic acid sequence;
in the decoding method, "performing information interpretation conversion on the nucleic acid sequences corresponding to the watermark character string, the variable encoding section, and the usage scene section to obtain decoding information" includes:
converting the nucleic acid sequences corresponding to the watermark character string, the variable coding segment and the use scene segment into binary digital characters according to the rule of A-00, G-01, C-10, T or U-11 to obtain a second information set;
and encoding the second information set by using ASCII (American standard code for information interchange) with each 8 bits as a group, and converting the second information set into character strings to obtain decoding information corresponding to the target artificial nucleic acid sequence.
10. The encoding and decoding method of an artificial nucleic acid sequence watermark encoding system of claim 8,
in the encoding method, "encoding the watermark string, the variable encoding section, and the usage scenario section to obtain the target artificial nucleic acid sequence" includes:
respectively acquiring non-numeric characters and decimal numeric characters in the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section;
converting the non-numeric characters into ASCII byte coded eight-bit binary numbers to obtain a third information set;
converting the third information set into a four-bit quaternion according to a rule mapping of 00 ═ A, 01 ═ G, 10 ═ C, 11 ═ T or U, and obtaining a fourth information set;
performing quaternary conversion on the decimal numeric characters; according to the fixed digit number of the quaternary number converted in the quaternary system, the front end of the digit which is not enough for the fixed digit number after the quaternary conversion reaches the fixed digit number by 0; obtaining a fifth information set;
mapping the fifth information set according to a rule of 0 ═ A, 1 ═ G, 2 ═ C, 3 ═ T or U, and converting the fifth information set into decimal numbers to obtain the target artificial nucleic acid sequence;
in the decoding method, "performing information interpretation conversion on the nucleic acid sequences corresponding to the watermark character string, the variable encoding section, and the usage scene section to obtain decoding information" includes:
cutting the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section into subsequences with the length of 4+ n, and taking each subsequence as a decoding unit; wherein n is a fixed digit of a quaternary number converted from quaternary;
converting the nucleic acid sequence of the first four bits in each decoding unit into eight-bit binary ASCII codes according to the rule of A00, G01, C10, T or U11, and converting the eight-bit binary ASCII codes into characters to obtain a sixth information set; and,
converting the nucleic acid sequence of the last n bits in each decoding unit into a quaternary number according to the rule of A ═ 0, G ═ 1, C ═ 2, T or U ═ 3, and then converting the quaternary number into a decimal number to obtain a seventh information set;
and connecting the sixth information set and the seventh information set in all decoding units to obtain the decoding information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110716040.6A CN113380322B (en) | 2021-06-25 | 2021-06-25 | Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110716040.6A CN113380322B (en) | 2021-06-25 | 2021-06-25 | Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113380322A true CN113380322A (en) | 2021-09-10 |
CN113380322B CN113380322B (en) | 2023-10-24 |
Family
ID=77579354
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110716040.6A Active CN113380322B (en) | 2021-06-25 | 2021-06-25 | Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113380322B (en) |
Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040094550A (en) * | 2003-05-03 | 2004-11-10 | 한국과학기술원 | Digital Watermarking Method And Digital Watermark Certification Method Utilizing Gene Information |
CN101200763A (en) * | 2007-11-13 | 2008-06-18 | 东南大学 | Coding and decoding method for determined nucleic acid sequence |
WO2009013910A1 (en) * | 2007-07-24 | 2009-01-29 | Keio University | Encoder apparatus, decoder apparatus, and information recording medium |
CN101400803A (en) * | 2006-01-19 | 2009-04-01 | 纽约州立大学研究基金会 | Methods and devices for detection and identification of encoded beads and biological molecules |
CN101932725A (en) * | 2007-10-08 | 2010-12-29 | 合成基因组股份有限公司 | The assembling of large nucleic acids |
CN102007407A (en) * | 2007-11-21 | 2011-04-06 | 考斯摩斯德公司 | Genome identification system |
WO2011053868A1 (en) * | 2009-10-30 | 2011-05-05 | Synthetic Genomics, Inc. | Encoding text into nucleic acid sequences |
US20110269119A1 (en) * | 2009-10-30 | 2011-11-03 | Synthetic Genomics, Inc. | Encoding text into nucleic acid sequences |
WO2014025056A1 (en) * | 2012-08-07 | 2014-02-13 | Hitachi Zosen Corporation | Method of encryption and decryption using nucleic acid |
KR20140096788A (en) * | 2013-01-29 | 2014-08-06 | 동명대학교산학협력단 | Dna watermarking scheme robust to mutation |
CN104850760A (en) * | 2015-03-27 | 2015-08-19 | 苏州泓迅生物科技有限公司 | Artificially synthesized DNA storage medium with coding information, storage reading method for information, and applications |
TW201541346A (en) * | 2014-04-22 | 2015-11-01 | Univ Nat Taiwan Ocean | Method for protecting DNA sequence content by using watermark technology |
CN105119717A (en) * | 2015-07-21 | 2015-12-02 | 郑州轻工业学院 | DNA coding based encryption system and encryption method |
US20170338943A1 (en) * | 2014-10-29 | 2017-11-23 | Massachusetts Institute Of Technology | Dna encryption technologies |
US20180253536A1 (en) * | 2017-03-01 | 2018-09-06 | Seven Bridges Genomics, Inc. | Watermarking for data security in bioinformatic sequence analysis |
US20180357742A1 (en) * | 2015-11-24 | 2018-12-13 | B<>Com | A method for processing a sequence of digital images, a method of watermarking, devices and associated computer programs |
WO2019040871A1 (en) * | 2017-08-24 | 2019-02-28 | Miller Julian | Device for information encoding and, storage using artificially expanded alphabets of nucleic acids and other analogous polymers |
CN109460822A (en) * | 2018-11-19 | 2019-03-12 | 天津大学 | Information storage means based on DNA |
US20190089372A1 (en) * | 2017-09-20 | 2019-03-21 | University Of Wyoming | Methods for data encoding in dna and genetically modified organism authentication |
CN109830263A (en) * | 2019-01-30 | 2019-05-31 | 东南大学 | A kind of DNA storage method based on oligonucleotide sequence code storage |
CN110060734A (en) * | 2019-03-29 | 2019-07-26 | 天津大学 | A kind of high robust DNA sequencing bar code generating at and read method |
CN110352253A (en) * | 2016-07-22 | 2019-10-18 | 核素示踪有限公司 | The method of amplifying nucleic acid sequence |
CN111095423A (en) * | 2017-08-25 | 2020-05-01 | 深圳华大生命科学研究院 | Encoding/decoding method, apparatus and data processing apparatus |
-
2021
- 2021-06-25 CN CN202110716040.6A patent/CN113380322B/en active Active
Patent Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040094550A (en) * | 2003-05-03 | 2004-11-10 | 한국과학기술원 | Digital Watermarking Method And Digital Watermark Certification Method Utilizing Gene Information |
CN101400803A (en) * | 2006-01-19 | 2009-04-01 | 纽约州立大学研究基金会 | Methods and devices for detection and identification of encoded beads and biological molecules |
WO2009013910A1 (en) * | 2007-07-24 | 2009-01-29 | Keio University | Encoder apparatus, decoder apparatus, and information recording medium |
CN101932725A (en) * | 2007-10-08 | 2010-12-29 | 合成基因组股份有限公司 | The assembling of large nucleic acids |
CN101200763A (en) * | 2007-11-13 | 2008-06-18 | 东南大学 | Coding and decoding method for determined nucleic acid sequence |
CN102007407A (en) * | 2007-11-21 | 2011-04-06 | 考斯摩斯德公司 | Genome identification system |
WO2011053868A1 (en) * | 2009-10-30 | 2011-05-05 | Synthetic Genomics, Inc. | Encoding text into nucleic acid sequences |
US20110269119A1 (en) * | 2009-10-30 | 2011-11-03 | Synthetic Genomics, Inc. | Encoding text into nucleic acid sequences |
WO2014025056A1 (en) * | 2012-08-07 | 2014-02-13 | Hitachi Zosen Corporation | Method of encryption and decryption using nucleic acid |
KR20140096788A (en) * | 2013-01-29 | 2014-08-06 | 동명대학교산학협력단 | Dna watermarking scheme robust to mutation |
TW201541346A (en) * | 2014-04-22 | 2015-11-01 | Univ Nat Taiwan Ocean | Method for protecting DNA sequence content by using watermark technology |
US20170338943A1 (en) * | 2014-10-29 | 2017-11-23 | Massachusetts Institute Of Technology | Dna encryption technologies |
CN104850760A (en) * | 2015-03-27 | 2015-08-19 | 苏州泓迅生物科技有限公司 | Artificially synthesized DNA storage medium with coding information, storage reading method for information, and applications |
CN105119717A (en) * | 2015-07-21 | 2015-12-02 | 郑州轻工业学院 | DNA coding based encryption system and encryption method |
US20180357742A1 (en) * | 2015-11-24 | 2018-12-13 | B<>Com | A method for processing a sequence of digital images, a method of watermarking, devices and associated computer programs |
CN110352253A (en) * | 2016-07-22 | 2019-10-18 | 核素示踪有限公司 | The method of amplifying nucleic acid sequence |
US20180253536A1 (en) * | 2017-03-01 | 2018-09-06 | Seven Bridges Genomics, Inc. | Watermarking for data security in bioinformatic sequence analysis |
WO2019040871A1 (en) * | 2017-08-24 | 2019-02-28 | Miller Julian | Device for information encoding and, storage using artificially expanded alphabets of nucleic acids and other analogous polymers |
CN111095423A (en) * | 2017-08-25 | 2020-05-01 | 深圳华大生命科学研究院 | Encoding/decoding method, apparatus and data processing apparatus |
US20190089372A1 (en) * | 2017-09-20 | 2019-03-21 | University Of Wyoming | Methods for data encoding in dna and genetically modified organism authentication |
CN109460822A (en) * | 2018-11-19 | 2019-03-12 | 天津大学 | Information storage means based on DNA |
CN109830263A (en) * | 2019-01-30 | 2019-05-31 | 东南大学 | A kind of DNA storage method based on oligonucleotide sequence code storage |
CN110060734A (en) * | 2019-03-29 | 2019-07-26 | 天津大学 | A kind of high robust DNA sequencing bar code generating at and read method |
Non-Patent Citations (2)
Title |
---|
徐熙君, 许曰滨: "遗传算法在图像水印嵌入技术中的实现", 青岛大学学报(自然科学版), no. 02 * |
王玉军: "数字指纹的研究及其在图像版权保护方面的应用", 中国优秀硕士学位论文全文数据库, no. 06 * |
Also Published As
Publication number | Publication date |
---|---|
CN113380322B (en) | 2023-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Welsh et al. | Genomic fingerprints produced by PCR with consensus tRNA gene primers | |
US20220238184A1 (en) | Steganographic embedding of information in coding genes | |
Slater et al. | Genome sequences of three Agrobacterium biovars help elucidate the evolution of multichromosome genomes in bacteria | |
Kim et al. | The complete nucleotide sequence of the domestic dog (Canis familiaris) mitochondrial genome | |
Gamble et al. | Identification of sex‐specific molecular markers using restriction site‐associated DNA sequencing | |
Nishibori et al. | Molecular evidence for hybridization of species in the genus Gallus except for Gallus varius | |
Coombe et al. | Assembly of the complete Sitka spruce chloroplast genome using 10X Genomics’ GemCode sequencing data | |
Voigt et al. | A fragmented metazoan organellar genome: the two mitochondrial chromosomes of Hydra magnipapillata | |
Pati et al. | Complete genome sequence of Arcobacter nitrofigilis type strain (CI T) | |
WO2018039938A1 (en) | Method for biologically storing and restoring data | |
Baranov et al. | Transcriptional slippage in bacteria: distribution in sequenced genomes and utilization in IS element gene expression | |
Williams et al. | The complete sequence of the Acacia ligulata chloroplast genome reveals a highly divergent clpP1 gene | |
DeWoody et al. | A translocated mitochondrial cytochrome b pseudogene in voles (Rodentia: Microtus) | |
Miller et al. | Comparative genomics of all three Campylobacter sputorum biovars and a novel cattle-associated C. sputorum clade | |
Reyna-Flores et al. | Draft genome sequences of endophytic isolates of Klebsiella variicola and Klebsiella pneumoniae obtained from the same sugarcane plant | |
KR101493978B1 (en) | Indel marker for discrimination of soybean cultivar | |
CN111095423B (en) | Encoding/decoding method, apparatus and data processing apparatus | |
Wang et al. | Theoretical prediction and experimental verification of protein-coding genes in plant pathogen genome Agrobacterium tumefaciens strain C58 | |
Zimmermann et al. | Chloroplast and mitochondrial genetic variation of larches at the Siberian tundra-taiga ecotone revealed by de novo assembly | |
Ramírez et al. | Paleogenomics in a temperate environment: shotgun sequencing from an extinct Mediterranean caprine | |
CN113380322B (en) | Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method | |
Nakano et al. | Phylogenetic trees of closely related bacterial species and subspecies based on frequencies of short nucleotide sequences | |
EP2411933A2 (en) | Methods for providing a set of symbols uniquely distinguishing an organism such as a human individual | |
Jiao et al. | Code for encryption hiding data into genomic DNA of living organisms | |
Pombert et al. | The complete mitochondrial genome from an unidentified Phalansterium species |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |