CN113380322B - Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method - Google Patents

Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method Download PDF

Info

Publication number
CN113380322B
CN113380322B CN202110716040.6A CN202110716040A CN113380322B CN 113380322 B CN113380322 B CN 113380322B CN 202110716040 A CN202110716040 A CN 202110716040A CN 113380322 B CN113380322 B CN 113380322B
Authority
CN
China
Prior art keywords
nucleic acid
acid sequence
information
watermark
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110716040.6A
Other languages
Chinese (zh)
Other versions
CN113380322A (en
Inventor
康康
陈钰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beisheng Biotechnology Shenzhen Co ltd
Original Assignee
Beisheng Biotechnology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beisheng Biotechnology Shenzhen Co ltd filed Critical Beisheng Biotechnology Shenzhen Co ltd
Priority to CN202110716040.6A priority Critical patent/CN113380322B/en
Publication of CN113380322A publication Critical patent/CN113380322A/en
Application granted granted Critical
Publication of CN113380322B publication Critical patent/CN113380322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioethics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention provides an artificial nucleic acid sequence watermark coding system, a watermark character string and coding and decoding methods. Wherein the artificial nucleic acid sequence watermark encoding system comprises: a variable encoding segment, and at least one watermark string coupled to the variable encoding segment; the variable encoding section includes at least one information unit; the information unit comprises a nucleic acid feature and a numbering part connected with the nucleic acid feature and corresponding to the nucleic acid feature. The coding system can be used as a watermark sequence to carry out intellectual property protection on gene products, and the corresponding sequence is determined according to the nucleic acid characteristics of the coding system, so that the storage capacity and the sequencing efficiency are greatly improved.

Description

Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method
Technical Field
The invention belongs to the technical field of artificial nucleic acid sequences, and particularly relates to an artificial nucleic acid sequence watermark coding system, a watermark character string and coding and decoding methods.
Background
A DNA sequence or gene sequence is the primary structure of a DNA molecule carrying genetic information, either true or hypothetical, represented using a series of letters. Genetic engineering refers to strictly designing according to the wish of people, and endowing organisms with new genetic characteristics through in-vitro DNA recombination and transgenic technology, so as to create new biological types and biological products which are more in line with the needs of people. Genetic engineering is designed and constructed at the DNA molecular level, also called DNA recombination technology.
Plasmid vectors are plasmids that are constructed artificially to accommodate laboratory procedures on the basis of natural plasmids. Plasmid vectors typically carry one or more selectable marker genes (e.g., antibiotic resistance genes) and an artificially synthesized multiple cloning site sequence containing multiple restriction enzyme recognition sites, as compared to the natural plasmid, with the major unnecessary sequence removed to minimize molecular weight for ease of genetic engineering operations.
The intellectual property rights of artificial nucleic acid sequences for genetic engineering (such as the vector plasmids described above) are attributed to the company or laboratory that designed the sequences. However, the artificial nucleic acid sequence is often formed by combining the disclosed nucleic acid sequences, and lacks specific intellectual property attribution information, so that the intellectual property protection is difficult.
In addition, artificial nucleic acid sequences tend to be long, such as DNA sequences of 3k-10k bp in a vector plasmid. If the artificial nucleic acid sequence is to be identified, sequencing technology is often required to obtain the complete functional module. Taking the most commonly used Sanger sequencing as an example, a single Sanger sequencing reaction can only effectively cover 300-600bp sequences, and 5-20 Sanger reactions are required for complete identification of one vector plasmid sequence, so that the cost is high.
Disclosure of Invention
The invention provides an artificial nucleic acid sequence watermark coding system, a watermark character string and a coding and decoding method, which are used for solving the defects in the prior art.
To solve the above problems, the present invention provides an artificial nucleic acid sequence watermark encoding system, comprising:
a variable encoding segment, and at least one watermark string coupled to the variable encoding segment; wherein the variable coding section is arranged at a midstream position in the artificial nucleic acid sequence watermark coding system, and the watermark character string is arranged at an upstream position and/or a downstream position in the artificial nucleic acid sequence watermark coding system;
the variable encoding section includes at least one information unit;
the information unit comprises a nucleic acid feature and a numbering part connected with the nucleic acid feature and corresponding to the nucleic acid feature.
Preferably, the watermark string length is not less than 3;
in the variable encoding section, a plurality of the information units are arranged in 5 'to 3' order for representing the kinds and corresponding numbers of the functional fragments on the artificial nucleic acid fragment.
Preferably, the nucleic acid features in each of the information units are used to represent DNA assembly methods and interface information, screening marker information, replication origin information, reporter information, insert backbone type information, insert promoter information, insert RBS information, and insert CDS information or insert terminator information.
Preferably, the method further comprises a use scene segment for representing the end and use scene of the artificial nucleic acid sequence watermark encoding system;
the usage scene segment is connected with the variable coding segment;
an upstream location or a downstream location in the artificial nucleic acid sequence watermark encoding system of the usage scenario Duan Sheyu;
the variable coding section is arranged at a midstream position between the use scene and/or the watermark character string.
Preferably, the usage scenario segment comprises a bacterial artificial chromosome signature, a yeast artificial chromosome signature, a nuclear genome signature, a mitochondrial genome signature and/or a chloroplast genome signature RNA fragment signature for identifying the usage scenario.
In addition, in order to solve the above problems, the present application also provides a watermark character string in an artificial nucleic acid sequence watermark encoding system, the length of which is not less than 15bp/nt; and the watermark character string can be used as a universal PCR primer sequence to amplify and sequence the artificial nucleic acid fragment.
Preferably, the watermark character string has a nucleic acid sequence, a reverse complement of the nucleic acid sequence, an RNA sequence, and/or a sequence itself of the reverse complement of the RNA sequence, and/or a subsequence thereof, which is a sequence having a similarity of not less than 95% between the nucleic acid sequence, the reverse complement of the nucleic acid sequence, the RNA sequence, and the reverse complement of the RNA sequence; wherein,,
The nucleic acid sequence is as follows:
the reverse complement of the nucleic acid sequence is:
the RNA sequence is as follows:
the reverse complement of the RNA sequence is:
in addition, in order to solve the above problems, the present application also provides a coding and decoding method of an artificial nucleic acid sequence watermark coding system, including:
the coding method comprises the following steps:
presetting a watermark character string, a variable coding section and a use scene section;
encoding the watermark character string, the variable encoding section and the use scene section to obtain a target artificial nucleic acid sequence;
the decoding method comprises the following steps:
performing Sanger sequencing on the target artificial nucleic acid sequence to obtain a sequencing result;
obtaining watermark character strings, variable coding segments and nucleic acid sequences corresponding to the using scene segments at the upstream position, the midstream position and the downstream position from the sequencing result;
and carrying out information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the use scene section to obtain decoding information.
Preferably, in the encoding method, "encoding the watermark character string, the variable encoding section, and the usage scene section, obtaining the target artificial nucleic acid sequence" includes:
converting each character in the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the use scene section into an eight-bit binary number coded by ASCII bytes to obtain a first information set;
Converting the first information set into a quaternary number according to a 00=a, 01=g, 10=c, 11=t or U mapping, so as to obtain the target artificial nucleic acid sequence;
in the decoding method, the step of performing information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the usage scene section to obtain decoding information includes:
converting the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the using scene section into binary digital characters according to the rules of A=00, G=01, C=10, T or U=11, so as to obtain a second information set;
and (3) encoding the second information set by using ASCII with 8 bits as a group, and converting the ASCII into character strings to obtain decoding information corresponding to the target artificial nucleic acid sequence.
Preferably, in the encoding method, "encoding the watermark character string, the variable encoding section, and the usage scene section, obtaining the target artificial nucleic acid sequence" includes:
respectively obtaining non-digital characters and decimal digital characters in the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section;
converting the non-digital character into an eight-bit binary number encoded by ASCII bytes to obtain a third information set;
Mapping the third information set according to a rule of 00=a, 01=g, 10=c, 11=t or U, and converting the third information set into a four-bit quaternary number to obtain a fourth information set;
performing quaternary conversion on the decimal digital character; according to the fixed digit number of the quaternary digit converted by the quaternary system, the front end of the digit which is less than the fixed digit number after the quaternary system conversion is complemented by 0 to reach the fixed digit number; obtaining a fifth information set;
mapping the fifth information set according to rules of 0=a, 1=g, 2=c, 3=t or U, and converting the mapping into decimal numbers to obtain the target artificial nucleic acid sequence;
in the decoding method, the step of performing information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the usage scene section to obtain decoding information includes:
cutting the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the use scene section into subsequences with the length of 4+n, and taking each subsequence as a decoding unit; wherein n is a fixed number of bits of the quaternary number converted by the quaternary system;
converting the nucleic acid sequence of the first four bits in each decoding unit into eight-bit binary ASCII codes according to the rule of A=00, G=01, C=10, T or U=11, and converting the eight-bit binary ASCII codes into characters to obtain a sixth information set; and, in addition, the processing unit,
Converting the nucleic acid sequence of the last n bits in each decoding unit into quaternary numbers according to the rule of A=0, G=1, C=2, T or U=3, and converting into decimal numbers to obtain a seventh information set;
and connecting the sixth information set and the seventh information set in all decoding units to obtain the decoding information.
The invention provides an artificial nucleic acid sequence watermark coding system, a watermark character string and coding and decoding methods. Wherein the artificial nucleic acid sequence watermark encoding system comprises: comprising the following steps: a variable encoding segment, and at least one watermark string coupled to the variable encoding segment; wherein the variable coding section is arranged at a midstream position in the artificial nucleic acid sequence watermark coding system, and the watermark character string is arranged at an upstream position and/or a downstream position in the artificial nucleic acid sequence watermark coding system; the variable encoding section includes at least one information unit; the information unit comprises a nucleic acid feature and a numbering part connected with the nucleic acid feature and corresponding to the nucleic acid feature. The invention provides a coding and decoding method of an artificial nucleic acid sequence watermark coding system, which is characterized in that a variable coding section, a watermark character string and relative positions of scene sections are arranged through artificial design, the variable coding section is fixed at a midstream position, other watermark character strings and the like are fixed at the upper and lower sides of the two sides, the artificial nucleic acid sequence watermark coding system is converted into a nucleic acid sequence, and the nucleic acid sequence watermark coding system has no high homology with a known natural biological genome and can be used as a watermark sequence so as to identify a specific unit or the design of the artificial nucleic acid sequence in a laboratory, and the intellectual property of a gene product is protected; in addition, the length of the watermark coding system of the artificial nucleic acid sequence is 100-300bp, the watermark coding system can be covered by a Sanger sequencing reaction, the whole artificial nucleic acid sequence is not required to be sequenced, and the element composition of the artificial nucleic acid fragment can be obtained only by one Sanger sequencing. The complete sequence of the artificial nucleic acid fragment can be deduced by combining a predetermined mapping relationship between the nucleic acid features (assembly method, etc.), the numbering unit (element library number) and the sequence.
Detailed Description
The technical solution of the present invention is further described in detail below with reference to specific embodiments, but the present invention is not limited thereto, and any modifications made by anyone within the scope of the claims of the present invention are still within the scope of the claims of the present invention.
The present embodiment provides an artificial nucleic acid sequence watermark encoding system, including:
a variable encoding segment, and at least one watermark string coupled to the variable encoding segment; wherein the variable coding section is arranged at a midstream position in the artificial nucleic acid sequence watermark coding system, and the watermark character string is arranged at an upstream position and/or a downstream position in the artificial nucleic acid sequence watermark coding system;
the variable encoding section includes at least one information unit;
the information unit comprises a nucleic acid feature and a numbering part connected with the nucleic acid feature and corresponding to the nucleic acid feature.
The whole length of the artificial nucleic acid sequence watermark coding system is 100-300bp, and the whole can be an artificial nucleic acid fragment, which can include but is not limited to a vector plasmid; wherein, the partial fragment of the plasmid can be BAC fragment, YAC fragment, nucleic acid fragment inserted into biological genome, or RNA fragment, etc.
The watermark string and variable coding segments of the artificial nucleic acid sequence watermark coding system described above should be within the same engineered organism as the artificial nucleic acid fragment, but not necessarily on the same nucleic acid molecule, e.g., within the same chromosome, within the same vector plasmid.
The above-mentioned artificial nucleic acid sequence watermark encoding system is entirely composed of a plurality of parts, and may include a variable encoding section, and at least one watermark character string connected to the variable encoding section. The length of the variable code segment can be determined according to manual design, and can be long or short, so that the problem of connection of a plurality of variable code segments is not involved, and the number of watermark strings can be 1 or more and used for representing different unit information and the like. The multiple short messages may be connected by the character "_".
In an artificial nucleic acid sequence watermark encoding system, it may be set to an upstream position, a midstream position and a downstream position, where the variable encoding section may be set to the midstream position and/or the downstream position, and where the watermark string may have 1 or more, the following may occur:
1. watermark string-variable encoding section;
2. watermark string 1_ variable encoding segment _ watermark string 2.
The boundary of the artificial nucleic acid sequence watermark coding system can be rapidly determined by fixing the position of the variable coding section at the midstream position and relative to the fixed position of the variable coding section, so that the accuracy and precision of identification are improved, the identification efficiency is improved, and the problem of messy codes during coding and decoding when the position of the variable coding section is not determined is avoided.
The variable coding segment is composed of one or more information units, and each information unit comprises two parts, namely a nucleic acid characteristic part and a numbering part. Wherein the nucleic acid feature is used for identifying a nucleic acid assembly method or accounting and case epitaxy; and a coding part for identifying the specific assembly method number or the number of the selected nucleic acid element in the element library.
Furthermore, the element library is a preset coding part with an index target and a database with a corresponding DNA sequence, and can be used for coding or decoding an artificial nucleic acid sequence watermark coding system.
The variable code segment is formed of a plurality of information units, each of which includes a nucleic acid feature, which may be represented by A1-digit letter, and a numbering portion, which may be represented by a plurality of digit numbers (e.g., a123B456C 789). The number of the information units can be increased or decreased and customized according to actual demands, information quantity requirements and condition requirements.
The watermark string may be disposed at an upstream position or a downstream position of the artificial nucleic acid sequence watermark encoding system, and may be designed by a sequence design unit for identifying specific unit information, for example, the watermark string "biosystem" indicates that the sequence is provided by "biosystem ltd.
For example, the artificial nucleic acid sequence watermark encoding system is:
biosystem_a003S 001O001R001I005P001B001C010T003; the system includes two parts, an upstream position watermark string "biosystem", and a midstream position (or midstream downstream position) variable encoding segment "a003S001O001R001I005P001B001C010T003". Wherein the variable encoding section includes 9 information units, which are "a003", "S001", "O001", "R001", "I005", "P001", "B001", "C010", and "T003", respectively; in the first information element, the nucleic acid feature is "A" and the number is "003". Therefore, the corresponding sequence can be determined based on the number part in each information unit and indexed into the corresponding element library according to the nucleic acid characteristics, so that the storage capacity and the sequencing efficiency are greatly improved.
The application provides an artificial nucleic acid sequence watermark coding system, wherein the relative positions of a variable coding section and a watermark character string are artificially designed and arranged, the variable coding section is fixed at a midstream position, other watermark character strings are fixed at the upper and lower sides of the two sides, and the like, after the artificial nucleic acid sequence watermark coding system is converted into a nucleic acid sequence, the nucleic acid sequence watermark coding system has no high homology with a known natural biological genome, and can be used as a watermark sequence to identify a specific unit or the design of the artificial nucleic acid sequence in a laboratory, so that the intellectual property of a gene product is protected; in addition, the length of the watermark coding system of the artificial nucleic acid sequence is 100-300bp, the watermark coding system can be covered by a Sanger sequencing reaction, the whole artificial nucleic acid sequence is not required to be sequenced, and the element composition of the artificial nucleic acid fragment can be obtained only by one Sanger sequencing. The complete sequence of the artificial nucleic acid fragment can be deduced by combining a predetermined mapping relationship between the nucleic acid features (assembly method, etc.), the numbering unit (element library number) and the sequence.
Further, the length of the watermark character string is not less than 3;
in the variable encoding section, a plurality of the information units are arranged in 5 'to 3' order for representing the kinds and corresponding numbers of the functional fragments on the artificial nucleic acid fragment.
The length of the watermark string is 3 or more, so that the length of the universal primer can reach 16nt after encoding into nucleic acid.
In the above-described variable encoding section, a plurality of the information units are arranged in 5 'to 3' order, for example:
Biosysen_A003S001O001R001I005P001B001C010T003;
a003 is start information for identifying the nucleic acid assembly method and interface information, 003 is a coding unit, and information in the index corresponding element library. Each information unit then indicates the type and number of the functional fragment on the artificial nucleic acid fragment in 5 'to 3' order (e.g., as a circular nucleic acid system such as a plasmid, generally starting from the first functional element downstream of the insert); if there are several information units after Ixxx and before the downstream field Jing Zifu string, these information units represent the internal structure of the insert (I005P 001B001C010T003 in this example, which represents that the framework type of the insert is 005, the interior of which is composed of a 001 promoter, a 001 RBS, a 010 CDS, and a003 terminator connected in sequence).
Further, the nucleic acid features in each of the information units are used to represent DNA assembly methods and interface information, screening marker information, replication origin information, reporter information, insert backbone type information, insert promoter information, insert RBS information, and insert CDS information or insert terminator information.
The uses of the nucleic acid features in each information unit described above include, but are not limited to, DNA assembly methods and interface information, screening marker information, replication origin information, reporter information, insert backbone type information, insert promoter information, insert RBS information, and insert CDS information or insert terminator information, among others.
The above-mentioned nucleic acid feature means a nucleic acid assembling method or a kind of a nucleic acid element, and is indicated by a 1-letter in this embodiment, but is not limited to this. In this embodiment, the definition and representation can be performed as follows:
the nucleic acid characteristic part in each information unit is used for identifying the nucleic acid assembly method and the type of the nucleic acid element, and the corresponding number in the element library is identified by utilizing the number of the coding part, so that the corresponding nucleic acid information can be accurately and rapidly indexed, a simple artificial nucleic acid sequence watermark coding system comprising a plurality of information units is enabled to store more relevant information of the nucleic acid sequence through corresponding coding and decoding, the complete functional module composition of the artificial nucleic acid sequence can be completely known through sequencing the coding system, even the complete nucleic acid sequence is deduced, the sequencing cost is greatly reduced, the sequencing efficiency is improved, and convenience is provided for sequencing work.
Further, the method also comprises a use scene segment for representing the end and use scene of the artificial nucleic acid sequence watermark encoding system;
the usage scene segment is connected with the variable coding segment;
an upstream location or a downstream location in the artificial nucleic acid sequence watermark encoding system of the usage scenario Duan Sheyu;
the variable coding section is arranged at a midstream position between the use scene and/or the watermark character string.
Further, the usage scenario segment includes a bacterial artificial chromosome signature, a yeast artificial chromosome signature, a nuclear genome signature, a mitochondrial genome signature, and/or a chloroplast genome signature RNA fragment signature for identifying the usage scenario.
The use context segment described above includes, but is not limited to, a bacterial artificial chromosome signature, a yeast artificial chromosome signature, a nuclear genome signature, a mitochondrial genome signature, and/or a chloroplast genome signature RNA fragment signature for identifying the use context. Wherein, can be used for identifying the end of the coding system and can also be used for identifying the use scene of the artificial nucleic acid sequence. The position thereof may be set at an upstream position or a downstream position, and may be, for example:
1. Watermark string-variable encoding segment-usage scene segment;
2. a scene segment variable encoding segment watermark string is used.
For example: biossen_a003S 001O001R001I005P001B001C010T003_vector;
wherein Vector is the segment of the usage scenario at the downstream location, indicating that the artificial nucleic acid sequence is a Vector plasmid. In addition, artificial nucleic acid sequences such as "BAC", "YAC", "Genome", "Mito", "Chloro", "RNA" and the like may also be used, but are not limited to, for bacterial artificial chromosomes, yeast artificial chromosomes, nuclear genomes, mitochondrial genomes, chloroplast genomes, RNA fragments and the like. By adopting the scene segment, the scene used by the sequence, such as vector plasmid, can be obtained during sequencing, encoding and decoding, the sequencing efficiency can be improved, and related information can be accurately obtained.
In addition, in order to solve the above-mentioned problem, this embodiment also provides a watermark string in the watermark encoding system of artificial nucleic acid sequence, its length is not less than 15bp/nt; and the watermark character string can be used as a universal PCR primer sequence to amplify and sequence the artificial nucleic acid fragment.
Further, the watermark character string has a nucleic acid sequence, a reverse complement of the nucleic acid sequence, an RNA sequence and/or a sequence itself of the reverse complement of the RNA sequence and/or a subsequence thereof, and is a sequence with a similarity of not less than 95% between the nucleic acid sequence, the reverse complement of the nucleic acid sequence, the RNA sequence and the reverse complement of the RNA sequence; wherein,,
The nucleic acid sequence is as follows:
the reverse complement of the nucleic acid sequence is:
the RNA sequence is as follows:
the reverse complement of the RNA sequence is:
the above-mentioned nucleic acid sequence is a nucleic acid sequence corresponding to the watermark string "biosystem" decoded, and is not highly homologous to the known natural biological genome (no result of the comparison with the NCBI NT database running blastn), and thus can be regarded as a specific sequence designed by biosystem Ltd.
Accordingly, the following sequences may be included:
(1) The four sequences described above are themselves;
(2) Any subsequence of the four sequences is larger than or equal to 15 bp/nt;
(3) The four sequences or any subsequence with the similarity of more than or equal to 15bp/nt of the four sequences is more than or equal to 95 percent (blastn percent identity);
all these cases can be considered as watermark information of the artificial nucleic acid sequence designed by biosystem ltd; artificial nucleic acid sequences or genetically engineered organisms containing such watermark information are considered to contain biosystem ltd.
By inserting the watermark character string into the watermark encoding system of the artificial nucleic acid sequence, specific unit information of the artificial nucleic acid sequence can be quickly obtained during encoding or decoding, so that the producer corresponding to the sequence can be obtained as biosystem Ltd. For example:
Biosysen_A003S001O001R001I005P001B001C010T003_Vector;
For the system, the 4 corresponding sequences can be obtained from biosystem_, or in the sequencing process, watermark character strings corresponding to the 4 sequences can be obtained from biosystem_, so that the producer is biosystem Ltd, specific production unit information can be kept in the gene segment, the corresponding producer can be obtained in the encoding or decoding process, the protection of intellectual property rights is improved, and another mode for protecting intellectual property rights, product anti-counterfeiting and tracing is provided.
In addition, to solve the above-mentioned problem, the present embodiment further provides a coding and decoding method of an artificial nucleic acid sequence watermark coding system, including:
the coding method comprises the following steps:
presetting a watermark character string, a variable coding section and a use scene section;
encoding the watermark character string, the variable encoding section and the use scene section to obtain a target artificial nucleic acid sequence;
the decoding method comprises the following steps:
performing Sanger sequencing on the target artificial nucleic acid sequence to obtain a sequencing result;
obtaining watermark character strings, variable coding segments and nucleic acid sequences corresponding to the using scene segments at the upstream position, the midstream position and the downstream position from the sequencing result;
And carrying out information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the use scene section to obtain decoding information.
The watermark string, variable code segment and usage scene segment described above, need to be encoded and corresponding decoding procedures if it is to be loaded into the complete sequence. In the encoding, the corresponding watermark character string, the variable encoding section and the use scene section are preset, and then the encoding is carried out according to the watermark character string, the variable encoding section and the use scene section, so that the corresponding target artificial nucleic acid sequence is obtained. If the sequence obtained by sequencing is required to be used as a target artificial nucleic acid sequence, corresponding information is acquired, and decoding of the target artificial sequence is required.
The embodiment provides a coding and decoding method of an artificial nucleic acid sequence watermark coding system, which is characterized in that a target artificial nucleic acid sequence is obtained by presetting and artificially designing and arranging a variable coding section, a watermark character string and a use scene section, and further coding the variable coding section, the watermark character string and the use scene section; or the target artificial nucleic acid sequence is obtained through sequencing and then decoded through a decoding method, so that the method or the corresponding decoding information is obtained, and therefore, after the watermark coding system of the artificial nucleic acid sequence is converted into the nucleic acid sequence, the nucleic acid sequence has no high homology with the known natural biological genome and can be used as a watermark sequence to identify a specific unit or the design of the artificial nucleic acid sequence in a laboratory, and the intellectual property protection is carried out on a gene product; in addition, the length of the watermark coding system of the artificial nucleic acid sequence is 100-300bp, the watermark coding system can be covered by a Sanger sequencing reaction, the whole artificial nucleic acid sequence is not required to be sequenced, and the element composition of the artificial nucleic acid fragment can be obtained only by one Sanger sequencing. The complete sequence of the artificial nucleic acid fragment can be deduced by combining a predetermined mapping relationship between the nucleic acid features (assembly method, etc.), the numbering unit (element library number) and the sequence.
Example 1:
in the encoding method, "encoding the watermark character string, the variable encoding section, and the usage scene section, obtaining a target artificial nucleic acid sequence" includes:
converting each character in the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the use scene section into an eight-bit binary number coded by ASCII bytes to obtain a first information set;
converting the first information set into a quaternary number according to a 00=a, 01=g, 10=c, 11=t or U mapping, so as to obtain the target artificial nucleic acid sequence;
in the decoding method, the step of performing information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the usage scene section to obtain decoding information includes:
converting the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the using scene section into binary digital characters according to the rules of A=00, G=01, C=10, T or U=11, so as to obtain a second information set;
and (3) encoding the second information set by using ASCII with 8 bits as a group, and converting the ASCII into character strings to obtain decoding information corresponding to the target artificial nucleic acid sequence.
In the encoding process of this embodiment (encoding by ASCII transcoding to obtain the target artificial nucleic acid sequence), the ASCII method is used: each character in the artificial nucleic acid sequence watermark encoding system is converted into ASCII byte code (eight-bit binary number), and then every two bits are mapped into 00=A, 01=G, 10=C, 11=T or U, and converted into a four-bit quaternary (A, T/U, C, G) DNA sequence.
For example: biossen_a003S 001O001R001I005P001B001C010T003_vector;
the watermark encoding system of the artificial nucleic acid sequence in this case has the following nucleic acid sequence converted by the method provided in this embodiment:
in the decoding process, the sequencing sequence is converted into a binary digit string by means of a=00, g=01, c=10, t or u=11, and then the binary digit string is encoded by ASCII with each 8 bits as a group, and the binary digit string is converted into a string, thereby obtaining the variable encoded string.
Specifically, sequencing an artificial nucleic acid sequence of interest obtained by encoding with the ASCII transcoding method in example one:
the plasmid numbered biosystem_a001S 001O001r001I001 Vector, primer 5'-GAACGCCGGCTTGTATGTCGG-3', was selected for Sanger sequencing with the sequencing result:
decoding was performed with the sequence obtained in example 1, resulting in:
biosystem_a001S 001O001r001I001_vector, the coding information is completely restored, and the test is successful.
Example 2:
in the encoding method, "encoding the watermark character string, the variable encoding section, and the usage scene section, obtaining a target artificial nucleic acid sequence" includes:
respectively obtaining non-digital characters and decimal digital characters in the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section;
converting the non-digital character into an eight-bit binary number encoded by ASCII bytes to obtain a third information set;
mapping the third information set according to a rule of 00=a, 01=g, 10=c, 11=t or U, and converting the third information set into a four-bit quaternary number to obtain a fourth information set;
performing quaternary conversion on the decimal digital character; according to the fixed digit number of the quaternary digit converted by the quaternary system, the front end of the digit which is less than the fixed digit number after the quaternary system conversion is complemented by 0 to reach the fixed digit number; obtaining a fifth information set;
mapping the fifth information set according to rules of 0=a, 1=g, 2=c, 3=t or U, and converting the mapping into decimal numbers to obtain the target artificial nucleic acid sequence;
In the decoding method, the step of performing information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the usage scene section to obtain decoding information includes:
cutting the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the use scene section into subsequences with the length of 4+n, and taking each subsequence as a decoding unit; wherein n is a fixed number of bits of the quaternary number converted by the quaternary system;
converting the nucleic acid sequence of the first four bits in each decoding unit into eight-bit binary ASCII codes according to the rule of A=00, G=01, C=10, T or U=11, and converting the eight-bit binary ASCII codes into characters to obtain a sixth information set; and, in addition, the processing unit,
converting the nucleic acid sequence of the last n bits in each decoding unit into quaternary numbers according to the rule of A=0, G=1, C=2, T or U=3, and converting into decimal numbers to obtain a seventh information set;
and connecting the sixth information set and the seventh information set in all decoding units to obtain the decoding information.
In the encoding process, a hybrid encoding method is adopted, the non-digital character in the artificial nucleic acid sequence watermark encoding system is converted as in scheme one, and the decimal digital character string is directly subjected to quaternary conversion (the converted quaternary number should have a fixed number of bits, the digital front end is complemented to the bit number by 0 when the bit is short, for example, the quaternary number is used in this example, the quaternary number 130 should be complemented to 0130), and the binary sequence is mapped to the nucleic acid sequence (for example, 0130→agta) according to the rule of 0=a, 1=g, 2=c, 3=t or U.
For example: biossen_a003S 001O001R001I005P001B001C010T003_vector;
the artificial nucleic acid sequence watermark encoding system of this case has the following nucleic acid sequences converted in example 2:
in the decoding process, the sequencing sequence is first cut into sub-sequences of length 4+n (n is a fixed number of bits used by the quaternary number when the number is coded, in this case 4), each sub-sequence representing an information unit: in each information unit, the first 4-bit nucleic acid sequence is converted into an 8-bit binary ASCII code by means of a=00, g=01, c=10, t or u=11, and into characters; the latter n-bit sequence is converted into a quaternary number by a rule of a= 0,G =1, c=2, t or u=3, and then into a decimal number; and connecting the characters restored by all the information units and decimal numbers, and restoring to obtain the variable coding character string.
Sequencing example 2:
the plasmid numbered biosystem_a003S 001O001r001I001 Vector, primer 5'-GAACGCCGGCTTGTATGTCGG-3' was selected for Sanger sequencing with the sequencing result:
decoding was performed as in example 2, resulting in:
biosystem_a003S 001O001R001I001_vector, completely restores the encoded information, and the test is successful.
Example 3:
In addition, biological products are widely related to aspects of medical treatment and health, food safety, crop planting, livestock farming and the like, and are closely related to life health and safety of human beings, so that the uniqueness and the correspondence of the products in the logistics, storage and clinical delivery processes of the biological products are important in ensuring the application of biological safety, in the existing solution, the uniqueness of the products can be ensured by adding a sealing means and manually checking physical labels, the products correspond to factory products, and the defects of safety and anti-counterfeiting can be overcome by adopting the way of adding a watermark coding system and containing watermark character strings in the means, but in the method, the situation that the replacement labels are replaced by corresponding artificial nucleic acid sequence products cannot be controlled in the middle links exists.
In order to solve the above-mentioned problems, in this embodiment, an encryption and decryption method for checking a specific biological product is provided, and an artificial nucleic acid sequence watermark encoding system, a watermark string technique, and an encryption algorithm are comprehensively utilized, specifically as follows:
after the step of presetting the watermark character string, the variable coding section and the usage scene section, the method further comprises the following steps:
In the encoding method:
generating a public key and a private key based on an SHA-256 algorithm according to the preset watermark character string, the variable coding section and the use scene section;
based on the private key, carrying out encryption calculation on the watermark character string, the variable coding section and the usage scene section to obtain the watermark character string, the variable coding section and the usage scene section which are encrypted by the private key;
and in the step of encoding the watermark character string, the variable encoding section and the using scene section to obtain a target artificial nucleic acid sequence, the watermark character string, the variable encoding section and the using scene section are the watermark character string, the variable encoding section and the using scene section which are encrypted by the private key.
In the decoding method:
the step of "performing information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the usage scene section" further includes, after obtaining decoding information:
decrypting the decoding information by using the public key to obtain a first hash value; meanwhile, calculating the decoding information by using an SHA-256 algorithm to obtain a second hash value;
Comparing the first hash value with the second hash value, and judging whether the first hash value is consistent with the second hash value or not;
and if the first hash value is consistent with the second hash value, judging that the test is passed.
In the above-mentioned process, the public key and the private key are generated together, wherein the private key can calculate the public key, and the public key cannot calculate the private key.
The private key is only held by a manufacturer and is only used for calculating and encoding the sequence of the prepared product, and specifically, the private key is used for encrypting and calculating the watermark character string, the variable encoding section and the usage scene section to obtain the watermark character string, the variable encoding section and the usage scene section which are encrypted by the private key; each item (category) is subjected to private key calculation to obtain a corresponding 256-bit digital value, and then the obtained digital value is utilized to obtain the target artificial nucleic acid sequence correspondingly.
The public key disclosure may be held by the transferee, publicly available, e.g., with the product transferred to the transferee, or obtained by the transferee at the disclosure.
The transferee calculates the nucleic acid sequence obtained after decoding by using the public key to obtain a 256-bit hash value, namely a first hash value.
Meanwhile, the SHA-256 algorithm is utilized to calculate the nucleic acid sequence, and a 256-bit hash value is obtained, namely a second hash value.
Comparing the two hash values, if the first hash value is consistent with the second hash value, i.e. the first hash value = second hash value, the following conclusion can be demonstrated:
1. the resulting decoded information is for the resulting product having the artificial nucleic acid sequence of interest;
2. this decoding information corresponds to the transferor of the product;
3. the received decoded information is the same as the original record of the product that the transferor has taken out of stock and has not been tampered with.
The SHA-256 algorithm, which is an algorithm subdivided under SHA-2, is a hash function, also called hash algorithm, and is a method for creating small digital "fingerprints" from any kind of data. The hash function compresses the message or data into a digest, making the amount of data smaller, fixing the format of the data. The function mixes the data scrambling and recreates a fingerprint called a hash value (or hash value). The hash value is typically represented by a short string of random letters and numbers.
For example, after acquiring biological product a of company B (shipped according to its order) after cold chain transportation according to the order of the hospital, the biological product a is subjected to Sanger sequencing, and the sequencing result is:
Decoding is performed in the method of embodiment 2, and the result (i.e., decoding information) is:
Biosysen_A003S001O001R001I001_Vector;
the decoding information is calculated by using the public key corresponding to the biological product A as follows, "biosystem_A003S 001O001R001I001_vector" to obtain a first hash value, namely:
55CA1E14F0FC5D3C3FDE11E223B8ACA5A7944297866DCF923F1F5E 5DD80B2075;
meanwhile, the decoding information 'biosystem_a003S 001O001R001I001 Vector' is calculated by using the SHA-256 calculation tool to obtain a second hash value, namely:
55CA1E14F0FC5D3C3FDE11E223B8ACA5A7944297866DCF923F1F5E 5DD80B2075;
matching, i.e. comparing, the first hash value with the second hash value, and obtaining a conclusion through comparison: the first hash value is consistent with the second hash value, i.e. the first hash value=the second hash value, then it may be determined that:
1. the resulting decoded information is for the resulting biological product a having the artificial nucleic acid sequence of interest;
2. this decoded information is produced by company B corresponding to the biological product a;
3. the received decoding information is the same as the original record of the product A which is taken out of the company B and is not tampered.
That is, the product a obtained by the X hospital is produced by the company B, and is a designated product corresponding to the order intention required by the X hospital, which the company B sends to the X hospital.
In this embodiment, the encryption and decryption are performed on the information in the encoding method and the decoding method by using the RSA asymmetric encryption algorithm, so that in the product circulation process, the encryption and decryption are performed by using the algorithm until the source reaches the destination of the product, the encryption information is written in the sequence of the biochemical product, then after the product is obtained, the sequencing and encoding are performed, the decryption is performed by using the public key, and finally, after the comparison of the hash values, the unique conclusion of the product is obtained, on one hand, the safety, the uniqueness and the correspondence of the product are improved, and on the other hand, the intellectual property rights of the design, the research and the production parties are fully ensured, so that the product achieves the high confidentiality of the biological information level.
The encryption and decryption threshold is greatly improved, the RSA asymmetric encryption method of the blockchain is fully combined with the watermark coding system of the artificial nucleic acid sequence and the coding and decoding of the character string for the first time, and the possibility that the product is replaced and falsified and information is falsified in the logistics process and the transfer assigned process is further avoided based on the embodiment from the multi-link positions of research, production, clinical medical treatment, industrial application and the like, so that the security problem that the biological product is applied to the specific food and medical and health fields in different links is more facilitated.
While the preferred embodiments and examples of the present invention have been described, it should be noted that those skilled in the art may make various modifications and improvements without departing from the inventive concept, including but not limited to, adjustments of proportions, procedures, and amounts, which fall within the scope of the present invention.

Claims (4)

1. A method of encoding and decoding an artificial nucleic acid sequence watermark encoding system, comprising:
the coding method comprises the following steps:
presetting a watermark character string, a variable coding section and a use scene section;
encoding the watermark character string, the variable encoding section and the use scene section to obtain a target artificial nucleic acid sequence;
The decoding method comprises the following steps:
performing Sanger sequencing on the target artificial nucleic acid sequence to obtain a sequencing result;
obtaining watermark character strings, variable coding segments and nucleic acid sequences corresponding to using scene segments, wherein the watermark character strings, the variable coding segments and the nucleic acid sequences correspond to the upstream position, the midstream position and the downstream position respectively, according to the sequencing result;
wherein, the nucleic acid sequence corresponding to the watermark character string comprises: the nucleic acid sequence, the reverse complement of the nucleic acid sequence, the RNA sequence and/or the reverse complement of the RNA sequence are each as follows:
the nucleic acid sequence is as follows:
GAACGCCGGCTTGTATGTCGGTATGCGGGCTCGGTT;
the reverse complement of the nucleic acid sequence is:
AACCGAGCCCGCATACCGACATACAAGCCGGCGTTC;
the RNA sequence is as follows:
GAACGCCGGCUUGUAUGUCGGUAUGCGGGCUCGGUU;
the reverse complement of the RNA sequence is:
AACCGAGCCCGCAUACCGACAUACAAGCCGGCGUUC;
performing information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the use scene section to obtain decoding information;
the watermark character string is used for identifying specific unit information;
the variable encoding section includes at least one information unit;
the information unit comprises a nucleic acid characteristic part and a numbering part which is connected with the nucleic acid characteristic part and corresponds to the nucleic acid characteristic part;
the nucleic acid features in each of the information units are used to represent DNA assembly methods and interface information, screening marker information, replication origin information, reporter gene information, insert backbone type information, insert promoter information, insert RBS information, and insert CDS information or insert terminator information;
The usage scenario segment includes a bacterial artificial chromosome identification, a yeast artificial chromosome identification, a nuclear genome identification, a mitochondrial genome identification, and/or a chloroplast genome identification RNA fragment identification for identifying the usage scenario.
2. The encoding and decoding method of an artificial nucleic acid sequence watermark encoding system according to claim 1, wherein said watermark string length is not less than 3;
in the variable encoding section, a plurality of the information units are arranged in 5 'to 3' order for representing the kinds and corresponding numbers of the functional fragments on the artificial nucleic acid fragment.
3. A method of encoding and decoding an artificial nucleic acid sequence watermark encoding system according to claim 1,
in the encoding method, "encoding the watermark character string, the variable encoding section, and the usage scene section, obtaining a target artificial nucleic acid sequence" includes:
converting each character in the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the use scene section into an eight-bit binary number coded by ASCII bytes to obtain a first information set;
converting the first information set into a quaternary number according to a 00=a, 01=g, 10=c, 11=t or U mapping, so as to obtain the target artificial nucleic acid sequence;
In the decoding method, the step of performing information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the usage scene section to obtain decoding information includes:
converting the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the using scene section into binary digital characters according to the rules of A=00, G=01, C=10, T or U=11, so as to obtain a second information set;
and (3) encoding the second information set by using ASCII with 8 bits as a group, and converting the ASCII into character strings to obtain decoding information corresponding to the target artificial nucleic acid sequence.
4. A method of encoding and decoding an artificial nucleic acid sequence watermark encoding system according to claim 1,
in the encoding method, "encoding the watermark character string, the variable encoding section, and the usage scene section, obtaining a target artificial nucleic acid sequence" includes:
respectively obtaining non-digital characters and decimal digital characters in the nucleic acid sequences corresponding to the watermark character string, the variable coding section and the use scene section;
converting the non-digital character into an eight-bit binary number encoded by ASCII bytes to obtain a third information set;
Mapping the third information set according to a rule of 00=a, 01=g, 10=c, 11=t or U, and converting the third information set into a four-bit quaternary number to obtain a fourth information set;
performing quaternary conversion on the decimal digital character; according to the fixed digit number of the quaternary digit converted by the quaternary system, the front end of the digit which is less than the fixed digit number after the quaternary system conversion is complemented by 0 to reach the fixed digit number; obtaining a fifth information set;
mapping the fifth information set according to rules of 0=a, 1=g, 2=c, 3=t or U, and converting the mapping into decimal numbers to obtain the target artificial nucleic acid sequence;
in the decoding method, the step of performing information interpretation and conversion on the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the usage scene section to obtain decoding information includes:
cutting the watermark character string, the variable coding section and the nucleic acid sequence corresponding to the use scene section into subsequences with the length of 4+n, and taking each subsequence as a decoding unit; wherein n is a fixed number of bits of the quaternary number converted by the quaternary system;
converting the nucleic acid sequence of the first four bits in each decoding unit into eight-bit binary ASCII codes according to the rule of A=00, G=01, C=10, T or U=11, and converting the eight-bit binary ASCII codes into characters to obtain a sixth information set; and, in addition, the processing unit,
Converting the nucleic acid sequence of the last n bits in each decoding unit into quaternary numbers according to the rule of A=0, G=1, C=2, T or U=3, and converting into decimal numbers to obtain a seventh information set;
and connecting the sixth information set and the seventh information set in all decoding units to obtain the decoding information.
CN202110716040.6A 2021-06-25 2021-06-25 Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method Active CN113380322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110716040.6A CN113380322B (en) 2021-06-25 2021-06-25 Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110716040.6A CN113380322B (en) 2021-06-25 2021-06-25 Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method

Publications (2)

Publication Number Publication Date
CN113380322A CN113380322A (en) 2021-09-10
CN113380322B true CN113380322B (en) 2023-10-24

Family

ID=77579354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110716040.6A Active CN113380322B (en) 2021-06-25 2021-06-25 Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method

Country Status (1)

Country Link
CN (1) CN113380322B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040094550A (en) * 2003-05-03 2004-11-10 한국과학기술원 Digital Watermarking Method And Digital Watermark Certification Method Utilizing Gene Information
CN101200763A (en) * 2007-11-13 2008-06-18 东南大学 Coding and decoding method for determined nucleic acid sequence
WO2009013910A1 (en) * 2007-07-24 2009-01-29 Keio University Encoder apparatus, decoder apparatus, and information recording medium
CN101400803A (en) * 2006-01-19 2009-04-01 纽约州立大学研究基金会 Methods and devices for detection and identification of encoded beads and biological molecules
CN101932725A (en) * 2007-10-08 2010-12-29 合成基因组股份有限公司 The assembling of large nucleic acids
CN102007407A (en) * 2007-11-21 2011-04-06 考斯摩斯德公司 Genome identification system
WO2011053868A1 (en) * 2009-10-30 2011-05-05 Synthetic Genomics, Inc. Encoding text into nucleic acid sequences
WO2014025056A1 (en) * 2012-08-07 2014-02-13 Hitachi Zosen Corporation Method of encryption and decryption using nucleic acid
KR20140096788A (en) * 2013-01-29 2014-08-06 동명대학교산학협력단 Dna watermarking scheme robust to mutation
CN104850760A (en) * 2015-03-27 2015-08-19 苏州泓迅生物科技有限公司 Artificially synthesized DNA storage medium with coding information, storage reading method for information, and applications
TW201541346A (en) * 2014-04-22 2015-11-01 Univ Nat Taiwan Ocean Method for protecting DNA sequence content by using watermark technology
CN105119717A (en) * 2015-07-21 2015-12-02 郑州轻工业学院 DNA coding based encryption system and encryption method
WO2019040871A1 (en) * 2017-08-24 2019-02-28 Miller Julian Device for information encoding and, storage using artificially expanded alphabets of nucleic acids and other analogous polymers
CN109460822A (en) * 2018-11-19 2019-03-12 天津大学 Information storage means based on DNA
CN109830263A (en) * 2019-01-30 2019-05-31 东南大学 A kind of DNA storage method based on oligonucleotide sequence code storage
CN110060734A (en) * 2019-03-29 2019-07-26 天津大学 A kind of high robust DNA sequencing bar code generating at and read method
CN110352253A (en) * 2016-07-22 2019-10-18 核素示踪有限公司 The method of amplifying nucleic acid sequence
CN111095423A (en) * 2017-08-25 2020-05-01 深圳华大生命科学研究院 Encoding/decoding method, apparatus and data processing apparatus

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110269119A1 (en) * 2009-10-30 2011-11-03 Synthetic Genomics, Inc. Encoding text into nucleic acid sequences
US20170338943A1 (en) * 2014-10-29 2017-11-23 Massachusetts Institute Of Technology Dna encryption technologies
FR3044140B1 (en) * 2015-11-24 2017-11-24 B<>Com METHOD FOR PROCESSING A SEQUENCE OF DIGITAL IMAGES, TATTOOTING METHOD, DEVICES AND COMPUTER PROGRAMS
US10726110B2 (en) * 2017-03-01 2020-07-28 Seven Bridges Genomics, Inc. Watermarking for data security in bioinformatic sequence analysis
US10810495B2 (en) * 2017-09-20 2020-10-20 University Of Wyoming Methods for data encoding in DNA and genetically modified organism authentication

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040094550A (en) * 2003-05-03 2004-11-10 한국과학기술원 Digital Watermarking Method And Digital Watermark Certification Method Utilizing Gene Information
CN101400803A (en) * 2006-01-19 2009-04-01 纽约州立大学研究基金会 Methods and devices for detection and identification of encoded beads and biological molecules
WO2009013910A1 (en) * 2007-07-24 2009-01-29 Keio University Encoder apparatus, decoder apparatus, and information recording medium
CN101932725A (en) * 2007-10-08 2010-12-29 合成基因组股份有限公司 The assembling of large nucleic acids
CN101200763A (en) * 2007-11-13 2008-06-18 东南大学 Coding and decoding method for determined nucleic acid sequence
CN102007407A (en) * 2007-11-21 2011-04-06 考斯摩斯德公司 Genome identification system
WO2011053868A1 (en) * 2009-10-30 2011-05-05 Synthetic Genomics, Inc. Encoding text into nucleic acid sequences
WO2014025056A1 (en) * 2012-08-07 2014-02-13 Hitachi Zosen Corporation Method of encryption and decryption using nucleic acid
KR20140096788A (en) * 2013-01-29 2014-08-06 동명대학교산학협력단 Dna watermarking scheme robust to mutation
TW201541346A (en) * 2014-04-22 2015-11-01 Univ Nat Taiwan Ocean Method for protecting DNA sequence content by using watermark technology
CN104850760A (en) * 2015-03-27 2015-08-19 苏州泓迅生物科技有限公司 Artificially synthesized DNA storage medium with coding information, storage reading method for information, and applications
CN105119717A (en) * 2015-07-21 2015-12-02 郑州轻工业学院 DNA coding based encryption system and encryption method
CN110352253A (en) * 2016-07-22 2019-10-18 核素示踪有限公司 The method of amplifying nucleic acid sequence
WO2019040871A1 (en) * 2017-08-24 2019-02-28 Miller Julian Device for information encoding and, storage using artificially expanded alphabets of nucleic acids and other analogous polymers
CN111095423A (en) * 2017-08-25 2020-05-01 深圳华大生命科学研究院 Encoding/decoding method, apparatus and data processing apparatus
CN109460822A (en) * 2018-11-19 2019-03-12 天津大学 Information storage means based on DNA
CN109830263A (en) * 2019-01-30 2019-05-31 东南大学 A kind of DNA storage method based on oligonucleotide sequence code storage
CN110060734A (en) * 2019-03-29 2019-07-26 天津大学 A kind of high robust DNA sequencing bar code generating at and read method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
数字指纹的研究及其在图像版权保护方面的应用;王玉军;中国优秀硕士学位论文全文数据库(第06期);全文 *
遗传算法在图像水印嵌入技术中的实现;徐熙君, 许曰滨;青岛大学学报(自然科学版)(第02期);全文 *

Also Published As

Publication number Publication date
CN113380322A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
Welsh et al. Genomic fingerprints produced by PCR with consensus tRNA gene primers
Baranov et al. Transcriptional slippage in bacteria: distribution in sequenced genomes and utilization in IS element gene expression
CN107798219B (en) Method for biologically storing and restoring data
Moilanen et al. Phylogenetic network and physicochemical properties of nonsynonymous mutations in the protein-coding genes of human mitochondrial DNA
CN112802549B (en) Coding and decoding method for DNA sequence integrity check and error correction
DeWoody et al. A translocated mitochondrial cytochrome b pseudogene in voles (Rodentia: Microtus)
Pfeiffer et al. The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains
Pati et al. Complete genome sequence of Arcobacter nitrofigilis type strain (CI T)
Heider et al. DNA watermarking: challenging perspectives for biotechnological applications
CN113380322B (en) Artificial nucleic acid sequence watermark coding system, watermark character string and coding and decoding method
Metzgar et al. The microsatellites of Escherichia coli: rapidly evolving repetitive DNAs in a non‐pathogenic prokaryote
Wang et al. Theoretical prediction and experimental verification of protein-coding genes in plant pathogen genome Agrobacterium tumefaciens strain C58
CN111095423B (en) Encoding/decoding method, apparatus and data processing apparatus
Nakano et al. Phylogenetic trees of closely related bacterial species and subspecies based on frequencies of short nucleotide sequences
Yachie et al. Stabilizing synthetic data in the DNA of living organisms
Lee et al. Reversible DNA data hiding using multiple difference expansions for DNA authentication and storage
Jiao et al. Code for encryption hiding data into genomic DNA of living organisms
Pombert et al. The complete mitochondrial genome from an unidentified Phalansterium species
Abdullah et al. New data hiding approach based on biological functionality of DNA sequence
Rodriguez et al. Characterization of twenty‐four microsatellite markers for rainbow trout (Oncorhynchus mykiss)
WO2014025056A1 (en) Method of encryption and decryption using nucleic acid
Assis et al. A strong deletion bias in nonallelic gene conversion
Jiao et al. Hiding data in DNA of living organisms
Pchelin et al. Genotyping of Russian isolates of fungal pathogen Trichophyton rubrum, based on simple sequence repeat and single nucleotide polymorphism
Eastman et al. Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant