CN110718272B - Non-numerical field encryption and decryption method based on gene sequence and gene function - Google Patents
Non-numerical field encryption and decryption method based on gene sequence and gene function Download PDFInfo
- Publication number
- CN110718272B CN110718272B CN201910850865.XA CN201910850865A CN110718272B CN 110718272 B CN110718272 B CN 110718272B CN 201910850865 A CN201910850865 A CN 201910850865A CN 110718272 B CN110718272 B CN 110718272B
- Authority
- CN
- China
- Prior art keywords
- gene
- gene sequence
- sequences
- markov
- monte carlo
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/40—Encryption of genetic data
Abstract
The invention provides an encryption and decryption method of non-numerical fields based on gene sequences and gene functions, which is characterized by comprising the following steps of: establishing a Markov-Monte Carlo model; for each non-numerical field in the database, acquiring a gene function and gene sequence training Markov-Monte Carlo model; and carrying out decryption operation by utilizing the trained Markov-Monte Carlo model.
Description
Technical Field
The invention relates to an encryption method of a non-numerical value field.
Background
The modern society is a big data information era, and data analysis and big data analysis are not necessary for clothes, eating, living and working of people, so that the data becomes a precious wealth more and more. Although the era of big data information advocates data opening and data sharing, many sensitive data related to, for example, business core interests, national confidential security, individual privacy, etc., are still valuable resources that companies, countries, and individuals cannot infringe. In order to ensure the data security of such sensitive data and prevent the sensitive data from being maliciously stolen, the corresponding database needs to be encrypted for the sensitive data.
The difficulty of encrypting the non-numerical field of the database is as follows:
(1) the content of the field and the type of encoding of the field are not controlled
Unlike the encrypted content (i.e., plaintext) encrypted in the numeric field, which is only for numbers, the encrypted content encrypted in the non-numeric field is theoretically uncontrolled and can be a combination of characters, symbols, and numbers in various countries.
(2) It is not feasible to use a fixed one-to-one mapping of plaintext and ciphertext
Any encryption algorithm based on a one-to-one mapping between plaintext and ciphertext is a relatively weak encryption method. Encryption of non-numeric fields is no exception and such encryption methods should be avoided.
Disclosure of Invention
The purpose of the invention is: an encryption method for non-numeric fields of a database is provided.
In order to achieve the above object, the technical solution of the present invention is to provide an encryption and decryption method for non-numeric fields based on gene sequences and gene functions, comprising the steps of:
step 1, establishing a Markov-Monte Carlo model;
step 2, for each non-numerical field in the database, taking out all different plaintexts in the non-numerical field, encrypting each different plaintexts to generate gene functions, wherein each gene function generates Y different ciphertexts, and each cipher text is a gene sequence;
step 3, calculating the length len (seq (x)) of each gene sequence, seq (x) represents the gene sequence x, screening all the gene sequences with the length len (N) from the gene sequence data set to form a gene sequence set for training, inputting the gene sequence set into the Markov-Monte Carlo model established in the step 1, and calculating the conditional probability distribution of all possible letters after each position gives the letter from the first position to the last position of the gene sequence by the Markov-Monte Carlo model, thereby completing the training of the Markov-Monte Carlo model;
step 4, acquiring all different plaintexts in the non-numerical field to be encrypted, encrypting each different plaintexts to generate a gene function, wherein each gene function generates Y different ciphertexts, each cipher text is a gene sequence, and all the gene sequences form a gene sequence set;
step 5, during decryption, obtaining the gene sequences to be compared, calculating the length of the gene sequences to be compared, and selecting all the gene sequences with the same length from the gene sequence set to form a gene sequence subset;
step 6, inputting the gene sequence subset and the gene sequence to be compared into a trained Markov-Monte Carlo model for comparison, and matching k gene sequences with the highest similarity from the gene sequence subset by utilizing maximum likelihood estimation under the condition probability to form a new gene sequence data subset;
and 7, returning the gene functions corresponding to all the gene sequences in the gene sequence data subset according to the new gene sequence data subset obtained in the previous step, calculating the ratio of the gene function with the largest ratio in the gene functions, wherein if the ratio is more than or equal to p, the gene function is the gene function corresponding to the gene sequence to be compared, and otherwise, returning to the step 5.
Preferably, in step 2 and step 4, encryption of base64 is performed for each different plaintext.
Preferably, in step 6, the alignment length z is input, and the contents of the aligned gene sequences are aligned z by z according to positions when alignment is performed.
Preferably, in step 6, k is an externally input parameter; in step 7, p is an externally input parameter.
The invention has the advantages that: all texts can be encrypted; the efficiency of encryption-decryption is not reduced due to the increase of data; a mapping relation which cannot be cracked by violence exists between a plaintext and a ciphertext.
Drawings
FIG. 1 is a flow chart of training for each field;
FIG. 2 is a partial flow chart after threading;
FIG. 3 is a flow chart of the model training phase of the present invention;
FIG. 4 is a flowchart of the present invention after it is on-line.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The encryption algorithm provided by the invention is proposed based on a gene sequence and a gene function:
the gene sequence in biology refers to a letter sequence representing, for example, DNA or the like. The gene function refers to a function that a gene sequence (gene fragment) can exert, such as controlling hormone secretion and the like. Different gene sequences can provide similar or even identical gene functions, and the same class of gene functions may correspond to many different gene sequences.
In the encryption algorithm, the plain texts of the non-numerical value fields of the database correspond to gene functions, and the cipher texts correspond to gene sequences, so that a plurality of different cipher texts can be generated for a specific plain text. And when the ciphertext is decrypted, the set plaintext can be corresponded.
The encryption of the non-numerical field of the database by using the gene sequence and the gene function needs to overcome the following difficulties:
the encryption method based on gene sequences and gene functions seems to be just one-to-many mapping of plaintext corresponding to ciphertext, however, in reality, as the data volume in the database is large, the variety and content of the plaintext are rich, so that the variety of the ciphertext is large. For example, if there are 100 ten thousand plaintext items each corresponding to 10 ciphertext items and there are 1000 ten thousand different ciphertext items, then by directly mapping one to many, there may be performance problems when converting plaintext and ciphertext, and there may be a problem in response. However, in some scenarios, the response is not timely fatal. Therefore, it is necessary to accelerate the conversion between plaintext and ciphertext using an algorithm.
The gene sequence and gene function based encryption method can realize gene sequence comparison by directly solving the joint distribution of corresponding letters at each position in the gene sequence. However, in the case that random variables of each dimension in such a joint distribution are strongly correlated with each other, directly solving the joint distribution tends to fall into local extrema (which is why the performance is not high when a one-to-many mapping method is directly used for matching), and this performance weakness can be avoided if an algorithm based on a markov chain monte carlo computing framework is used.
Essentially, the algorithm based on the Markov chain Monte Carlo calculation framework is to fit the joint distribution of the gene sequences by calculating the conditional probability distribution of the gene sequences, and is a very efficient method in the scene of sequence matching.
The designations of the symbols in the present invention are shown in the following table:
content providing method and apparatus | Mark |
Sequence data set | { seq (1), seq (2), seq (3),., seq (n) } or |
Specifically, the invention provides an encryption and decryption method based on non-numerical fields of gene sequences and gene functions, which comprises the following steps:
step 1, establishing a Markov-Monte Carlo model;
step 2, for each non-numerical field in the database, taking out all different plaintexts in the non-numerical field, carrying out base64 encryption on each different plaintexts to generate gene functions, wherein each gene function generates Y different ciphertexts, and each cipher text is a gene sequence;
step 3, calculating the length len (seq (x)) of each gene sequence, seq (x) represents the gene sequence x, screening all the gene sequences with the length len (N) from the gene sequence data set to form a gene sequence set for training, inputting the gene sequence set into the Markov-Monte Carlo model established in the step 1, and calculating the conditional probability distribution of all possible letters after each position gives the letter from the first position to the last position of the gene sequence by the Markov-Monte Carlo model, thereby completing the training of the Markov-Monte Carlo model;
step 4, acquiring all different plaintexts in the non-numerical field to be encrypted, encrypting base64 on each different plaintexts to generate gene functions, wherein each gene function generates Y different ciphertexts, each cipher text is a gene sequence, and all the gene sequences form a gene sequence set;
step 5, during decryption, obtaining the gene sequences to be compared, calculating the length of the gene sequences to be compared, and selecting all the gene sequences with the same length from the gene sequence set to form a gene sequence subset;
step 6, inputting the gene sequence subset and the gene sequence to be compared into a trained Markov-Monte Carlo model for comparison, inputting a comparison length z, comparing the contents of z comparison gene sequences by position during comparison, and matching k gene sequences with the highest similarity from the gene sequence subset by utilizing maximum likelihood estimation under condition probability to form a new gene sequence data subset, wherein k is an externally input parameter;
and 7, returning the gene functions corresponding to all the gene sequences in the gene sequence data subset according to the new gene sequence data subset obtained in the previous step, calculating the ratio of the gene function with the largest ratio in the gene functions, wherein if the ratio is more than or equal to p, and p is an externally input parameter, the gene function is the gene function corresponding to the gene sequence to be compared, and otherwise, returning to the step 5. As more and more data is extracted, the likelihood becomes greater and can theoretically reach 1.
Claims (4)
1. A method for encrypting and decrypting non-numerical fields based on gene sequences and gene functions is characterized by comprising the following steps:
step 1, establishing a Markov-Monte Carlo model;
step 2, for each non-numerical field in the database, taking out all different plaintexts in the non-numerical field, encrypting each different plaintexts to generate gene functions, wherein each gene function generates Y different ciphertexts, and each cipher text is a gene sequence;
step 3, calculating the length len (seq (x)) of each gene sequence, seq (x) represents the gene sequence x, screening all the gene sequences with the length len (N) from the gene sequence data set to form a gene sequence set for training, inputting the gene sequence set into the Markov-Monte Carlo model established in the step 1, and calculating the conditional probability distribution of all possible letters after each position gives the letter from the first position to the last position of the gene sequence by the Markov-Monte Carlo model, thereby completing the training of the Markov-Monte Carlo model;
step 4, acquiring all different plaintexts in the non-numerical field to be encrypted, encrypting each different plaintexts to generate a gene function, wherein each gene function generates Y different ciphertexts, each cipher text is a gene sequence, and all the gene sequences form a gene sequence set;
step 5, during decryption, obtaining the gene sequences to be compared, calculating the length of the gene sequences to be compared, and selecting all the gene sequences with the same length from the gene sequence set to form a gene sequence subset;
step 6, inputting the gene sequence subset and the gene sequence to be compared into a trained Markov-Monte Carlo model for comparison, and matching k gene sequences with the highest similarity from the gene sequence subset by utilizing maximum likelihood estimation under the condition probability to form a new gene sequence data subset;
and 7, returning the gene functions corresponding to all the gene sequences in the gene sequence data subset according to the new gene sequence data subset obtained in the previous step, calculating the ratio of the gene function with the largest ratio in the gene functions, if the ratio is more than or equal to p, determining the gene function as the gene function corresponding to the gene sequence to be compared, obtaining the plaintext corresponding to the gene sequence to be compared based on the gene function corresponding to the gene sequence to be compared, and otherwise, returning to the step 5.
2. The method for encrypting and decrypting the non-numerical field based on the gene sequence and the gene function as claimed in claim 1, wherein in the step 2 and the step 4, the encryption of the base64 is performed for each different plain text.
3. The encryption and decryption method of non-numerical fields based on gene sequences and gene functions as claimed in claim 1, wherein in step 6, the alignment length z is inputted, and when performing alignment, the contents of the gene sequences are aligned by z positions.
4. The encryption and decryption method of claim 1, wherein in step 6, k is an externally inputted parameter; in step 7, p is an externally input parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910850865.XA CN110718272B (en) | 2019-09-10 | 2019-09-10 | Non-numerical field encryption and decryption method based on gene sequence and gene function |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910850865.XA CN110718272B (en) | 2019-09-10 | 2019-09-10 | Non-numerical field encryption and decryption method based on gene sequence and gene function |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110718272A CN110718272A (en) | 2020-01-21 |
CN110718272B true CN110718272B (en) | 2020-11-17 |
Family
ID=69209694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910850865.XA Active CN110718272B (en) | 2019-09-10 | 2019-09-10 | Non-numerical field encryption and decryption method based on gene sequence and gene function |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110718272B (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110202486A1 (en) * | 2009-07-21 | 2011-08-18 | Glenn Fung | Healthcare Information Technology System for Predicting Development of Cardiovascular Conditions |
CN106817218A (en) * | 2015-12-01 | 2017-06-09 | 国基电子(上海)有限公司 | Encryption method based on DNA technique |
US20180322246A1 (en) * | 2017-05-04 | 2018-11-08 | Annai Systems, Inc. | System and method for secure, high-speed transfer of very large files |
CN110070914B (en) * | 2019-03-15 | 2020-07-03 | 崔大超 | Gene sequence identification method, system and computer readable storage medium |
-
2019
- 2019-09-10 CN CN201910850865.XA patent/CN110718272B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110718272A (en) | 2020-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rahulamathavan et al. | Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud | |
Al-Harbi et al. | Security analysis of DNA based steganography techniques | |
Oktaviana et al. | Three-pass protocol implementation in caesar cipher classic cryptography | |
Shinge et al. | An encryption algorithm based on ASCII value of data | |
Mahdi et al. | A proposed lightweight image encryption using ChaCha with hyperchaotic maps | |
Tan et al. | An approach to identifying cryptographic algorithm from ciphertext | |
CN102447558B (en) | Encryption method using random sequence on-demand and having misleading function | |
CN102412963B (en) | Random sequence based encryption method with misleading function | |
Salmi et al. | Implementation of the data encryption using caesar cipher and vernam cipher methods based on CrypTool2 | |
Hammad et al. | Implementation of combined steganography and cryptography vigenere cipher, caesar cipher and converting periodic tables for securing secret message | |
CN106357608A (en) | Method for encrypting and decrypting private data for personal healthcare data | |
CN110718272B (en) | Non-numerical field encryption and decryption method based on gene sequence and gene function | |
Al-Sabaawi | Cryptanalysis of Vigenère cipher: method implementation | |
Fahrianto et al. | Encrypted SMS application on Android with combination of caesar cipher and vigenere algorithm | |
CN112906052B (en) | Aggregation method of multi-user gradient permutation in federated learning | |
Siahaan et al. | Implementation of super playfair in messaging | |
Kim et al. | Robust lightweight fingerprint encryption using random block feedback | |
Ghrare et al. | New text encryption method based on hidden encrypted symmetric key | |
Kumar et al. | SCLCT: Secured cross language cipher technique | |
Ardhianto et al. | A Comparative Experiment of Document Security Level on Parallel Encryption With Digit Arithmetic of Covertext and Parallel Encryption using Covertext | |
Garg et al. | Pentaplicative Cipher Technique | |
Zhao et al. | The research of cryptosystem recognition based on randomness test’s return value | |
Saxena et al. | Application of deep learning in classification of encrypted images | |
Arroyo et al. | A Modified Polybius Cipher with a New Element-in-Grid Sequencer | |
Yu et al. | Block ciphers identification scheme based on randomness test |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |