CN110718272B - Non-numerical field encryption and decryption method based on gene sequence and gene function - Google Patents

Non-numerical field encryption and decryption method based on gene sequence and gene function Download PDF

Info

Publication number
CN110718272B
CN110718272B CN201910850865.XA CN201910850865A CN110718272B CN 110718272 B CN110718272 B CN 110718272B CN 201910850865 A CN201910850865 A CN 201910850865A CN 110718272 B CN110718272 B CN 110718272B
Authority
CN
China
Prior art keywords
gene
gene sequence
sequences
markov
monte carlo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910850865.XA
Other languages
Chinese (zh)
Other versions
CN110718272A (en
Inventor
张毅骏
谭翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Para Software Co ltd
Original Assignee
Shanghai Para Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Para Software Co ltd filed Critical Shanghai Para Software Co ltd
Priority to CN201910850865.XA priority Critical patent/CN110718272B/en
Publication of CN110718272A publication Critical patent/CN110718272A/en
Application granted granted Critical
Publication of CN110718272B publication Critical patent/CN110718272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/40Encryption of genetic data

Abstract

The invention provides an encryption and decryption method of non-numerical fields based on gene sequences and gene functions, which is characterized by comprising the following steps of: establishing a Markov-Monte Carlo model; for each non-numerical field in the database, acquiring a gene function and gene sequence training Markov-Monte Carlo model; and carrying out decryption operation by utilizing the trained Markov-Monte Carlo model.

Description

Non-numerical field encryption and decryption method based on gene sequence and gene function
Technical Field
The invention relates to an encryption method of a non-numerical value field.
Background
The modern society is a big data information era, and data analysis and big data analysis are not necessary for clothes, eating, living and working of people, so that the data becomes a precious wealth more and more. Although the era of big data information advocates data opening and data sharing, many sensitive data related to, for example, business core interests, national confidential security, individual privacy, etc., are still valuable resources that companies, countries, and individuals cannot infringe. In order to ensure the data security of such sensitive data and prevent the sensitive data from being maliciously stolen, the corresponding database needs to be encrypted for the sensitive data.
The difficulty of encrypting the non-numerical field of the database is as follows:
(1) the content of the field and the type of encoding of the field are not controlled
Unlike the encrypted content (i.e., plaintext) encrypted in the numeric field, which is only for numbers, the encrypted content encrypted in the non-numeric field is theoretically uncontrolled and can be a combination of characters, symbols, and numbers in various countries.
(2) It is not feasible to use a fixed one-to-one mapping of plaintext and ciphertext
Any encryption algorithm based on a one-to-one mapping between plaintext and ciphertext is a relatively weak encryption method. Encryption of non-numeric fields is no exception and such encryption methods should be avoided.
Disclosure of Invention
The purpose of the invention is: an encryption method for non-numeric fields of a database is provided.
In order to achieve the above object, the technical solution of the present invention is to provide an encryption and decryption method for non-numeric fields based on gene sequences and gene functions, comprising the steps of:
step 1, establishing a Markov-Monte Carlo model;
step 2, for each non-numerical field in the database, taking out all different plaintexts in the non-numerical field, encrypting each different plaintexts to generate gene functions, wherein each gene function generates Y different ciphertexts, and each cipher text is a gene sequence;
step 3, calculating the length len (seq (x)) of each gene sequence, seq (x) represents the gene sequence x, screening all the gene sequences with the length len (N) from the gene sequence data set to form a gene sequence set for training, inputting the gene sequence set into the Markov-Monte Carlo model established in the step 1, and calculating the conditional probability distribution of all possible letters after each position gives the letter from the first position to the last position of the gene sequence by the Markov-Monte Carlo model, thereby completing the training of the Markov-Monte Carlo model;
step 4, acquiring all different plaintexts in the non-numerical field to be encrypted, encrypting each different plaintexts to generate a gene function, wherein each gene function generates Y different ciphertexts, each cipher text is a gene sequence, and all the gene sequences form a gene sequence set;
step 5, during decryption, obtaining the gene sequences to be compared, calculating the length of the gene sequences to be compared, and selecting all the gene sequences with the same length from the gene sequence set to form a gene sequence subset;
step 6, inputting the gene sequence subset and the gene sequence to be compared into a trained Markov-Monte Carlo model for comparison, and matching k gene sequences with the highest similarity from the gene sequence subset by utilizing maximum likelihood estimation under the condition probability to form a new gene sequence data subset;
and 7, returning the gene functions corresponding to all the gene sequences in the gene sequence data subset according to the new gene sequence data subset obtained in the previous step, calculating the ratio of the gene function with the largest ratio in the gene functions, wherein if the ratio is more than or equal to p, the gene function is the gene function corresponding to the gene sequence to be compared, and otherwise, returning to the step 5.
Preferably, in step 2 and step 4, encryption of base64 is performed for each different plaintext.
Preferably, in step 6, the alignment length z is input, and the contents of the aligned gene sequences are aligned z by z according to positions when alignment is performed.
Preferably, in step 6, k is an externally input parameter; in step 7, p is an externally input parameter.
The invention has the advantages that: all texts can be encrypted; the efficiency of encryption-decryption is not reduced due to the increase of data; a mapping relation which cannot be cracked by violence exists between a plaintext and a ciphertext.
Drawings
FIG. 1 is a flow chart of training for each field;
FIG. 2 is a partial flow chart after threading;
FIG. 3 is a flow chart of the model training phase of the present invention;
FIG. 4 is a flowchart of the present invention after it is on-line.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The encryption algorithm provided by the invention is proposed based on a gene sequence and a gene function:
the gene sequence in biology refers to a letter sequence representing, for example, DNA or the like. The gene function refers to a function that a gene sequence (gene fragment) can exert, such as controlling hormone secretion and the like. Different gene sequences can provide similar or even identical gene functions, and the same class of gene functions may correspond to many different gene sequences.
In the encryption algorithm, the plain texts of the non-numerical value fields of the database correspond to gene functions, and the cipher texts correspond to gene sequences, so that a plurality of different cipher texts can be generated for a specific plain text. And when the ciphertext is decrypted, the set plaintext can be corresponded.
The encryption of the non-numerical field of the database by using the gene sequence and the gene function needs to overcome the following difficulties:
the encryption method based on gene sequences and gene functions seems to be just one-to-many mapping of plaintext corresponding to ciphertext, however, in reality, as the data volume in the database is large, the variety and content of the plaintext are rich, so that the variety of the ciphertext is large. For example, if there are 100 ten thousand plaintext items each corresponding to 10 ciphertext items and there are 1000 ten thousand different ciphertext items, then by directly mapping one to many, there may be performance problems when converting plaintext and ciphertext, and there may be a problem in response. However, in some scenarios, the response is not timely fatal. Therefore, it is necessary to accelerate the conversion between plaintext and ciphertext using an algorithm.
The gene sequence and gene function based encryption method can realize gene sequence comparison by directly solving the joint distribution of corresponding letters at each position in the gene sequence. However, in the case that random variables of each dimension in such a joint distribution are strongly correlated with each other, directly solving the joint distribution tends to fall into local extrema (which is why the performance is not high when a one-to-many mapping method is directly used for matching), and this performance weakness can be avoided if an algorithm based on a markov chain monte carlo computing framework is used.
Essentially, the algorithm based on the Markov chain Monte Carlo calculation framework is to fit the joint distribution of the gene sequences by calculating the conditional probability distribution of the gene sequences, and is a very efficient method in the scene of sequence matching.
The designations of the symbols in the present invention are shown in the following table:
content providing method and apparatus Mark
Sequence data set { seq (1), seq (2), seq (3),., seq (n) } or
Figure GDA0002671812040000031
Specifically, the invention provides an encryption and decryption method based on non-numerical fields of gene sequences and gene functions, which comprises the following steps:
step 1, establishing a Markov-Monte Carlo model;
step 2, for each non-numerical field in the database, taking out all different plaintexts in the non-numerical field, carrying out base64 encryption on each different plaintexts to generate gene functions, wherein each gene function generates Y different ciphertexts, and each cipher text is a gene sequence;
step 3, calculating the length len (seq (x)) of each gene sequence, seq (x) represents the gene sequence x, screening all the gene sequences with the length len (N) from the gene sequence data set to form a gene sequence set for training, inputting the gene sequence set into the Markov-Monte Carlo model established in the step 1, and calculating the conditional probability distribution of all possible letters after each position gives the letter from the first position to the last position of the gene sequence by the Markov-Monte Carlo model, thereby completing the training of the Markov-Monte Carlo model;
step 4, acquiring all different plaintexts in the non-numerical field to be encrypted, encrypting base64 on each different plaintexts to generate gene functions, wherein each gene function generates Y different ciphertexts, each cipher text is a gene sequence, and all the gene sequences form a gene sequence set;
step 5, during decryption, obtaining the gene sequences to be compared, calculating the length of the gene sequences to be compared, and selecting all the gene sequences with the same length from the gene sequence set to form a gene sequence subset;
step 6, inputting the gene sequence subset and the gene sequence to be compared into a trained Markov-Monte Carlo model for comparison, inputting a comparison length z, comparing the contents of z comparison gene sequences by position during comparison, and matching k gene sequences with the highest similarity from the gene sequence subset by utilizing maximum likelihood estimation under condition probability to form a new gene sequence data subset, wherein k is an externally input parameter;
and 7, returning the gene functions corresponding to all the gene sequences in the gene sequence data subset according to the new gene sequence data subset obtained in the previous step, calculating the ratio of the gene function with the largest ratio in the gene functions, wherein if the ratio is more than or equal to p, and p is an externally input parameter, the gene function is the gene function corresponding to the gene sequence to be compared, and otherwise, returning to the step 5. As more and more data is extracted, the likelihood becomes greater and can theoretically reach 1.

Claims (4)

1. A method for encrypting and decrypting non-numerical fields based on gene sequences and gene functions is characterized by comprising the following steps:
step 1, establishing a Markov-Monte Carlo model;
step 2, for each non-numerical field in the database, taking out all different plaintexts in the non-numerical field, encrypting each different plaintexts to generate gene functions, wherein each gene function generates Y different ciphertexts, and each cipher text is a gene sequence;
step 3, calculating the length len (seq (x)) of each gene sequence, seq (x) represents the gene sequence x, screening all the gene sequences with the length len (N) from the gene sequence data set to form a gene sequence set for training, inputting the gene sequence set into the Markov-Monte Carlo model established in the step 1, and calculating the conditional probability distribution of all possible letters after each position gives the letter from the first position to the last position of the gene sequence by the Markov-Monte Carlo model, thereby completing the training of the Markov-Monte Carlo model;
step 4, acquiring all different plaintexts in the non-numerical field to be encrypted, encrypting each different plaintexts to generate a gene function, wherein each gene function generates Y different ciphertexts, each cipher text is a gene sequence, and all the gene sequences form a gene sequence set;
step 5, during decryption, obtaining the gene sequences to be compared, calculating the length of the gene sequences to be compared, and selecting all the gene sequences with the same length from the gene sequence set to form a gene sequence subset;
step 6, inputting the gene sequence subset and the gene sequence to be compared into a trained Markov-Monte Carlo model for comparison, and matching k gene sequences with the highest similarity from the gene sequence subset by utilizing maximum likelihood estimation under the condition probability to form a new gene sequence data subset;
and 7, returning the gene functions corresponding to all the gene sequences in the gene sequence data subset according to the new gene sequence data subset obtained in the previous step, calculating the ratio of the gene function with the largest ratio in the gene functions, if the ratio is more than or equal to p, determining the gene function as the gene function corresponding to the gene sequence to be compared, obtaining the plaintext corresponding to the gene sequence to be compared based on the gene function corresponding to the gene sequence to be compared, and otherwise, returning to the step 5.
2. The method for encrypting and decrypting the non-numerical field based on the gene sequence and the gene function as claimed in claim 1, wherein in the step 2 and the step 4, the encryption of the base64 is performed for each different plain text.
3. The encryption and decryption method of non-numerical fields based on gene sequences and gene functions as claimed in claim 1, wherein in step 6, the alignment length z is inputted, and when performing alignment, the contents of the gene sequences are aligned by z positions.
4. The encryption and decryption method of claim 1, wherein in step 6, k is an externally inputted parameter; in step 7, p is an externally input parameter.
CN201910850865.XA 2019-09-10 2019-09-10 Non-numerical field encryption and decryption method based on gene sequence and gene function Active CN110718272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910850865.XA CN110718272B (en) 2019-09-10 2019-09-10 Non-numerical field encryption and decryption method based on gene sequence and gene function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910850865.XA CN110718272B (en) 2019-09-10 2019-09-10 Non-numerical field encryption and decryption method based on gene sequence and gene function

Publications (2)

Publication Number Publication Date
CN110718272A CN110718272A (en) 2020-01-21
CN110718272B true CN110718272B (en) 2020-11-17

Family

ID=69209694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910850865.XA Active CN110718272B (en) 2019-09-10 2019-09-10 Non-numerical field encryption and decryption method based on gene sequence and gene function

Country Status (1)

Country Link
CN (1) CN110718272B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110202486A1 (en) * 2009-07-21 2011-08-18 Glenn Fung Healthcare Information Technology System for Predicting Development of Cardiovascular Conditions
CN106817218A (en) * 2015-12-01 2017-06-09 国基电子(上海)有限公司 Encryption method based on DNA technique
US20180322246A1 (en) * 2017-05-04 2018-11-08 Annai Systems, Inc. System and method for secure, high-speed transfer of very large files
CN110070914B (en) * 2019-03-15 2020-07-03 崔大超 Gene sequence identification method, system and computer readable storage medium

Also Published As

Publication number Publication date
CN110718272A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
Rahulamathavan et al. Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud
Al-Harbi et al. Security analysis of DNA based steganography techniques
Oktaviana et al. Three-pass protocol implementation in caesar cipher classic cryptography
Shinge et al. An encryption algorithm based on ASCII value of data
Mahdi et al. A proposed lightweight image encryption using ChaCha with hyperchaotic maps
Tan et al. An approach to identifying cryptographic algorithm from ciphertext
CN102447558B (en) Encryption method using random sequence on-demand and having misleading function
CN102412963B (en) Random sequence based encryption method with misleading function
Salmi et al. Implementation of the data encryption using caesar cipher and vernam cipher methods based on CrypTool2
Hammad et al. Implementation of combined steganography and cryptography vigenere cipher, caesar cipher and converting periodic tables for securing secret message
CN106357608A (en) Method for encrypting and decrypting private data for personal healthcare data
CN110718272B (en) Non-numerical field encryption and decryption method based on gene sequence and gene function
Al-Sabaawi Cryptanalysis of Vigenère cipher: method implementation
Fahrianto et al. Encrypted SMS application on Android with combination of caesar cipher and vigenere algorithm
CN112906052B (en) Aggregation method of multi-user gradient permutation in federated learning
Siahaan et al. Implementation of super playfair in messaging
Kim et al. Robust lightweight fingerprint encryption using random block feedback
Ghrare et al. New text encryption method based on hidden encrypted symmetric key
Kumar et al. SCLCT: Secured cross language cipher technique
Ardhianto et al. A Comparative Experiment of Document Security Level on Parallel Encryption With Digit Arithmetic of Covertext and Parallel Encryption using Covertext
Garg et al. Pentaplicative Cipher Technique
Zhao et al. The research of cryptosystem recognition based on randomness test’s return value
Saxena et al. Application of deep learning in classification of encrypted images
Arroyo et al. A Modified Polybius Cipher with a New Element-in-Grid Sequencer
Yu et al. Block ciphers identification scheme based on randomness test

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant