CN114117487A - Plaintext similarity estimation method, device, equipment and medium for encrypted character string - Google Patents

Plaintext similarity estimation method, device, equipment and medium for encrypted character string Download PDF

Info

Publication number
CN114117487A
CN114117487A CN202111402823.3A CN202111402823A CN114117487A CN 114117487 A CN114117487 A CN 114117487A CN 202111402823 A CN202111402823 A CN 202111402823A CN 114117487 A CN114117487 A CN 114117487A
Authority
CN
China
Prior art keywords
data set
distribution
plaintext
estimated
distribution corresponding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111402823.3A
Other languages
Chinese (zh)
Inventor
徐莉莎
陈远猷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Para Software Co ltd
Original Assignee
Shanghai Para Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Para Software Co ltd filed Critical Shanghai Para Software Co ltd
Priority to CN202111402823.3A priority Critical patent/CN114117487A/en
Publication of CN114117487A publication Critical patent/CN114117487A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Informatics (AREA)
  • Geometry (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the invention discloses a plaintext similarity estimation method, a plaintext similarity estimation device, plaintext similarity estimation equipment and a plaintext similarity estimation medium for encrypted character strings, wherein the method comprises the following steps: acquiring a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set; respectively modeling a plaintext data set and a ciphertext data set based on a plurality of distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set; based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set; and estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the incidence relation among a plurality of plaintext data can be estimated.

Description

Plaintext similarity estimation method, device, equipment and medium for encrypted character string
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a plaintext similarity estimation method, a plaintext similarity estimation device, plaintext similarity estimation equipment and a plaintext similarity estimation medium for encrypted character strings.
Background
With the rapid development of informatization, people have higher and higher requirements on information security, and data encryption is a basic application technology for protecting information in information security and data confidentiality applications.
In the prior art, there are many methods for judging the similarity of character strings, but in terms of data security, how to use encrypted ciphertext data as algorithm input and how to output the similarity of plaintext data before encryption is not very rich.
Disclosure of Invention
The embodiment of the invention provides a plaintext similarity estimation method, a plaintext similarity estimation device, plaintext similarity estimation equipment and a plaintext similarity estimation medium for encrypted character strings, and the existing related scheme for estimating plaintext data can be optimized.
In a first aspect, an embodiment of the present invention provides a plaintext similarity estimation method for an encrypted character string, including:
acquiring a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
respectively modeling the plaintext data set and the ciphertext data set based on a plurality of distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;
based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set;
and estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm.
In a second aspect, an embodiment of the present invention provides a plaintext similarity estimation apparatus for an encrypted character string, including:
the encryption operation module is used for acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
the prediction distribution obtaining module is used for respectively modeling the plaintext data set and the ciphertext data set based on a plurality of distributions to obtain prediction distributions corresponding to the plaintext data set and the ciphertext data set;
the pre-estimation distribution calculation module is used for pre-estimating the pre-estimation distribution corresponding to the decryption function according to the pre-estimation distribution corresponding to the plaintext data set and the pre-estimation distribution corresponding to the ciphertext data set based on a Bayesian statistical model;
and the plaintext similarity estimation module is used for estimating the plaintext similarity between different target encryption character strings according to the estimation distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character string is the preset encryption algorithm.
In a third aspect, an embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the plaintext similarity prediction method for an encrypted character string according to an embodiment of the present invention when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a plaintext similarity prediction method for an encrypted character string according to an embodiment of the present invention.
According to the plaintext similarity estimation scheme of the encrypted character string, a plaintext data set is obtained firstly, and a preset encryption algorithm is used for carrying out encryption operation on the plaintext data set to obtain a ciphertext data set; then modeling a plaintext data set and a ciphertext data set respectively based on the multiple distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set; based on Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set; and finally, estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the incidence relation among a plurality of plaintext data can be estimated.
Drawings
Fig. 1 is a schematic flow chart of a plaintext similarity estimation method for an encrypted character string according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of another method for estimating plaintext similarity of an encrypted character string according to an embodiment of the present invention;
fig. 3 is a block diagram of a plaintext similarity estimation apparatus for encrypted strings according to an embodiment of the present invention;
fig. 4 is a block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 1 is a schematic flow chart of a plaintext similarity estimation method for an encrypted string according to an embodiment of the present invention, which may be performed by a plaintext similarity estimation apparatus for an encrypted string, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device such as a server. As shown in fig. 1, the method includes:
s110, a plaintext data set is obtained, and a preset encryption algorithm is used for carrying out encryption operation on the plaintext data set to obtain a ciphertext data set.
The plaintext data is a string of characters that is not encrypted. The character string may be derived from characters of different lengths and/or in different combinations including at least one of numbers, letters and symbols.
Accordingly, the plaintext data set includes a preset number of character strings with different lengths and/or different combinations. The preset number may be 10000 or 20000, and is determined according to training samples required by developers, and is not limited herein. The purpose of the encryption operation on the plaintext data set is to ensure the confidentiality of the data in the transmission process.
Correspondingly, the ciphertext data set is obtained by performing encryption operation on the character strings in the plaintext data set by using a preset encryption algorithm. The preset Encryption Algorithm may be a symmetric Algorithm (Data Encryption Standard, referred to as DES for short), an International Data Encryption Algorithm (IDEA for short), a Digital Signature Algorithm (DSA for short), or the like, and is not limited herein.
Correspondingly, the ciphertext data set subjected to the encryption operation comprises a preset number of encryption character strings.
And S120, respectively modeling the plaintext data set and the ciphertext data set based on the multiple distributions to obtain the estimation distribution corresponding to the plaintext data set and the estimation distribution corresponding to the ciphertext data set.
Polynomial Distribution (Multinomial Distribution) is a generalization of binomial Distribution, consisting of two or more random variables X of a finite number of possible values1,X2,…,Xk(wherein k.gtoreq.2) a plurality of distributions induced by the joint distribution.
Because the plaintext data provided by the embodiment of the invention is composed of character strings, according to a conventional input mode, for example, a limited number of m different inputs (including numbers, letters and symbols) coexist, when a plaintext data set is modeled by using a plurality of distributions, a prediction distribution corresponding to the plaintext data set can be represented by the following expression:
Multinomial(n1,n2,…,nm,p1,p2,…,pm) (1)
where Multinomial denotes a Multinomial distribution, m denotes a total dimension of a character type, n denotes a mean vector of corresponding dimensional variables, and p denotes a covariance vector of corresponding dimensional variables.
Specifically, 94 different inputs can be obtained statistically according to the conventional keyboard input mode, and then the prediction distribution corresponding to the plaintext data set can be specifically expressed as:
Multinomial(n1,n2,…,n94,p1,p2,…,p94) (2)
a total of the mean vector and covariance vector in 94 dimensions can be obtained from 94 input modes.
Correspondingly, the ciphertext data set is obtained by performing encryption operation on the plaintext data set, and when modeling is performed on the ciphertext data set based on multi-term distribution, the prediction distribution expression corresponding to the ciphertext data set is the same as the prediction distribution expression corresponding to the plaintext data set.
S130, based on the Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set.
The decryption function is a way of decrypting the encrypted ciphertext data set obtained after encryption, and can be understood as an inverse function of the encryption algorithm.
When the pre-estimated distribution corresponding to the decryption function is determined, m different input modes obtained in step S120 need to be modeled, and the m modes correspond to m dimensions, so that the pre-estimated distribution corresponding to the decryption function can be determined by using the big data law and the multivariate normal distribution, and can be represented by the following expression:
N(μmm) (3)
in the formula, mumRepresenting the corresponding mean vector, Σ, of the decryption function in the total dimensionmRepresenting the corresponding variance matrix of the decryption function in the total dimension.
For 94 different inputs obtained from the existing statistics, the estimated distribution corresponding to the decryption function can be expressed as:
N(μ9494) (4)
further, the obtained distribution corresponding to the plaintext data set and the distribution corresponding to the ciphertext data set are used as the input of the Bayesian statistical model, so that the estimated distribution corresponding to the decryption function is output.
According to the prediction distribution expression corresponding to the decryption function, the prediction distribution of the mean vector parameter mu corresponding to the current dimension and the prediction distribution of the variance matrix parameter sigma corresponding to the current dimension are mainly predicted based on the Bayesian statistical model.
S140, estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function.
In the embodiment of the invention, the target encryption character string can be understood as the encryption character string needing plaintext similarity estimation, the specific number can be more than two, and the encryption algorithm corresponding to the target encryption character string is a preset encryption algorithm.
After the pre-estimated distribution corresponding to the decryption function is determined, the target encryption character string can be decrypted according to the pre-estimated distribution corresponding to the current decryption function, and therefore the pre-estimated plaintext character string corresponding to the encryption character string can be obtained. After the estimated plaintext character strings corresponding to the target encryption character strings are obtained, the similarity between any two estimated plaintext character strings can be calculated, so that the plaintext similarity between different target encryption character strings is estimated.
The plaintext similarity estimation method for the encrypted character string provided by the embodiment of the invention comprises the steps of firstly obtaining a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set; then modeling a plaintext data set and a ciphertext data set respectively based on the multiple distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set; based on Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set; and finally, estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the incidence relation among a plurality of plaintext data can be estimated.
Example two
The embodiment of the present invention is further optimized on the basis of the above embodiment, and the estimating distribution corresponding to the pre-estimated decryption function based on the pre-estimated distribution corresponding to the plaintext data set and the pre-estimated distribution corresponding to the ciphertext data set is optimized, including: converting the pre-estimated distribution corresponding to the plaintext data set into posterior distribution related to the Bayesian statistical model, and converting the pre-estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model; and estimating the likelihood distribution of the Bayes statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayes statistical model is the estimated distribution corresponding to the decryption function. The advantage of setting up like this is that the function problem of will deciphering converts the Bayesian statistical model problem into, and the calculation is convenient.
The step of predicting the plaintext similarity between different target encryption character strings according to the prediction distribution corresponding to the decryption function is further optimized, and comprises the following steps: selecting a first target encryption character string and a second target encryption character string; inputting the first target encryption character string and the second target encryption character string to the corresponding pre-estimated distribution of the decryption function respectively to obtain first pre-estimated plaintext data corresponding to the first target encryption character string and second pre-estimated plaintext data corresponding to the second target encryption character string; and carrying out similarity calculation on the first pre-estimated plaintext data and the second pre-estimated plaintext data to obtain plaintext similarity corresponding to the first target encryption character string and the second target encryption character string. The method has the advantages that the corresponding plaintext data estimated about the encrypted character string is obtained by obtaining the estimation distribution corresponding to the decryption function, so that the similarity of the plaintext is obtained through prediction, and the data safety is guaranteed in the data transmission process.
Referring to fig. 2, fig. 2 is a schematic flow chart of another plaintext similarity estimation method for encrypted strings according to an embodiment of the invention; specifically, the method comprises the following steps:
s210, a plaintext data set is obtained, and a preset encryption algorithm is used for carrying out encryption operation on the plaintext data set to obtain a ciphertext data set.
If the obtained plaintext data set is denoted as x, the encryption algorithm is denoted as f (-) and the ciphertext data set is denoted as y, the relation y ═ f (x) can be obtained.
In order to implement the method for estimating the plaintext similarity of an encrypted string based on a ciphertext of the encrypted string provided by the embodiment of the present invention, a decryption function n (-) of an encryption algorithm f (-) needs to be found, where n (-) is a generalized inverse function of f (-) and generally only needs to obtain a decryption result without affecting similarity evaluation of decrypted characters.
When a plaintext data set x is obtained, a certain number of character strings with different lengths and different combinations can be randomly generated to serve as the plaintext data set, and 20000 pieces are selected as an example.
Then, a preset encryption algorithm f (-) is used for carrying out encryption operation on each plaintext character string in the plaintext data set, and a corresponding ciphertext data set y can be obtained.
S220, respectively modeling the plaintext data set and the ciphertext data set based on the multiple distributions to obtain the estimation distribution corresponding to the plaintext data set and the estimation distribution corresponding to the ciphertext data set.
Further, modeling is carried out on the plaintext data set based on the multi-term distribution, and the obtained estimated distribution corresponding to the plaintext data set is marked as X; accordingly, if the estimated distribution corresponding to the ciphertext data set obtained by modeling the ciphertext data set based on the plurality of distributions is denoted as Y, the relationship Y ═ f (x) can be obtained according to S210.
S230, the pre-estimated distribution corresponding to the plaintext data set is converted into the posterior distribution related to the Bayesian statistical model, and the pre-estimated distribution corresponding to the ciphertext data set is converted into the prior distribution related to the Bayesian statistical model.
The plaintext similarity estimation method for the encrypted character string provided by the embodiment of the invention can convert the problem of calculating the decryption function into a Bayesian statistical model problem. The distribution corresponding to the plaintext data set is recorded as X, the distribution corresponding to the ciphertext data set is recorded as Y, the pre-estimated distribution X corresponding to the plaintext data set is converted into posterior distribution related to a Bayesian statistical model, and the posterior distribution can be understood as the probability distribution of the known result and the cause is estimated according to the result; the distribution Y corresponding to the ciphertext data set can be regarded as prior distribution in a Bayesian statistical model, and the prior distribution can be understood as prior to the result to determine the probability distribution of the reason.
And S240, estimating the likelihood distribution of the Bayes statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayes statistical model is the estimated distribution corresponding to the decryption function.
Further, the decryption function may be denoted as Q, and the estimated distribution corresponding to the decryption function may be denoted as Q, so that the estimated distribution X corresponding to the plaintext data set, the distribution Y corresponding to the ciphertext data set, and the estimated distribution Q corresponding to the decryption function have the following relationship:
X∝QY (5)
that is, the distribution X corresponding to the plaintext data set is proportional to the product of the distribution Y corresponding to the ciphertext data set and the estimated distribution Q corresponding to the decryption function.
In the formula (5), the estimated distribution corresponding to the decryption function can be regarded as likelihood distribution in the bayesian statistical model, and the likelihood distribution can be understood as determining the reason first and estimating the probability distribution of the result according to the reason.
Further, before predicting the likelihood distribution of the bayesian statistical model based on the posterior distribution and the prior distribution, the method further comprises the following steps: and estimating parameters of the pre-estimated distribution corresponding to the decryption function by using a law of large numbers and the multivariate normal distribution.
Since the pre-estimated distribution corresponding to the decryption function is related to the plaintext data set input dimension, when the input dimension is 94, determining the pre-estimated distribution corresponding to the decryption function using the big data law and the multivariate normal distribution can be expressed as:
N(μ9494)
in the formula, mu94Representing the corresponding mean vector, Σ, of the decryption function in the total dimension94Representing the corresponding variance matrix of the decryption function in the total dimension.
Therefore, before determining the estimated distribution corresponding to the decryption function, the parameter μ of the estimated distribution corresponding to the decryption function needs to be determined94Sum-sigma94
Specifically, multivariate normal distribution is selected as pre-estimated distribution corresponding to a decryption function, and a Bayesian statistical model after problem transformation, namely X and Y, is utilized to carry out pre-estimated distribution corresponding to the decryption function on a parameter mu94Sum-sigma94And estimating to determine the estimated distribution Q corresponding to the decryption function.
Further, the parameters of the pre-estimated distribution corresponding to the decryption function can be estimated by using the law of large numbers and the multivariate normal distribution.
The Law of big data (Law of Large Numbers) discusses the Law of convergence of the arithmetic mean of a sequence of random variables to the mathematically expected arithmetic mean of each random variable. Multivariate normal distribution (Multivariate normal distribution) is a popularization from single-dimensional normal distribution to multi-dimensional, and the frequency proportion in any value range can be estimated according to a formula as long as the mean and standard deviation of a variable which is subjected to normal distribution are known. Embodiments of the present invention solve the problem of the parameters contained in the decryption function using the law of large numbers and multivariate normal distribution.
Accordingly, step S240 may be further based on obtaining the posterior distribution X (i.e., the predicted distribution corresponding to the plaintext data set) and the prior distribution Y (i.e., the predicted distribution corresponding to the ciphertext data set) of the bayesian statistical model, and the parameter μ of the predicted distribution corresponding to the decryption function94Sum-sigma94Then, the likelihood distribution of the Bayes statistical model can be estimated, and the likelihood distribution of the Bayes statistical model is the solutionAnd (4) the estimated distribution corresponding to the cryptographic function.
And S250, selecting a first target encryption character string and a second target encryption character string.
The first target encryption character string and the second target encryption character string are ciphertext character strings obtained after encryption operation is performed by using an encryption algorithm, and plaintext similarity corresponding to the first target encryption character string and the second target encryption character string needs to be estimated.
S260, inputting the first target encryption character string and the second target encryption character string into the pre-estimated distribution corresponding to the decryption function respectively to obtain first pre-estimated plaintext data corresponding to the first target encryption character string and second pre-estimated plaintext data corresponding to the second target encryption character string.
According to the estimated distribution Q corresponding to the decryption function, the first target encryption character string can be marked as y _ new1And the second target encryption character string is marked as y _ new2Respectively, will y _ new1And y _ new2Inputting the data into the predicted distribution Q corresponding to the decryption function to obtain y _ new1The corresponding first pre-estimated plaintext data is marked as x _ new1And get y _ new2The corresponding second pre-estimated plaintext data is marked as x _ new2
S270, similarity calculation is carried out on the first pre-estimated plaintext data and the second pre-estimated plaintext data, and plaintext similarity corresponding to the first target encryption character string and the second target encryption character string is obtained.
The way of calculating the similarity between the first predicted plaintext data and the second predicted plaintext data may be: calculating cosine similarity (cosine similarity), calculating Euclidean distance (Euclidean distance), calculating Mahalanobis distance (Mahalanobis), and the like, and the specific calculation method is not limited herein.
According to the plaintext similarity estimation method for the encrypted character string, when the similarity estimation is performed on the corresponding plaintext by using the ciphertext data, in the field of data security, the association relation between the plaintext data is obtained by calculation while the privacy data or the plaintext data are protected. The similarity of the plain texts can be obtained according to the known encryption algorithm and the known ciphertext relation; meanwhile, the calculation method is a generalized decryption model, the decryption problem is converted into a Bayesian statistical model, the method is theoretically suitable for various encryption algorithms, accurate decryption calculation does not need to be carried out on the encryption algorithms, and decryption cost is greatly reduced on the basis of certain accuracy.
EXAMPLE III
Fig. 3 is a block diagram of a plaintext similarity prediction apparatus for an encrypted string according to an embodiment of the present invention, which may be implemented by software and/or hardware, and may be generally integrated in a computer device such as a server, and may predict plaintext similarity of the encrypted string by performing a plaintext similarity prediction method for the encrypted string. As shown in fig. 3, the apparatus includes: an encryption operation module 31, an estimated distribution obtaining module 32, an estimated distribution calculating module 33 and a plaintext similarity estimating module 34, wherein:
the encryption operation module 31 is configured to obtain a plaintext data set, and perform an encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
an estimated distribution obtaining module 32, configured to model the plaintext data set and the ciphertext data set based on a plurality of distributions, respectively, to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;
the pre-estimation distribution calculation module 33 is configured to pre-estimate the pre-estimation distribution corresponding to the decryption function according to the pre-estimation distribution corresponding to the plaintext data set and the pre-estimation distribution corresponding to the ciphertext data set based on a bayesian statistical model;
a plaintext similarity estimation module 34, configured to estimate plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, where an encryption algorithm corresponding to the target encryption character string is the preset encryption algorithm.
The plaintext similarity estimation device for the encrypted character string, provided by the embodiment of the invention, comprises the steps of firstly obtaining a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set; then modeling a plaintext data set and a ciphertext data set respectively based on the multiple distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set; based on Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set; and finally, estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the incidence relation among a plurality of plaintext data can be estimated.
Optionally, the plaintext data set includes a preset number of character strings with different lengths and/or different combinations including at least one of numbers, letters, and symbols.
Optionally, the estimated distribution calculating module 33 includes: an estimated distribution conversion unit and an estimated distribution calculation unit, wherein:
the prediction distribution conversion unit is used for converting the prediction distribution corresponding to the plaintext data set into posterior distribution related to a Bayesian statistical model and converting the prediction distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model;
and the estimated distribution calculating unit is used for estimating the likelihood distribution of the Bayes statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayes statistical model is the estimated distribution corresponding to the decryption function.
Optionally, the estimated distribution calculating unit further includes: a parameter estimation subunit;
and the parameter estimation subunit is used for estimating the parameters of the estimated distribution corresponding to the decryption function by using a majority law and multivariate normal distribution.
Correspondingly, the estimated distribution calculating unit is further configured to estimate the likelihood distribution of the bayesian statistical model based on the posterior distribution, the prior distribution and the parameters of the estimated distribution corresponding to the decryption function.
Optionally, the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set are represented by the following expressions:
Multinomial(n1,n2,…,nm,p1,p2,…,pm)
where Multinomial denotes a Multinomial distribution, m denotes a total dimension of a character type, n denotes a mean vector of corresponding dimensional variables, and p denotes a covariance vector of corresponding dimensional variables.
Optionally, the pre-estimated distribution corresponding to the decryption function is represented by the following expression:
N(μmm)
in the formula, mumRepresenting the corresponding mean vector, Σ, of the decryption function in the total dimensionmRepresenting the corresponding variance matrix of the decryption function in the total dimension.
Optionally, the plaintext similarity prediction module 34 includes: the device comprises a target encryption character string selecting unit, a target encryption character string input unit and a plaintext similarity estimating unit, wherein:
the target encryption character string selection unit is used for selecting a first target encryption character string and a second target encryption character string;
a target encryption character string input unit, configured to input the first target encryption character string and the second target encryption character string to the pre-estimated distribution corresponding to the decryption function, respectively, to obtain first pre-estimated plaintext data corresponding to the first target encryption character string and second pre-estimated plaintext data corresponding to the second target encryption character string;
and the plaintext similarity estimation unit is used for calculating the similarity of the first estimated plaintext data and the second estimated plaintext data to obtain plaintext similarity corresponding to the first target encryption character string and the second target encryption character string.
The device for estimating the plaintext similarity of the encrypted character string provided by the embodiment of the invention can execute the method for estimating the plaintext similarity of the encrypted character string provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the method.
Example four
The embodiment of the invention provides computer equipment, and the plaintext similarity estimation device for the encrypted character string provided by the embodiment of the invention can be integrated in the computer equipment. Fig. 4 is a block diagram of a computer device according to an embodiment of the present invention. The computer device 400 may include: a memory 401, a processor 402 and a computer program stored in the memory 401 and executable by the processor, wherein the processor 402 implements the plaintext similarity estimation method for an encrypted string according to an embodiment of the invention when executing the computer program.
The computer device provided by the embodiment of the invention can execute the plaintext similarity estimation method for the encrypted character string provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the method.
EXAMPLE five
Embodiments of the present invention further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a plaintext similarity prediction method for an encrypted string, the method including:
acquiring a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
respectively modeling the plaintext data set and the ciphertext data set based on a plurality of distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;
based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set;
and estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm.
Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDRRAM, SRAM, EDORAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present invention is not limited to the plaintext similarity estimation operation for the encrypted string as described above, and may also perform related operations in the plaintext similarity estimation method for the encrypted string provided in any embodiment of the present invention.
The device, the equipment and the storage medium for estimating the plaintext similarity of the encrypted character string provided by the embodiment can execute the method for estimating the plaintext similarity of the encrypted character string provided by any embodiment of the invention, and have corresponding functional modules and beneficial effects for executing the method. For details of the technique not described in detail in the above embodiments, reference may be made to the plaintext similarity estimation method for encrypted strings according to any embodiment of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A plaintext similarity estimation method for an encrypted character string is characterized by comprising the following steps:
acquiring a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
respectively modeling the plaintext data set and the ciphertext data set based on a plurality of distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;
based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set;
and estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm.
2. The method according to claim 1, characterized in that the plaintext data set comprises a preset number of character strings of different lengths and/or different combinations comprising at least one of numbers, letters and symbols.
3. The method according to claim 1, wherein the predicting the predicted distribution corresponding to the decryption function according to the predicted distribution corresponding to the plaintext data set and the predicted distribution corresponding to the ciphertext data set based on the bayesian statistical model comprises:
converting the pre-estimated distribution corresponding to the plaintext data set into posterior distribution related to the Bayesian statistical model, and converting the pre-estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model;
and estimating the likelihood distribution of the Bayes statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayes statistical model is the estimated distribution corresponding to the decryption function.
4. The method according to claim 3, wherein before estimating the likelihood distribution of the Bayesian statistical model based on the posterior distribution and the prior distribution, further comprising:
estimating parameters of pre-estimated distribution corresponding to the decryption function by using a law of large numbers and multivariate normal distribution;
correspondingly, the estimating the likelihood distribution of the bayesian statistical model based on the posterior distribution and the prior distribution comprises:
and predicting the likelihood distribution of the Bayesian statistical model based on the posterior distribution, the prior distribution and the parameters of the predicted distribution corresponding to the decryption function.
5. The method of claim 1, wherein the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set are represented by the following expressions:
Multinomial(n1,n2,…,nm,p1,p2,…,pm)
where Multinomial denotes a Multinomial distribution, m denotes a total dimension of a character type, n denotes a mean vector of corresponding dimensional variables, and p denotes a covariance vector of corresponding dimensional variables.
6. The method of claim 1, wherein the pre-estimated distribution of the decryption function is expressed by the following expression:
N(μmm)
in the formula, mumRepresenting the corresponding mean vector, Σ, of the decryption function in the total dimensionmRepresenting decryption function in general dimensionThe corresponding variance matrix in degrees.
7. The method according to claim 1, wherein said predicting plaintext similarity between different target encrypted strings according to the predicted distribution corresponding to the decryption function comprises:
selecting a first target encryption character string and a second target encryption character string;
inputting the first target encryption character string and the second target encryption character string to the pre-estimated distribution corresponding to the decryption function respectively to obtain first pre-estimated plaintext data corresponding to the first target encryption character string and second pre-estimated plaintext data corresponding to the second target encryption character string;
and carrying out similarity calculation on the first pre-estimated plaintext data and the second pre-estimated plaintext data to obtain plaintext similarity corresponding to the first target encryption character string and the second target encryption character string.
8. A plaintext similarity estimation device for an encrypted character string is characterized by comprising the following components:
the encryption operation module is used for acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;
the prediction distribution obtaining module is used for respectively modeling the plaintext data set and the ciphertext data set based on a plurality of distributions to obtain prediction distributions corresponding to the plaintext data set and the ciphertext data set;
the pre-estimation distribution calculation module is used for pre-estimating the pre-estimation distribution corresponding to the decryption function according to the pre-estimation distribution corresponding to the plaintext data set and the pre-estimation distribution corresponding to the ciphertext data set based on a Bayesian statistical model;
and the plaintext similarity estimation module is used for estimating the plaintext similarity between different target encryption character strings according to the estimation distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character string is the preset encryption algorithm.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202111402823.3A 2021-11-24 2021-11-24 Plaintext similarity estimation method, device, equipment and medium for encrypted character string Pending CN114117487A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111402823.3A CN114117487A (en) 2021-11-24 2021-11-24 Plaintext similarity estimation method, device, equipment and medium for encrypted character string

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111402823.3A CN114117487A (en) 2021-11-24 2021-11-24 Plaintext similarity estimation method, device, equipment and medium for encrypted character string

Publications (1)

Publication Number Publication Date
CN114117487A true CN114117487A (en) 2022-03-01

Family

ID=80371774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111402823.3A Pending CN114117487A (en) 2021-11-24 2021-11-24 Plaintext similarity estimation method, device, equipment and medium for encrypted character string

Country Status (1)

Country Link
CN (1) CN114117487A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062299A (en) * 2022-07-26 2022-09-16 华控清交信息科技(北京)有限公司 Security detection method and device for data leakage and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115062299A (en) * 2022-07-26 2022-09-16 华控清交信息科技(北京)有限公司 Security detection method and device for data leakage and electronic equipment

Similar Documents

Publication Publication Date Title
US11816226B2 (en) Secure data processing transactions
US20240113858A1 (en) Systems and Methods for Performing Secure Machine Learning Analytics Using Homomorphic Encryption
Yang et al. A comprehensive survey on secure outsourced computation and its applications
JP5975490B2 (en) Search system, search method, and program
CN111291401B (en) Privacy protection-based business prediction model training method and device
JP5762232B2 (en) Method and system for selecting the order of encrypted elements while protecting privacy
US11824967B2 (en) Electronic device using homomorphic encryption and encrypted data processing method thereof
CN113239391B (en) Third-party-free logistic regression federal learning model training system and method
CN111428887A (en) Model training control method, device and system based on multiple computing nodes
Sarkar et al. Fast and scalable private genotype imputation using machine learning and partially homomorphic encryption
CN115174191A (en) Local prediction value safe transmission method, computer equipment and storage medium
WO2020071187A1 (en) Hidden sigmoid function calculation system, hidden logistic regression calculation system, hidden sigmoid function calculation device, hidden logistic regression calculation device, hidden sigmoid function calculation method, hidden logistic regression calculation method, and program
Le et al. An efficient hybrid webshell detection method for webserver of marine transportation systems
CN111107076A (en) Safe and efficient matrix multiplication outsourcing method
Ibarrondo et al. Banners: Binarized neural networks with replicated secret sharing
CN114117487A (en) Plaintext similarity estimation method, device, equipment and medium for encrypted character string
Sultan et al. A novel image-based homomorphic approach for preserving the privacy of autonomous vehicles connected to the cloud
CN110874481A (en) GBDT model-based prediction method and device
CN111126628B (en) Method, device and equipment for training GBDT model in trusted execution environment
CN117439731A (en) Privacy protection big data principal component analysis method and system based on homomorphic encryption
WO2018008547A1 (en) Secret computation system, secret computation device, secret computation method, and program
CN108900294B (en) Encryption protection system and method for neural network model related to specified frequency band encryption
JP2021179603A (en) Method and apparatus for processing ciphertext based on homomorphic encryption
CN111091197B (en) Method, device and equipment for training GBDT model in trusted execution environment
CN113177231A (en) User data anonymous coding method and device based on differential privacy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination