CN114117487A

CN114117487A - Plaintext similarity estimation method, device, equipment and medium for encrypted character string

Info

Publication number: CN114117487A
Application number: CN202111402823.3A
Authority: CN
Inventors: 徐莉莎; 陈远猷
Original assignee: Shanghai Para Software Co ltd
Current assignee: Shanghai Para Software Co ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-03-01

Abstract

The embodiment of the invention discloses a plaintext similarity estimation method, a plaintext similarity estimation device, plaintext similarity estimation equipment and a plaintext similarity estimation medium for encrypted character strings, wherein the method comprises the following steps: acquiring a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set; respectively modeling a plaintext data set and a ciphertext data set based on a plurality of distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set; based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set; and estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the incidence relation among a plurality of plaintext data can be estimated.

Description

Plaintext similarity estimation method, device, equipment and medium for encrypted character string

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a plaintext similarity estimation method, a plaintext similarity estimation device, plaintext similarity estimation equipment and a plaintext similarity estimation medium for encrypted character strings.

Background

With the rapid development of informatization, people have higher and higher requirements on information security, and data encryption is a basic application technology for protecting information in information security and data confidentiality applications.

In the prior art, there are many methods for judging the similarity of character strings, but in terms of data security, how to use encrypted ciphertext data as algorithm input and how to output the similarity of plaintext data before encryption is not very rich.

Disclosure of Invention

The embodiment of the invention provides a plaintext similarity estimation method, a plaintext similarity estimation device, plaintext similarity estimation equipment and a plaintext similarity estimation medium for encrypted character strings, and the existing related scheme for estimating plaintext data can be optimized.

In a first aspect, an embodiment of the present invention provides a plaintext similarity estimation method for an encrypted character string, including:

acquiring a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;

respectively modeling the plaintext data set and the ciphertext data set based on a plurality of distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;

based on a Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set;

and estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm.

In a second aspect, an embodiment of the present invention provides a plaintext similarity estimation apparatus for an encrypted character string, including:

the encryption operation module is used for acquiring a plaintext data set, and performing encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;

the prediction distribution obtaining module is used for respectively modeling the plaintext data set and the ciphertext data set based on a plurality of distributions to obtain prediction distributions corresponding to the plaintext data set and the ciphertext data set;

the pre-estimation distribution calculation module is used for pre-estimating the pre-estimation distribution corresponding to the decryption function according to the pre-estimation distribution corresponding to the plaintext data set and the pre-estimation distribution corresponding to the ciphertext data set based on a Bayesian statistical model;

and the plaintext similarity estimation module is used for estimating the plaintext similarity between different target encryption character strings according to the estimation distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character string is the preset encryption algorithm.

In a third aspect, an embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the plaintext similarity prediction method for an encrypted character string according to an embodiment of the present invention when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a plaintext similarity prediction method for an encrypted character string according to an embodiment of the present invention.

According to the plaintext similarity estimation scheme of the encrypted character string, a plaintext data set is obtained firstly, and a preset encryption algorithm is used for carrying out encryption operation on the plaintext data set to obtain a ciphertext data set; then modeling a plaintext data set and a ciphertext data set respectively based on the multiple distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set; based on Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set; and finally, estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the incidence relation among a plurality of plaintext data can be estimated.

Drawings

Fig. 1 is a schematic flow chart of a plaintext similarity estimation method for an encrypted character string according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of another method for estimating plaintext similarity of an encrypted character string according to an embodiment of the present invention;

fig. 3 is a block diagram of a plaintext similarity estimation apparatus for encrypted strings according to an embodiment of the present invention;

fig. 4 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example one

Fig. 1 is a schematic flow chart of a plaintext similarity estimation method for an encrypted string according to an embodiment of the present invention, which may be performed by a plaintext similarity estimation apparatus for an encrypted string, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device such as a server. As shown in fig. 1, the method includes:

s110, a plaintext data set is obtained, and a preset encryption algorithm is used for carrying out encryption operation on the plaintext data set to obtain a ciphertext data set.

The plaintext data is a string of characters that is not encrypted. The character string may be derived from characters of different lengths and/or in different combinations including at least one of numbers, letters and symbols.

Accordingly, the plaintext data set includes a preset number of character strings with different lengths and/or different combinations. The preset number may be 10000 or 20000, and is determined according to training samples required by developers, and is not limited herein. The purpose of the encryption operation on the plaintext data set is to ensure the confidentiality of the data in the transmission process.

Correspondingly, the ciphertext data set is obtained by performing encryption operation on the character strings in the plaintext data set by using a preset encryption algorithm. The preset Encryption Algorithm may be a symmetric Algorithm (Data Encryption Standard, referred to as DES for short), an International Data Encryption Algorithm (IDEA for short), a Digital Signature Algorithm (DSA for short), or the like, and is not limited herein.

Correspondingly, the ciphertext data set subjected to the encryption operation comprises a preset number of encryption character strings.

And S120, respectively modeling the plaintext data set and the ciphertext data set based on the multiple distributions to obtain the estimation distribution corresponding to the plaintext data set and the estimation distribution corresponding to the ciphertext data set.

Polynomial Distribution (Multinomial Distribution) is a generalization of binomial Distribution, consisting of two or more random variables X of a finite number of possible values₁，X₂，…，X_k(wherein k.gtoreq.2) a plurality of distributions induced by the joint distribution.

Because the plaintext data provided by the embodiment of the invention is composed of character strings, according to a conventional input mode, for example, a limited number of m different inputs (including numbers, letters and symbols) coexist, when a plaintext data set is modeled by using a plurality of distributions, a prediction distribution corresponding to the plaintext data set can be represented by the following expression:

Multinomial(n₁,n₂,…,n_m,p₁,p₂,…,p_m) (1)

where Multinomial denotes a Multinomial distribution, m denotes a total dimension of a character type, n denotes a mean vector of corresponding dimensional variables, and p denotes a covariance vector of corresponding dimensional variables.

Specifically, 94 different inputs can be obtained statistically according to the conventional keyboard input mode, and then the prediction distribution corresponding to the plaintext data set can be specifically expressed as:

Multinomial(n₁,n₂,…,n₉₄,p₁,p₂,…,p₉₄) (2)

a total of the mean vector and covariance vector in 94 dimensions can be obtained from 94 input modes.

Correspondingly, the ciphertext data set is obtained by performing encryption operation on the plaintext data set, and when modeling is performed on the ciphertext data set based on multi-term distribution, the prediction distribution expression corresponding to the ciphertext data set is the same as the prediction distribution expression corresponding to the plaintext data set.

S130, based on the Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set.

The decryption function is a way of decrypting the encrypted ciphertext data set obtained after encryption, and can be understood as an inverse function of the encryption algorithm.

When the pre-estimated distribution corresponding to the decryption function is determined, m different input modes obtained in step S120 need to be modeled, and the m modes correspond to m dimensions, so that the pre-estimated distribution corresponding to the decryption function can be determined by using the big data law and the multivariate normal distribution, and can be represented by the following expression:

N(μ_m,Σ_m) (3)

in the formula, mu_mRepresenting the corresponding mean vector, Σ, of the decryption function in the total dimension_mRepresenting the corresponding variance matrix of the decryption function in the total dimension.

For 94 different inputs obtained from the existing statistics, the estimated distribution corresponding to the decryption function can be expressed as:

N(μ₉₄,Σ₉₄) (4)

further, the obtained distribution corresponding to the plaintext data set and the distribution corresponding to the ciphertext data set are used as the input of the Bayesian statistical model, so that the estimated distribution corresponding to the decryption function is output.

According to the prediction distribution expression corresponding to the decryption function, the prediction distribution of the mean vector parameter mu corresponding to the current dimension and the prediction distribution of the variance matrix parameter sigma corresponding to the current dimension are mainly predicted based on the Bayesian statistical model.

S140, estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function.

In the embodiment of the invention, the target encryption character string can be understood as the encryption character string needing plaintext similarity estimation, the specific number can be more than two, and the encryption algorithm corresponding to the target encryption character string is a preset encryption algorithm.

After the pre-estimated distribution corresponding to the decryption function is determined, the target encryption character string can be decrypted according to the pre-estimated distribution corresponding to the current decryption function, and therefore the pre-estimated plaintext character string corresponding to the encryption character string can be obtained. After the estimated plaintext character strings corresponding to the target encryption character strings are obtained, the similarity between any two estimated plaintext character strings can be calculated, so that the plaintext similarity between different target encryption character strings is estimated.

The plaintext similarity estimation method for the encrypted character string provided by the embodiment of the invention comprises the steps of firstly obtaining a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set; then modeling a plaintext data set and a ciphertext data set respectively based on the multiple distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set; based on Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set; and finally, estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the incidence relation among a plurality of plaintext data can be estimated.

Example two

The embodiment of the present invention is further optimized on the basis of the above embodiment, and the estimating distribution corresponding to the pre-estimated decryption function based on the pre-estimated distribution corresponding to the plaintext data set and the pre-estimated distribution corresponding to the ciphertext data set is optimized, including: converting the pre-estimated distribution corresponding to the plaintext data set into posterior distribution related to the Bayesian statistical model, and converting the pre-estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model; and estimating the likelihood distribution of the Bayes statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayes statistical model is the estimated distribution corresponding to the decryption function. The advantage of setting up like this is that the function problem of will deciphering converts the Bayesian statistical model problem into, and the calculation is convenient.

The step of predicting the plaintext similarity between different target encryption character strings according to the prediction distribution corresponding to the decryption function is further optimized, and comprises the following steps: selecting a first target encryption character string and a second target encryption character string; inputting the first target encryption character string and the second target encryption character string to the corresponding pre-estimated distribution of the decryption function respectively to obtain first pre-estimated plaintext data corresponding to the first target encryption character string and second pre-estimated plaintext data corresponding to the second target encryption character string; and carrying out similarity calculation on the first pre-estimated plaintext data and the second pre-estimated plaintext data to obtain plaintext similarity corresponding to the first target encryption character string and the second target encryption character string. The method has the advantages that the corresponding plaintext data estimated about the encrypted character string is obtained by obtaining the estimation distribution corresponding to the decryption function, so that the similarity of the plaintext is obtained through prediction, and the data safety is guaranteed in the data transmission process.

Referring to fig. 2, fig. 2 is a schematic flow chart of another plaintext similarity estimation method for encrypted strings according to an embodiment of the invention; specifically, the method comprises the following steps:

s210, a plaintext data set is obtained, and a preset encryption algorithm is used for carrying out encryption operation on the plaintext data set to obtain a ciphertext data set.

If the obtained plaintext data set is denoted as x, the encryption algorithm is denoted as f (-) and the ciphertext data set is denoted as y, the relation y ═ f (x) can be obtained.

In order to implement the method for estimating the plaintext similarity of an encrypted string based on a ciphertext of the encrypted string provided by the embodiment of the present invention, a decryption function n (-) of an encryption algorithm f (-) needs to be found, where n (-) is a generalized inverse function of f (-) and generally only needs to obtain a decryption result without affecting similarity evaluation of decrypted characters.

When a plaintext data set x is obtained, a certain number of character strings with different lengths and different combinations can be randomly generated to serve as the plaintext data set, and 20000 pieces are selected as an example.

Then, a preset encryption algorithm f (-) is used for carrying out encryption operation on each plaintext character string in the plaintext data set, and a corresponding ciphertext data set y can be obtained.

S220, respectively modeling the plaintext data set and the ciphertext data set based on the multiple distributions to obtain the estimation distribution corresponding to the plaintext data set and the estimation distribution corresponding to the ciphertext data set.

Further, modeling is carried out on the plaintext data set based on the multi-term distribution, and the obtained estimated distribution corresponding to the plaintext data set is marked as X; accordingly, if the estimated distribution corresponding to the ciphertext data set obtained by modeling the ciphertext data set based on the plurality of distributions is denoted as Y, the relationship Y ═ f (x) can be obtained according to S210.

S230, the pre-estimated distribution corresponding to the plaintext data set is converted into the posterior distribution related to the Bayesian statistical model, and the pre-estimated distribution corresponding to the ciphertext data set is converted into the prior distribution related to the Bayesian statistical model.

The plaintext similarity estimation method for the encrypted character string provided by the embodiment of the invention can convert the problem of calculating the decryption function into a Bayesian statistical model problem. The distribution corresponding to the plaintext data set is recorded as X, the distribution corresponding to the ciphertext data set is recorded as Y, the pre-estimated distribution X corresponding to the plaintext data set is converted into posterior distribution related to a Bayesian statistical model, and the posterior distribution can be understood as the probability distribution of the known result and the cause is estimated according to the result; the distribution Y corresponding to the ciphertext data set can be regarded as prior distribution in a Bayesian statistical model, and the prior distribution can be understood as prior to the result to determine the probability distribution of the reason.

And S240, estimating the likelihood distribution of the Bayes statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayes statistical model is the estimated distribution corresponding to the decryption function.

Further, the decryption function may be denoted as Q, and the estimated distribution corresponding to the decryption function may be denoted as Q, so that the estimated distribution X corresponding to the plaintext data set, the distribution Y corresponding to the ciphertext data set, and the estimated distribution Q corresponding to the decryption function have the following relationship:

X∝QY (5)

that is, the distribution X corresponding to the plaintext data set is proportional to the product of the distribution Y corresponding to the ciphertext data set and the estimated distribution Q corresponding to the decryption function.

In the formula (5), the estimated distribution corresponding to the decryption function can be regarded as likelihood distribution in the bayesian statistical model, and the likelihood distribution can be understood as determining the reason first and estimating the probability distribution of the result according to the reason.

Further, before predicting the likelihood distribution of the bayesian statistical model based on the posterior distribution and the prior distribution, the method further comprises the following steps: and estimating parameters of the pre-estimated distribution corresponding to the decryption function by using a law of large numbers and the multivariate normal distribution.

Since the pre-estimated distribution corresponding to the decryption function is related to the plaintext data set input dimension, when the input dimension is 94, determining the pre-estimated distribution corresponding to the decryption function using the big data law and the multivariate normal distribution can be expressed as:

N(μ₉₄,Σ₉₄)

in the formula, mu₉₄Representing the corresponding mean vector, Σ, of the decryption function in the total dimension₉₄Representing the corresponding variance matrix of the decryption function in the total dimension.

Therefore, before determining the estimated distribution corresponding to the decryption function, the parameter μ of the estimated distribution corresponding to the decryption function needs to be determined₉₄Sum-sigma₉₄。

Specifically, multivariate normal distribution is selected as pre-estimated distribution corresponding to a decryption function, and a Bayesian statistical model after problem transformation, namely X and Y, is utilized to carry out pre-estimated distribution corresponding to the decryption function on a parameter mu₉₄Sum-sigma₉₄And estimating to determine the estimated distribution Q corresponding to the decryption function.

Further, the parameters of the pre-estimated distribution corresponding to the decryption function can be estimated by using the law of large numbers and the multivariate normal distribution.

The Law of big data (Law of Large Numbers) discusses the Law of convergence of the arithmetic mean of a sequence of random variables to the mathematically expected arithmetic mean of each random variable. Multivariate normal distribution (Multivariate normal distribution) is a popularization from single-dimensional normal distribution to multi-dimensional, and the frequency proportion in any value range can be estimated according to a formula as long as the mean and standard deviation of a variable which is subjected to normal distribution are known. Embodiments of the present invention solve the problem of the parameters contained in the decryption function using the law of large numbers and multivariate normal distribution.

Accordingly, step S240 may be further based on obtaining the posterior distribution X (i.e., the predicted distribution corresponding to the plaintext data set) and the prior distribution Y (i.e., the predicted distribution corresponding to the ciphertext data set) of the bayesian statistical model, and the parameter μ of the predicted distribution corresponding to the decryption function₉₄Sum-sigma₉₄Then, the likelihood distribution of the Bayes statistical model can be estimated, and the likelihood distribution of the Bayes statistical model is the solutionAnd (4) the estimated distribution corresponding to the cryptographic function.

And S250, selecting a first target encryption character string and a second target encryption character string.

The first target encryption character string and the second target encryption character string are ciphertext character strings obtained after encryption operation is performed by using an encryption algorithm, and plaintext similarity corresponding to the first target encryption character string and the second target encryption character string needs to be estimated.

S260, inputting the first target encryption character string and the second target encryption character string into the pre-estimated distribution corresponding to the decryption function respectively to obtain first pre-estimated plaintext data corresponding to the first target encryption character string and second pre-estimated plaintext data corresponding to the second target encryption character string.

According to the estimated distribution Q corresponding to the decryption function, the first target encryption character string can be marked as y _ new₁And the second target encryption character string is marked as y _ new₂Respectively, will y _ new₁And y _ new₂Inputting the data into the predicted distribution Q corresponding to the decryption function to obtain y _ new₁The corresponding first pre-estimated plaintext data is marked as x _ new₁And get y _ new₂The corresponding second pre-estimated plaintext data is marked as x _ new₂。

S270, similarity calculation is carried out on the first pre-estimated plaintext data and the second pre-estimated plaintext data, and plaintext similarity corresponding to the first target encryption character string and the second target encryption character string is obtained.

The way of calculating the similarity between the first predicted plaintext data and the second predicted plaintext data may be: calculating cosine similarity (cosine similarity), calculating Euclidean distance (Euclidean distance), calculating Mahalanobis distance (Mahalanobis), and the like, and the specific calculation method is not limited herein.

According to the plaintext similarity estimation method for the encrypted character string, when the similarity estimation is performed on the corresponding plaintext by using the ciphertext data, in the field of data security, the association relation between the plaintext data is obtained by calculation while the privacy data or the plaintext data are protected. The similarity of the plain texts can be obtained according to the known encryption algorithm and the known ciphertext relation; meanwhile, the calculation method is a generalized decryption model, the decryption problem is converted into a Bayesian statistical model, the method is theoretically suitable for various encryption algorithms, accurate decryption calculation does not need to be carried out on the encryption algorithms, and decryption cost is greatly reduced on the basis of certain accuracy.

EXAMPLE III

Fig. 3 is a block diagram of a plaintext similarity prediction apparatus for an encrypted string according to an embodiment of the present invention, which may be implemented by software and/or hardware, and may be generally integrated in a computer device such as a server, and may predict plaintext similarity of the encrypted string by performing a plaintext similarity prediction method for the encrypted string. As shown in fig. 3, the apparatus includes: an encryption operation module 31, an estimated distribution obtaining module 32, an estimated distribution calculating module 33 and a plaintext similarity estimating module 34, wherein:

the encryption operation module 31 is configured to obtain a plaintext data set, and perform an encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set;

an estimated distribution obtaining module 32, configured to model the plaintext data set and the ciphertext data set based on a plurality of distributions, respectively, to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set;

the pre-estimation distribution calculation module 33 is configured to pre-estimate the pre-estimation distribution corresponding to the decryption function according to the pre-estimation distribution corresponding to the plaintext data set and the pre-estimation distribution corresponding to the ciphertext data set based on a bayesian statistical model;

a plaintext similarity estimation module 34, configured to estimate plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, where an encryption algorithm corresponding to the target encryption character string is the preset encryption algorithm.

The plaintext similarity estimation device for the encrypted character string, provided by the embodiment of the invention, comprises the steps of firstly obtaining a plaintext data set, and carrying out encryption operation on the plaintext data set by using a preset encryption algorithm to obtain a ciphertext data set; then modeling a plaintext data set and a ciphertext data set respectively based on the multiple distributions to obtain an estimated distribution corresponding to the plaintext data set and an estimated distribution corresponding to the ciphertext data set; based on Bayesian statistical model, estimating the estimated distribution corresponding to the decryption function according to the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set; and finally, estimating the plaintext similarity between different target encryption character strings according to the estimated distribution corresponding to the decryption function, wherein the encryption algorithm corresponding to the target encryption character strings is the preset encryption algorithm. By adopting the technical scheme, the similarity of the plaintext data before encryption can be estimated through the encrypted ciphertext data, so that the privacy of the plaintext data is protected, and the incidence relation among a plurality of plaintext data can be estimated.

Optionally, the plaintext data set includes a preset number of character strings with different lengths and/or different combinations including at least one of numbers, letters, and symbols.

Optionally, the estimated distribution calculating module 33 includes: an estimated distribution conversion unit and an estimated distribution calculation unit, wherein:

the prediction distribution conversion unit is used for converting the prediction distribution corresponding to the plaintext data set into posterior distribution related to a Bayesian statistical model and converting the prediction distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model;

and the estimated distribution calculating unit is used for estimating the likelihood distribution of the Bayes statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayes statistical model is the estimated distribution corresponding to the decryption function.

Optionally, the estimated distribution calculating unit further includes: a parameter estimation subunit;

and the parameter estimation subunit is used for estimating the parameters of the estimated distribution corresponding to the decryption function by using a majority law and multivariate normal distribution.

Correspondingly, the estimated distribution calculating unit is further configured to estimate the likelihood distribution of the bayesian statistical model based on the posterior distribution, the prior distribution and the parameters of the estimated distribution corresponding to the decryption function.

Optionally, the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set are represented by the following expressions:

Multinomial(n₁,n₂,…,n_m,p₁,p₂,…,p_m)

Optionally, the pre-estimated distribution corresponding to the decryption function is represented by the following expression:

N(μ_m,Σ_m)

Optionally, the plaintext similarity prediction module 34 includes: the device comprises a target encryption character string selecting unit, a target encryption character string input unit and a plaintext similarity estimating unit, wherein:

the target encryption character string selection unit is used for selecting a first target encryption character string and a second target encryption character string;

a target encryption character string input unit, configured to input the first target encryption character string and the second target encryption character string to the pre-estimated distribution corresponding to the decryption function, respectively, to obtain first pre-estimated plaintext data corresponding to the first target encryption character string and second pre-estimated plaintext data corresponding to the second target encryption character string;

and the plaintext similarity estimation unit is used for calculating the similarity of the first estimated plaintext data and the second estimated plaintext data to obtain plaintext similarity corresponding to the first target encryption character string and the second target encryption character string.

The device for estimating the plaintext similarity of the encrypted character string provided by the embodiment of the invention can execute the method for estimating the plaintext similarity of the encrypted character string provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the method.

Example four

The embodiment of the invention provides computer equipment, and the plaintext similarity estimation device for the encrypted character string provided by the embodiment of the invention can be integrated in the computer equipment. Fig. 4 is a block diagram of a computer device according to an embodiment of the present invention. The computer device 400 may include: a memory 401, a processor 402 and a computer program stored in the memory 401 and executable by the processor, wherein the processor 402 implements the plaintext similarity estimation method for an encrypted string according to an embodiment of the invention when executing the computer program.

The computer device provided by the embodiment of the invention can execute the plaintext similarity estimation method for the encrypted character string provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the method.

EXAMPLE five

Embodiments of the present invention further provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a plaintext similarity prediction method for an encrypted string, the method including:

Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDRRAM, SRAM, EDORAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present invention is not limited to the plaintext similarity estimation operation for the encrypted string as described above, and may also perform related operations in the plaintext similarity estimation method for the encrypted string provided in any embodiment of the present invention.

The device, the equipment and the storage medium for estimating the plaintext similarity of the encrypted character string provided by the embodiment can execute the method for estimating the plaintext similarity of the encrypted character string provided by any embodiment of the invention, and have corresponding functional modules and beneficial effects for executing the method. For details of the technique not described in detail in the above embodiments, reference may be made to the plaintext similarity estimation method for encrypted strings according to any embodiment of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A plaintext similarity estimation method for an encrypted character string is characterized by comprising the following steps:

2. The method according to claim 1, characterized in that the plaintext data set comprises a preset number of character strings of different lengths and/or different combinations comprising at least one of numbers, letters and symbols.

3. The method according to claim 1, wherein the predicting the predicted distribution corresponding to the decryption function according to the predicted distribution corresponding to the plaintext data set and the predicted distribution corresponding to the ciphertext data set based on the bayesian statistical model comprises:

converting the pre-estimated distribution corresponding to the plaintext data set into posterior distribution related to the Bayesian statistical model, and converting the pre-estimated distribution corresponding to the ciphertext data set into prior distribution related to the Bayesian statistical model;

and estimating the likelihood distribution of the Bayes statistical model based on the posterior distribution and the prior distribution, wherein the likelihood distribution of the Bayes statistical model is the estimated distribution corresponding to the decryption function.

4. The method according to claim 3, wherein before estimating the likelihood distribution of the Bayesian statistical model based on the posterior distribution and the prior distribution, further comprising:

estimating parameters of pre-estimated distribution corresponding to the decryption function by using a law of large numbers and multivariate normal distribution;

correspondingly, the estimating the likelihood distribution of the bayesian statistical model based on the posterior distribution and the prior distribution comprises:

and predicting the likelihood distribution of the Bayesian statistical model based on the posterior distribution, the prior distribution and the parameters of the predicted distribution corresponding to the decryption function.

5. The method of claim 1, wherein the estimated distribution corresponding to the plaintext data set and the estimated distribution corresponding to the ciphertext data set are represented by the following expressions:

Multinomial(n₁,n₂,…,n_m,p₁,p₂,…,p_m)

6. The method of claim 1, wherein the pre-estimated distribution of the decryption function is expressed by the following expression:

N(μ_m,Σ_m)

in the formula, mu_mRepresenting the corresponding mean vector, Σ, of the decryption function in the total dimension_mRepresenting decryption function in general dimensionThe corresponding variance matrix in degrees.

7. The method according to claim 1, wherein said predicting plaintext similarity between different target encrypted strings according to the predicted distribution corresponding to the decryption function comprises:

selecting a first target encryption character string and a second target encryption character string;

inputting the first target encryption character string and the second target encryption character string to the pre-estimated distribution corresponding to the decryption function respectively to obtain first pre-estimated plaintext data corresponding to the first target encryption character string and second pre-estimated plaintext data corresponding to the second target encryption character string;

and carrying out similarity calculation on the first pre-estimated plaintext data and the second pre-estimated plaintext data to obtain plaintext similarity corresponding to the first target encryption character string and the second target encryption character string.

8. A plaintext similarity estimation device for an encrypted character string is characterized by comprising the following components:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.