CN113536377A

CN113536377A - Encrypted domain cross-modal information retrieval method based on hyperchaos pseudorandom sequence

Info

Publication number: CN113536377A
Application number: CN202110819110.0A
Authority: CN
Inventors: 周亮; 徐建博; 匡雅鑫; 索云飞; 冶占远; 魏昕
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2021-10-22
Anticipated expiration: 2041-07-20
Also published as: CN113536377B

Abstract

The invention discloses an encrypted domain cross-modal information retrieval method based on a hyperchaotic pseudorandom sequence, which comprises the following steps: preprocessing three modes of images, audio and tactile signals; inputting the preprocessed three modes into a constructed hyperchaotic pseudorandom sequence encryption system, and performing scrambling, column scrambling and pixel value replacement respectively to obtain ciphertext information of the three modes; feature extraction is carried out on ciphertext information of three modes by using a pre-trained VGG16 network, and the three modes are trained in respective corresponding branch networks; inputting the three trained modes into a multi-mode semantic fusion network for semantic fusion, retrieving the output result, decrypting the retrieved result to obtain a plaintext result and outputting the plaintext result. The method is different from the traditional retrieval method, the cross-modal information retrieval of three modes of images, audios and touch signals is considered, the information security problem of data is considered, and the cross-modal information retrieval in the encryption domain is realized.

Description

Encrypted domain cross-modal information retrieval method based on hyperchaos pseudorandom sequence

Technical Field

The invention relates to the technical field of information security and encrypted domain retrieval, in particular to an encrypted domain cross-modal information retrieval method based on a hyperchaotic pseudorandom sequence.

Background

With the continuous development of the information society, on one hand, methods such as deep learning and artificial intelligence are continuously created, multi-modal data such as images, audios, time frequencies and texts are produced, on the other hand, the problem of information security is also widely concerned by people, the comprehensive strength of a country is not only reflected in the total value of national production, but also in the protection of information security, meanwhile, people are shifted from retrieval among single modalities to cross-modality retrieval, and therefore, the problem of how to overcome semantic gaps among different modalities is also a great challenge, for example, an intelligent robot can simultaneously acquire image, audio and touch signal data in the process of man-machine interaction; however, the existing retrieval method under the encryption domain cannot directly retrieve the ciphertext information, or the retrieval accuracy is low, and cross-modal retrieval under the encryption domain cannot be realized; therefore, it is necessary to design a reasonable cross-modal information retrieval method under the encryption domain, which is used to protect data security on one hand and solve the retrieval problem of multiple modalities on the other hand.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made in view of the above-mentioned conventional problems.

Therefore, the technical problem solved by the invention is as follows: the existing retrieval method under the encryption domain can not directly retrieve the ciphertext information, or has low retrieval accuracy, and even can not realize cross-modal retrieval under the encryption domain.

In order to solve the technical problems, the invention provides the following technical scheme: preprocessing three modes of images, audio and tactile signals; inputting the preprocessed three modes into a constructed hyperchaotic pseudorandom sequence encryption system, and respectively performing scrambling, column scrambling and pixel value replacement to obtain ciphertext information of the three modes; extracting features of the ciphertext information of the three modes by using a pre-trained VGG16 network, and training the three modes in respective corresponding branch networks; inputting the three trained modes into a multi-mode semantic fusion network for semantic fusion, retrieving the output result, decrypting the retrieved result to obtain a plaintext result and outputting the plaintext result.

As a preferred scheme of the encrypted domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence, the method comprises the following steps: preprocessing the three modalities of the image, the audio and the tactile signal comprises adjusting the resolution of the image to be M multiplied by 3; pre-emphasis, framing, windowing, power spectrum calculation and filter bank processing are carried out on the audio to obtain a time-frequency graph of the audio, and the resolution is adjusted to be MxMx3; the haptic signal is processed by pre-emphasis, framing, windowing, power spectrum calculation and filter bank to obtain a time-frequency graph of the haptic signal, and the resolution is adjusted to be M multiplied by 3.

As a preferred scheme of the encrypted domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence, the method comprises the following steps: inputting the preprocessed three modes into a constructed hyperchaotic pseudo-random sequence encryption system for encryption, wherein the MxMx 3 image obtained by preprocessing the three modes comprises R, G, B three channel classifications, each channel is an MxM matrix, each channel is encrypted, and the MxM matrix is marked as I; four-dimensional hyperchaotic Chen system:

wherein ,

the state variable is expressed, x, y, z and w are expressed as system variables, a, b, c, d and e are expressed as control parameters of the system, and when a is 35, b is 3, c is 12, d is 7 and e is 0.58, the system is in a hyperchaotic state; setting a starting TIME point TIME _ START, an ending TIME point TIME _ FINAL and a STEP size STEP, and setting four initial values of the system [ x0 y0 z0 w0]Namely, the key is the key of the encryption system, and the Runge-Kutta method is used for solving the hyperchaotic Chen system to obtain a final four-dimensional hyperchaotic pseudorandom sequence lm; the encryption process comprises position scrambling and pixel value replacement.

As a preferred scheme of the encrypted domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence, the method comprises the following steps: the loss function of the multi-modal semantic fusion network comprises the following steps:

L＝L₁+λ·L₂

where λ denotes the hyperparameter of the loss function, E_V、E_A、E_TRespectively representing the encrypted sample numbers of the image, the audio and the tactile signals,

representing encrypted image samples E_VThe kth sample of (a) is passed through the features of the model output,

representing encrypted image samples E_VThe label corresponding to the characteristic of the kth sample,

representing encrypted audio samples E_AThe k-th one ofThe characteristics of the sample output through the model,

representing encrypted audio samples E_AThe label corresponding to the characteristic of the kth sample,

representing encrypted haptic signal samples E_TThe kth sample of (a) is passed through the features of the model output,

representing encrypted haptic signal samples E_TThe label corresponding to the kth sample characteristic in (1), g (-) represents a multi-class cross entropy loss function, E_sTotal number of samples representing image, audio, haptic signals, e^mRepresenting the total number of ciphertext samples E_sCharacteristic of the m-th sample through model output, c_mThe class center corresponding to the mth sample is indicated.

As a preferred scheme of the encrypted domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence, the method comprises the following steps: the structure of the multi-mode semantic fusion network comprises a full connection layer 3, a Dropout layer 3, a full connection layer 4, a Dropout layer 4, a full connection layer 5, a Dropout layer 5 and a full connection layer 6.

As a preferred scheme of the encrypted domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence, the method comprises the following steps: the output result comprises a ciphertext sample used based on a query and a ciphertext sample used for retrieval, and an output vector passing through the multi-mode semantic fusion network is marked as q_i and r_jAnd a cosine function is adopted for similarity measurement:

wherein i and j respectively represent the serial numbers of the samples in the query set and the retrieval set; and fixing i, traversing the retrieval set and then sequencing j according to the sequence of values from large to small, wherein the maximum value is the most similar result retrieved, namely the output result.

As a preferred scheme of the encrypted domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence, the method comprises the following steps: and decrypting the retrieval result comprises pixel value replacement, column scrambling and row scrambling.

As a preferred scheme of the encrypted domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence, the method comprises the following steps: the pixel value replacement comprises representing a matrix of a first channel R of the searched ciphertext result as E, and performing pixel value replacement on the E; marking the elements in the matrix E as E (i, j) (i is more than or equal to 1 and less than or equal to M, j is more than or equal to 1 and less than or equal to M), traversing from the first element position (1,1) to the last position (M, M) from left to right and from top to bottom for the first interval [0,32), and if the element E (i, j) is less than 32, performing pixel value replacement:

I(i,j)＝bitxor(E(i,j),l_p1(num1,1))

I(i,j)＝mod(I(i,j),32)

wherein the initial value of num1 is 0, the value of each element E (i, j) is less than 32, and the value of num1 is added with 1; for the second interval [32,64), starting from the first element position (1,1), going from left to right, going from top to bottom, going through to the last position (M, M), if 32 ≦ E (i, j) < 64, then pixel value replacement is performed:

I(i,j)＝bitxor(E(i,j),lp₂(num2,1))

I(i,j)＝mod(I(i,j),32)+32

wherein the initial value of num2 is 0, the value of E (i, j) of each element E (i, j) is more than or equal to E (i, j) < 64, and the value of num2 is added with 1; and so on, for the last interval [224,256), starting from the first element position (1,1), going from left to right, going from top to bottom, going through to the last position (M, M), if 224 ≦ I (I, j) < 256, then pixel value replacement is performed:

I(i,j)＝bitxor(E(i,j),l_p8(num8,1))

I(i,j)＝mod(I(i,j),32)+224

wherein the initial value of num8 is 0, the value 224 of each element E (i, j) is more than or equal to E (i, j) < 256, and the value of num8 is added with 1; and finally obtaining the image I after the pixel value replacement.

As a preferred scheme of the encrypted domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence, the method comprises the following steps: the column scrambling and the row scrambling include performing column permutation and row permutation on the image I subjected to the pixel value permutation: first, the column and row are scrambled, the c-th matrix of the matrix I_MColumn is interchanged with first column, then c_M-1Exchange column with first column, and so on, and finally c₁Columns are interchanged with the first column; then, the row scrambling is carried out, and the r-th matrix I after the column scrambling is carried out_MThe rows are interchanged with the first row, and then the r-th row is replaced_M-1Exchange the row with the first row, and so on, and finally get the r-th row₁Row is interchanged with the first row; finally, obtaining a matrix I after column scrambling and row scrambling, wherein I is the final decryption result.

The invention has the beneficial effects that: the method is different from the traditional retrieval method, the cross-modal information retrieval of three modes of images, audios and touch signals is considered, the information security problem of data is considered, and the cross-modal information retrieval in the encryption domain is realized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

fig. 1 is a schematic basic flow chart of an encrypted domain cross-modal information retrieval method based on a hyper-chaotic pseudorandom sequence according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating encryption and decryption effects of an encryption domain cross-modal information retrieval method based on a hyper-chaotic pseudorandom sequence according to an embodiment of the present invention;

fig. 3 is an original image and a ciphertext graph of three components of an image R, G, B and a corresponding image histogram of an encryption domain cross-modal information retrieval method based on a hyper-chaotic pseudorandom sequence according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1

Referring to fig. 1 to 3, an embodiment of the present invention provides an encrypted domain cross-modal information retrieval method based on a hyper-chaotic pseudorandom sequence, including:

s1: preprocessing three modes of images, audio and tactile signals;

specifically, the method comprises the following steps:

(1) adjusting the resolution of the image to M × M × 3;

(2) pre-emphasis, framing, windowing, power spectrum calculation and filter bank processing are carried out on the audio to obtain a time-frequency graph of the audio, and the resolution is adjusted to be MxMx3;

(3) similarly, the haptic signal is processed by pre-emphasis, framing, windowing, power spectrum calculation and filter bank to obtain a time-frequency diagram of the haptic signal, and the resolution is adjusted to be M × M × 3.

S2: inputting the preprocessed three modes into a constructed hyperchaotic pseudorandom sequence encryption system, and performing scrambling, column scrambling and pixel value replacement respectively to obtain ciphertext information of the three modes;

specifically, the method comprises the following steps:

(1) the mxmxm 3 images obtained after the preprocessing of the three modes all include R, G, B classification of three channels, each channel is a matrix with size of mxm, each channel is encrypted respectively, the encryption methods are the same, taking the R channel as an example, the mxm matrix is marked as I;

(2) four-dimensional hyperchaotic Chen system:

wherein ,

the state variables are expressed, x, y, z and w are expressed as system variables, a, b, c, d and e are expressed as control parameters of the system, and when a is 35, b is 3, c is 12, d is 7 and e is 0.58, the system is in a hyperchaotic state.

(3) Setting a starting TIME point TIME _ START, an ending TIME point TIME _ FINAL and a STEP size STEP, and setting four initial values [ x ] of the system₀ y₀ z₀ w₀]Namely, the key of the encryption system is obtained, and the Runge-Kutta method is utilized to solve the hyperchaotic Chen system to obtain the final four-dimensional hyperchaotic pseudorandom sequence l_m；

(4) The encryption method comprises the following steps:

A. position scrambling:

generating a row vector and a column vector for use in position scrambling:

r＝round(mod((abs(l_m(100:M+99,1))-floor(abs(l_m(100:M+99,1))))×10¹⁴,M))

c＝round(mod((abs(l_m(100:M+99,2))-floor(abs(l_m(100:M+99,2))))×10¹⁴,M))

where round denotes rounding, abs denotes absolute value, floor denotes rounding, mod denotes remainder, r and c are vectors each including M elements, each element is an integer between 0 and M-1, and the value of each element in the vectors r and c added by 1 is expressed as r ═ r [ r ═ r ] respectively₁,r₂,…,r_M] and c＝[c₁,c₂,…,c_M]；

Scrambling is performed first, and the r-th matrix of the matrix I is₁The rows are interchanged with the first row, and then the r-th row is replaced₂Exchange the row with the first row, and so on, and finally get the r-th row_MRow is interchanged with the first row;

then proceed to the columnScrambling, the c-th matrix I after row scrambling₁Column is interchanged with first column, then c₂Exchange column with first column, and so on, and finally c_MColumns are interchanged with the first column;

finally, obtaining a matrix I after row scrambling and column scrambling;

B. pixel value replacement:

calculating the number COUNT of elements in the element values of [0,32 ], [32,64 ], [64,96 ], [96,128 ], [128,160 ], [160,192 ], [192,224 ] and [224,256) in the matrix I₁、COUNT₂、COUNT₃、COUNT₄、COUNT₅、COUNT₆、COUNT₇、COUNT₈；

For the first interval [0,32), if COUNT₁Not 0, generating length of COUNT₁Of the hyper-chaotic pseudo-random sequence l_p1：

l_m1(:,i)＝round(mod((ads(l_m(100:COUNT₁+99,1))-floor(abs(l_m(100:COUNT₁+99,1))))×10¹⁴,32))(1≤i≤4)

l_p1＝mod(l_sum,32)

In the same way, the hyperchaotic pseudo-random sequence l corresponding to the rest seven intervals is obtained_p2,…,l_p8；

Marking the elements in the matrix I as I (I, j) (1 ≦ I ≦ M,1 ≦ j ≦ M), for the first interval [0,32), starting from the first element position (1,1), going from left to right, going from top to bottom, going through to the last position (M, M), if the element I (I, j) is less than 32, then performing pixel value permutation:

E(i,j)＝bitxor(I(i,j),l_p1(num1,1))

E(i,j)＝mod(E(i,j),32)

in the above formula, bitxor represents bitxor, num1 has an initial value of 0, and each element I (I, j) has a value less than 32, so num1 has a value of 1;

for the second interval [32,64), starting from the first element position (1,1), going from left to right, going from top to bottom, going through to the last position (M, M), if 32 ≦ I (I, j) < 64, then pixel value replacement is performed:

E(i,j)＝bitxor(I(i,j),l_p2(num2,1))

E(i,j)＝mod(E(i,j),32)+32

in the above formula, num2 has an initial value of 0, and if each element I (I, j) has a value of 32 ≦ I (I, j) < 64, then num2 is incremented by 1;

and so on, for the last bin [224,256), starting at the first element position (1,1), going from left to right, going from top to bottom, going through to the last position (M, M), if 224 ≦ I (I, j) < 256, then pixel value replacement is done:

E(i,j)＝bitxor(I(i,j),l_p8(num8,1))

E(i,j)＝mod(E(i,j),32)+224

in the above formula, num8 has an initial value of 0, and each element I (I, j) has a value 224 of I (I, j) < 256, then num8 is added with 1;

and finally obtaining the ciphertext image E.

(5) Similarly, the same encryption method is adopted for G, B channel components of the image, and finally the two-dimensional matrixes obtained by respectively encrypting the three channel components of the original image R, G, B are spliced to obtain a final three-dimensional color ciphertext image E.

S3: feature extraction is carried out on ciphertext information of three modes by using a pre-trained VGG16 network, and the three modes are trained in respective corresponding branch networks;

s4: inputting the three trained modes into a multi-mode semantic fusion network for semantic fusion, retrieving the output result, decrypting the retrieved result to obtain a plaintext result and outputting the plaintext result;

specifically, a cross-modal information retrieval model related to three-modal ciphertext information is constructed:

(1) ciphertext information of an image, an audio signal and a tactile signal after being subjected to preprocessing and an encryption system is input into a VGG16 network which is pre-trained on ImageNet and is provided with a top full connection layer removed, the ciphertext information is flattened into a one-dimensional vector, then Adam optimizers and multi-classification cross entropy loss functions are adopted in respective network branches for training, wherein the network structure comprises a batch normalization layer, a Dropout layer 1, a full connection layer 1 (an activation function is Relu), a Dropout layer 2 and a full connection layer 2 (an activation function is Softmax and is used for final classification output), after the network training is finished, the full connection layer 2 is removed, and the rest of networks with trained weight parameters are input into a multi-modal fusion network;

(2) design loss function L:

L＝L₁+λ·L₂

representing encrypted audio samples E_AThe kth sample of (a) is passed through the features of the model output,

representing encrypted haptic signal samples E_TThe label corresponding to the kth sample characteristic in (1), g (-) represents a multi-class cross entropy loss function, E_sTotal number of samples representing image, audio, haptic signals, e^mRepresenting the total number of ciphertext samples E_sCharacteristic of the m-th sample through model output, c_mThe class center corresponding to the mth sample is represented, and the center is continuously updated with the batch of each iteration.

Wherein, the multi-modal semantic fusion network structure includes: fully connected layer 3 (activation function Relu), Dropout layer 3, fully connected layer 4 (activation function Relu), Dropout layer 4, fully connected layer 5 (activation function Relu), Dropout layer 5, fully connected layer 6 (activation function Softmax for final classification output); training by adopting an Adam optimizer and a loss function L, setting the total iteration number as K, stopping iteration after the iteration number is reached, and storing a model structure and a trained model weight;

(3) providing a ciphertext sample for inquiry and a ciphertext sample for retrieval, and recording an output vector passing through the multi-mode semantic fusion network as q_i and r_j(ii) a And (3) carrying out similarity measurement by adopting a cosine function:

wherein i and j respectively represent the serial numbers of the samples in the query set and the retrieval set;

and fixing i, traversing the retrieval set and then sequencing j according to the sequence of values from large to small, wherein the maximum value is the most similar result retrieved, namely the output result.

Further, the retrieval result is decrypted and output:

representing the matrix of the first channel R of the searched ciphertext result as E, and performing pixel value replacement on the E;

marking the elements in the matrix E as E (i, j) (i is more than or equal to 1 and less than or equal to M, j is more than or equal to 1 and less than or equal to M), traversing from the first element position (1,1) to the last position (M, M) from left to right and from top to bottom for the first interval [0,32), and if the element E (i, j) is less than 32, performing pixel value replacement:

I(i,j)＝bitxor(E(i,j),l_p1(num1,1))

I(i,j)＝mod(I(i,j),32)

wherein the initial value of num1 is 0, the value of each element E (i, j) is less than 32, and the value of num1 is added with 1;

for the second interval [32,64), starting from the first element position (1,1), going from left to right, going from top to bottom, going through to the last position (M, M), if 32 ≦ E (i, j) < 64, then pixel value replacement is performed:

I(i,j)＝bitxor(E(i,j),l_p2(num2,1))

I(i,j)＝mod(I(i,j),32)+32

wherein the initial value of num2 is 0, the value of E (i, j) is more than or equal to 32 and less than or equal to 64 for each element E (i, j), and the value of num2 is added with 1;

and so on, for the last interval [224,256), starting from the first element position (1,1), going from left to right, going from top to bottom, going through to the last position (M, M), if 224 ≦ I (I, j) < 256, then pixel value replacement is performed:

I(i,j)＝bitxor(E(i,j),l_p8(num8,1))

I(i,j)＝mod(I(i,j),32)+224

wherein the initial value of num8 is 0, the value 224 of each element E (i, j) is more than or equal to E (i, j) < 256, and the value of num8 is added with 1;

and finally obtaining the image I after the pixel value replacement.

And performing column replacement and row replacement on the image I subjected to pixel value replacement:

here, r ═ r introduced in step S2₁,r₂,…,r_M] and c＝[C₁,C₂,…,C_M]First, the column and row are scrambled, the c-th matrix of the matrix I_MColumn is interchanged with first column, then c_M-1Exchange column with first column, and so on, and finally c₁Columns are interchanged with the first column;

then, the row scrambling is carried out, and the r-th matrix I after the column scrambling is carried out_MThe rows are interchanged with the first row, and then the r-th row is replaced_M-1Exchange the row with the first row, and so on, and finally get the r-th row₁Row is interchanged with the first row;

finally obtaining a matrix I after column scrambling and row scrambling, wherein the matrix I is a final decryption result;

similarly, the same decryption method is adopted for G, B channel components of the image, and finally the two-dimensional matrixes obtained by respectively decrypting the three channel components of the ciphertext image R, G, B are spliced to obtain the finally decrypted original image I.

The method comprises the steps of preprocessing audio signals and tactile signals to obtain respective time-frequency graphs, adjusting three modes to be the same in resolution, encrypting information of the three modes by an encryption method based on a hyperchaos pseudorandom sequence, including position scrambling and pixel value replacement, performing cross-mode information retrieval on ciphertext information of the three modes, and finally decrypting retrieved results to output the final plaintext results. The invention is different from the traditional scheme in that the cross-modal information retrieval of three modes of images, audio and touch signals is considered, the information security problem of data is considered, and the cross-modal information retrieval in an encryption domain is realized.

Example 2

Different from the first embodiment, the embodiment provides a test for the encrypted domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence, and in order to verify the technical effects adopted in the method, the embodiment adopts the conventional technical scheme and the method of the invention to perform a comparison test, and compares the test results by means of scientific demonstration to verify the real effect of the method.

The traditional technical scheme is as follows: a hyperchaotic pseudorandom sequence image encryption method is provided, in the step of 'pixel value replacement' of the encryption method, the division of a pixel interval is not considered, namely, the traditional method only considers the encryption of the whole pixel interval of 0-255, but the method is often low in retrieval performance. In order to verify that the method has higher retrieval performance compared with the traditional method. In this embodiment, the above-mentioned conventional scheme and the method suitable for searching under the encryption condition are respectively used for comparing the searching performance.

And (3) testing environment: the data set selects a surface texture material public data set (https:// zeus. lmt. ei. tum. de/downloads/texture) containing an image (V), an audio (A) and a touch signal (T), and a training set, a verification set and a test set are respectively according to the following steps of 3: 1: 1, encrypting data in a code written on MATLAB2017a to obtain ciphertext data, performing a retrieval experiment on the ciphertext data in a Jupiter notewood software by using a retrieval model code written by Python, and measuring retrieval performances of different methods by using an MAP value to obtain experimental data, wherein the larger the MAP value is, the better the retrieval performance is represented, and the results are shown in the following table:

table 1: the experimental results are shown in a comparison table.

From the above table, it can be seen that the retrieval performance of the method of the present invention is much better than that of the conventional method.

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. An encrypted domain cross-modal information retrieval method based on a hyperchaotic pseudorandom sequence is characterized by comprising the following steps:

preprocessing three modes of images, audio and tactile signals;

inputting the preprocessed three modes into a constructed hyperchaotic pseudorandom sequence encryption system, and respectively performing scrambling, column scrambling and pixel value replacement to obtain ciphertext information of the three modes;

extracting features of the ciphertext information of the three modes by using a pre-trained VGG16 network, and training the three modes in respective corresponding branch networks;

inputting the three trained modes into a multi-mode semantic fusion network for semantic fusion, retrieving the output result, decrypting the retrieved result to obtain a plaintext result and outputting the plaintext result.

2. The encryption domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence as recited in claim 1, wherein: the three modalities of preprocessing the image, audio and tactile signals include,

adjusting the resolution of the image to mxmx 3;

pre-emphasis, framing, windowing, power spectrum calculation and filter bank processing are carried out on the audio to obtain a time-frequency graph of the audio, and the resolution is adjusted to be MxMx3;

the haptic signal is processed by pre-emphasis, framing, windowing, power spectrum calculation and filter bank to obtain a time-frequency graph of the haptic signal, and the resolution is adjusted to be M multiplied by 3.

3. The encryption domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence as recited in claim 2, wherein: inputting the three preprocessed modes into the constructed hyperchaotic pseudorandom sequence encryption system to perform the encryption process,

the MxMx3 images obtained after the preprocessing of the three modes comprise R, G, B three channel classifications, each channel is an MxM matrix, each channel is encrypted, and the MxM matrix is marked as I;

four-dimensional hyperchaotic Chen system:

wherein ,

the state variable is expressed, x, y, z and w are expressed as system variables, a, b, c, d and e are expressed as control parameters of the system, and when a is 35, b is 3, c is 12, d is 7 and e is 0.58, the system is in a hyperchaotic state;

setting a starting TIME point TIME _ START, a terminating TIME point TIME _ FINAL and a STEP STEP, setting four initial values [ x0 y0 z0 w0] of the system, namely a key of the encryption system, and solving the hyperchaotic Chen system by using a Runge-Kutta method to obtain a FINAL four-dimensional hyperchaotic pseudorandom sequence lm;

the encryption process comprises position scrambling and pixel value replacement.

4. The encryption domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence as recited in claim 3, wherein: the loss function of the multi-modal semantic fusion network includes,

setting the loss function as:

L＝L₁+λ·L₂

5. The encryption domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence as recited in claim 4, wherein: the structure of the multi-modal semantic fusion network comprises,

full connection layer 3, Dropout layer 3, full connection layer 4, Dropout layer 4, full connection layer 5, Dropout layer 5, and full connection layer 6.

6. The encryption domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence as recited in claim 5, wherein: the result of the output includes a result of,

based on a ciphertext sample used for inquiry and a ciphertext sample used for retrieval, an output vector passing through the multi-mode semantic fusion network is recorded as q_i and r_jAnd a cosine function is adopted for similarity measurement:

7. The encryption domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence as recited in claim 6, wherein: and decrypting the retrieval result comprises pixel value replacement, column scrambling and row scrambling.

8. The encryption domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence as recited in claim 7, wherein: the pixel value replacement includes the replacement of the pixel value,

representing a matrix of a first channel R of the searched ciphertext result as E, and performing pixel value replacement on the E;

I(i，j)＝bitxor(E(i，j)，l_p1(num1，1))

I(i，j)＝mod(I(i，j)，32)

I(i，j)＝bitxor(E(i，j)，l_p2(num2，1))

I(i，j)＝mod(I(i，j)，32)+32

wherein the initial value of num2 is 0, the value of E (i, j) of each element E (i, j) is more than or equal to E (i, j) < 64, and the value of num2 is added with 1;

I(i，j)＝bitxor(E(i，j)，l_p8(num8，1))

I(i，j)＝mod(I(i，j)，32)+224

and finally obtaining the image I after the pixel value replacement.

9. The encryption domain cross-modal information retrieval method based on the hyperchaotic pseudorandom sequence as recited in claim 7, wherein: the column scrambling and the row scrambling may comprise,

and performing column replacement and row replacement on the image I subjected to the pixel value replacement:

first, the column and row are scrambled, the c-th matrix of the matrix I_MColumn is interchanged with first column, then c_M-1Exchange column with first column, and so on, and finally c₁Columns are interchanged with the first column;

finally, obtaining a matrix I after column scrambling and row scrambling, wherein I is the final decryption result.