CN113536377B

CN113536377B - Encryption domain cross-modal information retrieval method based on hyperchaotic pseudorandom sequence

Info

Publication number: CN113536377B
Application number: CN202110819110.0A
Authority: CN
Inventors: 周亮; 徐建博; 匡雅鑫; 索云飞; 冶占远; 魏昕
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2023-09-05
Anticipated expiration: 2041-07-20
Also published as: CN113536377A

Abstract

The application discloses an encryption domain cross-modal information retrieval method based on a hyper-chaos pseudorandom sequence, which comprises the following steps: preprocessing three modes of image, audio and touch signals; inputting the preprocessed three modes into a constructed hyperchaotic pseudo-random sequence encryption system, and respectively carrying out row scrambling, column scrambling and pixel value replacement to obtain ciphertext information of the three modes; feature extraction is carried out on ciphertext information of three modes by utilizing a pretrained VGG16 network, and the three modes are trained in the branch networks corresponding to the three modes; inputting the trained three modes into a multi-mode semantic fusion network for semantic fusion, retrieving the output result, decrypting the retrieved result, and obtaining a plaintext result output. The application is different from the traditional retrieval method, takes the cross-mode information retrieval of three modes of image, audio and touch signals into consideration, and takes the information security problem of data into consideration, thereby realizing the cross-mode information retrieval in an encryption domain.

Description

Encryption domain cross-modal information retrieval method based on hyperchaotic pseudorandom sequence

Technical Field

The application relates to the technical fields of information security and encryption domain retrieval, in particular to an encryption domain cross-mode information retrieval method based on a hyper-chaos pseudorandom sequence.

Background

With the continuous development of information society, on one hand, methods such as deep learning, artificial intelligence and the like are continuously developed, multi-mode data such as images, audios, time-frequency, texts and the like are produced, on the other hand, information security problems are also widely focused by people, the comprehensive strength of one country is reflected on the total national production value and the protection of information security, meanwhile, people are shifted from the retrieval among single modes to the cross-mode retrieval, so that the problem of semantic gap among different modes is overcome, and the problem of simultaneous acquisition of the images, the audios and the tactile signal data in the process of man-machine interaction of an intelligent robot is also a great challenge; however, the existing retrieval method under the encryption domain cannot directly retrieve the ciphertext information, or the retrieval accuracy is low, and cross-mode retrieval under the encryption domain is not realized; therefore, a reasonable cross-mode information retrieval method under an encryption domain needs to be designed, which is used for protecting data security on one hand and solving the retrieval problem of multiple modes on the other hand.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.

The present application has been made in view of the above-described problems occurring in the prior art.

Therefore, the technical problems solved by the application are as follows: the existing retrieval method under the encryption domain can not directly retrieve the ciphertext information, or the retrieval accuracy is lower, and cross-mode retrieval under the encryption domain is not realized.

In order to solve the technical problems, the application provides the following technical scheme: preprocessing three modes of image, audio and touch signals; inputting the three preprocessed modes into a constructed encryption system of the hyper-chaotic pseudorandom sequence, and respectively carrying out row scrambling, column scrambling and pixel value replacement to obtain ciphertext information of the three modes; extracting characteristics of ciphertext information of the three modes by utilizing a pretrained VGG16 network, and training the three modes in branch networks corresponding to the three modes; inputting the trained three modes into a multi-mode semantic fusion network for semantic fusion, retrieving the output result, decrypting the retrieved result, and obtaining a plaintext result output.

As a preferable scheme of the encryption domain cross-modal information retrieval method based on the hyper-chaos pseudo-random sequence, the application comprises the following steps: preprocessing the image, audio and tactile signals includes adjusting the resolution of the image to mxmxmx 3; the audio frequency is subjected to pre-emphasis, framing, windowing, power spectrum calculation and filter bank to obtain a time-frequency diagram of the audio frequency, and the resolution is adjusted to be MxMx3; the haptic signal is subjected to pre-emphasis, framing, windowing, power spectrum calculation and filter bank to obtain a time-frequency diagram of the haptic signal, and the resolution is adjusted to be MxMx3.

As a preferable scheme of the encryption domain cross-modal information retrieval method based on the hyper-chaos pseudo-random sequence, the application comprises the following steps: inputting the three preprocessed modes into an encryption system of a constructed hyperchaotic pseudo-random sequence for encryption, wherein an MxM x 3 image obtained after the preprocessing of the three modes comprises R, G, B channel classifications, each channel is a matrix with the size of MxM, each channel is respectively encrypted, and the matrix of MxM is marked as I; four-dimensional hyper-chaos Chen system:

wherein ,representing state variables, x, y, z, w representing system variables, a, b, c, d, e representing control parameters of the system, the system being in a hyperchaotic state when a=35, b=3, c=12, d=7, e=0.58; setting a START TIME point TIME_START, an end TIME point TIME_FINAL and a STEP, and setting four initial values [ x0 y0 z0 w0] of the system]The method is characterized in that the method is a secret key of an encryption system, and a Runge-Kutta method is utilized to solve a hyperchaotic Chen system to obtain a final four-dimensional hyperchaotic pseudo-random sequence lm; the encryption process comprises position scrambling and pixel value replacement.

As a preferable scheme of the encryption domain cross-modal information retrieval method based on the hyper-chaos pseudo-random sequence, the application comprises the following steps: the loss function of the multi-mode semantic fusion network comprises the following steps of:

L＝L ₁ +λ·L ₂

wherein λ represents the hyper-parameter of the loss function, E _V 、E _A 、E _T Respectively represent the number of samples of the image, audio and tactile signals after encryption,representing an encrypted image sample E _V Features of the kth sample output through the model,/-)>Representing an encrypted image sample E _V A tag corresponding to the kth sample feature, < ->Representing encrypted audio samples E _A Features of the kth sample output through the model,/-)>Representing encrypted audio samples E _A A tag corresponding to the kth sample feature, < ->Representing an encrypted haptic signal sample E _T Features of the kth sample output through the model,/-)>Representing an encrypted haptic signal sample E _T The label corresponding to the kth sample feature, g (·) represents a multi-class cross entropy loss function, E _s Representing the total number of samples of the image, audio, haptic signal, e ^m Representing the total number of ciphertext samples E _s Features of the mth sample output by the model, c _m Representing the class center to which the mth sample corresponds.

As a preferable scheme of the encryption domain cross-modal information retrieval method based on the hyper-chaos pseudo-random sequence, the application comprises the following steps: the multimode semantic fusion network comprises a full-connection layer 3, a Dropout layer 3, a full-connection layer 4, a Dropout layer 4, a full-connection layer 5, a Dropout layer 5 and a full-connection layer 6.

As a preferable scheme of the encryption domain cross-modal information retrieval method based on the hyper-chaos pseudo-random sequence, the application comprises the following steps: the output result comprises that based on a ciphertext sample used by a query and a ciphertext sample used by a search, the output vector passing through the multi-mode semantic fusion network is marked as q _i and r_j And adopts cosine function to make similarity measurement:

wherein i and j respectively represent the sequence numbers of the samples in the query set and the search set; and (3) fixing i, namely sorting j according to the sequence from large to small after traversing the search set, wherein the maximum value is the most similar result obtained by searching, namely the output result.

As a preferable scheme of the encryption domain cross-modal information retrieval method based on the hyper-chaos pseudo-random sequence, the application comprises the following steps: decrypting the retrieved result includes pixel value permutation, column scrambling, and row scrambling.

As a preferable scheme of the encryption domain cross-modal information retrieval method based on the hyper-chaos pseudo-random sequence, the application comprises the following steps: the pixel value replacement comprises the steps of representing a matrix of a first channel R of a retrieved ciphertext result as E, and carrying out pixel value replacement on the E; the elements in the matrix E are marked as E (i, j) (i is more than or equal to 1 and less than or equal to M, j is more than or equal to 1 and less than or equal to M), for the first interval [0,32 ], the first element position (1, 1) is traversed from left to right and from top to bottom to the last position (M, M), and if the element E (i, j) is less than 32, pixel value replacement is carried out:

I(i,j)＝bitxor(E(i,j),l _p1 (num1,1))

I(i,j)＝mod(I(i,j),32)

wherein, the initial value of num1 is 0, the value of each element E (i, j) is smaller than 32, and the value of num1 is added with 1; for the second interval [32, 64), starting from the first element position (1, 1), traversing from left to right, top to bottom, to the last position (M, M), if 32.ltoreq.E (i, j) < 64, performing pixel value substitution:

I(i,j)＝bitxor(E(i,j),lp ₂ (num2,1))

I(i,j)＝mod(I(i,j),32)+32

wherein, the initial value of num2 is 0, the value of each element E (i, j) is 32 less than or equal to E (i, j) < 64, and the value of num2 is added with 1; and so on, for the last interval [224,256 ], starting from the first element position (1, 1), traversing from left to right and top to bottom to the last position (M, M), if 224.ltoreq.I (I, j) < 256, performing pixel value substitution:

I(i,j)＝bitxor(E(i,j),l _p8 (num8,1))

I(i,j)＝mod(I(i,j),32)+224

wherein, the initial value of num8 is 0, the value 224 of each element E (i, j) is less than or equal to E (i, j) < 256, and the value of num8 is added with 1; finally, the image I with the pixel value replaced is obtained.

As a preferable scheme of the encryption domain cross-modal information retrieval method based on the hyper-chaos pseudo-random sequence, the application comprises the following steps: the column scrambling and row scrambling include performing column permutation and row permutation on the image I subjected to the pixel value permutation: first row scrambling, c-th of matrix I _M The column is interchanged with the first column, and then the c _M-1 Column is interchanged with first column, and so on, and finally c ₁ The columns are interchanged with the first column; then row scrambling is carried out, and the (r) th matrix I after column scrambling is carried out _M The row is interchanged with the first row, and the r is added again _M-1 The row is interchanged with the first row, so as to push, and finally the (r) ₁ The rows are interchanged with the first row; and finally obtaining a matrix I after column scrambling and row scrambling, wherein the matrix I is the final decryption result.

The application has the beneficial effects that: the application is different from the traditional retrieval method, takes the cross-mode information retrieval of three modes of image, audio and touch signals into consideration, and takes the information security problem of data into consideration, thereby realizing the cross-mode information retrieval in an encryption domain.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a basic flow diagram of an encryption domain cross-modal information retrieval method based on a hyperchaotic pseudorandom sequence according to an embodiment of the application;

fig. 2 is a schematic diagram of encryption and decryption effects of an encryption domain cross-modal information retrieval method based on a hyperchaotic pseudorandom sequence according to an embodiment of the application;

fig. 3 is an original image and a ciphertext image of three components of an image R, G, B of an encryption domain cross-modal information retrieval method based on a hyperchaotic pseudorandom sequence and a corresponding image histogram according to an embodiment of the application.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present application can be understood in detail, a more particular description of the application, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the application. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

While the embodiments of the present application have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the application. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Also in the description of the present application, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.

Example 1

Referring to fig. 1 to 3, for one embodiment of the present application, an encrypted domain cross-mode information retrieval method based on a hyperchaotic pseudo-random sequence is provided, including:

s1: preprocessing three modes of image, audio and touch signals;

specific:

(1) Adjusting the resolution of the image to be m×m×3;

(2) The audio frequency is subjected to pre-emphasis, framing, windowing, power spectrum calculation and filter bank to obtain a time-frequency diagram of the audio frequency, and the resolution is adjusted to be MxMx3;

(3) Similarly, the haptic signal is subjected to pre-emphasis, framing, windowing, power spectrum calculation, filter bank to obtain a time-frequency diagram of the haptic signal, and the resolution is adjusted to be MxMx3.

S2: inputting the preprocessed three modes into a constructed hyperchaotic pseudo-random sequence encryption system, and respectively carrying out row scrambling, column scrambling and pixel value replacement to obtain ciphertext information of the three modes;

specific:

(1) The MxM x 3 image obtained after preprocessing in three modes comprises R, G, B channel classifications, each channel is a matrix with the size of MxM, each channel is respectively encrypted, the encryption method is the same, and the matrix with the size of MxM is marked as I by taking an R channel as an example;

(2) Four-dimensional hyper-chaos Chen system:

wherein ,representing state variables, x, y, z, w representing system variables, a, b, c, d, e representing control parameters of the system, the system is in a hyperchaotic state when a=35, b=3, c=12, d=7, e=0.58.

(3) Setting a START TIME point TIME_START, an end TIME point TIME_FINAL and a STEP, and setting four initial values [ x ] of the system ₀ y ₀ z ₀ w ₀ ]The method is that a secret key of an encryption system is used, and a Runge-Kutta method is utilized to solve a hyperchaotic Chen system to obtain a final four-dimensional hyperchaotic pseudo-random sequence l _m ；

(4) The encryption method comprises the following steps:

A. position scrambling:

generating row vectors and column vectors for use in position scrambling:

r＝round(mod((abs(l _m (100:M+99,1))-floor(abs(l _m (100:M+99,1))))×10 ¹⁴ ,M))

c＝round(mod((abs(l _m (100:M+99,2))-floor(abs(l _m (100:M+99,2))))×10 ¹⁴ ,M))

wherein round represents rounding, abs represents taking absolute value, floor represents rounding downwards, mod represents taking remainder, r and c are vectors containing M elements, each element is an integer between 0 and M-1, and the value of each element in the vectors r and c is added with 1 and then is expressed as r= [ r ] ₁ ,r ₂ ,…,r _M] and c＝[c₁ ,c ₂ ,…,c _M ]；

Scrambling is performed first, and the (r) th of the matrix I is processed ₁ The row is interchanged with the first row, and the r is added again ₂ The row is interchanged with the first row, so as to push, and finally the (r) _M The rows are interchanged with the first row;

then column scrambling is carried out, and the c-th matrix I after row scrambling is carried out ₁ The column is interchanged with the first column, and then the c ₂ Column is interchanged with first column, and so on, and finally c _M The columns are interchanged with the first column;

finally, obtaining a matrix I after row scrambling and column scrambling;

B. pixel value substitution:

calculating the number of elements COUNT in the matrix I, wherein the element values are respectively [0,32 ], [32,64 ], [64,96 ], [96,128 ], [128,160 ], [160,192 ], [192,224 ], [224,256) ₁ 、COUNT ₂ 、COUNT ₃ 、COUNT ₄ 、COUNT ₅ 、COUNT ₆ 、COUNT ₇ 、COUNT ₈ ；

For the first interval [0, 32), if COUNT ₁ Is not 0, generates a length of COUNT ₁ Hyper-chaotic pseudorandom sequence l of (2) _p1 ：

l _m1 (:,i)＝round(mod((ads(l _m (100:COUNT ₁ +99,1))-floor(abs(l _m (100:COUNT ₁ +99,1))))×10 ¹⁴ ,32))(1≤i≤4)

l _p1 ＝mod(l _sum ,32)

Similarly, a hyper-chaos pseudo-random sequence l corresponding to the rest seven intervals is obtained _p2 ,…,l _p8 ；

The elements in the matrix I are denoted as I (I, j) (1.ltoreq.i.ltoreq.M, 1.ltoreq.j.ltoreq.M), for the first interval [0, 32), from left to right, from top to bottom, traversing to the last position (M, M), if the element I (I, j) is smaller than 32, pixel value substitution is performed:

E(i,j)＝bitxor(I(i,j),l _p1 (num1,1))

E(i,j)＝mod(E(i,j),32)

in the above formula, bitxor indicates that the initial value of num1 is 0, and the value of each element I (I, j) is less than 32, then the value of num1 is added by 1;

for the second interval [32, 64), starting from the first element position (1, 1), traversing from left to right, top to bottom, to the last position (M, M), if 32.ltoreq.I (I, j) < 64, performing pixel value substitution:

E(i,j)＝bitxor(I(i,j),l _p2 (num2,1))

E(i,j)＝mod(E(i,j),32)+32

in the above formula, the initial value of num2 is 0, and the value 32 of each element I (I, j) is less than or equal to I (I, j) < 64, then the value of num2 is added with 1;

and so on, for the last interval [224,256), starting from the first element position (1, 1), traversing from left to right, top to bottom, to the last position (M, M), if 224.ltoreq.I (I, j) < 256, pixel value permutation is performed:

E(i,j)＝bitxor(I(i,j),l _p8 (num8,1))

E(i,j)＝mod(E(i,j),32)+224

in the above formula, the initial value of num8 is 0, and the value 224 of each element I (I, j) is less than or equal to I (I, j) < 256, then the value of num8 is added with 1;

finally, a ciphertext image E is obtained.

(5) And the same encryption method is adopted for the two channel components of the image G, B, and finally, the two-dimensional matrixes after the three channel components of the original image R, G, B are respectively encrypted are spliced to obtain a final three-dimensional color ciphertext image E.

S3: feature extraction is carried out on ciphertext information of three modes by utilizing a pretrained VGG16 network, and the three modes are trained in the branch networks corresponding to the three modes;

s4: inputting the trained three modes into a multi-mode semantic fusion network for semantic fusion, retrieving the output result, decrypting the retrieved result, and obtaining a plaintext result output;

specifically, a cross-modal information retrieval model of three-modal ciphertext information is constructed:

(1) Inputting ciphertext information of the image, the audio and the touch signals after pretreatment and an encryption system into a VGG16 network which is pretrained on an ImageNet and has a top full-connection layer removed, flattening the ciphertext information into a one-dimensional vector, training the vector by adopting an Adam optimizer and a multi-classification cross entropy loss function in each network branch, wherein the network structure comprises a batch normalization layer, a Dropout layer 1, a full-connection layer 1 (an activation function is Relu), a Dropout layer 2 and a full-connection layer 2 (an activation function is Softmax and used for final classification output), removing the full-connection layer 2 after the network training is finished, and inputting the rest of network with trained weight parameters into a multi-mode semantic fusion network;

(2) Designing a loss function L:

L＝L ₁ +λ·L ₂

wherein λ represents the hyper-parameter of the loss function, E _V 、E _A 、E _T Respectively represent the number of samples of the image, audio and tactile signals after encryption,representing an encrypted image sample E _V Features of the kth sample output through the model,/-)>Representing an encrypted image sample E _V A tag corresponding to the kth sample feature, < ->Representing encrypted audio samples E _A Features of the kth sample output through the model,/-)>Representing encrypted audio samples E _A A tag corresponding to the kth sample feature, < ->Representing an encrypted haptic signal sample E _T Features of the kth sample output through the model,/-)>Representing an encrypted haptic signal sample E _T The label corresponding to the kth sample feature, g (·) represents a multi-class cross entropy loss function, E _s Representing the total number of samples of the image, audio, haptic signal, e ^m Representing the total number of ciphertext samples E _s Features of the mth sample output by the model, c _m Representing the class center corresponding to the mth sample, which is updated continuously with the batch for each iteration.

The multi-mode semantic fusion network structure comprises: full connection layer 3 (activation function Relu), dropout layer 3, full connection layer 4 (activation function Relu), dropout layer 4, full connection layer 5 (activation function Relu), dropout layer 5, full connection layer 6 (activation function Softmax for final classification output); training by adopting an Adam optimizer and a loss function L, setting the total iteration number as K, stopping iteration after the iteration number is reached, and storing a model structure and a trained model weight;

(3) Providing a ciphertext sample for query and a ciphertext sample for retrieval, and marking an output vector passing through the multi-mode semantic fusion network as q _i and r_j The method comprises the steps of carrying out a first treatment on the surface of the Similarity measurement is performed by adopting a cosine function:

wherein i and j respectively represent the sequence numbers of the samples in the query set and the search set;

and (3) fixing i, namely sorting j according to the sequence from large to small after traversing the search set, wherein the maximum value is the most similar result which is searched, namely the output result.

Further, the search result is decrypted and output:

representing a matrix of a first channel R of the retrieved ciphertext result as E, and performing pixel value replacement on the E;

the elements in the matrix E are marked as E (i, j) (i is more than or equal to 1 and less than or equal to M, j is more than or equal to 1 and less than or equal to M), for the first interval [0,32 ], the first element position (1, 1) is traversed from left to right and from top to bottom to the last position (M, M), and if the element E (i, j) is less than 32, pixel value replacement is carried out:

I(i,j)＝bitxor(E(i,j),l _p1 (num1,1))

I(i,j)＝mod(I(i,j),32)

wherein, the initial value of num1 is 0, the value of each element E (i, j) is smaller than 32, and the value of num1 is added with 1;

for the second interval [32, 64), starting from the first element position (1, 1), traversing from left to right, top to bottom, to the last position (M, M), if 32.ltoreq.E (i, j) < 64, performing pixel value substitution:

I(i,j)＝bitxor(E(i,j),l _p2 (num2,1))

I(i,j)＝mod(I(i,j),32)+32

wherein, the initial value of num2 is 0, the value of each element E (i, j) is 32 less than or equal to E (i, j) < 64, and the value of num2 is added with 1;

and so on, for the last interval [224,256 ], starting from the first element position (1, 1), traversing from left to right and top to bottom to the last position (M, M), if 224.ltoreq.I (I, j) < 256, performing pixel value substitution:

I(i,j)＝bitxor(E(i,j),l _p8 (num8,1))

I(i,j)＝mod(I(i,j),32)+224

wherein, the initial value of num8 is 0, the value 224 of each element E (i, j) is less than or equal to E (i, j) < 256, and the value of num8 is added with 1;

finally, the image I with the pixel value replaced is obtained.

Column and row permutes the pixel value permuted image I:

here, r= [ r ] in step S2 is introduced ₁ ,r ₂ ,…,r _M] and c＝[C₁ ,C ₂ ,…,C _M ]First, column scrambling is performed to obtain the c-th matrix I _M The column is interchanged with the first column, and then the c _M-1 Column is interchanged with first column, and so on, and finally c ₁ The columns are interchanged with the first column;

then row scrambling is carried out, and the (r) th matrix I after column scrambling is carried out _M The row is interchanged with the first row, and the r is added again _M-1 The row is interchanged with the first row, so as to push, and finally the (r) ₁ The rows are interchanged with the first row;

finally, a matrix I after column scrambling and row scrambling is obtained, wherein the matrix I is a final decryption result;

and the same decryption method is adopted for the two channel components of the image G, B, and finally, the two-dimensional matrix after the three channel components of the ciphertext image R, G, B are respectively decrypted is spliced to obtain the final decrypted original image I.

The method comprises the steps of preprocessing audio and tactile signals to obtain respective time-frequency diagrams, adjusting three modes to be the same resolution, encrypting three modes of information by adopting an encryption method based on a hyper-chaos pseudo-random sequence, wherein the encryption method comprises position scrambling and pixel value replacement, performing cross-mode information retrieval on ciphertext information of the three modes, and finally decrypting the retrieved result to obtain a final plaintext result and outputting the final plaintext result. The application is different from the traditional scheme, takes the cross-mode information retrieval of three modes of image, audio and touch signals into consideration, and takes the information security problem of data into consideration, thereby realizing the cross-mode information retrieval in the encryption domain.

Example 2

The embodiment is different from the first embodiment in that a test for an encryption domain cross-mode information retrieval method based on a hyperchaotic pseudo-random sequence is provided, and in order to verify and explain the technical effects adopted in the method, the conventional technical scheme is adopted to carry out a comparison test with the method of the application, and the test results are compared by a scientific demonstration means to verify the true effects of the method.

The traditional technical scheme is as follows: the image encryption method of the hyper-chaos pseudo-random sequence is provided, in the step of pixel value replacement of the encryption method, the pixel interval is not considered to be divided, namely, the traditional method only considers to encrypt the whole pixel interval of 0-255, but the method has low retrieval performance. In order to verify that the method has higher retrieval performance compared with the traditional method. In this embodiment, the search performance is compared with the search method under the encryption condition by adopting the conventional scheme.

Test environment: the data set is selected from a surface texture material public data set (https:// zeus.lmt.ei.tum.de/downloads/texture /) containing an image (V), an audio (A) and a touch signal (T), and the training set, the verification set and the test set are respectively carried out according to 3:1:1, encrypting data in a code written on MATLAB2017a to obtain ciphertext data, then carrying out a search experiment on the ciphertext data in a search model code written on Python on the Jupyter Notebook software, and measuring the search performance of different methods by using MAP values to obtain experimental data, wherein the larger the MAP value is, the better the representative search performance is, and the result is shown in the following table:

table 1: comparison table of experimental results.

From the above table, it can be seen that the search performance of the method of the present application is far better than that of the conventional method.

It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered in the scope of the claims of the present application.

Claims

1. An encryption domain cross-modal information retrieval method based on a hyper-chaos pseudorandom sequence is characterized by comprising the following steps:

preprocessing three modes of image, audio and touch signals;

inputting the three preprocessed modes into a constructed encryption system of the hyper-chaotic pseudorandom sequence, and respectively carrying out row scrambling, column scrambling and pixel value replacement to obtain ciphertext information of the three modes;

inputting the three modes after pretreatment into a constructed encryption system of the hyper-chaos pseudo-random sequence for encryption process, wherein the encryption process comprises,

the MxM multiplied by 3 images obtained after the pretreatment of the three modes comprise R, G, B channel classifications, each channel is a matrix with the size of MxM, each channel is respectively encrypted, and the matrix of MxM is marked as I;

four-dimensional hyper-chaos Chen system:

wherein ,representing state variables, x, y, z, w representing system variables, a, b, c, d, e representing control parameters of the system, the system being in a hyperchaotic state when a=35, b=3, c=12, d=7, e=0.58;

setting a starting TIME point TIME_START, a terminating TIME point TIME_FINAL and a STEP, setting four initial values [ x0 y0 z0 w0] of the system, namely a key of an encryption system, and solving the hyperchaotic Chen system by using a range-Kutta method to obtain a FINAL four-dimensional hyperchaotic pseudo-random sequence lm;

the encryption process comprises position scrambling and pixel value replacement;

extracting characteristics of ciphertext information of the three modes by utilizing a pretrained VGG16 network, and training the three modes in branch networks corresponding to the three modes;

inputting the trained three modes into a multi-mode semantic fusion network for semantic fusion, retrieving the output result, decrypting the retrieved result, and obtaining a plaintext result output;

the architecture of the multimodal semantic fusion network includes,

full connection layer 3, dropout layer 3, full connection layer 4, dropout layer 4, full connection layer 5, dropout layer 5, full connection layer 6;

decrypting the retrieved result includes pixel value permutation, column scrambling and row scrambling;

the pixel value permutation includes that,

the elements in the matrix E are marked as E (i, j) (1.ltoreq.i.ltoreq.M, 1.ltoreq.j.ltoreq.M), and for the first interval [0, 32), from the first element position (1, 1), from left to right, from top to bottom, traversing to the last position (M, M), and if the element E (i, j) is smaller than 32, pixel value replacement is performed:

I(i，j)＝bitxor(E(i，j)，l _p1 (num1，1))

I(i，j)＝mod(I(i，j)，32)

I(i，j)＝bitxor(E(i，j)，l _p2 (num2，1))

I(i，j)＝mod(I(i，j)，32)+32

and so on, for the last interval [224, 256), starting from the first element position (1, 1), traversing from left to right, top to bottom, to the last position (M, M), if 224.ltoreq.I (I, j) < 256, pixel value permutation is performed:

I(i，j)＝bitxor(E(i，j)，l _p8 (num8，1))

I(i，j)＝mod(I(i，j)，32)+224

finally obtaining an image I with the pixel value replaced;

the column scrambling and row scrambling include,

column and row permutes the image I after the pixel value permutation:

first row scrambling, c-th of matrix I _M The column is interchanged with the first column, and then the c _M-1 Column is interchanged with first column, and so on, and finally c ₁ The columns are interchanged with the first column;

and finally obtaining a matrix I after column scrambling and row scrambling, wherein the matrix I is the final decryption result.

2. The method for searching the encrypted domain cross-modal information based on the hyper-chaos pseudorandom sequence according to claim 1 is characterized in that: preprocessing the image, audio, haptic signals three modalities include,

adjusting the resolution of the image to mxmxmx 3;

the audio frequency is subjected to pre-emphasis, framing, windowing, power spectrum calculation and filter bank to obtain a time-frequency diagram of the audio frequency, and the resolution is adjusted to be MxMx3;

the haptic signal is subjected to pre-emphasis, framing, windowing, power spectrum calculation and filter bank to obtain a time-frequency diagram of the haptic signal, and the resolution is adjusted to be MxMx3.

3. The method for searching the encrypted domain cross-modal information based on the hyper-chaos pseudorandom sequence according to claim 1 is characterized in that: the loss function of the multimodal semantic fusion network includes,

the loss function is set as follows:

L＝L ₁ +λ·L ₂

4. The method for searching the encrypted domain cross-modal information based on the hyper-chaos pseudorandom sequence according to claim 1 is characterized in that: the result of the output includes that,

based on a ciphertext sample used for inquiry and a ciphertext sample used for retrieval, the output vector passing through the multi-mode semantic fusion network is marked as q _i and r_j And adopts cosine function to make similarity measurement:

and (3) fixing i, namely sorting j according to the sequence from large to small after traversing the search set, wherein the maximum value is the most similar result obtained by searching, namely the output result.