CN118093911A

CN118093911A - Fuzzy Transformer hash method for medical image retrieval target attack

Info

Publication number: CN118093911A
Application number: CN202410234959.5A
Authority: CN
Inventors: 丁卫平; 周琳琳; 刘传升; 黄嘉爽; 程纯; 鞠恒荣; 陈悦鹏; 侯涛; 周天奕
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2024-03-01
Filing date: 2024-03-01
Publication date: 2024-05-28

Abstract

The invention provides a fuzzy transform hash method for medical image retrieval target attack, which solves the technical problems that the current depth hash model has poor robustness in medical image retrieval and is easily influenced by an countermeasure sample. The technical proposal is as follows: establishing a medical image database, and establishing a fuzzy transform hash model, wherein the model mainly comprises four parts: a visual transducer hash model, a prototype network, a residual fuzzy generator and a discriminator; calculating the loss function of each part and optimizing by adopting an alternative learning algorithm; and (3) taking the prototype code and the countermeasure sample generated by the test set as query samples to search in a database, and evaluating the target attack performance of the model by using the target average precision t-MAP. The beneficial effects of the invention are as follows: the robustness and the anti-interference performance of the model in the medical image retrieval process are enhanced, and the accuracy of medical image retrieval is improved.

Description

Fuzzy Transformer hash method for medical image retrieval target attack

Technical Field

The invention relates to the technical field of medical image processing, in particular to a fuzzy transform hash method for medical image retrieval target attack.

Background

In recent years, with the development of medical imaging technology, the number of medical images has been increasing. In the medical field, doctors make diagnosis and formulate treatment solutions by examining pathological images and related cases of patients. Therefore, doctors often need to retrieve valuable images from different medical image databases to improve their clinical practices. However, since medical images generally have similar structures and textures, medical image retrieval presents significant challenges. Furthermore, the pathological images of different diseases may exhibit similar characteristics, which further highlights the necessity of improving the retrieval accuracy.

In the early days, the near nearest neighbor (Approximate Nearest Neighbor, ANN) search method was of great interest because of its efficiency and effectiveness. However, this method has problems of high memory cost, slow search speed and low accuracy. To solve these problems, a hash technique is introduced as one solution. Hashing techniques are capable of mapping high-dimensional data into compact binary code while maintaining semantic similarity, thus providing significant advantages in terms of storage cost and retrieval speed. With the rapid development of deep learning, a method combining deep learning and hash is widely applied. Among them, the deep hash method of automatically extracting features using the deep neural network (Deep Neural Network, DNN) has been greatly successful in learning the hash, and has been proved to have superior performance to the conventional hash method. However, deep neural networks tend to be susceptible to resistive examples, spoofing networks to make false predictions. The deep hash method achieves encouraging performance in many benchmarks, but at the same time inevitably inherits the characteristics of poor DNN robustness and susceptibility to challenge samples.

Disclosure of Invention

The invention aims to provide a fuzzy transform hash method for medical image retrieval target attack, which mainly solves the technical problems that the current depth hash model has poor robustness and is easily influenced by a countermeasure sample, and belongs to the technical field of medical image processing. The method greatly enhances the robustness and the anti-interference performance of the model in the medical image retrieval process, and improves the accuracy of medical image retrieval.

In order to achieve the aim of the invention, the invention adopts the technical scheme as follows: a fuzzy Transformer hashing method for medical image retrieval target attacks, comprising the steps of:

S10: firstly, a medical image database is established, then samples in the database are preprocessed and expanded, and finally a training set T _r, a test set T _e and a database sample set T _d are divided;

S20: a fuzzy transform hash model is constructed, which is mainly composed of four parts: a visual transducer hash model, a prototype network, a residual fuzzy generator and a discriminator; the visual transducer hash model comprises two modules, namely a feature learning module and a hash code learning module, wherein the feature learning module is responsible for extracting depth features of medical images, and the hash code learning module is responsible for mapping the extracted depth features into hash codes; the prototype network is responsible for extracting semantic features f _L of various image labels in the database and mapping the semantic features f _L into prototype codes; the residual ambiguity generator firstly carries out deconvolution on semantic features F _L extracted by a prototype network to form a feature map F _L with the same size as an original image, and then adds the feature map F _L with the original image q _t to generate a contrast sample q' _t; the discriminator is then responsible for discriminating whether its input is from the original image or the generated challenge sample;

S30: respectively calculating loss functions of all the parts according to the model constructed in the step S20: loss of visual transducer hash model L _H, loss of prototype network L _P, loss of residual blur generator L _G, loss of arbiter L _D;

S40: according to the loss function of each part of the calculated model, sequentially optimizing a visual transducer hash model, a prototype network, a residual error fuzzy generator and a discriminator by using an alternative learning algorithm, and storing the optimized model;

S50: medical images in the test set T _e are sequentially input into a visual transducer hash model, a prototype network and a residual error fuzzy generator to generate corresponding prototype codes and countermeasure samples, the prototype codes and the countermeasure samples are used as query samples to be retrieved from the database sample set T _d, and finally target average precision T-MAP is used for evaluating target attack performance.

As the fuzzy transform hash method for the medical image retrieval target attack, which is provided by the invention, the whole model is constructed in detail in the step S20, and the specific steps are as follows:

S21: the visual transducer hash model is mainly responsible for extracting characteristics of an attacked object and mapping the characteristics into hash codes, wherein the visual transducer model is adopted in the characteristic extraction stage;

s211: the power mean transform layer is integrated into the input and output stages of the transform encoder to enhance the nonlinearity of the model; if the input and output of the power mean transform layer are x and l, respectively, the PMT transforms x into [ ln (x+β), ln ² (x+β) ], where β is a constant; during the forward propagation process, the forward direction propagation time, Representing the gradient of PMT output (y= [ ln (x+β), ln ² (x+β) ]), calculated using the chain law:

The power mean value conversion layer is integrated into the whole model through forward and backward propagation, and the integration enables the model to learn more complex information in the training process, so that the nonlinearity of the model is enhanced;

S212: after step S211, the spatial pyramid pooling layer module accepts the input feature map and generates feature subgraphs with sizes of 1×1, 2×2, 4×4, and 16×16, respectively, through the adaptive averaging pooling layer; the generated feature subgraphs are flattened and spliced, so that features with different scales can be conveniently extracted from the input feature graphs, and multi-scale features are generated, so that information from different areas is integrated;

Integrating the spatial pyramid pooling layer module into a multi-head attention mechanism to form a multi-head spatial pyramid pooling attention mechanism module; in a multi-head space pyramid pooling attention mechanism module, firstly, carrying out multi-scale feature extraction and fusion on a value V and a key K through a space pyramid pooling layer module; next, the scale click attention layer processes query Q and pooled key value pairs K 'and V'; this layer calculates the attention score between query Q and key-value pairs K 'and V';

S213: the output of the multi-head space pyramid pooling attention mechanism module and the output of a transducer encoder can be obtained through the step S212; if the dimensions of query Q and key K' are d _k, the output of the multi-headed spatial pyramid pooling attention mechanism module can be expressed as:

Wherein K 'is a value obtained after the spatial pyramid pooling operation is performed on K, and V' is a value obtained after the spatial pyramid pooling operation is performed on V; MHSPA (·, ·, ·) represents a multi-headed spatial pyramid pooling attention operation; QK' ^T calculates the similarity between each query and all keys; normalizing the similarity score by a softmax (·) function to obtain the attention weight of each query, wherein the sum of the weights is 1; scaling factor The influence degree of dot product is reduced; the scaling dot product attention layer not only enhances the capability of the model to process long-range dependency in input, but also relieves the problem of unstable numerical value caused by large dot product magnitude;

if the input to the transducer encoder is Z _l, l=1, 2,..l, then the output Z _o can be expressed as:

Z_o＝MLP(LN(MHSPA(LN(z_l))+z_l))+MHSPA(LN(z_l))+z_l

Wherein, MLP (& gt) represents the operation of the multi-layer perceptron, and LN (& gt) represents the normalization operation of the layers;

S214: after obtaining the output of the transducer encoder according to step S213, generating features through a power mean value conversion layer, and classifying and generating hash codes through three completely connected layers;

S22: the prototype network is mainly responsible for extracting semantic features of labels and generating corresponding prototype codes, and the input of the prototype network is a unique label set of various images in a training set T _r N _u represents the number of unique tags, c represents the number of categories; firstly, extracting semantic features f _L from a tag set L _u through two full connection layers; secondly, the semantic feature f _L passes through the full connection layer again to obtain classification information and corresponding prototype codes; finally, the semantic feature f _L is used as part of the input assist challenge sample generation of the residual blur generator;

S23: the input of the residual ambiguity generator comprises two parts, one part is a feature F _L which is the same as the original image in size and is formed by deconvolution and lamination of 5 layers of semantic features F _L extracted from a prototype network, and the other part is an original image q _t; taking (q _t+F_L) as a new input, forming a feature F' _L with the same size as the original image through a 3-layer convolution layer, 1 residual blurring block and a 2-layer deconvolution layer; then (q _t+F′_L) is processed by 1 convolution layer to form a countermeasure sample q' _t with the same size as the original image;

S231: the residual blurring block in the step S23 is an important component part in a residual blurring generator and is formed by stacking 6 identical residual blurring units; each residual error blurring unit firstly passes through 2 layers of convolution layers and then is connected with the input feature map in a jumping manner to obtain a feature map F _R; then F _R continuously passes through the fuzzy layer Fl (-) and is connected with the feature map F _R in a jumping way to obtain a feature map F _R+Fl(F_R); finally, F _R+Fl(F_R) is used as a new input to sequentially pass through the residual error blurring units to obtain a blurred feature map F' _R; wherein the input of the blur layer is H ^R、W^R and C ^R represent the height, width, and number of channels of the input feature map, respectively;

S232: converting the feature map of each channel obtained in the step S231 into a one-dimensional feature vector, and subjecting all feature points in the feature vector of each channel to M times of fuzzy membership function calculation; each calculation of the fuzzy membership function assigns a fuzzy label to the feature points, namely:

i＝1,...,H^R×W^R;c＝1,...,C^R；k＝1,...,M

Wherein i represents each characteristic point in each channel, mu _k,c and sigma _k,c respectively represent the mean value and standard deviation of the c-th channel in the calculation of the k-th Gaussian blur membership function, and Z _i,k,c represents the blur label of the k-th Gaussian blur membership function of the i-th characteristic point in the c-th channel;

AND then merging fuzzy labels calculated by M Gaussian fuzzy membership functions through AND fuzzy logic operation, namely:

i＝1,...,H^R×W^R;c＝1,...,C^R

Wherein Z _i,c represents ambiguity; then, the ambiguity Z _i,c is subjected to shape transformation to obtain an ambiguous feature map

S24: the discriminator takes as input the original image q _t and the challenge sample q' _t, passes through the 5 convolutional layers and the 3 fully connected layers in sequence, and predicts the label of the image in the last fully connected layer and determines whether the input is from the original image or the challenge sample.

As the fuzzy transform hash method for medical image retrieval target attack provided by the invention, in step S30, the loss L _H of the visual transform hash model, the loss L _P of the prototype network, the loss L _G of the residual fuzzy generator and the loss L _D of the discriminator are specifically as follows:

s31: the loss L _H of the visual transducer hash model is calculated, and is specifically divided into a pairwise loss L _HP, a quantization loss L _HQ, a balance loss L _HB and a classification loss L _HC, and the objective functions of the visual transducer hash model are as follows:

s.t.V∈{-1,+1}^n×k

Wherein α _h、β_h and γ _h are insensitive hyper-parameters; n is the number of database samples, k is the number of hash codes, and V represents the hash codes of the database samples;

S311: first, the pair loss L _HP is calculated:

s.t.U＝[u₁,u₂,...,u_m]^T∈{-1,+1}^m×k,V＝[v₁,v₂,...,v_n]^T∈{-1,+1}^n×k

Wherein U _i and V _j represent hash codes of the t training sample and the j database sample, respectively, U and V represent hash codes of the training sample and the database sample, S represents a similarity matrix between the query set and the database, U _t＝h(q_t)＝sign(Φ(q_t;θ_h)), h (·) represents a hash function, q _t represents an original image, Φ (q _t;θ_h)∈R^k represents an output of Ha Xiceng, k represents a number of bits of the hash codes, m is a number of training samples, θ _h represents a network parameter of the full connection layer before Ha Xiceng;

Considering that the solution to the sign (·) of the sign in L _HP is a discrete optimization problem, the sign function is approximated using the hyperbolic tangent function tanh (·), namely:

S312: since the hyperbolic tangent function tanh (·) is used to approximate the sign (·) function in the pairwise penalty L _HP, a regularization term is added between the real value output and the hash code, namely the quantization penalty L _HQ:

S313: in order to make the hash code fill the entire hamming space of 2 ^k as much as possible and to guarantee the balance of each bit, a balance loss L _HB is proposed to ensure that the probability of-1 and +1 occurrence on each bit is as equal as possible, defined as follows:

Wherein the mean (·) function is used to calculate the average of the elements in the vector;

s314: and because the number of labels of different images is greatly different, different loss functions are used for the single-label image and the multi-label image in the training process:

(1) Multi-tag classification penalty L _multi:

(2) Single tag classification penalty L _single:

Where pred (q _t;θ_c) represents the predictive label of image q _t, l _t represents the true label of image q _t, and θ _c represents the network parameters of the fully connected layer prior to the classification layer;

Therefore, the classification penalty of the visual transducer hash model L _HC＝L_multi+L_single;

S32: the loss L _P of the prototype network is calculated, and is specifically divided into a pair likelihood loss L _PP, a quantization loss L _PQ and a classification loss L _PC, and the objective function of the prototype network is as follows:

Where α _p、β_p and γ _p are insensitive super parameters, θ _p represents the network parameters of the prototype network;

s321: first, the pair likelihood loss L _PP:

Wherein, Similarity matrix between the tag set L _u representing various images in the database and the tag set L _Q of the query set, n _u is the number of prototype codes,/>U epsilon-1, +1} ^m×k,V_p is a prototype code;

s322: a regularization term, i.e. quantization loss, is introduced between the real value output f _p of the prototype network and the prototype code V _p:

S323: similar to the visual transducer hash model, the classification loss of the prototype network combines multi-label classification loss and single-label classification loss, taking into account the variation in the number of labels on different images:

(1) Multi-tag classification penalty L _mul-p:

(2) Single tag classification penalty L _sin-p:

Wherein, Representing predicted labels, L _u representing labels of various types of images;

Therefore, the classification loss of the prototype network L _PC＝L_mul-p+L_sin-p;

S33: the loss L _G of the residual blurring generator is calculated, and the method is specifically divided into a reconstruction loss L _GR and a hamming loss L _GH:

where q _t represents the original image and q' _t represents the corresponding challenge sample;

Wherein V _p represents the prototype code, h (·) represents the hash function, h (q '_t) represents the hash code against the sample q' _t, dist _H (·) represents the hamming distance function, and k represents the number of bits of the hash code;

finally, the objective function of the residual blur generator is as follows:

Where α _g and β _g are insensitive super parameters, θ _g represents the network parameters of the residual blur generator;

S34: calculating loss L _D of the discriminator:

Wherein, D (·) represents the arbiter operation;

finally, the objective function of the arbiter is as follows:

where θ _d represents the network parameters of the arbiter.

Compared with the prior art, the invention has the beneficial effects that:

(1) In order to solve the problems of low anti-interference capability and insufficient robustness of ViTH, the invention introduces a generated type countermeasure network on the basis of visual transformation hash. In addition, the invention also provides a method for adding the residual fuzzy block into the generator and embedding the fuzzy layer to process the complex relation between the feature mapping and the corresponding generated sample, thereby improving the performance of the model when the model is subjected to targeted attack.

(2) Medical images have higher requirements for the extraction of minuscule features than natural images. The invention introduces a spatial pyramid pool (SPATIAL PYRAMID Pooling, SPP) layer in ViTH and integrates it with Multi-Head Attention (MHA) mechanism of a transformer encoder so as to be able to effectively extract and fuse features of different scales in medical images.

(3) To further improve model performance, the prototype network uses image tags as input, extracts semantic features from them, and passes these features to a residual blur generator to assist in generating the counterexamples.

(4) In order to solve the problems of poor robustness, susceptibility to resistant samples and the like of the deep hash method, a target attack method based on the deep hash applied to medical image retrieval is proposed and gradually applied to practical application. By the method, the accuracy and the reliability of medical image retrieval can be improved, so that the clinical practice of doctors can be better assisted.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

FIG. 1 is a schematic flow chart of a fuzzy transform hash method for medical image retrieval target attack provided by the invention;

FIG. 2 is a block diagram of a fuzzy transform hashing method for medical image retrieval target attacks in accordance with the present invention;

FIG. 3 is a schematic diagram of a spatial pyramid pooling layer (SPATIAL PYRAMID Pooling, SPP) in accordance with the present invention;

FIG. 4 is a schematic diagram of a Multi-headed spatial pyramid pooling Attention mechanism (Multi-HEAD SPATIAL PYRAMID Attention, MHSPA) module in accordance with the present invention;

FIG. 5 is a schematic diagram of a residual block (Residual Fuzzy Block) according to the present invention;

FIG. 6 is an example of image retrieval of an original image on an ISIC 2018 dataset;

fig. 7 is an example of image retrieval on an ISIC 2018 dataset, as exemplified by a challenge sample, in accordance with the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. Of course, the specific embodiments described herein are for purposes of illustration only and are not intended to limit the invention.

Example 1:

Referring to fig. 1 to 7, the present embodiment provides a fuzzy transform hash method for medical image retrieval target attack, which includes the following steps:

S10: firstly, establishing a dermatological medical image database, wherein the database contains 7 common diseases, preprocessing and expanding samples in the database to obtain 5 disease category samples, and finally dividing a training set T _r, a test set T _e and a database sample set T _d;

S20: a fuzzy transform hash model is constructed, which is mainly composed of four parts: a visual transducer hash model, a prototype network, a residual fuzzy generator and a discriminator; the visual transducer hash model comprises two modules, namely a feature learning module and a hash code learning module, wherein the feature learning module is responsible for extracting depth features of medical images, and the hash code learning module is responsible for mapping the extracted depth features into hash codes; the prototype network is responsible for extracting semantic features f _L of various image labels in the database and mapping the semantic features f _L into prototype codes; the residual ambiguity generator firstly carries out deconvolution on semantic features F _L extracted by a prototype network to form a feature map F _L with the same size as an original image, and then adds the feature map F _L with the original image q _t to generate a contrast sample q _t'; the discriminator is then responsible for discriminating whether its input is from the original image or the generated challenge sample;

S50: medical images in a test set T _e are sequentially input into a visual transducer hash model, a prototype network and a residual error fuzzy generator to generate corresponding prototype codes and countermeasure samples, the prototype codes and the countermeasure samples are used as query samples to be retrieved from a database sample set T _d, and finally target average precision T-MAP is used for evaluating target attack performance;

Finally, the model achieves t-MAPs of 0.981, 0.984, 0.986, 0.989 and 0.984 at 5 code lengths (12, 24, 36, 48, 60) on ISIC 2018, respectively, and the corresponding MAPs achieve 0.927, 0.931, 0.935, 0.945 and 0.955, respectively; therefore, it is further confirmed that ViTH-RFG plays an important role in improving the target attack performance of the model.

Specifically, in step S20, the specific structures of the main part prototype network, the residual blur generator and the arbiter of the fuzzy Transformer hash method for medical image retrieval target attack are as follows:

S211: the power mean transform layer is integrated into the input and output stages of the transform encoder to enhance the nonlinearity of the model; if the input and output are x and l, respectively, of the power mean transform layer, the PMT transforms x to [ ln (x+β), ln ² (x+β) ], where β is a constant (β=1); during the forward propagation process, the forward direction propagation time, Representing the gradient of PMT output (y= [ ln (x+β), ln ² (x+β) ]), calculated using the chain law:

Z_o＝MLP(LN(MHSPA(LN(z_l))+z_l))+MHSPA(LN(z_l))+z_l

S22: the prototype network is mainly responsible for extracting semantic features of labels and generating corresponding prototype codes, and the input of the prototype network is a unique label set of various images in a training set T _r N _u represents the number of unique tags, c=5 represents the number of categories; firstly, extracting semantic features f _L from a tag set L _u through two full connection layers; secondly, the semantic feature f _L passes through the full connection layer again to obtain classification information and corresponding prototype codes; finally, the semantic feature f _L is used as part of the input assist challenge sample generation of the residual blur generator;

i＝1,...,H^R×W^R;c＝1,...,C^R；k＝1,...,M

i＝1,...,H^R×W^R;c＝1,...,C^R

Specifically, in step 30, the loss L _H of the visual transducer hash model, the loss L _P of the prototype network, the loss L _G of the residual blur generator, and the loss L _D of the arbiter are specifically as follows:

s.t.V∈{-1,+1}^n×k

Wherein, α _h＝100、β_h =500 and γ _h =10 are insensitive super parameters; n is the number of database samples, k is the number of hash codes, and V represents the hash codes of the database samples;

S311: first, the pair loss L _HP is calculated:

(1) Multi-tag classification penalty L _multi:

(2) Single tag classification penalty L _single:

Wherein, And γ _p =1 is a insensitive hyper-parameter, θ _p represents a network parameter of the prototype network;

s321: first, the pair likelihood loss L _PP:

(1) Multi-tag classification penalty L _mul-p:

(2) Single tag classification penalty L _sin-p:

finally, the objective function of the residual blur generator is as follows:

Where α _g =25 and α _g =5 are insensitive super parameters, θ _g represents the network parameters of the residual blur generator;

S34: calculating loss L _D of the discriminator:

Wherein, D (·) represents the arbiter operation;

finally, the objective function of the arbiter is as follows:

where θ _d represents the network parameters of the arbiter.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A fuzzy Transformer hash method for medical image retrieval target attacks, comprising the steps of:

S20: a fuzzy transform hash model is constructed, which consists of four parts: a visual transducer hash model, a prototype network, a residual fuzzy generator and a discriminator; the visual transducer hash model comprises two modules, namely a feature learning module and a hash code learning module, wherein the feature learning module is responsible for extracting depth features of medical images, and the hash code learning module is responsible for mapping the extracted depth features into hash codes; the prototype network is responsible for extracting semantic features f _L of various image labels in the database and mapping the semantic features f _L into prototype codes; the residual ambiguity generator firstly carries out deconvolution on semantic features F _L extracted by a prototype network to form a feature map F _L with the same size as an original image, and then adds the feature map F _L with the original image q _t to generate a contrast sample q _t'; the discriminator is then responsible for discriminating whether its input is from the original image or the generated challenge sample;

s40: according to the loss function of each part of the model, sequentially optimizing a visual transducer hash model, a prototype network, a residual error fuzzy generator and a discriminator by using an alternative learning algorithm, and storing the optimized model;

2. The fuzzy Transformer hash method for medical image retrieval target attack of claim 1, wherein the specific structure of the visual Transformer hash model, prototype network, residual fuzzy generator and discriminant is as follows:

S213: step S212 is carried out to obtain the output of the multi-head space pyramid pooling attention mechanism module and the output of a transducer encoder; if the dimensions of query Q and key K' are d _k, the output of the multi-headed spatial pyramid pooling attention mechanism module is expressed as:

Wherein K 'is a value obtained after the spatial pyramid pooling operation is performed on K, and V' is a value obtained after the spatial pyramid pooling operation is performed on V; MHSPA (·, ·, ·) represents a multi-headed spatial pyramid pooling attention operation; QK' ^T calculates the similarity between each query and all keys; normalizing the similarity score by a soft max (·) function to obtain the attention weight of each query, wherein the sum of the weights is 1; scaling factor The influence degree of dot product is reduced; the scaling dot product attention layer not only enhances the capability of the model to process long-range dependency in input, but also relieves the problem of unstable numerical value caused by large dot product magnitude;

If the input to the transducer encoder is Z _l, l=1, 2,..l, then output Z _o is expressed as:

Z_o＝MLP(LN(MHSPA(LN(z_l))+z_l))+MHSPA(LN(z_l))+z_l

S23: the input of the residual ambiguity generator comprises two parts, one part is a feature F _L which is the same as the original image in size and is formed by deconvolution and lamination of 5 layers of semantic features F _L extracted from a prototype network, and the other part is an original image q _t; taking (q _t+F_L) as a new input, forming a feature F' _L with the same size as the original image through a 3-layer convolution layer, 1 residual blurring block and a 2-layer deconvolution layer; then (q _t+F_L ') is passed through 1 convolution layer to form a challenge sample q _t' with the same size as the original image;

Wherein i represents each feature point in the channel, mu _k,c and sigma _k,c respectively represent the mean value and standard deviation of the c-th channel in the calculation of the k-th Gaussian blur membership function, and Z _i,k,c represents the blur label of the k-th Gaussian blur membership function of the i-th feature point in the c-th channel;

3. The fuzzy Transformer hash method for medical image retrieval objective attack according to claim 1, wherein the loss of visual Transformer hash model L _H, the loss of prototype network L _P, the loss of residual fuzzy generator L _G and the loss of arbiter L _D are specifically as follows:

s.t.V∈{-1,+1}^n×k

S311: first, the pair loss L _HP is calculated:

(1) Multi-tag classification penalty L _multi:

(2) Single tag classification penalty L _single:

Where α _p、α_p and γ _p are insensitive super parameters, θ _p represents the network parameters of the prototype network;

s321: first, the pair likelihood loss L _PP:

L_PQ＝||V_p-f_p||²

＝||sign(f_p)-f_p||²

(1) Multi-tag classification penalty L _mul-p:

(2) Single tag classification penalty L _sin-p:

Wherein V _p represents the prototype code, h (·) represents the hash function, h (q '_t) represents the hash code against the sample q _t', dist _H (·) represents the hamming distance function, and k represents the number of bits of the hash code;

finally, the objective function of the residual blur generator is as follows:

S34: calculating loss L _D of the discriminator:

Wherein, D (·) represents the arbiter operation;

finally, the objective function of the arbiter is as follows:

where θ _d represents the network parameters of the arbiter.